Class: String
- Inherits:
-
Object
- Object
- String
- Defined in:
- lib/entity_converter.rb
Overview
extends the string class to convert html entities use carefully, can not convert entities back, as some entities are just skipped, because the are not useful for the search generation process
-
$Author$
-
$Rev$
-
$LastChangedDate$
Instance Method Summary collapse
-
#decode_html_entities ⇒ Object
this method converts encoded entities to their utf-8 euqivalent.
-
#encode_html_entities ⇒ Object
encodes html entities.
-
#umlaut_to_downcase ⇒ Object
converts uppercase umlauts to downcase.
Instance Method Details
#decode_html_entities ⇒ Object
this method converts encoded entities to their utf-8 euqivalent. be careful this method strips out all unknown entities because they are of no special use for the semantic search
10 11 12 |
# File 'lib/entity_converter.rb', line 10 def decode_html_entities mgsub([[/ä/,'ä'],[/Ä/,'Ä'],[/ö/,'ö'],[/Ö/,'Ö'],[/ü/,'ü'],[/Ü/,'Ü'],[/ß/,'ß'],[/&[a-zA-Z]{4,6};/,' ']]) end |
#encode_html_entities ⇒ Object
encodes html entities
15 16 17 |
# File 'lib/entity_converter.rb', line 15 def encode_html_entities mgsub([[/ä/,'ä'],[/Ä/,'Ä'],[/ö/,'ö'],[/Ö/,'Ö'],[/ü/,'ü'],[/U/,'Ü'],[/ß/,'ß']]) end |
#umlaut_to_downcase ⇒ Object
converts uppercase umlauts to downcase
20 21 22 |
# File 'lib/entity_converter.rb', line 20 def umlaut_to_downcase mgsub([[/Ä/,'ä'],[/Ö/,'ö'],[/Ü/,'ü']]) end |