Class: String

Inherits:
Object
  • Object
show all
Defined in:
lib/entity_converter.rb

Overview

extends the string class to convert html entities use carefully, can not convert entities back, as some entities are just skipped, because the are not useful for the search generation process

  • $Author$

  • $Rev$

  • $LastChangedDate$

Instance Method Summary collapse

Instance Method Details

#decode_html_entitiesObject

this method converts encoded entities to their utf-8 euqivalent. be careful this method strips out all unknown entities because they are of no special use for the semantic search



10
11
12
# File 'lib/entity_converter.rb', line 10

def decode_html_entities
  mgsub([[/ä/,'ä'],[/Ä/,'Ä'],[/ö/,'ö'],[/Ö/,'Ö'],[/ü/,'ü'],[/Ü/,'Ü'],[/ß/,'ß'],[/&[a-zA-Z]{4,6};/,' ']])
end

#encode_html_entitiesObject

encodes html entities



15
16
17
# File 'lib/entity_converter.rb', line 15

def encode_html_entities
  mgsub([[/ä/,'ä'],[/Ä/,'Ä'],[/ö/,'ö'],[/Ö/,'Ö'],[/ü/,'ü'],[/U/,'Ü'],[/ß/,'ß']])    
end

#umlaut_to_downcaseObject

converts uppercase umlauts to downcase



20
21
22
# File 'lib/entity_converter.rb', line 20

def umlaut_to_downcase
  mgsub([[/Ä/,'ä'],[/Ö/,'ö'],[/Ü/,'ü']])
end