Class: REXML::Text

Inherits:
Object
  • Object
show all
Defined in:
lib/crawler.rb

Overview

extends REXML::Text with the functionaliy to extract semantic information

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.store_semantics(tags) ⇒ Object

stores an array of meaningful tags with their rank value



157
158
159
# File 'lib/crawler.rb', line 157

def self.store_semantics(tags)
  @@semantic_tags=tags
end

Instance Method Details

#semantic_valueObject

extracts the semantic value of a text block



162
163
164
165
166
167
168
169
170
171
172
173
# File 'lib/crawler.rb', line 162

def semantic_value
  REXML::Text.store_semantics($config['crawler']['tags'].keys) unless defined?(@@semantic_tags)
  rank = 1
  node = parent
				return nil if(node.name == 'script')
  while @@semantic_tags.include?(node.name)
    rank += $config['crawler']['tags'][node.name]
    node = node.parent
					return nil if(node.name == 'script')
  end
  rank
end