Class: IMW::Parsers::HtmlMatchers::MatchHash
- Inherits:
-
Object
- Object
- IMW::Parsers::HtmlMatchers::MatchHash
- Defined in:
- lib/imw/parsers/html_parser/matchers.rb
Overview
Class for building a hash of values by using appropriate matchers against an HTML document.
Instance Attribute Summary collapse
-
#match_hash ⇒ Object
Returns the value of attribute match_hash.
Class Method Summary collapse
-
.scrub!(hsh) ⇒ Object
kill off keys with nil values.
Instance Method Summary collapse
-
#initialize(match_hash) ⇒ MatchHash
constructor
The
match_hash
must be aHash
of symbols matched to HTML matchers (subclasses ofIMW::Parsers::HtmlMatchers::Matcher
). -
#match(doc) ⇒ Object
Use the
match_hash
thisMatchHash
was initialized with to select elements fromdoc
and extract information from them:.
Constructor Details
#initialize(match_hash) ⇒ MatchHash
The match_hash
must be a Hash
of symbols matched to HTML matchers (subclasses of IMW::Parsers::HtmlMatchers::Matcher
).
214 215 216 217 218 |
# File 'lib/imw/parsers/html_parser/matchers.rb', line 214 def initialize match_hash # Kludge? maybe. raise "MatchHash requires a hash of :attributes => matchers." unless match_hash.is_a?(Hash) self.match_hash = match_hash end |
Instance Attribute Details
#match_hash ⇒ Object
Returns the value of attribute match_hash.
209 210 211 |
# File 'lib/imw/parsers/html_parser/matchers.rb', line 209 def match_hash @match_hash end |
Class Method Details
.scrub!(hsh) ⇒ Object
kill off keys with nil values
252 253 254 |
# File 'lib/imw/parsers/html_parser/matchers.rb', line 252 def self.scrub! hsh hsh # .reject{|k,v| v.nil? } end |
Instance Method Details
#match(doc) ⇒ Object
Use the match_hash
this MatchHash
was initialized with to select elements from doc
and extract information from them:
m = MatchHash.new({
:name => MatchFirstElement.new('li/span.customer'),
:order_status => MatchAttribute.new('li/ul[@status]','status'),
:products => MatchArray.new('li/ul/li')
})
m.match('<li><span class="customer">John Chimpo</span>
<ul status="shipped">
<li>bananas</li>
<li>mangos</li>
<li>banangos</li>
</ul></li>')
# => {
:name => "John Chimpo",
:order_status => "shipped",
:products => ["bananas", "mangos", "banangos"]
}
239 240 241 242 243 244 245 246 247 248 249 |
# File 'lib/imw/parsers/html_parser/matchers.rb', line 239 def match doc doc = Hpricot(doc) if doc.is_a?(String) hsh = { } match_hash.each do |attr, m| val = m.match(doc) case attr when Array then hsh.merge!(Hash.zip(attr, val).reject{|k,v| v.nil? }) if val else hsh[attr] = val end end self.class.scrub!(hsh) end |