Method: Wgit::Document.define_extractor
- Defined in:
- lib/wgit/document.rb
.define_extractor(var, xpath, opts = {}) {|value, source, type| ... } ⇒ Symbol
Defines a content extractor, which extracts HTML elements/content
into instance variables upon Document initialization. See the default
extractors defined in 'document_extractors.rb' as examples. Defining an
extractor means that every subsequently crawled/initialized document
will attempt to extract the xpath's content. Use #extract for a one off
content extraction on any document.
Note that defined extractors work for both Documents initialized from HTML (via Wgit::Crawler methods) and from database objects. An extractor once defined, initializes a private instance variable with the xpath or database object result(s).
When initialising from HTML, a singleton value of true will only
ever return the first result found; otherwise all the results are
returned in an Enumerable. When initialising from a database object, the
value is taken as is and singleton is only used to define the default
empty value. If a value cannot be found (in either the HTML or database
object), then a default will be used. The default value is:
singleton ? nil : [].
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
# File 'lib/wgit/document.rb', line 139 def self.define_extractor(var, xpath, opts = {}, &block) var = var.to_sym defaults = { singleton: true, text_content_only: true } opts = defaults.merge(opts) raise "var must match #{REGEX_EXTRACTOR_NAME}" unless \ var =~ REGEX_EXTRACTOR_NAME # Define the private init_*_from_html method for HTML. # Gets the HTML's xpath value and creates a var for it. func_name = Document.send(:define_method, "init_#{var}_from_html") do result = extract_from_html(xpath, **opts, &block) init_var(var, result) end Document.send(:private, func_name) # Define the private init_*_from_object method for a Database object. # Gets the Object's 'key' value and creates a var for it. func_name = Document.send( :define_method, "init_#{var}_from_object" ) do |obj| result = extract_from_object( obj, var.to_s, singleton: opts[:singleton], &block ) init_var(var, result) end Document.send(:private, func_name) @extractors << var var end |