Method: Wgit::Document#extract_from_html

Defined in:
lib/wgit/document.rb

#extract_from_html(xpath, singleton: true, text_content_only: true) {|Optionally| ... } ⇒ String, Object (protected)

Extracts a value/object from this Document's @html using the given xpath parameter.

Parameters:

  • xpath (String, #call, nil)

    Used to find the value/object in @html. Passing nil will skip the HTML extraction which isn't always needed.

  • singleton (Boolean) (defaults to: true)

    singleton ? results.first (single Object) : results (Enumerable).

  • text_content_only (Boolean) (defaults to: true)

    text_content_only ? result.content (String) : result (Nokogiri Object).

Yields:

  • (Optionally)

    Pass a block to read/write the result value before it's returned.

Yield Parameters:

  • value (Object)

    The result value to be returned.

  • source (Wgit::Document, Object)

    This Document instance.

  • type (Symbol)

    The source type, which is :document.

Yield Returns:

  • (Object)

    The return value of the block gets returned. Return the block's value param unchanged if you simply want to inspect it.

Returns:

  • (String, Object)

    The value found in the html or the default value (singleton ? nil : []).



661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
# File 'lib/wgit/document.rb', line 661

def extract_from_html(xpath, singleton: true, text_content_only: true)
  result = nil

  if xpath
    xpath  = xpath.call if xpath.respond_to?(:call)
    result = singleton ? at_xpath(xpath) : xpath(xpath)
  end

  if result && text_content_only
    result = singleton ? result.content : result.map(&:content)
  end

  result = Wgit::Utils.sanitize(result)
  result = yield(result, self, :document) if block_given?
  result
end