Class: Scrubyt::SimpleExampleLookup

Inherits:

Object

Object
Scrubyt::SimpleExampleLookup

show all

Defined in:: lib/scrubyt/utils/simple_example_lookup.rb

Overview

`Lookup of simple examples`

There are two types of string examples in scRUBYt! right now: the simple example and the compound example.

This class is responsible for finding elements matched by simple examples. In the futre probably more sophisticated matching algorithms will be added (e.g. match the n-th which matches the text, or element that matches the text but also contains a specific attribute etc.)

Class Method Summary collapse

.find_node_from_text(doc, text, next_link = false, index = 0) ⇒ Object

From the example text defined by the user, find the lowest possible node which contains the text ‘text’.

Class Method Details

.find_node_from_text(doc, text, next_link = false, index = 0) ⇒ `Object`

From the example text defined by the user, find the lowest possible node which contains the text ‘text’. The text can be also a mixed content text, e.g.

<a>Bon nuit, monsieur!</a>

In this case, <a>‘s text is considered to be “Bon nuit, monsieur”

# File 'lib/scrubyt/utils/simple_example_lookup.rb', line 17

def self.find_node_from_text(doc, text, next_link=false, index = 0)
  text.gsub!('»', '&#187;')
  #Process immediate attribute extraction (like "go to google.com/@href")
  if text =~ /.+\/@.+$/
    text = text.scan(/^(.+?)\/@.+$/)[0][0]
  elsif text =~ /.+\[\d+\]$/
    res = text.scan(/(.+)\[(\d+)\]$/)
    text = res[0][0]
    index = res[0][1].to_i
  elsif text =~ /.+\[.+\]$/
    final_element_name = text.scan(/^(.+?)\[/)[0][0]
    text = text.scan(/\[(.+?)\]/)[0][0]
  end
  if final_element_name
    text = Regexp.escape(text) if text.is_a? String
    result = SharedUtils.traverse_for_match(doc,/#{text}/)[index]
    result = XPathUtils.traverse_up_until_name(result,final_element_name)
  else
    text = Regexp.escape(text) if text.is_a? String
    result = SharedUtils.traverse_for_match(doc,/^#{text}$/)[index]
  end
end