Class: Scrubyt::SimpleExampleLookup

Inherits:
Object
  • Object
show all
Defined in:
lib/scrubyt/utils/simple_example_lookup.rb

Overview

Lookup of simple examples

There are two types of string examples in scRUBYt! right now: the simple example and the compound example.

This class is responsible for finding elements matched by simple examples. In the futre probably more sophisticated matching algorithms will be added (e.g. match the n-th which matches the text, or element that matches the text but also contains a specific attribute etc.)

Class Method Summary collapse

Class Method Details

.find_node_from_text(doc, text, next_link = false, index = 0) ⇒ Object

From the example text defined by the user, find the lowest possible node which contains the text ‘text’. The text can be also a mixed content text, e.g.

<a>Bon nuit, monsieur!</a>

In this case, <a>‘s text is considered to be “Bon nuit, monsieur”



17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# File 'lib/scrubyt/utils/simple_example_lookup.rb', line 17

def self.find_node_from_text(doc, text, next_link=false, index = 0)
  text.gsub!('»', '&#187;')
  #Process immediate attribute extraction (like "go to google.com/@href")
  if text =~ /.+\/@.+$/
    text = text.scan(/^(.+?)\/@.+$/)[0][0]
  elsif text =~ /.+\[\d+\]$/
    res = text.scan(/(.+)\[(\d+)\]$/)
    text = res[0][0]
    index = res[0][1].to_i
  elsif text =~ /.+\[.+\]$/
    final_element_name = text.scan(/^(.+?)\[/)[0][0]
    text = text.scan(/\[(.+?)\]/)[0][0]
  end
  if final_element_name
    text = Regexp.escape(text) if text.is_a? String
    result = SharedUtils.traverse_for_match(doc,/#{text}/)[index]
    result = XPathUtils.traverse_up_until_name(result,final_element_name)
  else
    text = Regexp.escape(text) if text.is_a? String
    result = SharedUtils.traverse_for_match(doc,/^#{text}$/)[index]
  end
end