Class: Scrubyt::XPathUtils

Inherits:
Object
  • Object
show all
Defined in:
lib/scrubyt/utils/xpathutils.rb

Overview

Various XPath utility functions

Class Method Summary collapse

Class Method Details

.find_image(doc, example, index = 0) ⇒ Object

Find an image based on the src of the example

parameters

doc - The containing document

example - The value of the src attribute of the img tag This is convenient, since if the users rigth-clicks an image and copies image location, this string will be copied to the clipboard and thus can be easily pasted as an examle

index - there might be more images with the same src on the page - most typically the user will need the 0th - but if this is not the case, there is the possibility to override this



96
97
98
99
100
101
102
103
# File 'lib/scrubyt/utils/xpathutils.rb', line 96

def self.find_image(doc, example, index=0)
  if example =~ /\.(jpg|png|gif|jpeg)(\[\d+\])$/
    res = example.scan(/(.+)\[(\d+)\]$/)
    example = res[0][0]
    index = res[0][1].to_i
  end
  (doc/"//img[@src='#{example}']")[index]
end

.find_nearest_node_with_attribute(node, attribute) ⇒ Object

Used when automatically looking up href attributes (for detail or next links) If the detail pattern did not extract a link, we first look up it’s children - and if we don’t find a link, traverse up



122
123
124
125
126
127
128
# File 'lib/scrubyt/utils/xpathutils.rb', line 122

def self.find_nearest_node_with_attribute(node, attribute)
  @node = nil
  return node if node.is_a? Hpricot::Elem and node[attribute]
  first_child_node_with_attribute(node, attribute)
  first_parent_node_with_attribute(node, attribute) if !@node
  @node
end

.generate_generalized_relative_XPath(elem, relative_root) ⇒ Object

Generate a generalized XPath (i.e. without indices) of the node, relatively to the given relative_root.

For example if the elem’s absolute XPath is /a/b/c, and the relative root’s Xpath is a/b, the result of the function will be /c.



77
78
79
80
# File 'lib/scrubyt/utils/xpathutils.rb', line 77

def self.generate_generalized_relative_XPath( elem,relative_root )
  return nil if (elem == relative_root)
  generate_XPath(elem, relative_root, false)
end

.generate_relative_XPath(elem, relative_root) ⇒ Object

Generate an XPath of the node with indices, relatively to the given relative_root.

For example if the elem’s absolute XPath is /a/b/c, and the relative root’s Xpath is a/b, the result of the function will be /c.



66
67
68
69
# File 'lib/scrubyt/utils/xpathutils.rb', line 66

def self.generate_relative_XPath( elem,relative_root )
  return nil if (elem == relative_root)
  generate_XPath(elem, relative_root, true)
end

.generate_relative_XPath_from_XPaths(parent_xpath, child_xpath) ⇒ Object

Generalre relative XPath from two XPaths: a parent one, (which points higher in the tree), and a child one. The result of the method is the relative XPath of the node pointed to by the second XPath to the node pointed to by the firs XPath.



134
135
136
137
138
139
140
141
142
143
# File 'lib/scrubyt/utils/xpathutils.rb', line 134

def self.generate_relative_XPath_from_XPaths(parent_xpath, child_xpath)
  original_child_xpath_parts = child_xpath.split('/').reject{|s|s==""}
  pairs = to_general_XPath(child_xpath).split('/').reject{|s|s==""}.zip to_general_XPath(parent_xpath).split('/').reject{|s|s==""}
  i = 0
  pairs.each_with_index do |pair,index|
    i = index
    break if pair[0] != pair[1]
  end
  "/" + original_child_xpath_parts[i..-1].join('/')
end

.generate_XPath(node, stopnode = nil, write_indices = false) ⇒ Object

Generate XPath for the given node

parameters

node - The node we are looking up the XPath for

stopnode - The Xpath generation is stopped and the XPath that was generated so far is returned if this node is reached.

write_indices - whether the index inside the parent shuold be added, as in html/body/table/tr/td



35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# File 'lib/scrubyt/utils/xpathutils.rb', line 35

def self.generate_XPath(node, stopnode=nil, write_indices=false)
  path = []
  indices = []
  found = false
  while !node.nil? && node.class != Hpricot::Doc do
    if node == stopnode
      found = true
      break
    end
    path.push node.name
    indices.push find_index(node) if write_indices
    node = node.parent
  end
  #This condition ensures that if there is a stopnode, and we did not found it along the way,
  #we return nil (since the stopnode is not contained in the path at all)
  return nil if stopnode != nil && !found
  result = ""
  if write_indices
    path.reverse.zip(indices.reverse).each { |node,index| result += "#{node}[#{index}]/" }
  else
    path.reverse.each{ |node| result += "#{node}/" }
  end
  "/" + result.chop
end

.lowest_common_ancestor(node1, node2) ⇒ Object

Find the LCA (Lowest Common Ancestor) of two nodes



10
11
12
13
14
15
16
17
18
19
20
21
# File 'lib/scrubyt/utils/xpathutils.rb', line 10

def self.lowest_common_ancestor(node1, node2)
  path1 = traverse_up(node1)
  path2 = traverse_up(node2)
  return node1.parent if path1 == path2

  closure = nil
  while (!path1.empty? && !path2.empty?)
 closure = path1.pop
 return closure.parent if (closure != path2.pop)
  end
  path1.size > path2.size ? path1.last.parent : path2.last.parent
end

.to_full_XPath(doc, xpath, generalize) ⇒ Object



145
146
147
148
149
# File 'lib/scrubyt/utils/xpathutils.rb', line 145

def self.to_full_XPath(doc, xpath, generalize)
  elem = doc/xpath
  elem = elem.map[0] if elem.is_a? Hpricot::Elements
  XPathUtils.generate_XPath(elem, nil, generalize)
end

.traverse_up_until_name(node, name) ⇒ Object

Used to find the parent of a node with the given name - for example find the <form> node which is the parent of the <input> node



108
109
110
111
112
113
114
115
116
# File 'lib/scrubyt/utils/xpathutils.rb', line 108

def self.traverse_up_until_name(node, name)
  while node.class != Hpricot::Doc do
    #raise "The element is nil! This probably means the widget with the specified name ('#{name}') does not exist" unless node
    return nil unless node
    break if node.name == name
    node = node.parent
  end
  node
end