Class: Ariel::Node::Extracted

Inherits:
Ariel::Node show all
Defined in:
lib/ariel/node/extracted.rb

Overview

Each Node::Extracted has a name, a TokenStream and a structure which points to the relevant Node::Structure. Skip straight to #search, #/ and #at for the query interface. This is strongly recommended over using the built in method accessors (a method isn’t defined if a given field isn’t extracted, so you’re going to have to catch a lot of potential errors).

Instance Attribute Summary collapse

Attributes inherited from Ariel::Node

#children, #node_name, #parent

Instance Method Summary collapse

Methods inherited from Ariel::Node

#add_child, #each_descendant

Constructor Details

#initialize(name, tokenstream, structure) ⇒ Extracted

Returns a new instance of Extracted.



13
14
15
16
17
# File 'lib/ariel/node/extracted.rb', line 13

def initialize(name, tokenstream, structure)
  super(name)
  @structure_node=structure
  @tokenstream=tokenstream
end

Instance Attribute Details

#structure_nodeObject

Returns the value of attribute structure_node.



11
12
13
# File 'lib/ariel/node/extracted.rb', line 11

def structure_node
  @structure_node
end

#tokenstreamObject

Returns the value of attribute tokenstream.



11
12
13
# File 'lib/ariel/node/extracted.rb', line 11

def tokenstream
  @tokenstream
end

Instance Method Details

#[](*args) ⇒ Object

list children. Node::Extracted# will return an array, while Node::Extracted will not. This behaviour is the same as Ruby’s standard Array class.



29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# File 'lib/ariel/node/extracted.rb', line 29

def [](*args)
			dont_splat=false #determines whether to splat or not if there is only a single result
			args.collect! do |arg|
if arg.kind_of? Range
	arg=arg.to_a
	dont_splat=true
end
arg
			end
			args.flatten!
  dont_splat=true if args.size > 1
  result=@children.values_at(*args).compact
			if result.size==1 && dont_splat==true
return result
			else
return *result
			end
end

#at(search_string) ⇒ Object

Acts exactly like #search, but returns only the first match or nil if there are no matches.



79
80
81
# File 'lib/ariel/node/extracted.rb', line 79

def at(search_string)
  self.search(search_string).first
end

#extracted_textObject

Returns the text contained in the TokenStream.



20
21
22
# File 'lib/ariel/node/extracted.rb', line 20

def extracted_text
  tokenstream.text
end

#inspectObject



83
84
85
86
87
88
# File 'lib/ariel/node/extracted.rb', line 83

def inspect
	[super,
	"structure_node=#{self.structure_node.node_name.inspect};",
	"extracted_text=\"#{text=self.extracted_text; text.size > 100 ? text[0..100]+'...' : text}\";"
	].join ' '
end

#search(search_string) ⇒ Object Also known as: /

The preferred way of querying extracted information. If nothing was extracted, an empty array is returned. This is much safer than using Node::Extracted accessors. Consider if your code is reading doc.address.phone_number.area_code - this will raise an error if any one of these were not extracted. (doc/‘address/phone_number/area_code’) is preferred. Numbered list_items can be queried e.g. (doc/‘comment_list/2’), and basic globbing is supported: (doc/‘//title’).



55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# File 'lib/ariel/node/extracted.rb', line 55

def search(search_string)
  queue=search_string.split '/'
  current_term=queue.shift
  return [self] if current_term.nil? #If for some reason nothing is given in the search string
  matches=[]
  if current_term=='*'
new_matches=self.children.values
new_matches.sort! {|a, b| a.node_name <=> b.node_name} rescue nil #is this evil?
    matches.concat new_matches
  elsif current_term[/\d+/]==current_term
    matches << @children[current_term.to_i]
  else
    matches << @children[current_term.to_sym]
  end
  if queue.empty?
    return matches.flatten.compact
  else
    return matches.collect {|match| match.search(queue.join('/'))}.flatten.compact
  end
end