Class: Ariel::Node::Extracted
- Inherits:
-
Ariel::Node
- Object
- Ariel::Node
- Ariel::Node::Extracted
- Defined in:
- lib/ariel/node/extracted.rb
Overview
Each Node::Extracted has a name, a TokenStream and a structure which points to the relevant Node::Structure. Skip straight to #search, #/ and #at for the query interface. This is strongly recommended over using the built in method accessors (a method isn’t defined if a given field isn’t extracted, so you’re going to have to catch a lot of potential errors).
Instance Attribute Summary collapse
-
#structure_node ⇒ Object
Returns the value of attribute structure_node.
-
#tokenstream ⇒ Object
Returns the value of attribute tokenstream.
Attributes inherited from Ariel::Node
#children, #node_name, #parent
Instance Method Summary collapse
-
#[](*args) ⇒ Object
list children.
-
#at(search_string) ⇒ Object
Acts exactly like #search, but returns only the first match or nil if there are no matches.
-
#extracted_text ⇒ Object
Returns the text contained in the TokenStream.
-
#initialize(name, tokenstream, structure) ⇒ Extracted
constructor
A new instance of Extracted.
- #inspect ⇒ Object
-
#search(search_string) ⇒ Object
(also: #/)
The preferred way of querying extracted information.
Methods inherited from Ariel::Node
Constructor Details
#initialize(name, tokenstream, structure) ⇒ Extracted
Returns a new instance of Extracted.
13 14 15 16 17 |
# File 'lib/ariel/node/extracted.rb', line 13 def initialize(name, tokenstream, structure) super(name) @structure_node=structure @tokenstream=tokenstream end |
Instance Attribute Details
#structure_node ⇒ Object
Returns the value of attribute structure_node.
11 12 13 |
# File 'lib/ariel/node/extracted.rb', line 11 def structure_node @structure_node end |
#tokenstream ⇒ Object
Returns the value of attribute tokenstream.
11 12 13 |
# File 'lib/ariel/node/extracted.rb', line 11 def tokenstream @tokenstream end |
Instance Method Details
#[](*args) ⇒ Object
list children. Node::Extracted# will return an array, while Node::Extracted will not. This behaviour is the same as Ruby’s standard Array class.
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
# File 'lib/ariel/node/extracted.rb', line 29 def [](*args) dont_splat=false #determines whether to splat or not if there is only a single result args.collect! do |arg| if arg.kind_of? Range arg=arg.to_a dont_splat=true end arg end args.flatten! dont_splat=true if args.size > 1 result=@children.values_at(*args).compact if result.size==1 && dont_splat==true return result else return *result end end |
#at(search_string) ⇒ Object
Acts exactly like #search, but returns only the first match or nil if there are no matches.
79 80 81 |
# File 'lib/ariel/node/extracted.rb', line 79 def at(search_string) self.search(search_string).first end |
#extracted_text ⇒ Object
Returns the text contained in the TokenStream.
20 21 22 |
# File 'lib/ariel/node/extracted.rb', line 20 def extracted_text tokenstream.text end |
#inspect ⇒ Object
83 84 85 86 87 88 |
# File 'lib/ariel/node/extracted.rb', line 83 def inspect [super, "structure_node=#{self.structure_node.node_name.inspect};", "extracted_text=\"#{text=self.extracted_text; text.size > 100 ? text[0..100]+'...' : text}\";" ].join ' ' end |
#search(search_string) ⇒ Object Also known as: /
The preferred way of querying extracted information. If nothing was extracted, an empty array is returned. This is much safer than using Node::Extracted accessors. Consider if your code is reading doc.address.phone_number.area_code - this will raise an error if any one of these were not extracted. (doc/‘address/phone_number/area_code’) is preferred. Numbered list_items can be queried e.g. (doc/‘comment_list/2’), and basic globbing is supported: (doc/‘//title’).
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/ariel/node/extracted.rb', line 55 def search(search_string) queue=search_string.split '/' current_term=queue.shift return [self] if current_term.nil? #If for some reason nothing is given in the search string matches=[] if current_term=='*' new_matches=self.children.values new_matches.sort! {|a, b| a.node_name <=> b.node_name} rescue nil #is this evil? matches.concat new_matches elsif current_term[/\d+/]==current_term matches << @children[current_term.to_i] else matches << @children[current_term.to_sym] end if queue.empty? return matches.flatten.compact else return matches.collect {|match| match.search(queue.join('/'))}.flatten.compact end end |