Class: Ariel::Node::Structure
- Inherits:
-
Ariel::Node
- Object
- Ariel::Node
- Ariel::Node::Structure
- Defined in:
- lib/ariel/node/structure.rb
Overview
Implements a Node object used to represent the structure of the document tree. Each node stores start and end rules to extract the desired content from its parent node. Could be viewed as a rule-storing object.
Instance Attribute Summary collapse
-
#node_type ⇒ Object
Returns the value of attribute node_type.
-
#ruleset ⇒ Object
Returns the value of attribute ruleset.
Attributes inherited from Ariel::Node
#children, #node_name, #parent
Instance Method Summary collapse
-
#apply_extraction_tree_on(root_node, extract_labels = false) ⇒ Object
Applies the extraction rules stored in the current Node::Structure and all its descendant children.
-
#extend_structure {|_self| ... } ⇒ Object
Used to extend an already created Node.
-
#extract_from(node) ⇒ Object
Given a Node to apply it’s rules to, this function will create a new node and add it as a child of the given node.
-
#initialize(name = :root, type = :not_list) {|_self| ... } ⇒ Structure
constructor
A new instance of Structure.
-
#item(name, &block) ⇒ Object
(also: #list)
Use when defining any object that occurs once.
-
#list_item(name, &block) ⇒ Object
See the docs for #item for a discussion of when to use #item and when to use #list_item.
Methods inherited from Ariel::Node
#add_child, #each_descendant, #inspect
Constructor Details
#initialize(name = :root, type = :not_list) {|_self| ... } ⇒ Structure
Returns a new instance of Structure.
11 12 13 14 15 |
# File 'lib/ariel/node/structure.rb', line 11 def initialize(name=:root, type=:not_list, &block) super(name) @node_type=type yield self if block_given? end |
Instance Attribute Details
#node_type ⇒ Object
Returns the value of attribute node_type.
9 10 11 |
# File 'lib/ariel/node/structure.rb', line 9 def node_type @node_type end |
#ruleset ⇒ Object
Returns the value of attribute ruleset.
9 10 11 |
# File 'lib/ariel/node/structure.rb', line 9 def ruleset @ruleset end |
Instance Method Details
#apply_extraction_tree_on(root_node, extract_labels = false) ⇒ Object
Applies the extraction rules stored in the current Node::Structure and all its descendant children.
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
# File 'lib/ariel/node/structure.rb', line 49 def apply_extraction_tree_on(root_node, extract_labels=false) extraction_queue = [root_node] until extraction_queue.empty? do new_parent = extraction_queue.shift new_parent.structure_node.children.values.each do |child| if extract_labels extractions=LabelUtils.extract_labeled_region(child, new_parent) else extractions=child.extract_from(new_parent) end extractions.each {|extracted_node| extraction_queue.push extracted_node} end end return root_node end |
#extend_structure {|_self| ... } ⇒ Object
Used to extend an already created Node. e.g.
node.extend_structure do |r|
r.item :new_field1
r.item :new_field2
end
22 23 24 |
# File 'lib/ariel/node/structure.rb', line 22 def extend_structure(&block) yield self if block_given? end |
#extract_from(node) ⇒ Object
Given a Node to apply it’s rules to, this function will create a new node and add it as a child of the given node. It returns an array of the items extracted by the rule
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
# File 'lib/ariel/node/structure.rb', line 29 def extract_from(node) extractions=[] i=0 return extractions if @ruleset.nil? #no extractions if no rule has been learnt @ruleset.apply_to(node.tokenstream) do |newstream| if self.node_type==:list_item new_node_name=i i+=1 else new_node_name=@node_name end extracted_node = Node::Extracted.new(new_node_name, newstream, self) node.add_child extracted_node extractions << extracted_node end return extractions end |
#item(name, &block) ⇒ Object Also known as: list
Use when defining any object that occurs once. #list is a synonym, but it’s recommended you use it when defining a container for list_items. The children of a list_item are just items. e.g. <tt>structure = Ariel::Node::Structure.new do |r|
r.list :comments do |c| # r.item :comments would be equivalent, but less readable
c.list_item :comment do |c|
c.item :author # Now these are just normal items, as they are extracted once from their parent
c.item :date
c.item :body
end
end
end
77 78 79 |
# File 'lib/ariel/node/structure.rb', line 77 def item(name, &block) self.add_child(Node::Structure.new(name, &block)) end |