Class: SuffixTree
- Inherits:
-
Object
- Object
- SuffixTree
- Defined in:
- lib/suffix_tree.rb
Overview
Builds a suffix tree from one or more DataSource instances
Constant Summary collapse
- NO_SUFFIX_OFFSET =
-1
Instance Attribute Summary collapse
-
#location ⇒ Object
readonly
where we are in the implicit tree building process.
-
#nodeFactory ⇒ Object
readonly
Returns the value of attribute nodeFactory.
-
#root ⇒ Object
readonly
the root of the tree, and the terminal value (for making implicit trees explicit).
-
#rootDataSource ⇒ Object
readonly
first data source we use.
-
#startOffset ⇒ Object
readonly
when there are a sequence of data sources, treat them as one long one, this is where next source starts.
-
#suffixLinker ⇒ Object
readonly
keep track of which nodes need suffix links.
-
#terminalValue ⇒ Object
readonly
the root of the tree, and the terminal value (for making implicit trees explicit).
Instance Method Summary collapse
-
#addDataSource(dataSource) ⇒ Object
Add all values in a given dataSource.
-
#addValue(value, offset) ⇒ Object
Adding one value at a time, rootDataSource must be set for this to work.
-
#extend(value, offset) ⇒ Object
Extend a single suffix at the current location, returns true if there is another suffix to extend.
-
#finish ⇒ Object
Finish building the tree by adding a value that is not part of the data source.
-
#initialize(terminalValue = nil, configuration = nil, persister = nil) ⇒ SuffixTree
constructor
A new instance of SuffixTree.
-
#setDataSource(dataSource) ⇒ Object
Set the data source, but do not add any values from the data source.
Constructor Details
#initialize(terminalValue = nil, configuration = nil, persister = nil) ⇒ SuffixTree
Returns a new instance of SuffixTree.
28 29 30 31 32 33 34 35 36 37 38 39 |
# File 'lib/suffix_tree.rb', line 28 def initialize(terminalValue = nil, configuration = nil, persister = nil) @nextNodeId = 0 @nodeFactory = NodeFactory.new(nil, persister) @nodeFactory.setConfiguration(configuration) if (configuration != nil) @root = @nodeFactory.newRoot() @rootDataSource = nil @location = Location.new(@root) @startOffset = 0 @suffixOffset = 0 @suffixLinker = SuffixLinker.new @terminalValue = terminalValue end |
Instance Attribute Details
#location ⇒ Object (readonly)
where we are in the implicit tree building process
18 19 20 |
# File 'lib/suffix_tree.rb', line 18 def location @location end |
#nodeFactory ⇒ Object (readonly)
Returns the value of attribute nodeFactory.
20 21 22 |
# File 'lib/suffix_tree.rb', line 20 def nodeFactory @nodeFactory end |
#root ⇒ Object (readonly)
the root of the tree, and the terminal value (for making implicit trees explicit)
23 24 25 |
# File 'lib/suffix_tree.rb', line 23 def root @root end |
#rootDataSource ⇒ Object (readonly)
first data source we use
12 13 14 |
# File 'lib/suffix_tree.rb', line 12 def rootDataSource @rootDataSource end |
#startOffset ⇒ Object (readonly)
when there are a sequence of data sources, treat them as one long one, this is where next source starts
15 16 17 |
# File 'lib/suffix_tree.rb', line 15 def startOffset @startOffset end |
#suffixLinker ⇒ Object (readonly)
keep track of which nodes need suffix links
26 27 28 |
# File 'lib/suffix_tree.rb', line 26 def suffixLinker @suffixLinker end |
#terminalValue ⇒ Object (readonly)
the root of the tree, and the terminal value (for making implicit trees explicit)
23 24 25 |
# File 'lib/suffix_tree.rb', line 23 def terminalValue @terminalValue end |
Instance Method Details
#addDataSource(dataSource) ⇒ Object
Add all values in a given dataSource
54 55 56 57 58 59 60 61 62 63 64 65 |
# File 'lib/suffix_tree.rb', line 54 def addDataSource(dataSource) @suffixOffset = 0 self.setDataSource(dataSource) dataSource.each_with_index(@startOffset) do |value, offset| self.addValue(value, offset) end if (@terminalValue != nil) then @lastOffsetAdded += 1 self.addValue(@terminalValue, @lastOffsetAdded) end @startOffset = @lastOffsetAdded + 1 end |
#addValue(value, offset) ⇒ Object
Adding one value at a time, rootDataSource must be set for this to work
70 71 72 73 74 75 |
# File 'lib/suffix_tree.rb', line 70 def addValue(value, offset) while (extend(value, offset)) do @suffixLinker.update(@location) end @lastOffsetAdded = offset end |
#extend(value, offset) ⇒ Object
Extend a single suffix at the current location, returns true if there is another
suffix to extend.
Handles these cases:
On a node:
if there is a child starting with the extension value, traverse down that one value, return FALSE
if no child has the extension value, add a leaf,
if we are at root, return FALSE,
otherwise traverse to the next suffix and return TRUE
On an edge:
if next character has the value, traverse past it, return FALSE
if next character is not the value, split edge at that location, locate at the new node, and return TRUE
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/suffix_tree.rb', line 102 def extend(value,offset) if (@location.onNode) if (@location.node.children.has_key?(value)) then @location.traverseDownChildValue(value) return false # rule 3 else @nodeFactory.addLeaf(@location.node, value, offset) return @location.traverseToNextSuffix(@rootDataSource) # rule 1, traverse returns false when at root end elsif (@rootDataSource.valueAt(@location.incomingEdgeOffset) == value) then @location.traverseDownEdgeValue() return false # found value on edge, rule 3 else newNode = @nodeFactory.splitEdgeAt(@location.node, @location.incomingEdgeOffset) @suffixLinker.nodeNeedingSuffixLink(newNode) @location.jumpToNode(newNode) return true # rule 2 end end |
#finish ⇒ Object
Finish building the tree by adding a value that is not part of the data source
80 81 82 83 84 |
# File 'lib/suffix_tree.rb', line 80 def finish() if (@rootDataSource.has_terminator?) then self.addValue(@rootDataSource.terminator, @startOffset) end end |
#setDataSource(dataSource) ⇒ Object
Set the data source, but do not add any values from the data source
44 45 46 47 48 49 |
# File 'lib/suffix_tree.rb', line 44 def setDataSource(dataSource) if (@rootDataSource == nil) then @rootDataSource = dataSource end @nodeFactory.extendDataSource(dataSource, @startOffset) end |