Class: AhoC::Trie
- Inherits:
-
Object
- Object
- AhoC::Trie
- Defined in:
- lib/ahocorasick.rb
Overview
Aho-Corasick Trie Class
Description
A class for constructing and accessing an Aho-Corasick trie data structure. Any pattern that supports the “each” method in ruby can be added to the Trie. (e.g. strings or arrays)
Example Usage:
Building a Aho-Corasick Trie
myTrie = AhoC::Trie.new myTrie.add(“he”) myTrie.add(“she”) myTrie.add(“hers”) myTrie.add(“his”) myTrie.build
Looking up a string
myTrie.lookup(“ushers”)
Output from Lookup
- “she”, “he”, “hers”
Instance Method Summary collapse
-
#add(pattern) ⇒ Object
Add a pattern to the Trie.
-
#build ⇒ Object
Sets the failure transitions and output for each node.
-
#initialize(*arg) ⇒ Trie
constructor
Creates an empty AhoC Trie with only a root node.
-
#lookup(target, *arg) ⇒ Object
Finds all occurrences of patterns in the Trie contained in target.
Constructor Details
#initialize(*arg) ⇒ Trie
Creates an empty AhoC Trie with only a root node
Accepts an optional argument (either :DFA or :NFA) indicating the type of automaton to build. If no argument is passed an NFA will be built.
55 56 57 58 59 60 61 62 63 64 65 |
# File 'lib/ahocorasick.rb', line 55 def initialize *arg @root = Node.new if !arg[0] || arg[0] == :NFA @type = :NFA elsif arg[0] == :DFA @type = :DFA else raise "Only :DFA or :NFA accepted as arguments" end end |
Instance Method Details
#add(pattern) ⇒ Object
Add a pattern to the Trie.
Accepts an optional block that can be used to modify the node output.
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
# File 'lib/ahocorasick.rb', line 71 def add pattern node = @root # If this is a string process each character if String(pattern) == pattern pattern.each_char do |char| if node.goto(char) == nil node = node.add(char) else node = node.goto(char) end end else # Otherwise, pattern should support "each" method. for item in pattern if node.goto(item) == nil node = node.add(item) else node = node.goto(item) end end end if block_given? node.output = yield pattern else node.output = [pattern] end end |
#build ⇒ Object
Sets the failure transitions and output for each node. Call this method once all the patterns have been added to the Trie.
Accepts an optional block that can be used to modify the output constructed from the node and its failure nodes.
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
# File 'lib/ahocorasick.rb', line 105 def build fifo_q = Array.new # Set the failures for the nodes coming out of the root node. @root.get.each_pair do |item, node| node.failure = @root fifo_q.push node end # Set the failures in breadth-first search order # using a FIFO queue. A failure identifies the deepest node # that is a proper suffix of the current node. while !fifo_q.empty? p_node = fifo_q.shift if p_node.get p_node.get.each_pair do |item, node| # Push the current node onto the queue, so any child # nodes can be processed later. fifo_q.push node f_node = p_node.failure # Follow the failures until we find a goto transition # or arrive back at the root node while f_node.goto(item) == nil and !f_node.eql? @root f_node = f_node.failure end if f_node.eql? @root and f_node.goto(item) == nil node.failure = @root else node.failure = f_node.goto(item) if block_given? node.output = yield node.output, (node.failure).output else if node.output && (node.failure).output node.output = node.output + (node.failure).output elsif (node.failure).output node.output = (node.failure).output end end end end end end build_dfa if @type == :DFA end |
#lookup(target, *arg) ⇒ Object
Finds all occurrences of patterns in the Trie contained in target. Outputs an array of patterns contained within the target.
Accepts an optional argument to override the type of the output (e.g. String.new) Accepts an optional block that can modify how the output is built from each node.
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
# File 'lib/ahocorasick.rb', line 160 def lookup target, *arg output = arg[0] ? arg[0] : Array.new node = @root # If this is a string process each character if String(target) == target target.each_char do |char| # Follow the failures until a goto transition is found # or we return to the root node. while(!node.goto(char) and !node.eql? @root) node = node.failure end # If there is a goto transition follow it; otherwise, # we can assume we are at the root node. if node.goto(char) node = node.goto(char) if node.output if block_given? output = yield output, node.output else output = output + node.output end end end end else # Otherwise, target should support "each" method. for item in target # Follow the failures until a goto transition is found # or we return to the root node. while(!node.goto(item) and !node.eql? @root) node = node.failure end # If there is a goto transition follow it; otherwise, # we can assume we are at the root node. if node.goto(item) node = node.goto(item) if node.output if block_given? output = yield output, node.output else output = output + node.output end end end end end return output end |