Class: AhoC::Trie

Inherits:
Object
  • Object
show all
Defined in:
lib/ahocorasick.rb

Overview

Aho-Corasick Trie Class

Description

A class for constructing and accessing an Aho-Corasick trie data structure. Any pattern that supports the “each” method in ruby can be added to the Trie. (e.g. strings or arrays)

Example Usage:

Building a Aho-Corasick Trie

myTrie = AhoC::Trie.new myTrie.add(“he”) myTrie.add(“she”) myTrie.add(“hers”) myTrie.add(“his”) myTrie.build

Looking up a string

myTrie.lookup(“ushers”)

Output from Lookup

“she”, “he”, “hers”

Instance Method Summary collapse

Constructor Details

#initialize(*arg) ⇒ Trie

Creates an empty AhoC Trie with only a root node

Accepts an optional argument (either :DFA or :NFA) indicating the type of automaton to build. If no argument is passed an NFA will be built.



55
56
57
58
59
60
61
62
63
64
65
# File 'lib/ahocorasick.rb', line 55

def initialize *arg
	@root = Node.new

	if !arg[0] || arg[0] == :NFA
		@type = :NFA	
	elsif arg[0] == :DFA
		@type = :DFA	
	else
		raise "Only :DFA or :NFA accepted as arguments"
	end
end

Instance Method Details

#add(pattern) ⇒ Object

Add a pattern to the Trie.

Accepts an optional block that can be used to modify the node output.



71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# File 'lib/ahocorasick.rb', line 71

def add pattern
	node = @root

	# If this is a string process each character
	if String(pattern) == pattern
		pattern.each_char do |char|
			if node.goto(char) == nil
				node = node.add(char)
			else
				node = node.goto(char)
			end
		end
	else # Otherwise, pattern should support "each" method.
		for item in pattern
			if node.goto(item) == nil
				node = node.add(item)
			else
				node = node.goto(item)
			end
		end
	end

	if block_given?
		node.output = yield pattern
	else
		node.output = [pattern]
	end
end

#buildObject

Sets the failure transitions and output for each node. Call this method once all the patterns have been added to the Trie.

Accepts an optional block that can be used to modify the output constructed from the node and its failure nodes.



105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
# File 'lib/ahocorasick.rb', line 105

def build
	fifo_q = Array.new
	
	# Set the failures for the nodes coming out of the root node.
	@root.get.each_pair do |item, node|
		node.failure = @root
		fifo_q.push node
	end

	# Set the failures in breadth-first search order
	# using a FIFO queue. A failure identifies the deepest node
	# that is a proper suffix of the current node. 
	while !fifo_q.empty?
		p_node = fifo_q.shift
		if p_node.get
			p_node.get.each_pair do |item, node|
				# Push the current node onto the queue, so any child
				# nodes can be processed later.
				fifo_q.push node 
			
				f_node = p_node.failure
				
				# Follow the failures until we find a goto transition
				# or arrive back at the root node
				while f_node.goto(item) == nil and !f_node.eql? @root
					f_node = f_node.failure
				end

				if f_node.eql? @root and f_node.goto(item) == nil
					node.failure = @root
				else
					node.failure = f_node.goto(item)
					if block_given?
						node.output = yield node.output, (node.failure).output
					else
						if node.output && (node.failure).output
							node.output = node.output + (node.failure).output
						elsif (node.failure).output
							node.output = (node.failure).output
						end
					end
				end
			end
		end
	end

	build_dfa if @type == :DFA

end

#lookup(target, *arg) ⇒ Object

Finds all occurrences of patterns in the Trie contained in target. Outputs an array of patterns contained within the target.

Accepts an optional argument to override the type of the output (e.g. String.new) Accepts an optional block that can modify how the output is built from each node.



160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
# File 'lib/ahocorasick.rb', line 160

def lookup target, *arg
	output = arg[0] ? arg[0] : Array.new
	node = @root

	# If this is a string process each character
	if String(target) == target
		target.each_char do |char|
			# Follow the failures until a goto transition is found
			# or we return to the root node.
			while(!node.goto(char) and !node.eql? @root)
				node = node.failure
			end

			# If there is a goto transition follow it; otherwise, 
			# we can assume we are at the root node.
			if node.goto(char)
				node = node.goto(char)

				if node.output		
					if block_given?
						output = yield output, node.output
					else
						output = output + node.output
					end
				end

			end
		end
	else # Otherwise, target should support "each" method.
		for item in target
			# Follow the failures until a goto transition is found
			# or we return to the root node.
			while(!node.goto(item) and !node.eql? @root)
				node = node.failure
			end

			# If there is a goto transition follow it; otherwise, 
			# we can assume we are at the root node.
			if node.goto(item)
				node = node.goto(item)

				if node.output		
					if block_given?
						output = yield output, node.output
					else
						output = output + node.output
					end
				end

			end
		end
	end

	return output
end