Module: Stamina::Induction::Commons
Overview
Defines common utilities used by rpni and redblue. About acronyms:
-
pta stands for Prefix Tree Acceptor
-
ufds stands for Union-Find Data Structure
Methods pta2ufds, sample2pta and sample2ufds are simply conversion methods used when the induction algorithm starts (executed on a sample, it first built a pta then convert it to a union find). Method ufds2pta is used when the algorithm ends, to convert refined union find to a dfa.
The merge_user_data method is probably the most important as it actually computes the merging of two states and build information about merging for determinization.
Instance Method Summary collapse
-
#merge_user_data(d1, d2, determinization) ⇒ Object
Merges two user data hashes d1 and d2 according to rules defined below.
-
#pta2ufds(pta) ⇒ Object
Factors and returns a UnionFind data structure from a PTA, keeping natural order of its states for union-find elements.
-
#sample2pta(sample) ⇒ Object
Converts a Sample to an (augmented) prefix tree acceptor.
-
#sample2ufds(sample) ⇒ Object
Converts a Sample instance to a ‘ready to refine’ union find data structure.
-
#ufds2dfa(ufds) ⇒ Object
Computes the quotient automaton from a refined UnionFind data structure.
Instance Method Details
#merge_user_data(d1, d2, determinization) ⇒ Object
Merges two user data hashes d1 and d2 according to rules defined below. Also fills a determinization array with pairs of state indices that are reached from d1 and d2 through the same symbol and should be merged for determinization. This method does NOT ensure that those pairs correspond to distinguish states according to the union find. In other words state indices in these pairs do not necessarily corespond to master states (see UnionFind for this term).
Returns the resulting data if the merge is successful (does not lead to merging an error state with an accepting one), nil otherwise.
The merging procedure for the different hash keys is as follows:
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 |
# File 'lib/stamina/induction/commons.rb', line 146 def merge_user_data(d1, d2, determinization) # we compute flags first new_data = {:initial => d1[:initial] || d2[:initial], :accepting => d1[:accepting] || d2[:accepting], :error => d1[:error] || d2[:error], :master => d1[:master] < d2[:master] ? d1[:master] : d2[:master]} # merge failure if accepting and error states are merged return nil if new_data[:accepting] and new_data[:error] # we recompute the delta function of the resulting state # keeping merging for determinization as pairs in _determinization_ new_data[:delta] = d1[:delta].merge(d2[:delta]) do |symbol, t1, t2| determinization << [t1, t2] t1 < t2 ? t1 : t2 end # returns merged data new_data end |
#pta2ufds(pta) ⇒ Object
Factors and returns a UnionFind data structure from a PTA, keeping natural order of its states for union-find elements. The resulting UnionFind contains a Hash as mergeable user data, presenting the following keys:
-
:initial, :accepting and :error flags of each state
-
:master indicating the index of the state in the PTA
-
:delta a delta function through a Hash => state_index
In this version, other user data attached to PTA states is lost during the conversion.
30 31 32 33 34 35 36 37 38 39 40 41 |
# File 'lib/stamina/induction/commons.rb', line 30 def pta2ufds(pta) Stamina::Induction::UnionFind.new(pta.state_count) do |i| state = pta.ith_state(i) data = {:initial => state.initial?, :accepting => state.accepting?, :error => state.error?, :master => i, :delta => {}} state.out_edges.each {|edge| data[:delta][edge.symbol] = edge.target.index} data end end |
#sample2pta(sample) ⇒ Object
Converts a Sample to an (augmented) prefix tree acceptor. This method ensures that the states of the PTA are in lexical order, according to the <=>
operator defined on symbols. States reached by negative strings are tagged as non accepting and error.
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
# File 'lib/stamina/induction/commons.rb', line 49 def sample2pta(sample) Automaton.new do |pta| initial_state = add_state(:initial => true, :accepting => false) # Fill the PTA with each string sample.each do |str| # split string using the dfa parsed, reached, remaining = pta.dfa_split(str, initial_state) # remaining symbols are not empty -> build the PTA unless remaining.empty? remaining.each do |symbol| newone = pta.add_state(:initial => false, :accepting => false, :error => false) pta.connect(reached, newone, symbol) reached = newone end end # flag state str.positive? ? reached.accepting! : reached.error! # check consistency, should not arrive as Sample does not allow # inconsistencies. Should appear only if _sample_ is not a Sample # instance but some other enumerable. raise(InconsistencyError, "Inconsistent sample on #{str}", caller)\ if (reached.error? and reached.accepting?) end # Reindex states by applying BFS to_index, index = [initial_state], 0 until to_index.empty? state = to_index.shift state[:__index__] = index state.out_edges.sort{|e,f| e.symbol<=>f.symbol}.each {|e| to_index << e.target} index += 1 end # Force the automaton to reindex pta.order_states{|s0,s1| s0[:__index__]<=>s1[:__index__]} # Remove marks pta.states.each{|s| s.remove_mark(:__index__)} end end |
#sample2ufds(sample) ⇒ Object
Converts a Sample instance to a ‘ready to refine’ union find data structure. This method is simply a shortcut for pta2ufds(sample2pta(sample))
.
96 97 98 |
# File 'lib/stamina/induction/commons.rb', line 96 def sample2ufds(sample) pta2ufds(sample2pta(sample)) end |
#ufds2dfa(ufds) ⇒ Object
Computes the quotient automaton from a refined UnionFind data structure.
In this version, only accepting and initial flags are taken into account when creating quotient automaton states. Other user data is lost during the conversion.
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
# File 'lib/stamina/induction/commons.rb', line 107 def ufds2dfa(ufds) Automaton.new(false) do |fa| mergeable_datas = ufds.mergeable_datas mergeable_datas.each do |data| state_data = data.reject {|key,value| [:master, :count, :delta].include?(key)} state_data[:name] = data[:master].to_s state_data[:error] = false fa.add_state(state_data) end mergeable_datas.each do |data| source = fa.get_state(data[:master].to_s) data[:delta].each_pair do |symbol, target| target = fa.get_state(ufds.find(target).to_s) fa.connect(source, target, symbol) end end end end |