Module: Stamina::Induction::Commons

Included in:
RPNI, RedBlue
Defined in:
lib/stamina/induction/commons.rb

Overview

Defines common utilities used by rpni and redblue. About acronyms:

  • pta stands for Prefix Tree Acceptor

  • ufds stands for Union-Find Data Structure

Methods pta2ufds, sample2pta and sample2ufds are simply conversion methods used when the induction algorithm starts (executed on a sample, it first built a pta then convert it to a union find). Method ufds2pta is used when the algorithm ends, to convert refined union find to a dfa.

The merge_user_data method is probably the most important as it actually computes the merging of two states and build information about merging for determinization.

Instance Method Summary collapse

Instance Method Details

#merge_user_data(d1, d2, determinization) ⇒ Object

Merges two user data hashes d1 and d2 according to rules defined below. Also fills a determinization array with pairs of state indices that are reached from d1 and d2 through the same symbol and should be merged for determinization. This method does NOT ensure that those pairs correspond to distinguish states according to the union find. In other words state indices in these pairs do not necessarily corespond to master states (see UnionFind for this term).

Returns the resulting data if the merge is successful (does not lead to merging an error state with an accepting one), nil otherwise.

The merging procedure for the different hash keys is as follows:



146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
# File 'lib/stamina/induction/commons.rb', line 146

def merge_user_data(d1, d2, determinization)
  # we compute flags first
  new_data = {:initial => d1[:initial] || d2[:initial], 
              :accepting => d1[:accepting] || d2[:accepting],
              :error => d1[:error] || d2[:error],
              :master => d1[:master] < d2[:master] ? d1[:master] : d2[:master]}
          
  # merge failure if accepting and error states are merged
  return nil if new_data[:accepting] and new_data[:error]
          
  # we recompute the delta function of the resulting state
  # keeping merging for determinization as pairs in _determinization_
  new_data[:delta] = d1[:delta].merge(d2[:delta]) do |symbol, t1, t2|
    determinization << [t1, t2]
    t1 < t2 ? t1 : t2
  end
  
  # returns merged data
  new_data
end

#pta2ufds(pta) ⇒ Object

Factors and returns a UnionFind data structure from a PTA, keeping natural order of its states for union-find elements. The resulting UnionFind contains a Hash as mergeable user data, presenting the following keys:

  • :initial, :accepting and :error flags of each state

  • :master indicating the index of the state in the PTA

  • :delta a delta function through a Hash => state_index

In this version, other user data attached to PTA states is lost during the conversion.



30
31
32
33
34
35
36
37
38
39
40
41
# File 'lib/stamina/induction/commons.rb', line 30

def pta2ufds(pta)
  Stamina::Induction::UnionFind.new(pta.state_count) do |i|
    state = pta.ith_state(i)
    data = {:initial => state.initial?,
            :accepting => state.accepting?,
            :error => state.error?,
            :master => i,
            :delta => {}}
    state.out_edges.each {|edge| data[:delta][edge.symbol] = edge.target.index}
    data
  end
end

#sample2pta(sample) ⇒ Object

Converts a Sample to an (augmented) prefix tree acceptor. This method ensures that the states of the PTA are in lexical order, according to the <=> operator defined on symbols. States reached by negative strings are tagged as non accepting and error.



49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'lib/stamina/induction/commons.rb', line 49

def sample2pta(sample)
  Automaton.new do |pta|
    initial_state = add_state(:initial => true, :accepting => false)

    # Fill the PTA with each string
    sample.each do |str|
      # split string using the dfa
      parsed, reached, remaining = pta.dfa_split(str, initial_state)

      # remaining symbols are not empty -> build the PTA
      unless remaining.empty?
        remaining.each do |symbol|
          newone = pta.add_state(:initial => false, :accepting => false, :error => false)
          pta.connect(reached, newone, symbol)
          reached = newone
        end
      end

      # flag state      
      str.positive? ? reached.accepting! : reached.error!
      
      # check consistency, should not arrive as Sample does not allow
      # inconsistencies. Should appear only if _sample_ is not a Sample
      # instance but some other enumerable.
      raise(InconsistencyError, "Inconsistent sample on #{str}", caller)\
        if (reached.error? and reached.accepting?)
    end

    # Reindex states by applying BFS
    to_index, index = [initial_state], 0
    until to_index.empty?
      state = to_index.shift
      state[:__index__] = index
      state.out_edges.sort{|e,f| e.symbol<=>f.symbol}.each {|e| to_index << e.target} 
      index += 1 
    end
    # Force the automaton to reindex
    pta.order_states{|s0,s1| s0[:__index__]<=>s1[:__index__]}
    # Remove marks
    pta.states.each{|s| s.remove_mark(:__index__)}
  end
end

#sample2ufds(sample) ⇒ Object

Converts a Sample instance to a ‘ready to refine’ union find data structure. This method is simply a shortcut for pta2ufds(sample2pta(sample)).



96
97
98
# File 'lib/stamina/induction/commons.rb', line 96

def sample2ufds(sample)
  pta2ufds(sample2pta(sample))
end

#ufds2dfa(ufds) ⇒ Object

Computes the quotient automaton from a refined UnionFind data structure.

In this version, only accepting and initial flags are taken into account when creating quotient automaton states. Other user data is lost during the conversion.



107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# File 'lib/stamina/induction/commons.rb', line 107

def ufds2dfa(ufds)
  Automaton.new(false) do |fa|
    mergeable_datas = ufds.mergeable_datas
    mergeable_datas.each do |data|
      state_data = data.reject {|key,value| [:master, :count, :delta].include?(key)}
      state_data[:name] = data[:master].to_s
      state_data[:error] = false
      fa.add_state(state_data)
    end
    mergeable_datas.each do |data|
      source = fa.get_state(data[:master].to_s)
      data[:delta].each_pair do |symbol, target|
        target = fa.get_state(ufds.find(target).to_s)
        fa.connect(source, target, symbol)
      end
    end
  end
end