Module: Stamina::Induction::Commons

Included in:
BlueFringe, RPNI
Defined in:
lib/stamina-induction/stamina/induction/commons.rb

Overview

Defines common utilities used by rpni and blue_fringe. About acronyms:

  • pta stands for Prefix Tree Acceptor

  • ufds stands for Union-Find Data Structure

Methods pta2ufds and sample2ufds are simply conversion methods used when the induction algorithm starts (executed on a sample, it first built a pta then convert it to a union find). Method ufds2dfa is used when the algorithm ends, to convert refined union find to a dfa.

The merge_user_data method is probably the most important as it actually computes the merging of two states and build information about merging for determinization.

Constant Summary collapse

DEFAULT_OPTIONS =
{
  :verbose    => false,
  :verbose_io => $stderr
}

Instance Attribute Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#optionsObject (readonly)

Additional options of the algorithm



25
26
27
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 25

def options
  @options
end

Instance Method Details

#info(msg) ⇒ Object

Display an information message (when verbose)



37
38
39
40
41
42
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 37

def info(msg)
  if verbose?
    verbose_io << msg << "\n"
    verbose_io.flush
  end
end

#merge_user_data(d1, d2, determinization) ⇒ Object

Merges two user data hashes d1 and d2 according to rules defined below. Also fills a determinization array with pairs of state indices that are reached from d1 and d2 through the same symbol and should be merged for determinization. This method does NOT ensure that those pairs correspond to distinguish states according to the union find. In other words state indices in these pairs do not necessarily corespond to master states (see UnionFind for this term).

Returns the resulting data if the merge is successful (does not lead to merging an error state with an accepting one), nil otherwise.

The merging procedure for the different hash keys is as follows:



132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 132

def merge_user_data(d1, d2, determinization)
  # we compute flags first
  new_data = {:initial => d1[:initial] || d2[:initial],
              :accepting => d1[:accepting] || d2[:accepting],
              :error => d1[:error] || d2[:error],
              :master => d1[:master] < d2[:master] ? d1[:master] : d2[:master]}

  # merge failure if accepting and error states are merged
  return nil if new_data[:accepting] and new_data[:error]

  # we recompute the delta function of the resulting state
  # keeping merging for determinization as pairs in _determinization_
  new_data[:delta] = d1[:delta].merge(d2[:delta]) do |symbol, t1, t2|
    determinization << [t1, t2]
    t1 < t2 ? t1 : t2
  end

  # returns merged data
  new_data
end

#pta2ufds(pta) ⇒ Object

Factors and returns a UnionFind data structure from a PTA, keeping natural order of its states for union-find elements. The resulting UnionFind contains a Hash as mergeable user data, presenting the following keys:

  • :initial, :accepting and :error flags of each state

  • :master indicating the index of the state in the PTA

  • :delta a delta function through a Hash => state_index

In this version, other user data attached to PTA states is lost during the conversion.



55
56
57
58
59
60
61
62
63
64
65
66
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 55

def pta2ufds(pta)
  Stamina::Induction::UnionFind.new(pta.state_count) do |i|
    state = pta.ith_state(i)
    data = {:initial => state.initial?,
            :accepting => state.accepting?,
            :error => state.error?,
            :master => i,
            :delta => {}}
    state.out_edges.each {|edge| data[:delta][edge.symbol] = edge.target.index}
    data
  end
end

#sample2pta(sample) ⇒ Object

Converts a Sample to an (augmented) prefix tree acceptor. This method ensures that the states of the PTA are in lexical order, according to the <=> operator defined on symbols. States reached by negative strings are tagged as non accepting and error.



74
75
76
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 74

def sample2pta(sample)
  sample.to_pta
end

#sample2ufds(sample) ⇒ Object

Converts a Sample instance to a ‘ready to refine’ union find data structure. This method is simply a shortcut for pta2ufds(sample2pta(sample)).



82
83
84
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 82

def sample2ufds(sample)
  pta2ufds(sample2pta(sample))
end

#ufds2dfa(ufds) ⇒ Object

Computes the quotient automaton from a refined UnionFind data structure.

In this version, only accepting and initial flags are taken into account when creating quotient automaton states. Other user data is lost during the conversion.



93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 93

def ufds2dfa(ufds)
  Automaton.new(false) do |fa|
    mergeable_datas = ufds.mergeable_datas
    mergeable_datas.each do |data|
      state_data = data.reject {|key,value| [:master, :count, :delta].include?(key)}
      state_data[:name] = data[:master].to_s
      state_data[:error] = false
      fa.add_state(state_data)
    end
    mergeable_datas.each do |data|
      source = fa.get_state(data[:master].to_s)
      data[:delta].each_pair do |symbol, target|
        target = fa.get_state(ufds.find(target).to_s)
        fa.connect(source, target, symbol)
      end
    end
  end
end

#verbose?Boolean

Is the verbose mode on ?

Returns:

  • (Boolean)


28
29
30
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 28

def verbose?
  @verbose ||= !!options[:verbose]
end

#verbose_ioObject



32
33
34
# File 'lib/stamina-induction/stamina/induction/commons.rb', line 32

def verbose_io
  @verbose_io ||= options[:verbose_io] || $stderr
end