Class: Stamina::Induction::RPNI

Inherits:
Object
  • Object
show all
Includes:
Commons
Defined in:
lib/stamina-induction/stamina/induction/rpni.rb

Overview

Implementation of the standard Regular Positive and Negative Induction (RPNI) algorithm. From a given sample, containing positive and negative strings, RPNI computes the smallest deterministic automaton compatible with the sample.

See J. Oncina and P. Garcia, Infering Regular Languages in Polynomial Update Time, In N. Perez de la Blanca, A. Sanfeliu and E. Vidal, editors, Pattern Recognition and Image Analysis, volume 1 of Series in Machines Perception and Artificial Intelligence, pages 49-61, World Scientific, 1992.

Example:

# sample typically comes from an ADL file
sample = Stamina::ADL.parse_sample_file('sample.adl')

# let RPNI build the smallest dfa
dfa = Stamina::Induction::RPNI.execute(sample, {:verbose => true})

Remarks:

  • Constructor and instance methods of this class are public but not intended to be used directly. They are left public for testing purposes only.

  • This class intensively uses the Stamina::Induction::UnionFind class and methods defined in the Stamina::Induction::Commons module which are worth reading to understand the algorithm implementation.

Constant Summary

Constants included from Commons

Commons::DEFAULT_OPTIONS

Instance Attribute Summary collapse

Attributes included from Commons

#options

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Commons

#info, #merge_user_data, #pta2ufds, #sample2pta, #sample2ufds, #ufds2dfa, #verbose?, #verbose_io

Constructor Details

#initialize(options = {}) ⇒ RPNI

Creates an algorithm instance with given options.

Raises:

  • (ArgumentError)


35
36
37
38
# File 'lib/stamina-induction/stamina/induction/rpni.rb', line 35

def initialize(options={})
  raise ArgumentError, "Invalid options #{options.inspect}" unless options.is_a?(Hash)
  @options = DEFAULT_OPTIONS.merge(options)
end

Instance Attribute Details

#ufdsObject (readonly)

Union-find data structure used internally



32
33
34
# File 'lib/stamina-induction/stamina/induction/rpni.rb', line 32

def ufds
  @ufds
end

Class Method Details

.execute(sample, options = {}) ⇒ Object

Build the smallest DFA compatible with the sample given as input.

Options (the options hash):

  • :verbose can be set to true to trace algorithm execution on standard output.

Preconditions:

  • The sample is consistent (does not contains the same string both labeled as positive and negative) and contains at least one string.

Postconditions:

  • The returned DFA is the smallest DFA that correctly labels the learning sample given as input.



179
180
181
# File 'lib/stamina-induction/stamina/induction/rpni.rb', line 179

def self.execute(sample, options={})
  RPNI.new(options).execute(sample)
end

Instance Method Details

#execute(sample) ⇒ Object

Build the smallest DFA compatible with the sample given as input.

Preconditions:

  • The sample is consistent (does not contains the same string both labeled as positive and negative) and contains at least one string.

Postconditions:

  • The returned DFA is the smallest DFA that correctly labels the learning sample given as input.

Remarks:

  • This instance version of RPNI.execute is not intended to be used directly and is mainly provided for testing purposes. Please use the class variant of this method if possible.



155
156
157
158
159
160
161
162
163
# File 'lib/stamina-induction/stamina/induction/rpni.rb', line 155

def execute(sample)
  # create union-find
  info("Creating PTA and UnionFind structure")
  ufds = sample2ufds(sample)
  # refine it
  ufds = main(ufds)
  # compute and return quotient automaton
  ufds2dfa(ufds)
end

#main(ufds) ⇒ Object

Main method of the algorithm. Refines the union find passed as first argument by merging well chosen state pairs. Returns the refined union find.

Preconditions:

  • The union find ufds is correctly initialized (contains :initial, :accepting, and :error boolean flags as well as a :delta sub hash)

Postconditions:

  • The union find has been refined. It encodes a quotient automaton (of the PTA it comes from) such that all positive and negative strings of the underlying sample are correctly classified by it.



117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
# File 'lib/stamina-induction/stamina/induction/rpni.rb', line 117

def main(ufds)
  @ufds = ufds
  info("Starting RPNI (#{@ufds.size} states)")
  # First loop, iterating all PTA states
  (1...@ufds.size).each do |i|
    # we ignore those that have been previously merged
    next if @ufds.slave?(i)
    # second loop: states of lower rank, with ignore
    (0...i).each do |j|
      next if @ufds.slave?(j)
      # try to merge this pair, including determinization
      # simply break the loop if it works!
      success = successfull_merge_or_nothing(i,j)
      if success
        info("#{i} and #{j} successfully merged")
        break
      end
    end # j loop
  end # i loop
  @ufds
end

#merge_and_determinize(i, j) ⇒ Object

Merges a state of rank j with a state of lower rank i. This merge method includes merging for determinization.

Preconditions:

  • States denoted by i and j are expected leader states (non merged ones)

  • States denoted by i and j are expected to be different

Postconditions:

  • Union find is refined, states i and j having been merged, as well as all state pairs that need to be merged to ensure the deterministic property of the quotient automaton.

  • If the resulting quotient automaton is consistent with the negative sample, this method returns true and the refined union-find correctly encodes the quotient automaton. Otherwise, the method returns false and the union-find information must be considered inaccurate.



57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# File 'lib/stamina-induction/stamina/induction/rpni.rb', line 57

def merge_and_determinize(i, j)
  # Make the union (keep additional merges to be performed in determinization)
  # and recompute the user data attached to the new state group (new_data)
  determinization = []
  @ufds.union(i, j) do |d1, d2|
    new_data = merge_user_data(d1, d2, determinization)
    return false unless new_data
    new_data
  end

  # Merge for determinization
  determinization.each do |pair|
    # we take the leader states of the pair to merge
    pair = pair.collect{|i| @ufds.find(i)}
    # do nothing if already the same leader state
    next if pair[0]==pair[1]
    # otherwise recurse or fail
    return false unless merge_and_determinize(pair[0], pair[1])
  end

  # Everything seems ok!
  true
end

#successfull_merge_or_nothing(i, j) ⇒ Object

Makes a complete merge (including determinization), or simply do nothing if it leads accepting a negative string.

Preconditions:

  • States denoted by i and j are expected leader states (non merged ones)

  • States denoted by i and j are expected to be different

Postconditions:

  • Union find is refined, states i and j having been merged, as well as all state pairs that need to be merged to ensure the deterministic property of the quotient automaton.

  • If the resulting quotient automaton is consistent with the negative sample, this method returns true and the refined union-find correctly encodes the quotient automaton. Otherwise, the union find has not been changed.



97
98
99
100
101
102
# File 'lib/stamina-induction/stamina/induction/rpni.rb', line 97

def successfull_merge_or_nothing(i,j)
  # try a merge and determinize inside a transaction on the ufds
  @ufds.transactional do
    merge_and_determinize(i, j)
  end
end