Class: Stamina::Induction::RPNI

Inherits:
Object
  • Object
show all
Includes:
Commons
Defined in:
lib/stamina/induction/rpni.rb

Overview

Implementation of the standard Regular Positive and Negative Induction (RPNI) algorithm. From a given sample, containing positive and negative strings, RPNI computes the smallest deterministic automaton compatible with the sample.

See J. Oncina and P. Garcia, Infering Regular Languages in Polynomial Update Time, In N. Perez de la Blanca, A. Sanfeliu and E. Vidal, editors, Pattern Recognition and Image Analysis, volume 1 of Series in Machines Perception and Artificial Intelligence, pages 49-61, World Scientific, 1992.

Example:

# sample typically comes from an ADL file
sample = Stamina::ADL.parse_sample_file('sample.adl')

# let RPNI build the smallest dfa
dfa = Stamina::Induction::RPNI.execute(sample, {:verbose => true})

Remarks:

  • Constructor and instance methods of this class are public but not intended to be used directly. They are left public for testing purposes only.

  • This class intensively uses the Stamina::Induction::UnionFind class and methods defined in the Stamina::Induction::Commons module which are worth reading to understand the algorithm implementation.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Commons

#merge_user_data, #pta2ufds, #sample2pta, #sample2ufds, #ufds2dfa

Constructor Details

#initialize(options = {}) ⇒ RPNI

Creates an algorithm instance with given options.



38
39
40
# File 'lib/stamina/induction/rpni.rb', line 38

def initialize(options={})
  @options = options
end

Instance Attribute Details

#optionsObject (readonly)

Additional options of the algorithm



35
36
37
# File 'lib/stamina/induction/rpni.rb', line 35

def options
  @options
end

#ufdsObject (readonly)

Union-find data structure used internally



32
33
34
# File 'lib/stamina/induction/rpni.rb', line 32

def ufds
  @ufds
end

Class Method Details

.execute(sample, options = {}) ⇒ Object

Build the smallest DFA compatible with the sample given as input.

Options (the options hash):

  • :verbose can be set to true to trace algorithm execution on standard output.

Preconditions:

  • The sample is consistent (does not contains the same string both labeled as positive and negative) and contains at least one string.

Postconditions:

  • The returned DFA is the smallest DFA that correctly labels the learning sample given as input.



181
182
183
# File 'lib/stamina/induction/rpni.rb', line 181

def self.execute(sample, options={})
  RPNI.new(options).execute(sample)
end

Instance Method Details

#execute(sample) ⇒ Object

Build the smallest DFA compatible with the sample given as input.

Preconditions:

  • The sample is consistent (does not contains the same string both labeled as positive and negative) and contains at least one string.

Postconditions:

  • The returned DFA is the smallest DFA that correctly labels the learning sample given as input.

Remarks:

  • This instance version of RPNI.execute is not intended to be used directly and is mainly provided for testing purposes. Please use the class variant of this method if possible.



157
158
159
160
161
162
163
164
165
# File 'lib/stamina/induction/rpni.rb', line 157

def execute(sample)
  # create union-find
  puts "Creating PTA and UnionFind structure" if @options[:verbose]
  ufds = sample2ufds(sample)
  # refine it
  ufds = main(ufds)
  # compute and return quotient automaton
  ufds2dfa(ufds)
end

#main(ufds) ⇒ Object

Main method of the algorithm. Refines the union find passed as first argument by merging well chosen state pairs. Returns the refined union find.

Preconditions:

  • The union find ufds is correctly initialized (contains :initial, :accepting, and :error boolean flags as well as a :delta sub hash)

Postconditions:

  • The union find has been refined. It encodes a quotient automaton (of the PTA it comes from) such that all positive and negative strings of the underlying sample are correctly classified by it.



119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
# File 'lib/stamina/induction/rpni.rb', line 119

def main(ufds)
  @ufds = ufds
  puts "Starting RPNI (#{@ufds.size} states)" if @options[:verbose]
  # First loop, iterating all PTA states
  (1...@ufds.size).each do |i|
    # we ignore those that have been previously merged
    next if @ufds.slave?(i) 
    # second loop: states of lower rank, with ignore
    (0...i).each do |j|
      next if @ufds.slave?(j)
      # try to merge this pair, including determinization
      # simply break the loop if it works!
      success = successfull_merge_or_nothing(i,j)
      if success
        puts "#{i} and #{j} successfully merged" if @options[:verbose]
        break
      end
    end # j loop
  end # i loop
  @ufds
end

#merge_and_determinize(i, j) ⇒ Object

Merges a state of rank j with a state of lower rank i. This merge method includes merging for determinization.

Preconditions:

  • States denoted by i and j are expected leader states (non merged ones)

  • States denoted by i and j are expected to be different

Postconditions:

  • Union find is refined, states i and j having been merged, as well as all state pairs that need to be merged to ensure the deterministic property of the quotient automaton.

  • If the resulting quotient automaton is consistent with the negative sample, this method returns true and the refined union-find correctly encodes the quotient automaton. Otherwise, the method returns false and the union-find information must be considered inaccurate.



59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# File 'lib/stamina/induction/rpni.rb', line 59

def merge_and_determinize(i, j)
  # Make the union (keep additional merges to be performed in determinization)
  # and recompute the user data attached to the new state group (new_data)
  determinization = []
  @ufds.union(i, j) do |d1, d2|
    new_data = merge_user_data(d1, d2, determinization)
    return false unless new_data
    new_data
  end
  
  # Merge for determinization
  determinization.each do |pair|
    # we take the leader states of the pair to merge
    pair = pair.collect{|i| @ufds.find(i)}
    # do nothing if already the same leader state
    next if pair[0]==pair[1]
    # otherwise recurse or fail
    return false unless merge_and_determinize(pair[0], pair[1]) 
  end
  
  # Everything seems ok!
  true
end

#successfull_merge_or_nothing(i, j) ⇒ Object

Makes a complete merge (including determinization), or simply do nothing if it leads accepting a negative string.

Preconditions:

  • States denoted by i and j are expected leader states (non merged ones)

  • States denoted by i and j are expected to be different

Postconditions:

  • Union find is refined, states i and j having been merged, as well as all state pairs that need to be merged to ensure the deterministic property of the quotient automaton.

  • If the resulting quotient automaton is consistent with the negative sample, this method returns true and the refined union-find correctly encodes the quotient automaton. Otherwise, the union find has not been changed.



99
100
101
102
103
104
# File 'lib/stamina/induction/rpni.rb', line 99

def successfull_merge_or_nothing(i,j)
  # try a merge and determinize inside a transaction on the ufds
  @ufds.transactional do
    merge_and_determinize(i, j)
  end
end