Class: Stamina::Sample

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/stamina-induction/stamina/sample.rb

Overview

A sample as an ordered collection of InputString labeled as positive or negative.

Tips and tricks

  • loading samples from disk is easy thanks to ADL !

Detailed API

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(strings = nil) ⇒ Sample

Creates an empty sample.



31
32
33
34
35
# File 'lib/stamina-induction/stamina/sample.rb', line 31

def initialize(strings = nil)
  @strings = []
  @size, @positive_count, @negative_count = 0, 0, 0
  strings.each{|s| self << s } unless strings.nil?
end

Instance Attribute Details

#negative_countObject (readonly)

Number of negative strings in the sample



20
21
22
# File 'lib/stamina-induction/stamina/sample.rb', line 20

def negative_count
  @negative_count
end

#positive_countObject (readonly)

Number of positive strings in the sample



17
18
19
# File 'lib/stamina-induction/stamina/sample.rb', line 17

def positive_count
  @positive_count
end

#sizeObject (readonly)

Number of strings in the sample



14
15
16
# File 'lib/stamina-induction/stamina/sample.rb', line 14

def size
  @size
end

Class Method Details

.[](*args) ⇒ Object

Creates an empty sample and appends it with args, by calling Sample#<< on each of them.



26
# File 'lib/stamina-induction/stamina/sample.rb', line 26

def self.[](*args) Sample.new << args end

.coerce(arg) ⇒ Object

Coerces ‘arg` to a Sample instance.



40
41
42
43
44
45
46
47
48
# File 'lib/stamina-induction/stamina/sample.rb', line 40

def self.coerce(arg)
  if arg.is_a?(Sample)
    arg
  elsif arg.is_a?(String)
    parse(arg)
  else
    raise ArgumentError, "Invalid argument #{arg} for `Sample`"
  end
end

.parse(adl) ⇒ Object

Parses an ADL input



53
54
55
# File 'lib/stamina-induction/stamina/sample.rb', line 53

def self.parse(adl)
  ADL::parse_sample(adl)
end

.to_pta(sample) ⇒ Object

Converts a Sample to an (augmented) prefix tree acceptor. This method ensures that the states of the PTA are in lexical order, according to the <=> operator defined on symbols. States reached by negative strings are tagged as non accepting and error.



235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
# File 'lib/stamina-induction/stamina/sample.rb', line 235

def self.to_pta(sample)
  thepta = Automaton.new do |pta|
    initial_state = add_state(:initial => true, :accepting => false)

    # Fill the PTA with each string
    sample.each do |str|
      # split string using the dfa
      parsed, reached, remaining = pta.dfa_split(str, initial_state)

      # remaining symbols are not empty -> build the PTA
      unless remaining.empty?
        remaining.each do |symbol|
          newone = pta.add_state(:initial => false, :accepting => false, :error => false)
          pta.connect(reached, newone, symbol)
          reached = newone
        end
      end

      # flag state
      str.positive? ? reached.accepting! : reached.error!

      # check consistency, should not arrive as Sample does not allow
      # inconsistencies. Should appear only if _sample_ is not a Sample
      # instance but some other enumerable.
      raise(InconsistencyError, "Inconsistent sample on #{str}", caller)\
        if (reached.error? and reached.accepting?)
    end

    # Reindex states by applying BFS
    to_index, index = [initial_state], 0
    until to_index.empty?
      state = to_index.shift
      state[:__index__] = index
      state.out_edges.sort{|e,f| e.symbol<=>f.symbol}.each{|e| to_index << e.target}
      index += 1
    end
  end

  # Now we rebuild a fresh one with states in order.
  # This look more efficient that reordering states of the PTA
  Automaton.new do |ordered|
    ordered.add_n_states(thepta.state_count)
    thepta.each_state do |pta_state|
      source = ordered.ith_state(pta_state[:__index__])
      source.initial!   if pta_state.initial?
      source.accepting! if pta_state.accepting?
      source.error!     if pta_state.error?
      pta_state.out_edges.each do |e|
        target = ordered.ith_state(e.target[:__index__])
        ordered.connect(source, target, e.symbol)
      end
    end
  end

end

Instance Method Details

#+(other) ⇒ Object

Returns a new sample as the union of both ‘self` and `other`



116
117
118
119
120
121
# File 'lib/stamina-induction/stamina/sample.rb', line 116

def +(other)
  s = Sample.new
  each{|x| s << x}
  other.each{|x| s << x}
  s
end

#<<(str) ⇒ Object

Adds a string to the sample. The str argument may be an InputString instance, a String (parsed using ADL), a Sample instance (all strings are added) or an Array (recurses on each element).

Raises an InconsistencyError if the same string already exists with the opposite label. Raises an ArgumentError if the str argument is not recognized.



73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'lib/stamina-induction/stamina/sample.rb', line 73

def <<(str)
  case str
    when InputString
      #raise(InconsistencyError, "Inconsistent sample on #{str}", caller) if self.include?(str.negate)
      @size += 1
      str.positive? ? (@positive_count += 1) : (@negative_count += 1)
      @strings << str
    when String
      self << ADL::parse_string(str)
    when Sample
      str.each {|s| self << s}
    when Array
      str.each {|s| self << s}
    else
      raise(ArgumentError, "#{str} is not a valid argument.", caller)
  end
  self
end

#==(other) ⇒ Object Also known as: eql?

Compares with another sample other, which is required to be a Sample instance. Returns true if the two samples contains the same strings (including labels), false otherwise.



128
129
130
# File 'lib/stamina-induction/stamina/sample.rb', line 128

def ==(other)
  include?(other) and other.include?(self)
end

#correctly_classified_by?(classifier) ⇒ Boolean

Checks if the sample is correctly classified by a given classifier (expected to include the Stamina::Classfier module). Unlabeled strings are simply ignored.

Returns:

  • (Boolean)


193
194
195
# File 'lib/stamina-induction/stamina/sample.rb', line 193

def correctly_classified_by?(classifier)
  classifier.correctly_classify?(self)
end

#eachObject

Yields the block with each string. This method has no effect if no block is given.



144
145
146
147
# File 'lib/stamina-induction/stamina/sample.rb', line 144

def each
  return unless block_given?
  @strings.each {|str| yield str}
end

#each_negativeObject

Yields the block with each negative string. This method has no effect if no block is given.



173
174
175
# File 'lib/stamina-induction/stamina/sample.rb', line 173

def each_negative
  each {|str| yield str if str.negative?}
end

#each_positiveObject

Yields the block with each positive string. This method has no effect if no block is given.



153
154
155
156
# File 'lib/stamina-induction/stamina/sample.rb', line 153

def each_positive
  return unless block_given?
  each {|str| yield str if str.positive?}
end

#empty?Boolean

Returns true if this sample does not contain any string, false otherwise.

Returns:

  • (Boolean)


61
62
63
# File 'lib/stamina-induction/stamina/sample.rb', line 61

def empty?()
  @size==0
end

#hashObject

Computes an hash code for this sample.



136
137
138
# File 'lib/stamina-induction/stamina/sample.rb', line 136

def hash
  self.inject(37){|memo,str| memo + 17*str.hash}
end

#include?(str) ⇒ Boolean

Returns true if a given string is included in the sample, false otherwise. This method allows same flexibility as << for the str argument.

Returns:

  • (Boolean)


96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# File 'lib/stamina-induction/stamina/sample.rb', line 96

def include?(str)
  case str
    when InputString
      @strings.include?(str)
    when String
      include?(ADL::parse_string(str))
    when Array
      str.each {|s| return false unless include?(s)}
      true
    when Sample
      str.each {|s| return false unless include?(s)}
      true
    else
      raise(ArgumentError, "#{str} is not a valid argument.", caller)
  end
end

#negative_enumeratorObject

Returns an enumerator on negative strings.



180
181
182
183
184
185
186
# File 'lib/stamina-induction/stamina/sample.rb', line 180

def negative_enumerator
if RUBY_VERSION >= "1.9"
    Enumerator.new(self, :each_negative)
  else
    Enumerable::Enumerator.new(self, :each_negative)
			end
end

#positive_enumeratorObject

Returns an enumerator on positive strings.



161
162
163
164
165
166
167
# File 'lib/stamina-induction/stamina/sample.rb', line 161

def positive_enumerator
if RUBY_VERSION >= "1.9"
    Enumerator.new(self, :each_positive)
  else
    Enumerable::Enumerator.new(self, :each_positive)
			end
end

#signatureObject

Computes and returns the binary signature of the sample. The signature is a String having one character for each string in the sample. A ‘1’ is used for positive strings, ‘0’ for negative ones and ‘?’ for unlabeled.



202
203
204
205
206
207
208
# File 'lib/stamina-induction/stamina/sample.rb', line 202

def signature
  signature = ''
  each do |str|
    signature << (str.unlabeled? ? '?' : str.positive? ? '1' : '0')
  end
  signature
end

#take(proportion = 0.5) ⇒ Object

Takes only a given proportion of this sample and returns it as a new Sample.



213
214
215
216
217
218
# File 'lib/stamina-induction/stamina/sample.rb', line 213

def take(proportion = 0.5)
  taken = Stamina::Sample.new
  each_positive{|s| taken << s if Kernel.rand < proportion}
  each_negative{|s| taken << s if Kernel.rand < proportion}
  taken
end

#to_adl(buffer = "") ⇒ Object Also known as: to_s, inspect

Prints an ADL description of this sample on the buffer.



223
224
225
# File 'lib/stamina-induction/stamina/sample.rb', line 223

def to_adl(buffer="")
  self.inject(buffer) {|memo,str| memo << "\n" << str.to_adl}
end

#to_cdfaObject

Converts this sample to a canonical dfa



299
300
301
# File 'lib/stamina-induction/stamina/sample.rb', line 299

def to_cdfa
  to_pta.to_cdfa
end

#to_dotObject

Converts this sample to a dot output



304
305
306
# File 'lib/stamina-induction/stamina/sample.rb', line 304

def to_dot
  to_pta.to_dot
end

#to_ptaObject Also known as: to_fa, to_dfa

Converts this sample to a PTA



292
293
294
# File 'lib/stamina-induction/stamina/sample.rb', line 292

def to_pta
  Sample.to_pta(self)
end