Class: Pocketsphinx::Decoder

Inherits:

Object

Object
Pocketsphinx::Decoder

show all

Defined in:: lib/pocketsphinx/decoder.rb

Constant Summary collapse

Error =

Class.new(StandardError)

Instance Attribute Summary collapse

#ps_api ⇒ Object
#ps_decoder ⇒ Object readonly

Returns the value of attribute ps_decoder.

Instance Method Summary collapse

#decode(audio_path_or_file, max_samples = 2048) ⇒ Object

Decode a raw audio stream as a single utterance, opening a file if path given.
#decode_raw(audio_file, max_samples = 2048) ⇒ Object

Decode a raw audio stream as a single utterance.
#end_utterance ⇒ Object

End utterance processing.
#hypothesis ⇒ String

Get hypothesis string and path score.
#in_speech? ⇒ Boolean

Checks if the last feed audio buffer contained speech.
#initialize(configuration) ⇒ Decoder constructor

A new instance of Decoder.
#process_raw(buffer, size, no_search = false, full_utt = false) ⇒ Object

Decode raw audio data.
#start_utterance(name = nil) ⇒ Object

Start utterance processing.

Constructor Details

#initialize(configuration) ⇒ `Decoder`

Returns a new instance of Decoder.

# File 'lib/pocketsphinx/decoder.rb', line 8

def initialize(configuration)
  @configuration = configuration
  @ps_decoder = ps_api.ps_init(configuration.ps_config)
end

Instance Attribute Details

#ps_api ⇒ `Object`



96
97
98

# File 'lib/pocketsphinx/decoder.rb', line 96

def ps_api
  @ps_api || API::Pocketsphinx
end

#ps_decoder ⇒ `Object` (readonly)

Returns the value of attribute ps_decoder.



5
6
7

# File 'lib/pocketsphinx/decoder.rb', line 5

def ps_decoder
  @ps_decoder
end

Instance Method Details

#decode(audio_path_or_file, max_samples = 2048) ⇒ `Object`

Decode a raw audio stream as a single utterance, opening a file if path given

See #decode_raw

Parameters:

audio_path_or_file (IO) —

The raw audio stream or file path to decode as a single utterance
max_samples (Fixnum) (defaults to: 2048) —

The maximum samples to process from the stream on each iteration

# File 'lib/pocketsphinx/decoder.rb', line 19

def decode(audio_path_or_file, max_samples = 2048)
  case audio_path_or_file
  when String
    File.open(audio_path_or_file, 'rb') { |f| decode_raw(f, max_samples) }
  else
    decode_raw(audio_path_or_file, max_samples)
  end
end

#decode_raw(audio_file, max_samples = 2048) ⇒ `Object`

Decode a raw audio stream as a single utterance.

No headers are recognized in this files. The configuration parameters samprate and input_endian are used to determine the sampling rate and endianness of the stream, respectively. Audio is always assumed to be 16-bit signed PCM.

Parameters:

audio_file (IO) —

The raw audio stream to decode as a single utterance
max_samples (Fixnum) (defaults to: 2048) —

The maximum samples to process from the stream on each iteration

# File 'lib/pocketsphinx/decoder.rb', line 36

def decode_raw(audio_file, max_samples = 2048)
  start_utterance

  FFI::MemoryPointer.new(:int16, max_samples) do |buffer|
    while data = audio_file.read(max_samples * 2)
      buffer.write_string(data)
      process_raw(buffer, data.length / 2)
    end
  end

  end_utterance
end

#end_utterance ⇒ `Object`

End utterance processing

# File 'lib/pocketsphinx/decoder.rb', line 77

def end_utterance
  ps_api.ps_end_utt(@ps_decoder).tap do |result|
    raise Error, "Decoder#end_utterance failed with error code #{result}" if result < 0
  end
end

#hypothesis ⇒ `String`

TODO:

Expand to return path score and utterance ID

Get hypothesis string and path score.

Returns:

(String) —

Hypothesis string



92
93
94

# File 'lib/pocketsphinx/decoder.rb', line 92

def hypothesis
  ps_api.ps_get_hyp(@ps_decoder, nil, nil)
end

#in_speech? ⇒ `Boolean`

Checks if the last feed audio buffer contained speech

Returns:

(Boolean)



84
85
86

# File 'lib/pocketsphinx/decoder.rb', line 84

def in_speech?
  ps_api.ps_get_in_speech(@ps_decoder) != 0
end

#process_raw(buffer, size, no_search = false, full_utt = false) ⇒ `Object`

Decode raw audio data.

Parameters:

no_search (Boolean) (defaults to: false) —

If non-zero, perform feature extraction but don’t do any recognition yet. This may be necessary if your processor has trouble doing recognition in real-time.
full_utt (Boolean) (defaults to: false) —

If non-zero, this block of data is a full utterance worth of data. This may allow the recognizer to produce more accurate results.

Returns:

Number of frames of data searched

# File 'lib/pocketsphinx/decoder.rb', line 57

def process_raw(buffer, size, no_search = false, full_utt = false)
  ps_api.ps_process_raw(@ps_decoder, buffer, size, no_search ? 1 : 0, full_utt ? 1 : 0).tap do |result|
    raise Error, "Decoder#process_raw failed with error code #{result}" if result < 0
  end
end

#start_utterance(name = nil) ⇒ `Object`

Start utterance processing.

This function should be called before any utterance data is passed to the decoder. It marks the start of a new utterance and reinitializes internal data structures.

Parameters:

name (String) (defaults to: nil) —

String uniquely identifying this utterance. If nil, one will be created.

# File 'lib/pocketsphinx/decoder.rb', line 70

def start_utterance(name = nil)
  ps_api.ps_start_utt(@ps_decoder, name).tap do |result|
    raise Error, "Decoder#start_utterance failed with error code #{result}" if result < 0
  end
end

Class: Pocketsphinx::Decoder

Constant Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(configuration) ⇒ Decoder

Instance Attribute Details

#ps_api ⇒ Object

#ps_decoder ⇒ Object (readonly)

Instance Method Details

#decode(audio_path_or_file, max_samples = 2048) ⇒ Object

#decode_raw(audio_file, max_samples = 2048) ⇒ Object

#end_utterance ⇒ Object

#hypothesis ⇒ String

#in_speech? ⇒ Boolean

#process_raw(buffer, size, no_search = false, full_utt = false) ⇒ Object

#start_utterance(name = nil) ⇒ Object

#initialize(configuration) ⇒ `Decoder`

#ps_api ⇒ `Object`

#ps_decoder ⇒ `Object` (readonly)

#decode(audio_path_or_file, max_samples = 2048) ⇒ `Object`

#decode_raw(audio_file, max_samples = 2048) ⇒ `Object`

#end_utterance ⇒ `Object`

#hypothesis ⇒ `String`

#in_speech? ⇒ `Boolean`

#process_raw(buffer, size, no_search = false, full_utt = false) ⇒ `Object`

#start_utterance(name = nil) ⇒ `Object`