Class: Pocketsphinx::Decoder

Inherits:

Object

Object
Pocketsphinx::Decoder

show all

Includes:: API::CallHelpers

Defined in:: lib/pocketsphinx/decoder.rb

Defined Under Namespace

Classes: Hypothesis, Word

Instance Attribute Summary collapse

#configuration ⇒ Object

Returns the value of attribute configuration.
#ps_api ⇒ Object

Instance Method Summary collapse

#decode(audio_path_or_file, max_samples = 2048) ⇒ Object

Decode a raw audio stream as a single utterance, opening a file if path given.
#decode_raw(audio_file, max_samples = 2048) ⇒ Object

Decode a raw audio stream as a single utterance.
#end_utterance ⇒ Object

End utterance processing.
#get_search ⇒ Object

Returns name of curent search in decoder.
#hypothesis ⇒ Hypothesis

Get hypothesis string (with #path_score and #utterance_id).
#in_speech? ⇒ Boolean

Checks if the last feed audio buffer contained speech.
#initialize(configuration, ps_decoder = nil) ⇒ Decoder constructor

Initialize a Decoder.
#process_raw(buffer, size, no_search = false, full_utt = false) ⇒ Object

Decode raw audio data.
#ps_decoder ⇒ Object
#reconfigure(configuration = nil) ⇒ Object

Reinitialize the decoder with updated configuration.
#set_jsgf_string(jsgf_string, name = 'default') ⇒ Object

Adds new search using JSGF model.
#set_search(name = 'default') ⇒ Object

Actives search with the provided name.
#start_utterance ⇒ Object

Start utterance processing.
#unset_search(name = 'default') ⇒ Object

Unsets the search and releases related resources.
#words ⇒ Array

Get an array of words with start/end frame values (10msec/frame) for current hypothesis.

Methods included from API::CallHelpers

#api_call

Constructor Details

#initialize(configuration, ps_decoder = nil) ⇒ `Decoder`

Initialize a Decoder

Note that this initialization process actually updates the Configuration based on settings which are found in feat.params along with the acoustic model.

Parameters:

configuration (Configuration)
ps_decoder (FFI::Pointer) (defaults to: nil) —

An optional Pocketsphinx decoder. One is initialized if not provided.

# File 'lib/pocketsphinx/decoder.rb', line 31

def initialize(configuration, ps_decoder = nil)
  @configuration = configuration
  init_decoder if ps_decoder.nil?
end

Instance Attribute Details

#configuration ⇒ `Object`

Returns the value of attribute configuration.



22
23
24

# File 'lib/pocketsphinx/decoder.rb', line 22

def configuration
  @configuration
end

#ps_api ⇒ `Object`



186
187
188

# File 'lib/pocketsphinx/decoder.rb', line 186

def ps_api
  @ps_api || API::Pocketsphinx
end

Instance Method Details

#decode(audio_path_or_file, max_samples = 2048) ⇒ `Object`

Decode a raw audio stream as a single utterance, opening a file if path given

See #decode_raw

Parameters:

audio_path_or_file (IO) —

The raw audio stream or file path to decode as a single utterance
max_samples (Fixnum) (defaults to: 2048) —

The maximum samples to process from the stream on each iteration

# File 'lib/pocketsphinx/decoder.rb', line 54

def decode(audio_path_or_file, max_samples = 2048)
  case audio_path_or_file
  when String
    File.open(audio_path_or_file, 'rb') { |f| decode_raw(f, max_samples) }
  else
    decode_raw(audio_path_or_file, max_samples)
  end
end

#decode_raw(audio_file, max_samples = 2048) ⇒ `Object`

Decode a raw audio stream as a single utterance.

No headers are recognized in this files. The configuration parameters samprate and input_endian are used to determine the sampling rate and endianness of the stream, respectively. Audio is always assumed to be 16-bit signed PCM.

Parameters:

audio_file (IO) —

The raw audio stream to decode as a single utterance
max_samples (Fixnum) (defaults to: 2048) —

The maximum samples to process from the stream on each iteration

# File 'lib/pocketsphinx/decoder.rb', line 71

def decode_raw(audio_file, max_samples = 2048)
  start_utterance

  FFI::MemoryPointer.new(:int16, max_samples) do |buffer|
    while data = audio_file.read(max_samples * 2)
      buffer.write_string(data)
      process_raw(buffer, data.length / 2)
    end
  end

  end_utterance
end

#end_utterance ⇒ `Object`

End utterance processing



106
107
108

# File 'lib/pocketsphinx/decoder.rb', line 106

def end_utterance
  api_call :ps_end_utt, ps_decoder
end

#get_search ⇒ `Object`

Returns name of curent search in decoder



166
167
168

# File 'lib/pocketsphinx/decoder.rb', line 166

def get_search
  ps_api.ps_get_search(ps_decoder)
end

#hypothesis ⇒ `Hypothesis`

Get hypothesis string (with #path_score and #utterance_id).

Returns:

(Hypothesis) —

Hypothesis (behaves like a string)

# File 'lib/pocketsphinx/decoder.rb', line 118

def hypothesis
  mp_path_score = FFI::MemoryPointer.new(:int32, 1)

  hypothesis = ps_api.ps_get_hyp(ps_decoder, mp_path_score)
  posterior_prob = ps_api.ps_get_prob(ps_decoder)

  hypothesis.nil? ? nil : Hypothesis.new(
    hypothesis,
    log_prob_to_linear(mp_path_score.get_int32(0)),
    log_prob_to_linear(posterior_prob)
  )
end

#in_speech? ⇒ `Boolean`

Checks if the last feed audio buffer contained speech

Returns:

(Boolean)



111
112
113

# File 'lib/pocketsphinx/decoder.rb', line 111

def in_speech?
  ps_api.ps_get_in_speech(ps_decoder) != 0
end

#process_raw(buffer, size, no_search = false, full_utt = false) ⇒ `Object`

Decode raw audio data.

Parameters:

no_search (Boolean) (defaults to: false) —

If non-zero, perform feature extraction but don’t do any recognition yet. This may be necessary if your processor has trouble doing recognition in real-time.
full_utt (Boolean) (defaults to: false) —

If non-zero, this block of data is a full utterance worth of data. This may allow the recognizer to produce more accurate results.

Returns:

Number of frames of data searched



92
93
94

# File 'lib/pocketsphinx/decoder.rb', line 92

def process_raw(buffer, size, no_search = false, full_utt = false)
  api_call :ps_process_raw, ps_decoder, buffer, size, no_search ? 1 : 0, full_utt ? 1 : 0
end

#ps_decoder ⇒ `Object`

# File 'lib/pocketsphinx/decoder.rb', line 190

def ps_decoder
  init_decoder if @ps_decoder.nil?
  @ps_decoder
end

#reconfigure(configuration = nil) ⇒ `Object`

Reinitialize the decoder with updated configuration.

This function allows you to switch the acoustic model, dictionary, or other configuration without creating an entirely new decoding object.

Parameters:

configuration (Configuration) (defaults to: nil) —

An optional new configuration to use. If this is nil, the previous configuration will be reloaded, with any changes applied.

# File 'lib/pocketsphinx/decoder.rb', line 43

def reconfigure(configuration = nil)
  self.configuration = configuration if configuration
  reinit_decoder
end

#set_jsgf_string(jsgf_string, name = 'default') ⇒ `Object`

Adds new search using JSGF model.

Convenience method to parse JSGF model from string and create a search.

Parameters:

jsgf_string (String) —

The JSGF grammar
name (String) (defaults to: 'default') —

The search name



161
162
163

# File 'lib/pocketsphinx/decoder.rb', line 161

def set_jsgf_string(jsgf_string, name = 'default')
  api_call :ps_set_jsgf_string, ps_decoder, name, jsgf_string
end

#set_search(name = 'default') ⇒ `Object`

Actives search with the provided name.

Activates search with the provided name. The search must be added before using either ps_set_fsg(), ps_set_lm() or ps_set_kws().



174
175
176

# File 'lib/pocketsphinx/decoder.rb', line 174

def set_search(name = 'default')
  api_call :ps_set_search, ps_decoder, name
end

#start_utterance ⇒ `Object`

Start utterance processing.

This function should be called before any utterance data is passed to the decoder. It marks the start of a new utterance and reinitializes internal data structures.



101
102
103

# File 'lib/pocketsphinx/decoder.rb', line 101

def start_utterance
  api_call :ps_start_utt, ps_decoder
end

#unset_search(name = 'default') ⇒ `Object`

Unsets the search and releases related resources.

Unsets the search previously added with using either ps_set_fsg(), ps_set_lm() or ps_set_kws().



182
183
184

# File 'lib/pocketsphinx/decoder.rb', line 182

def unset_search(name = 'default')
  api_call :ps_unset_search, ps_decoder, name
end

#words ⇒ `Array`

Get an array of words with start/end frame values (10msec/frame) for current hypothesis

Returns:

(Array) —

Array of words with start/end frame values (10msec/frame)

# File 'lib/pocketsphinx/decoder.rb', line 134

def words
  mp_path_score = FFI::MemoryPointer.new(:int32, 1)
  start_frame   = FFI::MemoryPointer.new(:int32, 1)
  end_frame     = FFI::MemoryPointer.new(:int32, 1)

  seg_iter = ps_api.ps_seg_iter(ps_decoder, mp_path_score)
  words    = []

  until seg_iter.null? do
    ps_api.ps_seg_frames(seg_iter, start_frame, end_frame)
    words << Pocketsphinx::Decoder::Word.new(
      ps_api.ps_seg_word(seg_iter),
      start_frame.get_int32(0),
      end_frame.get_int32(0)
    )
    seg_iter = ps_api.ps_seg_next(seg_iter)
  end

  words
end

Class: Pocketsphinx::Decoder

Defined Under Namespace

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from API::CallHelpers

Constructor Details

#initialize(configuration, ps_decoder = nil) ⇒ Decoder

Instance Attribute Details

#configuration ⇒ Object

#ps_api ⇒ Object

Instance Method Details

#decode(audio_path_or_file, max_samples = 2048) ⇒ Object

#decode_raw(audio_file, max_samples = 2048) ⇒ Object

#end_utterance ⇒ Object

#get_search ⇒ Object

#hypothesis ⇒ Hypothesis

#in_speech? ⇒ Boolean

#process_raw(buffer, size, no_search = false, full_utt = false) ⇒ Object

#ps_decoder ⇒ Object

#reconfigure(configuration = nil) ⇒ Object

#set_jsgf_string(jsgf_string, name = 'default') ⇒ Object

#set_search(name = 'default') ⇒ Object

#start_utterance ⇒ Object

#unset_search(name = 'default') ⇒ Object

#words ⇒ Array

#initialize(configuration, ps_decoder = nil) ⇒ `Decoder`

#configuration ⇒ `Object`

#ps_api ⇒ `Object`

#decode(audio_path_or_file, max_samples = 2048) ⇒ `Object`

#decode_raw(audio_file, max_samples = 2048) ⇒ `Object`

#end_utterance ⇒ `Object`

#get_search ⇒ `Object`

#hypothesis ⇒ `Hypothesis`

#in_speech? ⇒ `Boolean`

#process_raw(buffer, size, no_search = false, full_utt = false) ⇒ `Object`

#ps_decoder ⇒ `Object`

#reconfigure(configuration = nil) ⇒ `Object`

#set_jsgf_string(jsgf_string, name = 'default') ⇒ `Object`

#set_search(name = 'default') ⇒ `Object`

#start_utterance ⇒ `Object`

#unset_search(name = 'default') ⇒ `Object`

#words ⇒ `Array`