Class: Pocketsphinx::Decoder
- Inherits:
-
Object
- Object
- Pocketsphinx::Decoder
- Includes:
- API::CallHelpers
- Defined in:
- lib/pocketsphinx/decoder.rb
Defined Under Namespace
Classes: Hypothesis, Word
Instance Attribute Summary collapse
-
#configuration ⇒ Object
Returns the value of attribute configuration.
- #ps_api ⇒ Object
Instance Method Summary collapse
-
#decode(audio_path_or_file, max_samples = 2048) ⇒ Object
Decode a raw audio stream as a single utterance, opening a file if path given.
-
#decode_raw(audio_file, max_samples = 2048) ⇒ Object
Decode a raw audio stream as a single utterance.
-
#end_utterance ⇒ Object
End utterance processing.
-
#get_search ⇒ Object
Returns name of curent search in decoder.
-
#hypothesis ⇒ Hypothesis
Get hypothesis string (with #path_score and #utterance_id).
-
#in_speech? ⇒ Boolean
Checks if the last feed audio buffer contained speech.
-
#initialize(configuration, ps_decoder = nil) ⇒ Decoder
constructor
Initialize a Decoder.
-
#process_raw(buffer, size, no_search = false, full_utt = false) ⇒ Object
Decode raw audio data.
- #ps_decoder ⇒ Object
-
#reconfigure(configuration = nil) ⇒ Object
Reinitialize the decoder with updated configuration.
-
#set_jsgf_string(jsgf_string, name = 'default') ⇒ Object
Adds new search using JSGF model.
-
#set_search(name = 'default') ⇒ Object
Actives search with the provided name.
-
#start_utterance ⇒ Object
Start utterance processing.
-
#unset_search(name = 'default') ⇒ Object
Unsets the search and releases related resources.
-
#words ⇒ Array
Get an array of words with start/end frame values (10msec/frame) for current hypothesis.
Methods included from API::CallHelpers
Constructor Details
#initialize(configuration, ps_decoder = nil) ⇒ Decoder
Initialize a Decoder
Note that this initialization process actually updates the Configuration based on settings which are found in feat.params along with the acoustic model.
31 32 33 34 |
# File 'lib/pocketsphinx/decoder.rb', line 31 def initialize(configuration, ps_decoder = nil) @configuration = configuration init_decoder if ps_decoder.nil? end |
Instance Attribute Details
#configuration ⇒ Object
Returns the value of attribute configuration.
22 23 24 |
# File 'lib/pocketsphinx/decoder.rb', line 22 def configuration @configuration end |
#ps_api ⇒ Object
186 187 188 |
# File 'lib/pocketsphinx/decoder.rb', line 186 def ps_api @ps_api || API::Pocketsphinx end |
Instance Method Details
#decode(audio_path_or_file, max_samples = 2048) ⇒ Object
Decode a raw audio stream as a single utterance, opening a file if path given
See #decode_raw
54 55 56 57 58 59 60 61 |
# File 'lib/pocketsphinx/decoder.rb', line 54 def decode(audio_path_or_file, max_samples = 2048) case audio_path_or_file when String File.open(audio_path_or_file, 'rb') { |f| decode_raw(f, max_samples) } else decode_raw(audio_path_or_file, max_samples) end end |
#decode_raw(audio_file, max_samples = 2048) ⇒ Object
Decode a raw audio stream as a single utterance.
No headers are recognized in this files. The configuration parameters samprate and input_endian are used to determine the sampling rate and endianness of the stream, respectively. Audio is always assumed to be 16-bit signed PCM.
71 72 73 74 75 76 77 78 79 80 81 82 |
# File 'lib/pocketsphinx/decoder.rb', line 71 def decode_raw(audio_file, max_samples = 2048) start_utterance FFI::MemoryPointer.new(:int16, max_samples) do |buffer| while data = audio_file.read(max_samples * 2) buffer.write_string(data) process_raw(buffer, data.length / 2) end end end_utterance end |
#end_utterance ⇒ Object
End utterance processing
106 107 108 |
# File 'lib/pocketsphinx/decoder.rb', line 106 def end_utterance api_call :ps_end_utt, ps_decoder end |
#get_search ⇒ Object
Returns name of curent search in decoder
166 167 168 |
# File 'lib/pocketsphinx/decoder.rb', line 166 def get_search ps_api.ps_get_search(ps_decoder) end |
#hypothesis ⇒ Hypothesis
Get hypothesis string (with #path_score and #utterance_id).
118 119 120 121 122 123 124 125 126 127 128 129 |
# File 'lib/pocketsphinx/decoder.rb', line 118 def hypothesis mp_path_score = FFI::MemoryPointer.new(:int32, 1) hypothesis = ps_api.ps_get_hyp(ps_decoder, mp_path_score) posterior_prob = ps_api.ps_get_prob(ps_decoder) hypothesis.nil? ? nil : Hypothesis.new( hypothesis, log_prob_to_linear(mp_path_score.get_int32(0)), log_prob_to_linear(posterior_prob) ) end |
#in_speech? ⇒ Boolean
Checks if the last feed audio buffer contained speech
111 112 113 |
# File 'lib/pocketsphinx/decoder.rb', line 111 def in_speech? ps_api.ps_get_in_speech(ps_decoder) != 0 end |
#process_raw(buffer, size, no_search = false, full_utt = false) ⇒ Object
Decode raw audio data.
92 93 94 |
# File 'lib/pocketsphinx/decoder.rb', line 92 def process_raw(buffer, size, no_search = false, full_utt = false) api_call :ps_process_raw, ps_decoder, buffer, size, no_search ? 1 : 0, full_utt ? 1 : 0 end |
#ps_decoder ⇒ Object
190 191 192 193 |
# File 'lib/pocketsphinx/decoder.rb', line 190 def ps_decoder init_decoder if @ps_decoder.nil? @ps_decoder end |
#reconfigure(configuration = nil) ⇒ Object
Reinitialize the decoder with updated configuration.
This function allows you to switch the acoustic model, dictionary, or other configuration without creating an entirely new decoding object.
43 44 45 46 |
# File 'lib/pocketsphinx/decoder.rb', line 43 def reconfigure(configuration = nil) self.configuration = configuration if configuration reinit_decoder end |
#set_jsgf_string(jsgf_string, name = 'default') ⇒ Object
Adds new search using JSGF model.
Convenience method to parse JSGF model from string and create a search.
161 162 163 |
# File 'lib/pocketsphinx/decoder.rb', line 161 def set_jsgf_string(jsgf_string, name = 'default') api_call :ps_set_jsgf_string, ps_decoder, name, jsgf_string end |
#set_search(name = 'default') ⇒ Object
Actives search with the provided name.
Activates search with the provided name. The search must be added before using either ps_set_fsg(), ps_set_lm() or ps_set_kws().
174 175 176 |
# File 'lib/pocketsphinx/decoder.rb', line 174 def set_search(name = 'default') api_call :ps_set_search, ps_decoder, name end |
#start_utterance ⇒ Object
Start utterance processing.
This function should be called before any utterance data is passed to the decoder. It marks the start of a new utterance and reinitializes internal data structures.
101 102 103 |
# File 'lib/pocketsphinx/decoder.rb', line 101 def start_utterance api_call :ps_start_utt, ps_decoder end |
#unset_search(name = 'default') ⇒ Object
Unsets the search and releases related resources.
Unsets the search previously added with using either ps_set_fsg(), ps_set_lm() or ps_set_kws().
182 183 184 |
# File 'lib/pocketsphinx/decoder.rb', line 182 def unset_search(name = 'default') api_call :ps_unset_search, ps_decoder, name end |
#words ⇒ Array
Get an array of words with start/end frame values (10msec/frame) for current hypothesis
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
# File 'lib/pocketsphinx/decoder.rb', line 134 def words mp_path_score = FFI::MemoryPointer.new(:int32, 1) start_frame = FFI::MemoryPointer.new(:int32, 1) end_frame = FFI::MemoryPointer.new(:int32, 1) seg_iter = ps_api.ps_seg_iter(ps_decoder, mp_path_score) words = [] until seg_iter.null? do ps_api.ps_seg_frames(seg_iter, start_frame, end_frame) words << Pocketsphinx::Decoder::Word.new( ps_api.ps_seg_word(seg_iter), start_frame.get_int32(0), end_frame.get_int32(0) ) seg_iter = ps_api.ps_seg_next(seg_iter) end words end |