Class: Pocketsphinx::SpeechRecognizer

Inherits:
Object
  • Object
show all
Defined in:
lib/pocketsphinx/speech_recognizer.rb

Overview

Reads audio data from a recordable interface and decodes it into utterances

Essentially orchestrates interaction between Recordable and Decoder, and detects new utterances.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(configuration = nil) ⇒ SpeechRecognizer

Returns a new instance of SpeechRecognizer.



10
11
12
# File 'lib/pocketsphinx/speech_recognizer.rb', line 10

def initialize(configuration = nil)
  @configuration = configuration
end

Instance Attribute Details

#decoderObject



18
19
20
# File 'lib/pocketsphinx/speech_recognizer.rb', line 18

def decoder
  @decoder ||= Decoder.new(configuration)
end

#recordableObject



14
15
16
# File 'lib/pocketsphinx/speech_recognizer.rb', line 14

def recordable
  @recordable or raise "A SpeechRecognizer must have a recordable interface"
end

Instance Method Details

#configurationObject



22
23
24
# File 'lib/pocketsphinx/speech_recognizer.rb', line 22

def configuration
  @configuration ||= Configuration.default
end

#in_speech?Boolean

Returns:

  • (Boolean)


53
54
55
56
# File 'lib/pocketsphinx/speech_recognizer.rb', line 53

def in_speech?
  # Use Pocketsphinx's implementation by default
  decoder.in_speech?
end

#recognize(max_samples = 4096) ⇒ Object

Recognize utterances and yield hypotheses in infinite loop

Splits speech into utterances by detecting silence between them. By default this uses Pocketsphinx’s internal Voice Activity Detection (VAD) which can be configured by adjusting the ‘vad_postspeech`, `vad_prespeech`, and `vad_threshold` settings.

Parameters:

  • max_samples (Fixnum) (defaults to: 4096)

    Number of samples to process at a time



33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# File 'lib/pocketsphinx/speech_recognizer.rb', line 33

def recognize(max_samples = 4096)
  decoder.start_utterance

  recordable.record do
    FFI::MemoryPointer.new(:int16, max_samples) do |buffer|
      loop do
        if in_speech?
          while decoder.in_speech?
            process_audio(buffer, max_samples) or break
          end

          yield get_hypothesis
        else
          process_audio(buffer, max_samples) or break
        end
      end
    end
  end
end