pocketsphinx-ruby

This gem provides Ruby FFI bindings for Pocketsphinx, a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop. Pocketsphinx is part of the CMU Sphinx Open Source Toolkit For Speech Recognition.

Pocketsphinx's SWIG interface was initially considered for this gem, but dropped in favor of FFI for many of the reasons outlined here; most importantly ease of maintenance and JRuby support.

The goal of this project is to make it as easy as possible for the Ruby community to experiment with speech recognition. Please do contribute fixes and enhancements.

Installation

This gem depends on Pocketsphinx (libpocketsphinx), and Sphinxbase (libsphinxbase and libsphinxad). The current stable versions (0.8) are from late 2012 and are now outdated. Build them manually from source, or on OSX the latest development (potentially unstable) versions can be installed using Homebrew as follows (more information here).

Add the Homebrew tap:

$ brew tap watsonbox/cmu-sphinx

You'll see some warnings as these formulae conflict with those in the main reponitory, but that's fine.

Install the libraries:

$ brew install --HEAD watsonbox/cmu-sphinx/cmu-sphinxbase
$ brew install --HEAD watsonbox/cmu-sphinx/cmu-sphinxtrain # optional
$ brew install --HEAD watsonbox/cmu-sphinx/cmu-pocketsphinx

You can test continuous recognition as follows:

$ pocketsphinx_continuous -inmic yes

Then add this line to your application's Gemfile:

gem 'pocketsphinx-ruby'

And then execute:

$ bundle

Or install it yourself as:

$ gem install pocketsphinx-ruby

Basic Usage

The LiveSpeechRecognizer is modeled on the same class in Sphinx4. It uses the Microphone and Decoder classes internally to provide a simple, high-level recognition interface:

require 'pocketsphinx-ruby'

Pocketsphinx::LiveSpeechRecognizer.new.recognize do |speech|
  puts speech
end

The AudioFileSpeechRecognizer decodes directly from an audio file by coordinating interactions between an AudioFile and Decoder.

recognizer = Pocketsphinx::AudioFileSpeechRecognizer.new

recognizer.recognize('spec/assets/audio/goforward.raw') do |speech|
  puts speech # => "go forward ten years"
end

These two classes split speech into utterances by detecting silence between them. By default this uses Pocketsphinx's internal Voice Activity Detection (VAD) which can be configured by adjusting the vad_postspeech, vad_prespeech, and vad_threshold configuration settings.

Configuration

All of Pocketsphinx's decoding settings are managed by the Configuration class, which can be passed into the high-level speech recognizers:

configuration = Pocketsphinx::Configuration.default
configuration.details('vad_threshold')
# => {
#   :name => "vad_threshold",
#   :type => :float,
#   :default => 2.0,
#   :value => 2.0,
#   :info => "Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level."
# }

configuration['vad_threshold'] = 4

Pocketsphinx::LiveSpeechRecognizer.new(configuration)

You can find the output of configuration.details here for more information on the various different settings.

Microphone

The Microphone class uses Pocketsphinx's libsphinxad to record audio for speech recognition. For desktop applications this should normally be 16bit/16kHz raw PCM audio, so these are the default settings. The exact audio backend depends on what was selected when libsphinxad was built. On OSX, OpenAL is now supported and should work just fine.

For example, to record and save a 5 second raw audio file:

microphone = Microphone.new

File.open("test.raw", "wb") do |file|
  microphone.record do
    FFI::MemoryPointer.new(:int16, 4096) do |buffer|
      50.times do
        sample_count = microphone.read_audio(buffer, 4096)
        file.write buffer.get_bytes(0, sample_count * 2)

        sleep 0.1
      end
    end
  end
end

To open this audio file take a look at this wiki page.

Decoder

The Decoder class uses Pocketsphinx's libpocketsphinx to decode audio data into text. For example to decode a single utterance:

decoder = Decoder.new(Configuration.default)
decoder.decode 'spec/assets/audio/goforward.raw'

puts decoder.hypothesis # => "go forward ten years"

Contributing

Fork it ( https://github.com/[my-github-username]/pocketsphinx-ruby/fork )
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create a new Pull Request