Class: Google::Cloud::Speech::Audio

Inherits:
Object
  • Object
show all
Defined in:
lib/google/cloud/speech/audio.rb

Overview

# Audio

Represents a source of audio data, with related metadata such as the [audio encoding](cloud.google.com/speech/docs/basics#audio-encodings), [sample rate](cloud.google.com/speech/docs/basics#sample-rates), and [language](cloud.google.com/speech/docs/basics#languages).

See Project#audio.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw, sample_rate: 16000
results = audio.recognize

result = results.first
result.transcript #=> "how old is the Brooklyn Bridge"
result.confidence #=> 0.9826789498329163

See Also:

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeAudio

Returns a new instance of Audio.



125
126
127
# File 'lib/google/cloud/speech/audio.rb', line 125

def initialize
  @grpc = V1beta1::RecognitionAudio.new
end

Instance Attribute Details

#encodingString, Symbol

Encoding of audio data to be recognized.

Acceptable values are:

* `raw` - Uncompressed 16-bit signed little-endian samples.
  (LINEAR16)
* `flac` - The [Free Lossless Audio
  Codec](http://flac.sourceforge.net/documentation.html) encoding.
  Only 16-bit samples are supported. Not all fields in STREAMINFO
  are supported. (FLAC)
* `mulaw` - 8-bit samples that compand 14-bit audio samples using
  G.711 PCMU/mu-law. (MULAW)
* `amr` - Adaptive Multi-Rate Narrowband codec. (`sample_rate` must
  be 8000 Hz.) (AMR)
* `amr_wb` - Adaptive Multi-Rate Wideband codec. (`sample_rate` must
  be 16000 Hz.) (AMR_WB)

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw, sample_rate: 16000

Returns:

  • (String, Symbol)


83
84
85
# File 'lib/google/cloud/speech/audio.rb', line 83

def encoding
  @encoding
end

#grpcObject (readonly)



51
52
53
# File 'lib/google/cloud/speech/audio.rb', line 51

def grpc
  @grpc
end

#languageString, Symbol

The language of the supplied audio as a [www.rfc-editor.org/rfc/bcp/bcp47.txt](BCP-47) language code. If not specified, the language defaults to “en-US”. See [Language Support](cloud.google.com/speech/docs/best-practices#language_support) for a list of the currently supported language codes.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw, sample_rate: 16000,
                     language: :en

Returns:

  • (String, Symbol)


121
122
123
# File 'lib/google/cloud/speech/audio.rb', line 121

def language
  @language
end

#sample_rateInteger

Sample rate in Hertz of the audio data to be recognized. Valid values are: 8000-48000. 16000 is optimal. For best results, set the sampling rate of the audio source to 16000 Hz. If that’s not possible, use the native sample rate of the audio source (instead of re-sampling).

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw, sample_rate: 16000

Returns:

  • (Integer)


101
102
103
# File 'lib/google/cloud/speech/audio.rb', line 101

def sample_rate
  @sample_rate
end

#speechObject (readonly)



53
54
55
# File 'lib/google/cloud/speech/audio.rb', line 53

def speech
  @speech
end

Class Method Details

.from_source(source, speech) ⇒ Object



255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
# File 'lib/google/cloud/speech/audio.rb', line 255

def self.from_source source, speech
  audio = new
  audio.instance_variable_set :@speech, speech
  if source.respond_to?(:read) && source.respond_to?(:rewind)
    source.rewind
    audio.grpc.content = source.read
    return audio
  end
  # Convert Storage::File objects to the URL
  source = source.to_gs_url if source.respond_to? :to_gs_url
  # Everything should be a string from now on
  source = String source
  # Create an Audio from the Google Storage URL
  if source.start_with? "gs://"
    audio.grpc.uri = source
    return audio
  end
  # Create an audio from a file on the filesystem
  if File.file? source
    fail ArgumentError, "Cannot read #{source}" unless \
      File.readable? source
    audio.grpc.content = File.read source, mode: "rb"
    return audio
  end
  fail ArgumentError, "Unable to convert #{source} to an Audio"
end

Instance Method Details

#content?Boolean

Returns:

  • (Boolean)


132
133
134
# File 'lib/google/cloud/speech/audio.rb', line 132

def content?
  @grpc.audio_source == :content
end

#recognize(max_alternatives: nil, profanity_filter: nil, phrases: nil) ⇒ Array<Result>

Performs synchronous speech recognition. Sends audio data to the Speech API, which performs recognition on that data, and returns results only after all audio has been processed. Limited to audio data of 1 minute or less in duration.

The Speech API will take roughly the same amount of time to process audio data sent synchronously as the duration of the supplied audio data. That is, if you send audio data of 30 seconds in length, expect the synchronous request to take approximately 30 seconds to return results.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw, sample_rate: 16000
results = audio.recognize

result = results.first
result.transcript #=> "how old is the Brooklyn Bridge"
result.confidence #=> 0.9826789498329163

Parameters:

  • max_alternatives (String) (defaults to: nil)

    The Maximum number of recognition hypotheses to be returned. Default is 1. The service may return fewer. Valid values are 0-30. Defaults to 1. Optional.

  • profanity_filter (Boolean) (defaults to: nil)

    When ‘true`, the service will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. “f***”. Default is `false`.

  • phrases (Array<String>) (defaults to: nil)

    A list of strings containing words and phrases “hints” so that the speech recognition is more likely to recognize them. See [usage limits](cloud.google.com/speech/limits#content). Optional.

Returns:

  • (Array<Result>)

    The transcribed text of audio recognized.

See Also:



187
188
189
190
191
192
193
194
195
# File 'lib/google/cloud/speech/audio.rb', line 187

def recognize max_alternatives: nil, profanity_filter: nil, phrases: nil
  ensure_speech!

  speech.recognize self, encoding: encoding, sample_rate: sample_rate,
                         language: language,
                         max_alternatives: max_alternatives,
                         profanity_filter: profanity_filter,
                         phrases: phrases
end

#recognize_job(max_alternatives: nil, profanity_filter: nil, phrases: nil) ⇒ Job

Performs asynchronous speech recognition. Requests are processed asynchronously, meaning a Job is returned once the audio data has been sent, and can be refreshed to retrieve recognition results once the audio data has been processed.

Examples:

require "google/cloud/speech"

speech = Google::Cloud::Speech.new

audio = speech.audio "path/to/audio.raw",
                     encoding: :raw, sample_rate: 16000
job = audio.recognize_job

job.done? #=> false
job.reload!
job.done? #=> true
results = job.results

Parameters:

  • max_alternatives (String) (defaults to: nil)

    The Maximum number of recognition hypotheses to be returned. Default is 1. The service may return fewer. Valid values are 0-30. Defaults to 1. Optional.

  • profanity_filter (Boolean) (defaults to: nil)

    When ‘true`, the service will attempt to filter out profanities, replacing all but the initial character in each filtered word with asterisks, e.g. “f***”. Default is `false`.

  • phrases (Array<String>) (defaults to: nil)

    A list of strings containing words and phrases “hints” so that the speech recognition is more likely to recognize them. See [usage limits](cloud.google.com/speech/limits#content). Optional.

Returns:

  • (Job)

    A resource represents the long-running, asynchronous processing of a speech-recognition operation.

See Also:



235
236
237
238
239
240
241
242
243
244
245
# File 'lib/google/cloud/speech/audio.rb', line 235

def recognize_job max_alternatives: nil, profanity_filter: nil,
                  phrases: nil
  ensure_speech!

  speech.recognize_job self, encoding: encoding,
                             sample_rate: sample_rate,
                             language: language,
                             max_alternatives: max_alternatives,
                             profanity_filter: profanity_filter,
                             phrases: phrases
end

#to_grpcObject



249
250
251
# File 'lib/google/cloud/speech/audio.rb', line 249

def to_grpc
  @grpc
end

#url?Boolean

Returns:

  • (Boolean)


139
140
141
# File 'lib/google/cloud/speech/audio.rb', line 139

def url?
  @grpc.audio_source == :uri
end