Class: IBMWatson::SpeechToTextV1

Inherits:
IBMCloudSdkCore::BaseService
  • Object
show all
Includes:
Concurrent::Async
Defined in:
lib/ibm_watson/speech_to_text_v1.rb

Overview

The Speech to Text V1 service.

Constant Summary collapse

DEFAULT_SERVICE_NAME =
"speech_to_text"
DEFAULT_SERVICE_URL =
"https://api.us-south.speech-to-text.watson.cloud.ibm.com"

Instance Method Summary collapse

Constructor Details

#initialize(args) ⇒ SpeechToTextV1

Construct a new client for the Speech to Text service.

Parameters:

  • args (Hash)

    The args to initialize with

Options Hash (args):

  • service_url (String)

    The base service URL to use when contacting the service. The base service_url may differ between IBM Cloud regions.

  • authenticator (Object)

    The Authenticator instance to be configured for this service.

  • service_name (String)

    The name of the service to configure. Will be used as the key to load any external configuration, if applicable.



80
81
82
83
84
85
86
87
88
89
90
91
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 80

def initialize(args = {})
  @__async_initialized__ = false
  defaults = {}
  defaults[:service_url] = DEFAULT_SERVICE_URL
  defaults[:service_name] = DEFAULT_SERVICE_NAME
  defaults[:authenticator] = nil
  user_service_url = args[:service_url] unless args[:service_url].nil?
  args = defaults.merge(args)
  args[:authenticator] = IBMCloudSdkCore::ConfigBasedAuthenticatorFactory.new.get_authenticator(service_name: args[:service_name]) if args[:authenticator].nil?
  super
  @service_url = user_service_url unless user_service_url.nil?
end

Instance Method Details

#add_audio(customization_id: , audio_name: , audio_resource: , content_type: nil, contained_content_type: nil, allow_overwrite: nil) ⇒ nil

Add an audio resource. Adds an audio resource to a custom acoustic model. Add audio content that reflects

the acoustic characteristics of the audio that you plan to transcribe. You must
use credentials for the instance of the service that owns a model to add an audio
resource to it. Adding audio data does not affect the custom acoustic model until
you train the model for the new data by using the [Train a custom acoustic
model](#trainacousticmodel) method.

You can add individual audio files or an archive file that contains multiple audio
files. Adding multiple audio files via a single archive file is significantly more
efficient than adding each file individually. You can add audio resources in any
format that the service supports for speech recognition.

You can use this method to add any number of audio resources to a custom model by
calling the method once for each audio or archive file. You can add multiple
different audio resources at the same time. You must add a minimum of 10 minutes
and a maximum of 200 hours of audio that includes speech, not just silence, to a
custom acoustic model before you can train it. No audio resource, audio- or
archive-type, can be larger than 100 MB. To add an audio resource that has the
same name as an existing audio resource, set the `allow_overwrite` parameter to
`true`; otherwise, the request fails.

The method is asynchronous. It can take several seconds or minutes to complete
depending on the duration of the audio and, in the case of an archive file, the
total number of audio files being processed. The service returns a 201 response
code if the audio is valid. It then asynchronously analyzes the contents of the
audio file or files and automatically extracts information about the audio such as
its length, sampling rate, and encoding. You cannot submit requests to train or
upgrade the model until the service's analysis of all audio resources for current
requests completes.

To determine the status of the service's analysis of the audio, use the [Get an
audio resource](#getaudio) method to poll the status of the audio. The method
accepts the customization ID of the custom model and the name of the audio
resource, and it returns the status of the resource. Use a loop to check the
status of the audio every few seconds until it becomes `ok`.

**Note:** Acoustic model customization is supported only for use with
previous-generation models. It is not supported for next-generation models.

**See also:** [Add audio to the custom acoustic
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-acoustic#addAudio).

### Content types for audio-type resources

 You can add an individual audio file in any format that the service supports for
speech recognition. For an audio-type resource, use the `Content-Type` parameter
to specify the audio format (MIME type) of the audio file, including specifying
the sampling rate, channels, and endianness where indicated.
* `audio/alaw` (Specify the sampling rate (`rate`) of the audio.)
* `audio/basic` (Use only with narrowband models.)
* `audio/flac`
* `audio/g729` (Use only with narrowband models.)
* `audio/l16` (Specify the sampling rate (`rate`) and optionally the number of
channels (`channels`) and endianness (`endianness`) of the audio.)
* `audio/mp3`
* `audio/mpeg`
* `audio/mulaw` (Specify the sampling rate (`rate`) of the audio.)
* `audio/ogg` (The service automatically detects the codec of the input audio.)
* `audio/ogg;codecs=opus`
* `audio/ogg;codecs=vorbis`
* `audio/wav` (Provide audio with a maximum of nine channels.)
* `audio/webm` (The service automatically detects the codec of the input audio.)
* `audio/webm;codecs=opus`
* `audio/webm;codecs=vorbis`

The sampling rate of an audio file must match the sampling rate of the base model
for the custom model: for broadband models, at least 16 kHz; for narrowband
models, at least 8 kHz. If the sampling rate of the audio is higher than the
minimum required rate, the service down-samples the audio to the appropriate rate.
If the sampling rate of the audio is lower than the minimum required rate, the
service labels the audio file as `invalid`.

 **See also:** [Supported audio
formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats).

### Content types for archive-type resources

 You can add an archive file (**.zip** or **.tar.gz** file) that contains audio
files in any format that the service supports for speech recognition. For an
archive-type resource, use the `Content-Type` parameter to specify the media type
of the archive file:
* `application/zip` for a **.zip** file
* `application/gzip` for a **.tar.gz** file.

When you add an archive-type resource, the `Contained-Content-Type` header is
optional depending on the format of the files that you are adding:
* For audio files of type `audio/alaw`, `audio/basic`, `audio/l16`, or
`audio/mulaw`, you must use the `Contained-Content-Type` header to specify the
format of the contained audio files. Include the `rate`, `channels`, and
`endianness` parameters where necessary. In this case, all audio files contained
in the archive file must have the same audio format.
* For audio files of all other types, you can omit the `Contained-Content-Type`
header. In this case, the audio files contained in the archive file can have any
of the formats not listed in the previous bullet. The audio files do not need to
have the same format.

Do not use the `Contained-Content-Type` header when adding an audio-type resource.

### Naming restrictions for embedded audio files

 The name of an audio file that is contained in an archive-type resource can
include a maximum of 128 characters. This includes the file extension and all
elements of the name (for example, slashes).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • audio_name (String) (defaults to: )

    The name of the new audio resource for the custom acoustic model. Use a localized name that matches the language of the custom model and reflects the contents of the resource.

    • Include a maximum of 128 characters in the name.

    • Do not use characters that need to be URL-encoded. For example, do not use

    spaces, slashes, backslashes, colons, ampersands, double quotes, plus signs, equals signs, questions marks, and so on in the name. (The service does not prevent the use of these characters. But because they must be URL-encoded wherever used, their use is strongly discouraged.)

    • Do not use the name of an audio resource that has already been added to the

    custom model.

  • audio_resource (File) (defaults to: )

    The audio resource that is to be added to the custom acoustic model, an individual audio file or an archive file.

    With the ‘curl` command, use the `–data-binary` option to upload the file for the request.

  • content_type (String) (defaults to: nil)

    For an audio-type resource, the format (MIME type) of the audio. For more information, see **Content types for audio-type resources** in the method description.

    For an archive-type resource, the media type of the archive file. For more information, see **Content types for archive-type resources** in the method description.

  • contained_content_type (String) (defaults to: nil)

    _For an archive-type resource_, specify the format of the audio files that are contained in the archive file if they are of type ‘audio/alaw`, `audio/basic`, `audio/l16`, or `audio/mulaw`. Include the `rate`, `channels`, and `endianness` parameters where necessary. In this case, all audio files that are contained in the archive file must be of the indicated type.

    For all other audio formats, you can omit the header. In this case, the audio files can be of multiple types as long as they are not of the types listed in the previous paragraph.

    The parameter accepts all of the audio formats that are supported for use with speech recognition. For more information, see **Content types for audio-type resources** in the method description.

    _For an audio-type resource_, omit the header.

  • allow_overwrite (Boolean) (defaults to: nil)

    If ‘true`, the specified audio resource overwrites an existing audio resource with the same name. If `false`, the request fails if an audio resource with the same name already exists. The parameter has no effect if an audio resource with the same name does not already exist.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


3328
3329
3330
3331
3332
3333
3334
3335
3336
3337
3338
3339
3340
3341
3342
3343
3344
3345
3346
3347
3348
3349
3350
3351
3352
3353
3354
3355
3356
3357
3358
3359
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 3328

def add_audio(customization_id:, audio_name:, audio_resource:, content_type: nil, contained_content_type: nil, allow_overwrite: nil)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  raise ArgumentError.new("audio_name must be provided") if audio_name.nil?

  raise ArgumentError.new("audio_resource must be provided") if audio_resource.nil?

  headers = {
    "Content-Type" => content_type,
    "Contained-Content-Type" => contained_content_type
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "add_audio")
  headers.merge!(sdk_headers)

  params = {
    "allow_overwrite" => allow_overwrite
  }

  data = audio_resource

  method_url = "/v1/acoustic_customizations/%s/audio/%s" % [ERB::Util.url_encode(customization_id), ERB::Util.url_encode(audio_name)]

  request(
    method: "POST",
    url: method_url,
    headers: headers,
    params: params,
    data: data,
    accept_json: true
  )
  nil
end

#add_corpus(customization_id: , corpus_name: , corpus_file: , allow_overwrite: nil) ⇒ nil

Add a corpus. Adds a single corpus text file of new training data to a custom language model.

Use multiple requests to submit multiple corpus text files. You must use
credentials for the instance of the service that owns a model to add a corpus to
it. Adding a corpus does not affect the custom language model until you train the
model for the new data by using the [Train a custom language
model](#trainlanguagemodel) method.

Submit a plain text file that contains sample sentences from the domain of
interest to enable the service to parse the words in context. The more sentences
you add that represent the context in which speakers use words from the domain,
the better the service's recognition accuracy.

The call returns an HTTP 201 response code if the corpus is valid. The service
then asynchronously processes and automatically extracts data from the contents of
the corpus. This operation can take on the order of minutes to complete depending
on the current load on the service, the total number of words in the corpus, and,
_for custom models that are based on previous-generation models_, the number of
new (out-of-vocabulary) words in the corpus. You cannot submit requests to add
additional resources to the custom model or to train the model until the service's
analysis of the corpus for the current request completes. Use the [Get a
corpus](#getcorpus) method to check the status of the analysis.

_For custom models that are based on previous-generation models_, the service
auto-populates the model's words resource with words from the corpus that are not
found in its base vocabulary. These words are referred to as out-of-vocabulary
(OOV) words. After adding a corpus, you must validate the words resource to ensure
that each OOV word's definition is complete and valid. You can use the [List
custom words](#listwords) method to examine the words resource. You can use other
words method to eliminate typos and modify how words are pronounced as needed.

To add a corpus file that has the same name as an existing corpus, set the
`allow_overwrite` parameter to `true`; otherwise, the request fails. Overwriting
an existing corpus causes the service to process the corpus text file and extract
its data anew. _For a custom model that is based on a previous-generation model_,
the service first removes any OOV words that are associated with the existing
corpus from the model's words resource unless they were also added by another
corpus or grammar, or they have been modified in some way with the [Add custom
words](#addwords) or [Add a custom word](#addword) method.

The service limits the overall amount of data that you can add to a custom model
to a maximum of 10 million total words from all sources combined. _For a custom
model that is based on a previous-generation model_, you can add no more than 90
thousand custom (OOV) words to a model. This includes words that the service
extracts from corpora and grammars, and words that you add directly.

**See also:**
* [Add a corpus to the custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageCreate#addCorpus)
* [Working with corpora for previous-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords#workingCorpora)
* [Working with corpora for next-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords-ng#workingCorpora-ng)

* [Validating a words resource for previous-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords#validateModel)

* [Validating a words resource for next-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords-ng#validateModel-ng).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • corpus_name (String) (defaults to: )

    The name of the new corpus for the custom language model. Use a localized name that matches the language of the custom model and reflects the contents of the corpus.

    • Include a maximum of 128 characters in the name.

    • Do not use characters that need to be URL-encoded. For example, do not use

    spaces, slashes, backslashes, colons, ampersands, double quotes, plus signs, equals signs, questions marks, and so on in the name. (The service does not prevent the use of these characters. But because they must be URL-encoded wherever used, their use is strongly discouraged.)

    • Do not use the name of an existing corpus or grammar that is already defined for

    the custom model.

    • Do not use the name ‘user`, which is reserved by the service to denote custom

    words that are added or modified by the user.

    • Do not use the name ‘base_lm` or `default_lm`. Both names are reserved for

    future use by the service.

  • corpus_file (File) (defaults to: )

    A plain text file that contains the training data for the corpus. Encode the file in UTF-8 if it contains non-ASCII characters; the service assumes UTF-8 encoding if it encounters non-ASCII characters.

    Make sure that you know the character encoding of the file. You must use that same encoding when working with the words in the custom language model. For more information, see [Character encoding for custom words](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageWords#charEncoding).

    With the ‘curl` command, use the `–data-binary` option to upload the file for the request.

  • allow_overwrite (Boolean) (defaults to: nil)

    If ‘true`, the specified corpus overwrites an existing corpus with the same name. If `false`, the request fails if a corpus with the same name already exists. The parameter has no effect if a corpus with the same name does not already exist.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2005

def add_corpus(customization_id:, corpus_name:, corpus_file:, allow_overwrite: nil)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  raise ArgumentError.new("corpus_name must be provided") if corpus_name.nil?

  raise ArgumentError.new("corpus_file must be provided") if corpus_file.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "add_corpus")
  headers.merge!(sdk_headers)

  params = {
    "allow_overwrite" => allow_overwrite
  }

  form_data = {}

  unless corpus_file.instance_of?(StringIO) || corpus_file.instance_of?(File)
    corpus_file = corpus_file.respond_to?(:to_json) ? StringIO.new(corpus_file.to_json) : StringIO.new(corpus_file)
  end
  form_data[:corpus_file] = HTTP::FormData::File.new(corpus_file, content_type: "text/plain", filename: corpus_file.respond_to?(:path) ? corpus_file.path : nil)

  method_url = "/v1/customizations/%s/corpora/%s" % [ERB::Util.url_encode(customization_id), ERB::Util.url_encode(corpus_name)]

  request(
    method: "POST",
    url: method_url,
    headers: headers,
    params: params,
    form: form_data,
    accept_json: true
  )
  nil
end

#add_grammar(customization_id: , grammar_name: , grammar_file: , content_type: , allow_overwrite: nil) ⇒ nil

Add a grammar. Adds a single grammar file to a custom language model. Submit a plain text file in

UTF-8 format that defines the grammar. Use multiple requests to submit multiple
grammar files. You must use credentials for the instance of the service that owns
a model to add a grammar to it. Adding a grammar does not affect the custom
language model until you train the model for the new data by using the [Train a
custom language model](#trainlanguagemodel) method.

The call returns an HTTP 201 response code if the grammar is valid. The service
then asynchronously processes the contents of the grammar and automatically
extracts new words that it finds. This operation can take a few seconds or minutes
to complete depending on the size and complexity of the grammar, as well as the
current load on the service. You cannot submit requests to add additional
resources to the custom model or to train the model until the service's analysis
of the grammar for the current request completes. Use the [Get a
grammar](#getgrammar) method to check the status of the analysis.

_For grammars that are based on previous-generation models,_ the service populates
the model's words resource with any word that is recognized by the grammar that is
not found in the model's base vocabulary. These are referred to as
out-of-vocabulary (OOV) words. You can use the [List custom words](#listwords)
method to examine the words resource and use other words-related methods to
eliminate typos and modify how words are pronounced as needed. _For grammars that
are based on next-generation models,_ the service extracts no OOV words from the
grammars.

To add a grammar that has the same name as an existing grammar, set the
`allow_overwrite` parameter to `true`; otherwise, the request fails. Overwriting
an existing grammar causes the service to process the grammar file and extract OOV
words anew. Before doing so, it removes any OOV words associated with the existing
grammar from the model's words resource unless they were also added by another
resource or they have been modified in some way with the [Add custom
words](#addwords) or [Add a custom word](#addword) method.

_For grammars that are based on previous-generation models,_ the service limits
the overall amount of data that you can add to a custom model to a maximum of 10
million total words from all sources combined. Also, you can add no more than 90
thousand OOV words to a model. This includes words that the service extracts from
corpora and grammars and words that you add directly.

**See also:**
* [Understanding
grammars](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-grammarUnderstand#grammarUnderstand)
* [Add a grammar to the custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-grammarAdd#addGrammar)
* [Language support for
customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • grammar_name (String) (defaults to: )

    The name of the new grammar for the custom language model. Use a localized name that matches the language of the custom model and reflects the contents of the grammar.

    • Include a maximum of 128 characters in the name.

    • Do not use characters that need to be URL-encoded. For example, do not use

    spaces, slashes, backslashes, colons, ampersands, double quotes, plus signs, equals signs, questions marks, and so on in the name. (The service does not prevent the use of these characters. But because they must be URL-encoded wherever used, their use is strongly discouraged.)

    • Do not use the name of an existing grammar or corpus that is already defined for

    the custom model.

    • Do not use the name ‘user`, which is reserved by the service to denote custom

    words that are added or modified by the user.

    • Do not use the name ‘base_lm` or `default_lm`. Both names are reserved for

    future use by the service.

  • grammar_file (File) (defaults to: )

    A plain text file that contains the grammar in the format specified by the ‘Content-Type` header. Encode the file in UTF-8 (ASCII is a subset of UTF-8). Using any other encoding can lead to issues when compiling the grammar or to unexpected results in decoding. The service ignores an encoding that is specified in the header of the grammar.

    With the ‘curl` command, use the `–data-binary` option to upload the file for the request.

  • content_type (String) (defaults to: )

    The format (MIME type) of the grammar file:

    • ‘application/srgs` for Augmented Backus-Naur Form (ABNF), which uses a

    plain-text representation that is similar to traditional BNF grammars.

    • ‘application/srgs+xml` for XML Form, which uses XML elements to represent the

    grammar.

  • allow_overwrite (Boolean) (defaults to: nil)

    If ‘true`, the specified grammar overwrites an existing grammar with the same name. If `false`, the request fails if a grammar with the same name already exists. The parameter has no effect if a grammar with the same name does not already exist.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2610

def add_grammar(customization_id:, grammar_name:, grammar_file:, content_type:, allow_overwrite: nil)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  raise ArgumentError.new("grammar_name must be provided") if grammar_name.nil?

  raise ArgumentError.new("grammar_file must be provided") if grammar_file.nil?

  raise ArgumentError.new("content_type must be provided") if content_type.nil?

  headers = {
    "Content-Type" => content_type
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "add_grammar")
  headers.merge!(sdk_headers)

  params = {
    "allow_overwrite" => allow_overwrite
  }

  data = grammar_file

  method_url = "/v1/customizations/%s/grammars/%s" % [ERB::Util.url_encode(customization_id), ERB::Util.url_encode(grammar_name)]

  request(
    method: "POST",
    url: method_url,
    headers: headers,
    params: params,
    data: data,
    accept_json: true
  )
  nil
end

#add_word(customization_id: , word_name: , word: nil, sounds_like: nil, display_as: nil) ⇒ nil

Add a custom word. Adds a custom word to a custom language model. You can use this method to add a

word or to modify an existing word in the words resource. _For custom models that
are based on previous-generation models_, the service populates the words resource
for a custom model with out-of-vocabulary (OOV) words from each corpus or grammar
that is added to the model. You can use this method to modify OOV words in the
model's words resource.

_For a custom model that is based on a previous-generation models_, the words
resource for a model can contain a maximum of 90 thousand custom (OOV) words. This
includes words that the service extracts from corpora and grammars and words that
you add directly.

You must use credentials for the instance of the service that owns a model to add
or modify a custom word for the model. Adding or modifying a custom word does not
affect the custom model until you train the model for the new data by using the
[Train a custom language model](#trainlanguagemodel) method.

Use the `word_name` parameter to specify the custom word that is to be added or
modified. Use the `CustomWord` object to provide one or both of the optional
`display_as` or `sounds_like` fields for the word.
* The `display_as` field provides a different way of spelling the word in a
transcript. Use the parameter when you want the word to appear different from its
usual representation or from its spelling in training data. For example, you might
indicate that the word `IBM` is to be displayed as `IBM™`.
* The `sounds_like` field, _which can be used only with a custom model that is
based on a previous-generation model_, provides an array of one or more
pronunciations for the word. Use the parameter to specify how the word can be
pronounced by users. Use the parameter for words that are difficult to pronounce,
foreign words, acronyms, and so on. For example, you might specify that the word
`IEEE` can sound like `i triple e`. You can specify a maximum of five sounds-like
pronunciations for a word. If you omit the `sounds_like` field, the service
attempts to set the field to its pronunciation of the word. It cannot generate a
pronunciation for all words, so you must review the word's definition to ensure
that it is complete and valid.

If you add a custom word that already exists in the words resource for the custom
model, the new definition overwrites the existing data for the word. If the
service encounters an error, it does not add the word to the words resource. Use
the [Get a custom word](#getword) method to review the word that you add.

**See also:**
* [Add words to the custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageCreate#addWords)
* [Working with custom words for previous-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords#workingWords)
* [Working with custom words for next-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords-ng#workingWords-ng)
* [Validating a words resource for previous-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords#validateModel)
* [Validating a words resource for next-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords-ng#validateModel-ng).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • word_name (String) (defaults to: )

    The custom word that is to be added to or updated in the custom language model. Do not include spaces in the word. Use a ‘-` (dash) or `_` (underscore) to connect the tokens of compound words. URL-encode the word if it includes non-ASCII characters. For more information, see [Character encoding](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords#charEncoding).

  • word (String) (defaults to: nil)

    For the [Add custom words](#addwords) method, you must specify the custom word that is to be added to or updated in the custom model. Do not include spaces in the word. Use a ‘-` (dash) or `_` (underscore) to connect the tokens of compound words.

    Omit this parameter for the [Add a custom word](#addword) method.

  • sounds_like (Array[String]) (defaults to: nil)

    _For a custom model that is based on a previous-generation model_, an array of sounds-like pronunciations for the custom word. Specify how words that are difficult to pronounce, foreign words, acronyms, and so on can be pronounced by users.

    • For a word that is not in the service’s base vocabulary, omit the parameter to

    have the service automatically generate a sounds-like pronunciation for the word.

    • For a word that is in the service’s base vocabulary, use the parameter to

    specify additional pronunciations for the word. You cannot override the default pronunciation of a word; pronunciations you add augment the pronunciation from the base vocabulary.

    A word can have at most five sounds-like pronunciations. A pronunciation can include at most 40 characters not including spaces.

    _For a custom model that is based on a next-generation model_, omit this field. Custom models based on next-generation models do not support the ‘sounds_like` field. The service ignores the field.

  • display_as (String) (defaults to: nil)

    An alternative spelling for the custom word when it appears in a transcript. Use the parameter when you want the word to have a spelling that is different from its usual representation or from its spelling in corpora training data.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2378

def add_word(customization_id:, word_name:, word: nil, sounds_like: nil, display_as: nil)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  raise ArgumentError.new("word_name must be provided") if word_name.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "add_word")
  headers.merge!(sdk_headers)

  data = {
    "word" => word,
    "sounds_like" => sounds_like,
    "display_as" => display_as
  }

  method_url = "/v1/customizations/%s/words/%s" % [ERB::Util.url_encode(customization_id), ERB::Util.url_encode(word_name)]

  request(
    method: "PUT",
    url: method_url,
    headers: headers,
    json: data,
    accept_json: true
  )
  nil
end

#add_words(customization_id: , words: ) ⇒ nil

Add custom words. Adds one or more custom words to a custom language model. You can use this method

to add words or to modify existing words in a custom model's words resource. _For
custom models that are based on previous-generation models_, the service populates
the words resource for a custom model with out-of-vocabulary (OOV) words from each
corpus or grammar that is added to the model. You can use this method to modify
OOV words in the model's words resource.

_For a custom model that is based on a previous-generation model_, the words
resource for a model can contain a maximum of 90 thousand custom (OOV) words. This
includes words that the service extracts from corpora and grammars and words that
you add directly.

You must use credentials for the instance of the service that owns a model to add
or modify custom words for the model. Adding or modifying custom words does not
affect the custom model until you train the model for the new data by using the
[Train a custom language model](#trainlanguagemodel) method.

You add custom words by providing a `CustomWords` object, which is an array of
`CustomWord` objects, one per word. Use the object's `word` parameter to identify
the word that is to be added. You can also provide one or both of the optional
`display_as` or `sounds_like` fields for each word.
* The `display_as` field provides a different way of spelling the word in a
transcript. Use the parameter when you want the word to appear different from its
usual representation or from its spelling in training data. For example, you might
indicate that the word `IBM` is to be displayed as `IBM™`.
* The `sounds_like` field, _which can be used only with a custom model that is
based on a previous-generation model_, provides an array of one or more
pronunciations for the word. Use the parameter to specify how the word can be
pronounced by users. Use the parameter for words that are difficult to pronounce,
foreign words, acronyms, and so on. For example, you might specify that the word
`IEEE` can sound like `i triple e`. You can specify a maximum of five sounds-like
pronunciations for a word. If you omit the `sounds_like` field, the service
attempts to set the field to its pronunciation of the word. It cannot generate a
pronunciation for all words, so you must review the word's definition to ensure
that it is complete and valid.

If you add a custom word that already exists in the words resource for the custom
model, the new definition overwrites the existing data for the word. If the
service encounters an error with the input data, it returns a failure code and
does not add any of the words to the words resource.

The call returns an HTTP 201 response code if the input data is valid. It then
asynchronously processes the words to add them to the model's words resource. The
time that it takes for the analysis to complete depends on the number of new words
that you add but is generally faster than adding a corpus or grammar.

You can monitor the status of the request by using the [Get a custom language
model](#getlanguagemodel) method to poll the model's status. Use a loop to check
the status every 10 seconds. The method returns a `Customization` object that
includes a `status` field. A status of `ready` means that the words have been
added to the custom model. The service cannot accept requests to add new data or
to train the model until the existing request completes.

You can use the [List custom words](#listwords) or [Get a custom word](#getword)
method to review the words that you add. Words with an invalid `sounds_like` field
include an `error` field that describes the problem. You can use other
words-related methods to correct errors, eliminate typos, and modify how words are
pronounced as needed.

**See also:**
* [Add words to the custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageCreate#addWords)
* [Working with custom words for previous-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords#workingWords)
* [Working with custom words for next-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords-ng#workingWords-ng)
* [Validating a words resource for previous-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords#validateModel)
* [Validating a words resource for next-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords-ng#validateModel-ng).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • words (Array[CustomWord]) (defaults to: )

    An array of ‘CustomWord` objects that provides information about each custom word that is to be added to or updated in the custom language model.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2263

def add_words(customization_id:, words:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  raise ArgumentError.new("words must be provided") if words.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "add_words")
  headers.merge!(sdk_headers)

  data = {
    "words" => words
  }

  method_url = "/v1/customizations/%s/words" % [ERB::Util.url_encode(customization_id)]

  request(
    method: "POST",
    url: method_url,
    headers: headers,
    json: data,
    accept_json: true
  )
  nil
end

#check_job(id: ) ⇒ IBMCloudSdkCore::DetailedResponse

Check a job. Returns information about the specified job. The response always includes the

status of the job and its creation and update times. If the status is `completed`,
the response includes the results of the recognition request. You must use
credentials for the instance of the service that owns a job to list information
about it.

You can use the method to retrieve the results of any job, regardless of whether
it was submitted with a callback URL and the `recognitions.completed_with_results`
event, and you can retrieve the results multiple times for as long as they remain
available. Use the [Check jobs](#checkjobs) method to request information about
the most recent jobs associated with the calling credentials.

**See also:** [Checking the status and retrieving the results of a
job](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async#job).

Parameters:

  • id (String) (defaults to: )

    The identifier of the asynchronous job that is to be used for the request. You must make the request with credentials for the instance of the service that owns the job.

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 1425

def check_job(id:)
  raise ArgumentError.new("id must be provided") if id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "check_job")
  headers.merge!(sdk_headers)

  method_url = "/v1/recognitions/%s" % [ERB::Util.url_encode(id)]

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  response
end

#check_jobsIBMCloudSdkCore::DetailedResponse

Check jobs. Returns the ID and status of the latest 100 outstanding jobs associated with the

credentials with which it is called. The method also returns the creation and
update times of each job, and, if a job was created with a callback URL and a user
token, the user token for the job. To obtain the results for a job whose status is
`completed` or not one of the latest 100 outstanding jobs, use the [Check a
job[(#checkjob) method. A job and its results remain available until you delete
them with the [Delete a job](#deletejob) method or until the job's time to live
expires, whichever comes first.

**See also:** [Checking the status of the latest
jobs](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async#jobs).

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.



1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 1387

def check_jobs
  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "check_jobs")
  headers.merge!(sdk_headers)

  method_url = "/v1/recognitions"

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  response
end

#create_acoustic_model(name: , base_model_name: , description: nil) ⇒ IBMCloudSdkCore::DetailedResponse

Create a custom acoustic model. Creates a new custom acoustic model for a specified base model. The custom

acoustic model can be used only with the base model for which it is created. The
model is owned by the instance of the service whose credentials are used to create
it.

You can create a maximum of 1024 custom acoustic models per owning credentials.
The service returns an error if you attempt to create more than 1024 models. You
do not lose any models, but you cannot create any more until your model count is
below the limit.

**Note:** Acoustic model customization is supported only for use with
previous-generation models. It is not supported for next-generation models.

**Important:** Effective 15 March 2022, previous-generation models for all
languages other than Arabic and Japanese are deprecated. The deprecated models
remain available until 15 September 2022, when they will be removed from the
service and the documentation. You must migrate to the equivalent next-generation
model by the end of service date. For more information, see [Migrating to
next-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-migrate).

**See also:** [Create a custom acoustic
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-acoustic#createModel-acoustic).

Parameters:

  • name (String) (defaults to: )

    A user-defined name for the new custom acoustic model. Use a name that is unique among all custom acoustic models that you own. Use a localized name that matches the language of the custom model. Use a name that describes the acoustic environment of the custom model, such as ‘Mobile custom model` or `Noisy car custom model`.

  • base_model_name (String) (defaults to: )

    The name of the base language model that is to be customized by the new custom acoustic model. The new custom model can be used only with the base model that it customizes. (Note: The model ‘ar-AR_BroadbandModel` is deprecated; use `ar-MS_BroadbandModel` instead.)

    To determine whether a base model supports acoustic model customization, refer to [Language support for customization](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).

  • description (String) (defaults to: nil)

    A description of the new custom acoustic model. Use a localized description that matches the language of the custom model.

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


2774
2775
2776
2777
2778
2779
2780
2781
2782
2783
2784
2785
2786
2787
2788
2789
2790
2791
2792
2793
2794
2795
2796
2797
2798
2799
2800
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2774

def create_acoustic_model(name:, base_model_name:, description: nil)
  raise ArgumentError.new("name must be provided") if name.nil?

  raise ArgumentError.new("base_model_name must be provided") if base_model_name.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "create_acoustic_model")
  headers.merge!(sdk_headers)

  data = {
    "name" => name,
    "base_model_name" => base_model_name,
    "description" => description
  }

  method_url = "/v1/acoustic_customizations"

  response = request(
    method: "POST",
    url: method_url,
    headers: headers,
    json: data,
    accept_json: true
  )
  response
end

#create_job(audio: , content_type: nil, model: nil, callback_url: nil, events: nil, user_token: nil, results_ttl: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, processing_metrics: nil, processing_metrics_interval: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil, low_latency: nil) ⇒ IBMCloudSdkCore::DetailedResponse

Create a job. Creates a job for a new asynchronous recognition request. The job is owned by the

instance of the service whose credentials are used to create it. How you learn the
status and results of a job depends on the parameters you include with the job
creation request:
* By callback notification: Include the `callback_url` parameter to specify a URL
to which the service is to send callback notifications when the status of the job
changes. Optionally, you can also include the `events` and `user_token` parameters
to subscribe to specific events and to specify a string that is to be included
with each notification for the job.
* By polling the service: Omit the `callback_url`, `events`, and `user_token`
parameters. You must then use the [Check jobs](#checkjobs) or [Check a
job](#checkjob) methods to check the status of the job, using the latter to
retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job
status or obtain results from the service manually even if you include a callback
URL. In both cases, you can include the `results_ttl` parameter to specify how
long the results are to remain available after the job is complete. Using the
HTTPS [Check a job](#checkjob) method to retrieve results is more secure than
receiving them via callback notification over HTTP because it provides
confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as other HTTP and WebSocket
recognition requests. It also supports the following parameters specific to the
asynchronous interface:
* `callback_url`
* `events`
* `user_token`
* `results_ttl`

You can pass a maximum of 1 GB and a minimum of 100 bytes of audio with a request.
The service automatically detects the endianness of the incoming audio and, for
audio that includes multiple channels, downmixes the audio to one-channel mono
during transcoding. The method returns only final results; to enable interim
results, use the WebSocket API. (With the `curl` command, use the `--data-binary`
option to upload the file for the request.)

**See also:** [Creating a
job](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async#create).

### Streaming mode

 For requests to transcribe live audio as it becomes available, you must set the
`Transfer-Encoding` header to `chunked` to use streaming mode. In streaming mode,
the service closes the connection (status code 408) if it does not receive at
least 15 seconds of audio (including silence) in any 30-second period. The service
also closes the connection (status code 400) if it detects no speech for
`inactivity_timeout` seconds of streaming audio; use the `inactivity_timeout`
parameter to change the default of 30 seconds.

**See also:**
* [Audio
transmission](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-input#transmission)
*
[Timeouts](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-input#timeouts)

### Audio formats (content types)

 The service accepts audio in the following formats (MIME types).
* For formats that are labeled **Required**, you must use the `Content-Type`
header with the request to specify the format of the audio.
* For all other formats, you can omit the `Content-Type` header or specify
`application/octet-stream` with the header to have the service automatically
detect the format of the audio. (With the `curl` command, you can specify either
`"Content-Type:"` or `"Content-Type: application/octet-stream"`.)

Where indicated, the format that you specify must include the sampling rate and
can optionally include the number of channels and the endianness of the audio.
* `audio/alaw` (**Required.** Specify the sampling rate (`rate`) of the audio.)
* `audio/basic` (**Required.** Use only with narrowband models.)
* `audio/flac`
* `audio/g729` (Use only with narrowband models.)
* `audio/l16` (**Required.** Specify the sampling rate (`rate`) and optionally the
number of channels (`channels`) and endianness (`endianness`) of the audio.)
* `audio/mp3`
* `audio/mpeg`
* `audio/mulaw` (**Required.** Specify the sampling rate (`rate`) of the audio.)
* `audio/ogg` (The service automatically detects the codec of the input audio.)
* `audio/ogg;codecs=opus`
* `audio/ogg;codecs=vorbis`
* `audio/wav` (Provide audio with a maximum of nine channels.)
* `audio/webm` (The service automatically detects the codec of the input audio.)
* `audio/webm;codecs=opus`
* `audio/webm;codecs=vorbis`

The sampling rate of the audio must match the sampling rate of the model for the
recognition request: for broadband models, at least 16 kHz; for narrowband models,
at least 8 kHz. If the sampling rate of the audio is higher than the minimum
required rate, the service down-samples the audio to the appropriate rate. If the
sampling rate of the audio is lower than the minimum required rate, the request
fails.

 **See also:** [Supported audio
formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats).

### Next-generation models

 The service supports next-generation `Multimedia` (16 kHz) and `Telephony` (8
kHz) models for many languages. Next-generation models have higher throughput than
the service's previous generation of `Broadband` and `Narrowband` models. When you
use next-generation models, the service can return transcriptions more quickly and
also provide noticeably better transcription accuracy.

You specify a next-generation model by using the `model` query parameter, as you
do a previous-generation model. Many next-generation models also support the
`low_latency` parameter, which is not available with previous-generation models.
Next-generation models do not support all of the parameters that are available for
use with previous-generation models.

**Important:** Effective 15 March 2022, previous-generation models for all
languages other than Arabic and Japanese are deprecated. The deprecated models
remain available until 15 September 2022, when they will be removed from the
service and the documentation. You must migrate to the equivalent next-generation
model by the end of service date. For more information, see  [Migrating to
next-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-migrate).

**See also:**
* [Next-generation languages and
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng)
* [Supported features for next-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng#models-ng-features).

Parameters:

  • audio (File) (defaults to: )

    The audio to transcribe.

  • content_type (String) (defaults to: nil)

    The format (MIME type) of the audio. For more information about specifying an audio format, see **Audio formats (content types)** in the method description.

  • model (String) (defaults to: nil)

    The identifier of the model that is to be used for the recognition request. (Note: The model ‘ar-AR_BroadbandModel` is deprecated; use `ar-MS_BroadbandModel` instead.) See [Using a model for speech recognition](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-use).

  • callback_url (String) (defaults to: nil)

    A URL to which callback notifications are to be sent. The URL must already be successfully allowlisted by using the [Register a callback](#registercallback) method. You can include the same callback URL with any number of job creation requests. Omit the parameter to poll the service for job completion and results.

    Use the ‘user_token` parameter to specify a unique user-specified string with each job to differentiate the callback notifications for the jobs.

  • events (String) (defaults to: nil)

    If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are

    • ‘recognitions.started` generates a callback notification when the service begins

    to process the job.

    • ‘recognitions.completed` generates a callback notification when the job is

    complete. You must use the [Check a job](#checkjob) method to retrieve the results before they time out or are deleted.

    • ‘recognitions.completed_with_results` generates a callback notification when the

    job is complete. The notification includes the results of the request.

    • ‘recognitions.failed` generates a callback notification if the service

    experiences an error while processing the job.

    The ‘recognitions.completed` and `recognitions.completed_with_results` events are incompatible. You can specify only of the two events.

    If the job includes a callback URL, omit the parameter to subscribe to the default events: ‘recognitions.started`, `recognitions.completed`, and `recognitions.failed`. If the job does not include a callback URL, omit the parameter.

  • user_token (String) (defaults to: nil)

    If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.

  • results_ttl (Fixnum) (defaults to: nil)

    The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.

  • language_customization_id (String) (defaults to: nil)

    The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the ‘model` parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See [Using a custom language model for speech recognition](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse).

    Note: Use this parameter instead of the deprecated ‘customization_id` parameter.

  • acoustic_customization_id (String) (defaults to: nil)

    The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the ‘model` parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See [Using a custom acoustic model for speech recognition](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-acousticUse).

  • base_model_version (String) (defaults to: nil)

    The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See [Making speech recognition requests with upgraded custom models](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade-use#custom-upgrade-use-recognition).

  • customization_weight (Float) (defaults to: nil)

    If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

    Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.

    The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model’s domain, but it can negatively affect performance on non-domain phrases.

    See [Using customization weight](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse#weight).

  • inactivity_timeout (Fixnum) (defaults to: nil)

    The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use ‘-1` for infinity. See [Inactivity timeout](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-input#timeouts-inactivity).

  • keywords (Array[String]) (defaults to: nil)

    An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.

    You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.

    See [Keyword spotting](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#keyword-spotting).

  • keywords_threshold (Float) (defaults to: nil)

    A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See [Keyword spotting](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#keyword-spotting).

  • max_alternatives (Fixnum) (defaults to: nil)

    The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of ‘0`, the service uses the default value, `1`. See [Maximum alternatives](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#max-alternatives).

  • word_alternatives_threshold (Float) (defaults to: nil)

    A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See [Word alternatives](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#word-alternatives).

  • word_confidence (Boolean) (defaults to: nil)

    If ‘true`, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See [Word confidence](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#word-confidence).

  • timestamps (Boolean) (defaults to: nil)

    If ‘true`, the service returns time alignment for each word. By default, no timestamps are returned. See [Word timestamps](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#word-timestamps).

  • profanity_filter (Boolean) (defaults to: nil)

    If ‘true`, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to `false` to return results with no censoring.

    Note: The parameter can be used with US English and Japanese transcription only. See [Profanity filtering](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#profanity-filtering).

  • smart_formatting (Boolean) (defaults to: nil)

    If ‘true`, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.

    Note: The parameter can be used with US English, Japanese, and Spanish (all dialects) transcription only.

    See [Smart formatting](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#smart-formatting).

  • speaker_labels (Boolean) (defaults to: nil)

    If ‘true`, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting `speaker_labels` to `true` forces the `timestamps` parameter to be `true`, regardless of whether you specify `false` for the parameter.

    • _For previous-generation models,_ the parameter can be used with Australian

    English, US English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.

    • _For next-generation models,_ the parameter can be used with Czech, English

    (Australian, Indian, UK, and US), German, Japanese, Korean, and Spanish transcription only.

    See [Speaker labels](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-speaker-labels).

  • customization_id (String) (defaults to: nil)

    Deprecated. Use the ‘language_customization_id` parameter to specify the customization ID (GUID) of a custom language model that is to be used with the recognition request. Do not specify both parameters with a request.

  • grammar_name (String) (defaults to: nil)

    The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the ‘language_customization_id` parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model’s words resource.

    See [Using a grammar for speech recognition](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-grammarUse).

  • redaction (Boolean) (defaults to: nil)

    If ‘true`, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an `X` character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.

    When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the ‘keywords` and `keywords_threshold` parameters) and returns only a single final transcript (forces the `max_alternatives` parameter to be `1`).

    Note: The parameter can be used with US English, Japanese, and Korean transcription only.

    See [Numeric redaction](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#numeric-redaction).

  • processing_metrics (Boolean) (defaults to: nil)

    If ‘true`, requests processing metrics about the service’s transcription of the input audio. The service returns processing metrics at the interval specified by the ‘processing_metrics_interval` parameter. It also returns processing metrics for transcription events, for example, for final and interim results. By default, the service returns no processing metrics.

    See [Processing metrics](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#processing-metrics).

  • processing_metrics_interval (Float) (defaults to: nil)

    Specifies the interval in real wall-clock seconds at which the service is to return processing metrics. The parameter is ignored unless the ‘processing_metrics` parameter is set to `true`.

    The parameter accepts a minimum value of 0.1 seconds. The level of precision is not restricted, so you can specify values such as 0.25 and 0.125.

    The service does not impose a maximum value. If you want to receive processing metrics only for transcription events instead of at periodic intervals, set the value to a large number. If the value is larger than the duration of the audio, the service returns processing metrics only for transcription events.

    See [Processing metrics](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#processing-metrics).

  • audio_metrics (Boolean) (defaults to: nil)

    If ‘true`, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.

    See [Audio metrics](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#audio-metrics).

  • end_of_phrase_silence_time (Float) (defaults to: nil)

    If ‘true`, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.

    Specify a value for the pause interval in the range of 0.0 to 120.0.

    • A value greater than 0 specifies the interval that the service is to use for

    speech recognition.

    • A value of 0 indicates that the service is to use the default interval. It is

    equivalent to omitting the parameter.

    The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.

    See [End of phrase silence time](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-parsing#silence-time).

  • split_transcript_at_phrase_end (Boolean) (defaults to: nil)

    If ‘true`, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript.

    By default, the service splits transcripts based solely on the pause interval. If the parameters are used together on the same request, ‘end_of_phrase_silence_time` has precedence over `split_transcript_at_phrase_end`.

    See [Split transcript at phrase end](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-parsing#split-transcript).

  • speech_detector_sensitivity (Float) (defaults to: nil)

    The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.

    Specify a value between 0.0 and 1.0:

    • 0.0 suppresses all audio (no speech is transcribed).

    • 0.5 (the default) provides a reasonable compromise for the level of sensitivity.

    • 1.0 suppresses no audio (speech detection sensitivity is disabled).

    The values increase on a monotonic curve.

    The parameter is supported with all next-generation models and with most previous-generation models. See [Speech detector sensitivity](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-parameters-sensitivity) and [Language model support](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-support).

  • background_audio_suppression (Float) (defaults to: nil)

    The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.

    Specify a value in the range of 0.0 to 1.0:

    • 0.0 (the default) provides no suppression (background audio suppression is

    disabled).

    • 0.5 provides a reasonable level of audio suppression for general usage.

    • 1.0 suppresses all audio (no audio is transcribed).

    The values increase on a monotonic curve.

    The parameter is supported with all next-generation models and with most previous-generation models. See [Background audio suppression](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-parameters-suppression) and [Language model support](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-support).

  • low_latency (Boolean) (defaults to: nil)

    If ‘true` for next-generation `Multimedia` and `Telephony` models that support low latency, directs the service to produce results even more quickly than it usually does. Next-generation models produce transcription results faster than previous-generation models. The `low_latency` parameter causes the models to produce results even more quickly, though the results might be less accurate when the parameter is used.

    The parameter is not available for previous-generation ‘Broadband` and `Narrowband` models. It is available only for some next-generation models. For a list of next-generation models that support low latency, see [Supported next-generation language models](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng#models-ng-supported).

    • For more information about the ‘low_latency` parameter, see [Low

    latency](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-interim#low-latency).

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 1314

def create_job(audio:, content_type: nil, model: nil, callback_url: nil, events: nil, user_token: nil, results_ttl: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, processing_metrics: nil, processing_metrics_interval: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil, low_latency: nil)
  raise ArgumentError.new("audio must be provided") if audio.nil?

  headers = {
    "Content-Type" => content_type
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "create_job")
  headers.merge!(sdk_headers)
  keywords *= "," unless keywords.nil?

  params = {
    "model" => model,
    "callback_url" => callback_url,
    "events" => events,
    "user_token" => user_token,
    "results_ttl" => results_ttl,
    "language_customization_id" => language_customization_id,
    "acoustic_customization_id" => acoustic_customization_id,
    "base_model_version" => base_model_version,
    "customization_weight" => customization_weight,
    "inactivity_timeout" => inactivity_timeout,
    "keywords" => keywords,
    "keywords_threshold" => keywords_threshold,
    "max_alternatives" => max_alternatives,
    "word_alternatives_threshold" => word_alternatives_threshold,
    "word_confidence" => word_confidence,
    "timestamps" => timestamps,
    "profanity_filter" => profanity_filter,
    "smart_formatting" => smart_formatting,
    "speaker_labels" => speaker_labels,
    "customization_id" => customization_id,
    "grammar_name" => grammar_name,
    "redaction" => redaction,
    "processing_metrics" => processing_metrics,
    "processing_metrics_interval" => processing_metrics_interval,
    "audio_metrics" => audio_metrics,
    "end_of_phrase_silence_time" => end_of_phrase_silence_time,
    "split_transcript_at_phrase_end" => split_transcript_at_phrase_end,
    "speech_detector_sensitivity" => speech_detector_sensitivity,
    "background_audio_suppression" => background_audio_suppression,
    "low_latency" => low_latency
  }

  data = audio

  method_url = "/v1/recognitions"

  response = request(
    method: "POST",
    url: method_url,
    headers: headers,
    params: params,
    data: data,
    accept_json: true
  )
  response
end

#create_language_model(name: , base_model_name: , dialect: nil, description: nil) ⇒ IBMCloudSdkCore::DetailedResponse

Create a custom language model. Creates a new custom language model for a specified base model. The custom

language model can be used only with the base model for which it is created. The
model is owned by the instance of the service whose credentials are used to create
it.

You can create a maximum of 1024 custom language models per owning credentials.
The service returns an error if you attempt to create more than 1024 models. You
do not lose any models, but you cannot create any more until your model count is
below the limit.

**Important:** Effective 15 March 2022, previous-generation models for all
languages other than Arabic and Japanese are deprecated. The deprecated models
remain available until 15 September 2022, when they will be removed from the
service and the documentation. You must migrate to the equivalent next-generation
model by the end of service date. For more information, see [Migrating to
next-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-migrate).

**See also:**
* [Create a custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageCreate#createModel-language)
* [Language support for
customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).

Parameters:

  • name (String) (defaults to: )

    A user-defined name for the new custom language model. Use a name that is unique among all custom language models that you own. Use a localized name that matches the language of the custom model. Use a name that describes the domain of the custom model, such as ‘Medical custom model` or `Legal custom model`.

  • base_model_name (String) (defaults to: )

    The name of the base language model that is to be customized by the new custom language model. The new custom model can be used only with the base model that it customizes.

    To determine whether a base model supports language model customization, use the [Get a model](#getmodel) method and check that the attribute ‘custom_language_model` is set to `true`. You can also refer to [Language support for customization](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).

  • dialect (String) (defaults to: nil)

    The dialect of the specified language that is to be used with the custom language model. _For all languages, it is always safe to omit this field._ The service automatically uses the language identifier from the name of the base model. For example, the service automatically uses ‘en-US` for all US English models.

    If you specify the ‘dialect` for a new custom model, follow these guidelines. _For non-Spanish previous-generation models and for next-generation models,_ you must specify a value that matches the five-character language identifier from the name of the base model. _For Spanish previous-generation models,_ you must specify one of the following values:

    • ‘es-ES` for Castilian Spanish (`es-ES` models)

    • ‘es-LA` for Latin American Spanish (`es-AR`, `es-CL`, `es-CO`, and `es-PE`

    models)

    • ‘es-US` for Mexican (North American) Spanish (`es-MX` models)

    All values that you pass for the ‘dialect` field are case-insensitive.

  • description (String) (defaults to: nil)

    A description of the new custom language model. Use a localized description that matches the language of the custom model.

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 1540

def create_language_model(name:, base_model_name:, dialect: nil, description: nil)
  raise ArgumentError.new("name must be provided") if name.nil?

  raise ArgumentError.new("base_model_name must be provided") if base_model_name.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "create_language_model")
  headers.merge!(sdk_headers)

  data = {
    "name" => name,
    "base_model_name" => base_model_name,
    "dialect" => dialect,
    "description" => description
  }

  method_url = "/v1/customizations"

  response = request(
    method: "POST",
    url: method_url,
    headers: headers,
    json: data,
    accept_json: true
  )
  response
end

#delete_acoustic_model(customization_id: ) ⇒ nil

Delete a custom acoustic model. Deletes an existing custom acoustic model. The custom model cannot be deleted if

another request, such as adding an audio resource to the model, is currently being
processed. You must use credentials for the instance of the service that owns a
model to delete it.

**Note:** Acoustic model customization is supported only for use with
previous-generation models. It is not supported for next-generation models.

**See also:** [Deleting a custom acoustic
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAcousticModels#deleteModel-acoustic).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


2900
2901
2902
2903
2904
2905
2906
2907
2908
2909
2910
2911
2912
2913
2914
2915
2916
2917
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2900

def delete_acoustic_model(customization_id:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "delete_acoustic_model")
  headers.merge!(sdk_headers)

  method_url = "/v1/acoustic_customizations/%s" % [ERB::Util.url_encode(customization_id)]

  request(
    method: "DELETE",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  nil
end

#delete_audio(customization_id: , audio_name: ) ⇒ nil

Delete an audio resource. Deletes an existing audio resource from a custom acoustic model. Deleting an

archive-type audio resource removes the entire archive of files. The service does
not allow deletion of individual files from an archive resource.

Removing an audio resource does not affect the custom model until you train the
model on its updated data by using the [Train a custom acoustic
model](#trainacousticmodel) method. You can delete an existing audio resource from
a model while a different resource is being added to the model. You must use
credentials for the instance of the service that owns a model to delete its audio
resources.

**Note:** Acoustic model customization is supported only for use with
previous-generation models. It is not supported for next-generation models.

**See also:** [Deleting an audio resource from a custom acoustic
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAudio#deleteAudio).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • audio_name (String) (defaults to: )

    The name of the audio resource for the custom acoustic model.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


3440
3441
3442
3443
3444
3445
3446
3447
3448
3449
3450
3451
3452
3453
3454
3455
3456
3457
3458
3459
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 3440

def delete_audio(customization_id:, audio_name:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  raise ArgumentError.new("audio_name must be provided") if audio_name.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "delete_audio")
  headers.merge!(sdk_headers)

  method_url = "/v1/acoustic_customizations/%s/audio/%s" % [ERB::Util.url_encode(customization_id), ERB::Util.url_encode(audio_name)]

  request(
    method: "DELETE",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  nil
end

#delete_corpus(customization_id: , corpus_name: ) ⇒ nil

Delete a corpus. Deletes an existing corpus from a custom language model. Removing a corpus does

not affect the custom model until you train the model with the [Train a custom
language model](#trainlanguagemodel) method. You must use credentials for the
instance of the service that owns a model to delete its corpora.

_For custom models that are based on previous-generation models_, the service
removes any out-of-vocabulary (OOV) words that are associated with the corpus from
the custom model's words resource unless they were also added by another corpus or
grammar, or they were modified in some way with the [Add custom words](#addwords)
or [Add a custom word](#addword) method.

**See also:** [Deleting a corpus from a custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageCorpora#deleteCorpus).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • corpus_name (String) (defaults to: )

    The name of the corpus for the custom language model.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2101

def delete_corpus(customization_id:, corpus_name:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  raise ArgumentError.new("corpus_name must be provided") if corpus_name.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "delete_corpus")
  headers.merge!(sdk_headers)

  method_url = "/v1/customizations/%s/corpora/%s" % [ERB::Util.url_encode(customization_id), ERB::Util.url_encode(corpus_name)]

  request(
    method: "DELETE",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  nil
end

#delete_grammar(customization_id: , grammar_name: ) ⇒ nil

Delete a grammar. Deletes an existing grammar from a custom language model. _For grammars that are

based on previous-generation models,_ the service removes any out-of-vocabulary
(OOV) words associated with the grammar from the custom model's words resource
unless they were also added by another resource or they were modified in some way
with the [Add custom words](#addwords) or [Add a custom word](#addword) method.
Removing a grammar does not affect the custom model until you train the model with
the [Train a custom language model](#trainlanguagemodel) method. You must use
credentials for the instance of the service that owns a model to delete its
grammar.

**See also:**
* [Deleting a grammar from a custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageGrammars#deleteGrammar)
* [Language support for
customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • grammar_name (String) (defaults to: )

    The name of the grammar for the custom language model.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2707

def delete_grammar(customization_id:, grammar_name:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  raise ArgumentError.new("grammar_name must be provided") if grammar_name.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "delete_grammar")
  headers.merge!(sdk_headers)

  method_url = "/v1/customizations/%s/grammars/%s" % [ERB::Util.url_encode(customization_id), ERB::Util.url_encode(grammar_name)]

  request(
    method: "DELETE",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  nil
end

#delete_job(id: ) ⇒ nil

Delete a job. Deletes the specified job. You cannot delete a job that the service is actively

processing. Once you delete a job, its results are no longer available. The
service automatically deletes a job and its results when the time to live for the
results expires. You must use credentials for the instance of the service that
owns a job to delete it.

**See also:** [Deleting a
job](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async#delete-async).

Parameters:

  • id (String) (defaults to: )

    The identifier of the asynchronous job that is to be used for the request. You must make the request with credentials for the instance of the service that owns the job.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 1459

def delete_job(id:)
  raise ArgumentError.new("id must be provided") if id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "delete_job")
  headers.merge!(sdk_headers)

  method_url = "/v1/recognitions/%s" % [ERB::Util.url_encode(id)]

  request(
    method: "DELETE",
    url: method_url,
    headers: headers,
    accept_json: false
  )
  nil
end

#delete_language_model(customization_id: ) ⇒ nil

Delete a custom language model. Deletes an existing custom language model. The custom model cannot be deleted if

another request, such as adding a corpus or grammar to the model, is currently
being processed. You must use credentials for the instance of the service that
owns a model to delete it.

**See also:**
* [Deleting a custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageLanguageModels#deleteModel-language)
* [Language support for
customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 1667

def delete_language_model(customization_id:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "delete_language_model")
  headers.merge!(sdk_headers)

  method_url = "/v1/customizations/%s" % [ERB::Util.url_encode(customization_id)]

  request(
    method: "DELETE",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  nil
end

#delete_user_data(customer_id: ) ⇒ nil

Delete labeled data. Deletes all data that is associated with a specified customer ID. The method

deletes all data for the customer ID, regardless of the method by which the
information was added. The method has no effect if no data is associated with the
customer ID. You must issue the request with credentials for the same instance of
the service that was used to associate the customer ID with the data. You
associate a customer ID with data by passing the `X-Watson-Metadata` header with a
request that passes the data.

**Note:** If you delete an instance of the service from the service console, all
data associated with that service instance is automatically deleted. This includes
all custom language models, corpora, grammars, and words; all custom acoustic
models and audio resources; all registered endpoints for the asynchronous HTTP
interface; and all data related to speech recognition requests.

**See also:** [Information
security](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-information-security#information-security).

Parameters:

  • customer_id (String) (defaults to: )

    The customer ID for which all data is to be deleted.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


3485
3486
3487
3488
3489
3490
3491
3492
3493
3494
3495
3496
3497
3498
3499
3500
3501
3502
3503
3504
3505
3506
3507
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 3485

def delete_user_data(customer_id:)
  raise ArgumentError.new("customer_id must be provided") if customer_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "delete_user_data")
  headers.merge!(sdk_headers)

  params = {
    "customer_id" => customer_id
  }

  method_url = "/v1/user_data"

  request(
    method: "DELETE",
    url: method_url,
    headers: headers,
    params: params,
    accept_json: false
  )
  nil
end

#delete_word(customization_id: , word_name: ) ⇒ nil

Delete a custom word. Deletes a custom word from a custom language model. You can remove any word that

you added to the custom model's words resource via any means. However, if the word
also exists in the service's base vocabulary, the service removes the word only
from the words resource; the word remains in the base vocabulary. Removing a
custom word does not affect the custom model until you train the model with the
[Train a custom language model](#trainlanguagemodel) method. You must use
credentials for the instance of the service that owns a model to delete its words.

**See also:** [Deleting a word from a custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageWords#deleteWord).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • word_name (String) (defaults to: )

    The custom word that is to be deleted from the custom language model. URL-encode the word if it includes non-ASCII characters. For more information, see [Character encoding](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords#charEncoding).

Returns:

  • (nil)

Raises:

  • (ArgumentError)


2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2464

def delete_word(customization_id:, word_name:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  raise ArgumentError.new("word_name must be provided") if word_name.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "delete_word")
  headers.merge!(sdk_headers)

  method_url = "/v1/customizations/%s/words/%s" % [ERB::Util.url_encode(customization_id), ERB::Util.url_encode(word_name)]

  request(
    method: "DELETE",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  nil
end

#get_acoustic_model(customization_id: ) ⇒ IBMCloudSdkCore::DetailedResponse

Get a custom acoustic model. Gets information about a specified custom acoustic model. You must use credentials

for the instance of the service that owns a model to list information about it.

**Note:** Acoustic model customization is supported only for use with
previous-generation models. It is not supported for next-generation models.

**See also:** [Listing custom acoustic
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAcousticModels#listModels-acoustic).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


2864
2865
2866
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
2879
2880
2881
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2864

def get_acoustic_model(customization_id:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "get_acoustic_model")
  headers.merge!(sdk_headers)

  method_url = "/v1/acoustic_customizations/%s" % [ERB::Util.url_encode(customization_id)]

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  response
end

#get_audio(customization_id: , audio_name: ) ⇒ IBMCloudSdkCore::DetailedResponse

Get an audio resource. Gets information about an audio resource from a custom acoustic model. The method

returns an `AudioListing` object whose fields depend on the type of audio resource
that you specify with the method's `audio_name` parameter:
* _For an audio-type resource_, the object's fields match those of an
`AudioResource` object: `duration`, `name`, `details`, and `status`.
* _For an archive-type resource_, the object includes a `container` field whose
fields match those of an `AudioResource` object. It also includes an `audio`
field, which contains an array of `AudioResource` objects that provides
information about the audio files that are contained in the archive.

The information includes the status of the specified audio resource. The status is
important for checking the service's analysis of a resource that you add to the
custom model.
* _For an audio-type resource_, the `status` field is located in the
`AudioListing` object.
* _For an archive-type resource_, the `status` field is located in the
`AudioResource` object that is returned in the `container` field.

You must use credentials for the instance of the service that owns a model to list
its audio resources.

**Note:** Acoustic model customization is supported only for use with
previous-generation models. It is not supported for next-generation models.

**See also:** [Listing audio resources for a custom acoustic
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAudio#listAudio).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • audio_name (String) (defaults to: )

    The name of the audio resource for the custom acoustic model.

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


3395
3396
3397
3398
3399
3400
3401
3402
3403
3404
3405
3406
3407
3408
3409
3410
3411
3412
3413
3414
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 3395

def get_audio(customization_id:, audio_name:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  raise ArgumentError.new("audio_name must be provided") if audio_name.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "get_audio")
  headers.merge!(sdk_headers)

  method_url = "/v1/acoustic_customizations/%s/audio/%s" % [ERB::Util.url_encode(customization_id), ERB::Util.url_encode(audio_name)]

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  response
end

#get_corpus(customization_id: , corpus_name: ) ⇒ IBMCloudSdkCore::DetailedResponse

Get a corpus. Gets information about a corpus from a custom language model. The information

includes the name, status, and total number of words for the corpus. _For custom
models that are based on previous-generation models_, it also includes the number
of out-of-vocabulary (OOV) words from the corpus. You must use credentials for the
instance of the service that owns a model to list its corpora.

**See also:** [Listing corpora for a custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageCorpora#listCorpora).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • corpus_name (String) (defaults to: )

    The name of the corpus for the custom language model.

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2057

def get_corpus(customization_id:, corpus_name:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  raise ArgumentError.new("corpus_name must be provided") if corpus_name.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "get_corpus")
  headers.merge!(sdk_headers)

  method_url = "/v1/customizations/%s/corpora/%s" % [ERB::Util.url_encode(customization_id), ERB::Util.url_encode(corpus_name)]

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  response
end

#get_grammar(customization_id: , grammar_name: ) ⇒ IBMCloudSdkCore::DetailedResponse

Get a grammar. Gets information about a grammar from a custom language model. For each grammar,

the information includes the name, status, and (for grammars that are based on
previous-generation models) the total number of out-of-vocabulary (OOV) words. You
must use credentials for the instance of the service that owns a model to list its
grammars.

**See also:**
* [Listing grammars from a custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageGrammars#listGrammars)
* [Language support for
customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • grammar_name (String) (defaults to: )

    The name of the grammar for the custom language model.

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2663

def get_grammar(customization_id:, grammar_name:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  raise ArgumentError.new("grammar_name must be provided") if grammar_name.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "get_grammar")
  headers.merge!(sdk_headers)

  method_url = "/v1/customizations/%s/grammars/%s" % [ERB::Util.url_encode(customization_id), ERB::Util.url_encode(grammar_name)]

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  response
end

#get_language_model(customization_id: ) ⇒ IBMCloudSdkCore::DetailedResponse

Get a custom language model. Gets information about a specified custom language model. You must use credentials

for the instance of the service that owns a model to list information about it.

**See also:**
* [Listing custom language
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageLanguageModels#listModels-language)
* [Language support for
customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 1631

def get_language_model(customization_id:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "get_language_model")
  headers.merge!(sdk_headers)

  method_url = "/v1/customizations/%s" % [ERB::Util.url_encode(customization_id)]

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  response
end

#get_model(model_id: ) ⇒ IBMCloudSdkCore::DetailedResponse

Get a model. Gets information for a single specified language model that is available for use

with the service. The information includes the name of the model and its minimum
sampling rate in Hertz, among other things.

**See also:** [Listing a specific
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-list#models-list-specific).

Parameters:

  • model_id (String) (defaults to: )

    The identifier of the model in the form of its name from the output of the [List models](#listmodels) method. (Note: The model ‘ar-AR_BroadbandModel` is deprecated; use `ar-MS_BroadbandModel` instead.).

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 138

def get_model(model_id:)
  raise ArgumentError.new("model_id must be provided") if model_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "get_model")
  headers.merge!(sdk_headers)

  method_url = "/v1/models/%s" % [ERB::Util.url_encode(model_id)]

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  response
end

#get_word(customization_id: , word_name: ) ⇒ IBMCloudSdkCore::DetailedResponse

Get a custom word. Gets information about a custom word from a custom language model. You must use

credentials for the instance of the service that owns a model to list information
about its words.

**See also:** [Listing words from a custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageWords#listWords).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • word_name (String) (defaults to: )

    The custom word that is to be read from the custom language model. URL-encode the word if it includes non-ASCII characters. For more information, see [Character encoding](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-corporaWords#charEncoding).

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2422

def get_word(customization_id:, word_name:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  raise ArgumentError.new("word_name must be provided") if word_name.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "get_word")
  headers.merge!(sdk_headers)

  method_url = "/v1/customizations/%s/words/%s" % [ERB::Util.url_encode(customization_id), ERB::Util.url_encode(word_name)]

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  response
end

#list_acoustic_models(language: nil) ⇒ IBMCloudSdkCore::DetailedResponse

List custom acoustic models. Lists information about all custom acoustic models that are owned by an instance

of the service. Use the `language` parameter to see all custom acoustic models for
the specified language. Omit the parameter to see all custom acoustic models for
all languages. You must use credentials for the instance of the service that owns
a model to list information about it.

**Note:** Acoustic model customization is supported only for use with
previous-generation models. It is not supported for next-generation models.

**See also:** [Listing custom acoustic
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAcousticModels#listModels-acoustic).

Parameters:

  • language (String) (defaults to: nil)

    The identifier of the language for which custom language or custom acoustic models are to be returned. Specify the five-character language identifier; for example, specify ‘en-US` to see all custom language or custom acoustic models that are based on US English models. Omit the parameter to see all custom language or custom acoustic models that are owned by the requesting credentials. (Note: The identifier `ar-AR` is deprecated; use `ar-MS` instead.)

    To determine the languages for which customization is available, see [Language support for customization](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.



2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840
2841
2842
2843
2844
2845
2846
2847
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2827

def list_acoustic_models(language: nil)
  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "list_acoustic_models")
  headers.merge!(sdk_headers)

  params = {
    "language" => language
  }

  method_url = "/v1/acoustic_customizations"

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    params: params,
    accept_json: true
  )
  response
end

#list_audio(customization_id: ) ⇒ IBMCloudSdkCore::DetailedResponse

List audio resources. Lists information about all audio resources from a custom acoustic model. The

information includes the name of the resource and information about its audio
data, such as its duration. It also includes the status of the audio resource,
which is important for checking the service's analysis of the resource in response
to a request to add it to the custom acoustic model. You must use credentials for
the instance of the service that owns a model to list its audio resources.

**Note:** Acoustic model customization is supported only for use with
previous-generation models. It is not supported for next-generation models.

**See also:** [Listing audio resources for a custom acoustic
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAudio#listAudio).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


3154
3155
3156
3157
3158
3159
3160
3161
3162
3163
3164
3165
3166
3167
3168
3169
3170
3171
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 3154

def list_audio(customization_id:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "list_audio")
  headers.merge!(sdk_headers)

  method_url = "/v1/acoustic_customizations/%s/audio" % [ERB::Util.url_encode(customization_id)]

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  response
end

#list_corpora(customization_id: ) ⇒ IBMCloudSdkCore::DetailedResponse

List corpora. Lists information about all corpora from a custom language model. The information

includes the name, status, and total number of words for each corpus. _For custom
models that are based on previous-generation models_, it also includes the number
of out-of-vocabulary (OOV) words from the corpus. You must use credentials for the
instance of the service that owns a model to list its corpora.

**See also:** [Listing corpora for a custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageCorpora#listCorpora).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 1891

def list_corpora(customization_id:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "list_corpora")
  headers.merge!(sdk_headers)

  method_url = "/v1/customizations/%s/corpora" % [ERB::Util.url_encode(customization_id)]

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  response
end

#list_grammars(customization_id: ) ⇒ IBMCloudSdkCore::DetailedResponse

List grammars. Lists information about all grammars from a custom language model. For each

grammar, the information includes the name, status, and (for grammars that are
based on previous-generation models) the total number of out-of-vocabulary (OOV)
words. You must use credentials for the instance of the service that owns a model
to list its grammars.

**See also:**
* [Listing grammars from a custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageGrammars#listGrammars)
* [Language support for
customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2506

def list_grammars(customization_id:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "list_grammars")
  headers.merge!(sdk_headers)

  method_url = "/v1/customizations/%s/grammars" % [ERB::Util.url_encode(customization_id)]

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  response
end

#list_language_models(language: nil) ⇒ IBMCloudSdkCore::DetailedResponse

List custom language models. Lists information about all custom language models that are owned by an instance

of the service. Use the `language` parameter to see all custom language models for
the specified language. Omit the parameter to see all custom language models for
all languages. You must use credentials for the instance of the service that owns
a model to list information about it.

**See also:**
* [Listing custom language
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageLanguageModels#listModels-language)
* [Language support for
customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).

Parameters:

  • language (String) (defaults to: nil)

    The identifier of the language for which custom language or custom acoustic models are to be returned. Specify the five-character language identifier; for example, specify ‘en-US` to see all custom language or custom acoustic models that are based on US English models. Omit the parameter to see all custom language or custom acoustic models that are owned by the requesting credentials. (Note: The identifier `ar-AR` is deprecated; use `ar-MS` instead.)

    To determine the languages for which customization is available, see [Language support for customization](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.



1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 1594

def list_language_models(language: nil)
  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "list_language_models")
  headers.merge!(sdk_headers)

  params = {
    "language" => language
  }

  method_url = "/v1/customizations"

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    params: params,
    accept_json: true
  )
  response
end

#list_modelsIBMCloudSdkCore::DetailedResponse

List models. Lists all language models that are available for use with the service. The

information includes the name of the model and its minimum sampling rate in Hertz,
among other things. The ordering of the list of models can change from call to
call; do not rely on an alphabetized or static list of models.

**See also:** [Listing all
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-list#models-list-all).

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.



108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 108

def list_models
  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "list_models")
  headers.merge!(sdk_headers)

  method_url = "/v1/models"

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  response
end

#list_words(customization_id: , word_type: nil, sort: nil) ⇒ IBMCloudSdkCore::DetailedResponse

List custom words. Lists information about custom words from a custom language model. You can list

all words from the custom model's words resource, only custom words that were
added or modified by the user, or, _for a custom model that is based on a
previous-generation model_, only out-of-vocabulary (OOV) words that were extracted
from corpora or are recognized by grammars. You can also indicate the order in
which the service is to return words; by default, the service lists words in
ascending alphabetical order. You must use credentials for the instance of the
service that owns a model to list information about its words.

**See also:** [Listing words from a custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageWords#listWords).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • word_type (String) (defaults to: nil)

    The type of words to be listed from the custom language model’s words resource:

    • ‘all` (the default) shows all words.

    • ‘user` shows only custom words that were added or modified by the user directly.

    • ‘corpora` shows only OOV that were extracted from corpora.

    • ‘grammars` shows only OOV words that are recognized by grammars.

    _For a custom model that is based on a next-generation model_, only ‘all` and `user` apply. Both options return the same results. Words from other sources are not added to custom models that are based on next-generation models.

  • sort (String) (defaults to: nil)

    Indicates the order in which the words are to be listed, ‘alphabetical` or by `count`. You can prepend an optional `+` or `-` to an argument to indicate whether the results are to be sorted in ascending or descending order. By default, words are sorted in ascending alphabetical order. For alphabetical ordering, the lexicographical precedence is numeric values, uppercase letters, and lowercase letters. For count ordering, values with the same count are ordered alphabetically. With the `curl` command, URL-encode the `+` symbol as `%2B`.

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2159

def list_words(customization_id:, word_type: nil, sort: nil)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "list_words")
  headers.merge!(sdk_headers)

  params = {
    "word_type" => word_type,
    "sort" => sort
  }

  method_url = "/v1/customizations/%s/words" % [ERB::Util.url_encode(customization_id)]

  response = request(
    method: "GET",
    url: method_url,
    headers: headers,
    params: params,
    accept_json: true
  )
  response
end

#recognize(audio: , content_type: nil, model: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil, low_latency: nil) ⇒ IBMCloudSdkCore::DetailedResponse

Recognize audio. Sends audio and returns transcription results for a recognition request. You can

pass a maximum of 100 MB and a minimum of 100 bytes of audio with a request. The
service automatically detects the endianness of the incoming audio and, for audio
that includes multiple channels, downmixes the audio to one-channel mono during
transcoding. The method returns only final results; to enable interim results, use
the WebSocket API. (With the `curl` command, use the `--data-binary` option to
upload the file for the request.)

**See also:** [Making a basic HTTP
request](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-http#HTTP-basic).

### Streaming mode

 For requests to transcribe live audio as it becomes available, you must set the
`Transfer-Encoding` header to `chunked` to use streaming mode. In streaming mode,
the service closes the connection (status code 408) if it does not receive at
least 15 seconds of audio (including silence) in any 30-second period. The service
also closes the connection (status code 400) if it detects no speech for
`inactivity_timeout` seconds of streaming audio; use the `inactivity_timeout`
parameter to change the default of 30 seconds.

**See also:**
* [Audio
transmission](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-input#transmission)
*
[Timeouts](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-input#timeouts)

### Audio formats (content types)

 The service accepts audio in the following formats (MIME types).
* For formats that are labeled **Required**, you must use the `Content-Type`
header with the request to specify the format of the audio.
* For all other formats, you can omit the `Content-Type` header or specify
`application/octet-stream` with the header to have the service automatically
detect the format of the audio. (With the `curl` command, you can specify either
`"Content-Type:"` or `"Content-Type: application/octet-stream"`.)

Where indicated, the format that you specify must include the sampling rate and
can optionally include the number of channels and the endianness of the audio.
* `audio/alaw` (**Required.** Specify the sampling rate (`rate`) of the audio.)
* `audio/basic` (**Required.** Use only with narrowband models.)
* `audio/flac`
* `audio/g729` (Use only with narrowband models.)
* `audio/l16` (**Required.** Specify the sampling rate (`rate`) and optionally the
number of channels (`channels`) and endianness (`endianness`) of the audio.)
* `audio/mp3`
* `audio/mpeg`
* `audio/mulaw` (**Required.** Specify the sampling rate (`rate`) of the audio.)
* `audio/ogg` (The service automatically detects the codec of the input audio.)
* `audio/ogg;codecs=opus`
* `audio/ogg;codecs=vorbis`
* `audio/wav` (Provide audio with a maximum of nine channels.)
* `audio/webm` (The service automatically detects the codec of the input audio.)
* `audio/webm;codecs=opus`
* `audio/webm;codecs=vorbis`

The sampling rate of the audio must match the sampling rate of the model for the
recognition request: for broadband models, at least 16 kHz; for narrowband models,
at least 8 kHz. If the sampling rate of the audio is higher than the minimum
required rate, the service down-samples the audio to the appropriate rate. If the
sampling rate of the audio is lower than the minimum required rate, the request
fails.

 **See also:** [Supported audio
formats](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats).

### Next-generation models

 The service supports next-generation `Multimedia` (16 kHz) and `Telephony` (8
kHz) models for many languages. Next-generation models have higher throughput than
the service's previous generation of `Broadband` and `Narrowband` models. When you
use next-generation models, the service can return transcriptions more quickly and
also provide noticeably better transcription accuracy.

You specify a next-generation model by using the `model` query parameter, as you
do a previous-generation model. Many next-generation models also support the
`low_latency` parameter, which is not available with previous-generation models.
Next-generation models do not support all of the parameters that are available for
use with previous-generation models.

**Important:** Effective 15 March 2022, previous-generation models for all
languages other than Arabic and Japanese are deprecated. The deprecated models
remain available until 15 September 2022, when they will be removed from the
service and the documentation. You must migrate to the equivalent next-generation
model by the end of service date. For more information, see [Migrating to
next-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-migrate).

**See also:**
* [Next-generation languages and
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng)
* [Supported features for next-generation
models](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng#models-ng-features)

### Multipart speech recognition

 **Note:** The asynchronous HTTP interface, WebSocket interface, and Watson SDKs
do not support multipart speech recognition.

The HTTP `POST` method of the service also supports multipart speech recognition.
With multipart requests, you pass all audio data as multipart form data. You
specify some parameters as request headers and query parameters, but you pass JSON
metadata as form data to control most aspects of the transcription. You can use
multipart recognition to pass multiple audio files with a single request.

Use the multipart approach with browsers for which JavaScript is disabled or when
the parameters used with the request are greater than the 8 KB limit imposed by
most HTTP servers and proxies. You can encounter this limit, for example, if you
want to spot a very large number of keywords.

**See also:** [Making a multipart HTTP
request](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-http#HTTP-multi).

Parameters:

  • audio (File) (defaults to: )

    The audio to transcribe.

  • content_type (String) (defaults to: nil)

    The format (MIME type) of the audio. For more information about specifying an audio format, see **Audio formats (content types)** in the method description.

  • model (String) (defaults to: nil)

    The identifier of the model that is to be used for the recognition request. (Note: The model ‘ar-AR_BroadbandModel` is deprecated; use `ar-MS_BroadbandModel` instead.) See [Using a model for speech recognition](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-use).

  • language_customization_id (String) (defaults to: nil)

    The customization ID (GUID) of a custom language model that is to be used with the recognition request. The base model of the specified custom language model must match the model specified with the ‘model` parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom language model is used. See [Using a custom language model for speech recognition](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse).

    Note: Use this parameter instead of the deprecated ‘customization_id` parameter.

  • acoustic_customization_id (String) (defaults to: nil)

    The customization ID (GUID) of a custom acoustic model that is to be used with the recognition request. The base model of the specified custom acoustic model must match the model specified with the ‘model` parameter. You must make the request with credentials for the instance of the service that owns the custom model. By default, no custom acoustic model is used. See [Using a custom acoustic model for speech recognition](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-acousticUse).

  • base_model_version (String) (defaults to: nil)

    The version of the specified base model that is to be used with the recognition request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See [Making speech recognition requests with upgraded custom models](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade-use#custom-upgrade-use-recognition).

  • customization_weight (Float) (defaults to: nil)

    If you specify the customization ID (GUID) of a custom language model with the recognition request, the customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

    Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.

    The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model’s domain, but it can negatively affect performance on non-domain phrases.

    See [Using customization weight](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse#weight).

  • inactivity_timeout (Fixnum) (defaults to: nil)

    The time in seconds after which, if only silence (no speech) is detected in streaming audio, the connection is closed with a 400 error. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use ‘-1` for infinity. See [Inactivity timeout](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-input#timeouts-inactivity).

  • keywords (Array[String]) (defaults to: nil)

    An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.

    You can spot a maximum of 1000 keywords with a single request. A single keyword can have a maximum length of 1024 characters, though the maximum effective length for double-byte languages might be shorter. Keywords are case-insensitive.

    See [Keyword spotting](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#keyword-spotting).

  • keywords_threshold (Float) (defaults to: nil)

    A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. If you specify a threshold, you must also specify one or more keywords. The service performs no keyword spotting if you omit either parameter. See [Keyword spotting](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#keyword-spotting).

  • max_alternatives (Fixnum) (defaults to: nil)

    The maximum number of alternative transcripts that the service is to return. By default, the service returns a single transcript. If you specify a value of ‘0`, the service uses the default value, `1`. See [Maximum alternatives](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#max-alternatives).

  • word_alternatives_threshold (Float) (defaults to: nil)

    A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. By default, the service computes no alternative words. See [Word alternatives](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-spotting#word-alternatives).

  • word_confidence (Boolean) (defaults to: nil)

    If ‘true`, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, the service returns no word confidence scores. See [Word confidence](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#word-confidence).

  • timestamps (Boolean) (defaults to: nil)

    If ‘true`, the service returns time alignment for each word. By default, no timestamps are returned. See [Word timestamps](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metadata#word-timestamps).

  • profanity_filter (Boolean) (defaults to: nil)

    If ‘true`, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to `false` to return results with no censoring.

    Note: The parameter can be used with US English and Japanese transcription only. See [Profanity filtering](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#profanity-filtering).

  • smart_formatting (Boolean) (defaults to: nil)

    If ‘true`, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, the service performs no smart formatting.

    Note: The parameter can be used with US English, Japanese, and Spanish (all dialects) transcription only.

    See [Smart formatting](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#smart-formatting).

  • speaker_labels (Boolean) (defaults to: nil)

    If ‘true`, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, the service returns no speaker labels. Setting `speaker_labels` to `true` forces the `timestamps` parameter to be `true`, regardless of whether you specify `false` for the parameter.

    • _For previous-generation models,_ the parameter can be used with Australian

    English, US English, German, Japanese, Korean, and Spanish (both broadband and narrowband models) and UK English (narrowband model) transcription only.

    • _For next-generation models,_ the parameter can be used with Czech, English

    (Australian, Indian, UK, and US), German, Japanese, Korean, and Spanish transcription only.

    See [Speaker labels](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-speaker-labels).

  • customization_id (String) (defaults to: nil)

    Deprecated. Use the ‘language_customization_id` parameter to specify the customization ID (GUID) of a custom language model that is to be used with the recognition request. Do not specify both parameters with a request.

  • grammar_name (String) (defaults to: nil)

    The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the ‘language_customization_id` parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model’s words resource.

    See [Using a grammar for speech recognition](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-grammarUse).

  • redaction (Boolean) (defaults to: nil)

    If ‘true`, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an `X` character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.

    When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the ‘keywords` and `keywords_threshold` parameters) and returns only a single final transcript (forces the `max_alternatives` parameter to be `1`).

    Note: The parameter can be used with US English, Japanese, and Korean transcription only.

    See [Numeric redaction](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-formatting#numeric-redaction).

  • audio_metrics (Boolean) (defaults to: nil)

    If ‘true`, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.

    See [Audio metrics](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-metrics#audio-metrics).

  • end_of_phrase_silence_time (Float) (defaults to: nil)

    If ‘true`, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.

    Specify a value for the pause interval in the range of 0.0 to 120.0.

    • A value greater than 0 specifies the interval that the service is to use for

    speech recognition.

    • A value of 0 indicates that the service is to use the default interval. It is

    equivalent to omitting the parameter.

    The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.

    See [End of phrase silence time](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-parsing#silence-time).

  • split_transcript_at_phrase_end (Boolean) (defaults to: nil)

    If ‘true`, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript.

    By default, the service splits transcripts based solely on the pause interval. If the parameters are used together on the same request, ‘end_of_phrase_silence_time` has precedence over `split_transcript_at_phrase_end`.

    See [Split transcript at phrase end](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-parsing#split-transcript).

  • speech_detector_sensitivity (Float) (defaults to: nil)

    The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.

    Specify a value between 0.0 and 1.0:

    • 0.0 suppresses all audio (no speech is transcribed).

    • 0.5 (the default) provides a reasonable compromise for the level of sensitivity.

    • 1.0 suppresses no audio (speech detection sensitivity is disabled).

    The values increase on a monotonic curve.

    The parameter is supported with all next-generation models and with most previous-generation models. See [Speech detector sensitivity](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-parameters-sensitivity) and [Language model support](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-support).

  • background_audio_suppression (Float) (defaults to: nil)

    The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.

    Specify a value in the range of 0.0 to 1.0:

    • 0.0 (the default) provides no suppression (background audio suppression is

    disabled).

    • 0.5 provides a reasonable level of audio suppression for general usage.

    • 1.0 suppresses all audio (no audio is transcribed).

    The values increase on a monotonic curve.

    The parameter is supported with all next-generation models and with most previous-generation models. See [Background audio suppression](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-parameters-suppression) and [Language model support](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-detection#detection-support).

  • low_latency (Boolean) (defaults to: nil)

    If ‘true` for next-generation `Multimedia` and `Telephony` models that support low latency, directs the service to produce results even more quickly than it usually does. Next-generation models produce transcription results faster than previous-generation models. The `low_latency` parameter causes the models to produce results even more quickly, though the results might be less accurate when the parameter is used.

    The parameter is not available for previous-generation ‘Broadband` and `Narrowband` models. It is available only for some next-generation models. For a list of next-generation models that support low latency, see [Supported next-generation language models](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng#models-ng-supported).

    • For more information about the ‘low_latency` parameter, see [Low

    latency](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-interim#low-latency).

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 514

def recognize(audio:, content_type: nil, model: nil, language_customization_id: nil, acoustic_customization_id: nil, base_model_version: nil, customization_weight: nil, inactivity_timeout: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, customization_id: nil, grammar_name: nil, redaction: nil, audio_metrics: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil, low_latency: nil)
  raise ArgumentError.new("audio must be provided") if audio.nil?

  headers = {
    "Content-Type" => content_type
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "recognize")
  headers.merge!(sdk_headers)
  keywords *= "," unless keywords.nil?

  params = {
    "model" => model,
    "language_customization_id" => language_customization_id,
    "acoustic_customization_id" => acoustic_customization_id,
    "base_model_version" => base_model_version,
    "customization_weight" => customization_weight,
    "inactivity_timeout" => inactivity_timeout,
    "keywords" => keywords,
    "keywords_threshold" => keywords_threshold,
    "max_alternatives" => max_alternatives,
    "word_alternatives_threshold" => word_alternatives_threshold,
    "word_confidence" => word_confidence,
    "timestamps" => timestamps,
    "profanity_filter" => profanity_filter,
    "smart_formatting" => smart_formatting,
    "speaker_labels" => speaker_labels,
    "customization_id" => customization_id,
    "grammar_name" => grammar_name,
    "redaction" => redaction,
    "audio_metrics" => audio_metrics,
    "end_of_phrase_silence_time" => end_of_phrase_silence_time,
    "split_transcript_at_phrase_end" => split_transcript_at_phrase_end,
    "speech_detector_sensitivity" => speech_detector_sensitivity,
    "background_audio_suppression" => background_audio_suppression,
    "low_latency" => low_latency
  }

  data = audio

  method_url = "/v1/recognize"

  response = request(
    method: "POST",
    url: method_url,
    headers: headers,
    params: params,
    data: data,
    accept_json: true
  )
  response
end

#recognize_using_websocket(content_type: nil, recognize_callback: , audio: nil, chunk_data: false, model: nil, customization_id: nil, acoustic_customization_id: nil, customization_weight: nil, base_model_version: nil, inactivity_timeout: nil, interim_results: nil, keywords: nil, keywords_threshold: nil, max_alternatives: nil, word_alternatives_threshold: nil, word_confidence: nil, timestamps: nil, profanity_filter: nil, smart_formatting: nil, speaker_labels: nil, end_of_phrase_silence_time: nil, split_transcript_at_phrase_end: nil, speech_detector_sensitivity: nil, background_audio_suppression: nil, low_latency: nil) ⇒ WebSocketClient, IBMCloudSdkCore::DetailedResponse

Sends audio for speech recognition using web sockets.

Parameters:

  • content_type (String) (defaults to: nil)

    The type of the input: audio/basic, audio/flac, audio/l16, audio/mp3, audio/mpeg, audio/mulaw, audio/ogg, audio/ogg;codecs=opus, audio/ogg;codecs=vorbis, audio/wav, audio/webm, audio/webm;codecs=opus, audio/webm;codecs=vorbis, or multipart/form-data.

  • recognize_callback (RecognizeCallback) (defaults to: )

    The instance handling events returned from the service.

  • audio (IO) (defaults to: nil)

    Audio to transcribe in the format specified by the ‘Content-Type` header.

  • chunk_data (Boolean) (defaults to: false)

    If true, then the WebSocketClient will expect to receive data in chunks rather than as a single audio file

  • model (String) (defaults to: nil)

    The identifier of the model to be used for the recognition request.

  • customization_id (String) (defaults to: nil)

    The GUID of a custom language model that is to be used with the request. The base model of the specified custom language model must match the model specified with the ‘model` parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom language model is used.

  • acoustic_customization_id (String) (defaults to: nil)

    The GUID of a custom acoustic model that is to be used with the request. The base model of the specified custom acoustic model must match the model specified with the ‘model` parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom acoustic model is used.

  • language_customization_id (String)

    The GUID of a custom language model that is to be used with the request. The base model of the specified custom language model must match the model specified with the ‘model` parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. By default, no custom language model is used.

  • base_model_version (String) (defaults to: nil)

    The version of the specified base ‘model` that is to be used for speech recognition. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. For more information, see [Base model version](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-input#version).

  • inactivity_timeout (Integer) (defaults to: nil)

    The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use ‘-1` for infinity.

  • interim_results (Boolean) (defaults to: nil)

    Send back non-final previews of each “sentence” as it is being processed. These results are ignored in text mode.

  • keywords (Array<String>) (defaults to: nil)

    Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. If you specify any keywords, you must also specify a keywords threshold. Omit the parameter or specify an empty array if you do not need to spot keywords.

  • keywords_threshold (Float) (defaults to: nil)

    Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords.

  • max_alternatives (Integer) (defaults to: nil)

    Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.

  • word_alternatives_threshold (Float) (defaults to: nil)

    Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter.

  • word_confidence (Boolean) (defaults to: nil)

    If ‘true`, confidence measure per word is returned.

  • timestamps (Boolean) (defaults to: nil)

    If ‘true`, time alignment for each word is returned.

  • profanity_filter (Boolean) (defaults to: nil)

    If ‘true` (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to `false` to return results with no censoring. Applies to US English transcription only.

  • smart_formatting (Boolean) (defaults to: nil)

    If ‘true`, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If `false` (the default), no formatting is performed. Applies to US English transcription only.

  • speaker_labels (Boolean) (defaults to: nil)

    Indicates whether labels that identify which words were spoken by which participants in a multi-person exchange are to be included in the response. The default is ‘false`; no speaker labels are returned. Setting `speaker_labels` to `true` forces the `timestamps` parameter to be `true`, regardless of whether you specify `false` for the parameter. To determine whether a language model supports speaker labels, use the `GET /v1/models` method and check that the attribute `speaker_labels` is set to `true`. You can also refer to [Speaker labels](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-output#speaker_labels).

  • grammar_name (String)

    The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the ‘language_customization_id` parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model’s words resource. See [Grammars](cloud.ibm.com/docs/speech-to-text/output.html).

  • redaction (Boolean)

    If ‘true`, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an `X` character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, the service performs no redaction.

    When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the ‘keywords` and `keywords_threshold` parameters) and returns only a single final transcript (forces the `max_alternatives` parameter to be `1`).

    Note: Applies to US English, Japanese, and Korean transcription only.

    See [Numeric redaction](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-output#redaction).

  • processing_metrics (Boolean)

    If ‘true`, requests processing metrics about the service’s transcription of the input audio. The service returns processing metrics at the interval specified by the ‘processing_metrics_interval` parameter. It also returns processing metrics for transcription events, for example, for final and interim results. By default, the service returns no processing metrics.

  • processing_metrics_interval (Float)

    Specifies the interval in real wall-clock seconds at which the service is to return processing metrics. The parameter is ignored unless the ‘processing_metrics` parameter is set to `true`. # The parameter accepts a minimum value of 0.1 seconds. The level of precision is not restricted, so you can specify values such as 0.25 and 0.125.

    The service does not impose a maximum value. If you want to receive processing metrics only for transcription events instead of at periodic intervals, set the value to a large number. If the value is larger than the duration of the audio, the service returns processing metrics only for transcription events.

  • audio_metrics (Boolean)

    If ‘true`, requests detailed information about the signal characteristics of the input audio. The service returns audio metrics with the final transcription results. By default, the service returns no audio metrics.

  • end_of_phrase_silence_time (Float) (defaults to: nil)

    If ‘true`, specifies the duration of the pause interval at which the service splits a transcript into multiple final results. If the service detects pauses or extended silence before it reaches the end of the audio stream, its response can include multiple final results. Silence indicates a point at which the speaker pauses between spoken words or phrases.

    Specify a value for the pause interval in the range of 0.0 to 120.0.

    • A value greater than 0 specifies the interval that the service is to use for

    speech recognition.

    • A value of 0 indicates that the service is to use the default interval. It is

    equivalent to omitting the parameter.

    The default pause interval for most languages is 0.8 seconds; the default for Chinese is 0.6 seconds.

    See [End of phrase silence time](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-output#silence_time).

  • split_transcript_at_phrase_end (Boolean) (defaults to: nil)

    If ‘true`, directs the service to split the transcript into multiple final results based on semantic features of the input, for example, at the conclusion of meaningful phrases such as sentences. The service bases its understanding of semantic features on the base language model that you use with a request. Custom language models and grammars can also influence how and where the service splits a transcript. By default, the service splits transcripts based solely on the pause interval.

    See [Split transcript at phrase end](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-output#split_transcript).

  • speech_detector_sensitivity (Float) (defaults to: nil)

    The sensitivity of speech activity detection that the service is to perform. Use the parameter to suppress word insertions from music, coughing, and other non-speech events. The service biases the audio it passes for speech recognition by evaluating the input audio against prior models of speech and non-speech activity.

    Specify a value between 0.0 and 1.0:

    • 0.0 suppresses all audio (no speech is transcribed).

    • 0.5 (the default) provides a reasonable compromise for the level of sensitivity.

    • 1.0 suppresses no audio (speech detection sensitivity is disabled).

    The values increase on a monotonic curve. See [Speech Activity Detection](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-input#detection).

  • background_audio_suppression (Float) (defaults to: nil)

    The level to which the service is to suppress background audio based on its volume to prevent it from being transcribed as speech. Use the parameter to suppress side conversations or background noise.

    Specify a value in the range of 0.0 to 1.0:

    • 0.0 (the default) provides no suppression (background audio suppression is

    disabled).

    • 0.5 provides a reasonable level of audio suppression for general usage.

    • 1.0 suppresses all audio (no audio is transcribed).

    The values increase on a monotonic curve. See [Speech Activity Detection](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-input#detection).

  • low_latency (Boolean) (defaults to: nil)

    If ‘true` for next-generation `Multimedia` and `Telephony` models that support low latency, directs the service to produce results even more quickly than it usually does. Next-generation models produce transcription results faster than previous-generation models. The `low_latency` parameter causes the models to produce results even more quickly, though the results might be less accurate when the parameter is used.

    Note: The parameter is beta functionality. It is not available for previous-generation ‘Broadband` and `Narrowband` models. It is available only for some next-generation models.

    • For a list of next-generation models that support low latency, see [Supported

    language models](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-models-ng#models-ng-supported) for next-generation models.

    • For more information about the ‘low_latency` parameter, see [Low

    latency](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-interim#low-latency).

Returns:



702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 702

def recognize_using_websocket(
  content_type: nil,
  recognize_callback:,
  audio: nil,
  chunk_data: false,
  model: nil,
  language_customization_id: nil,
  customization_id: nil,
  acoustic_customization_id: nil,
  customization_weight: nil,
  base_model_version: nil,
  inactivity_timeout: nil,
  interim_results: nil,
  keywords: nil,
  keywords_threshold: nil,
  max_alternatives: nil,
  word_alternatives_threshold: nil,
  word_confidence: nil,
  timestamps: nil,
  profanity_filter: nil,
  smart_formatting: nil,
  speaker_labels: nil,
  grammar_name: nil,
  redaction: nil,
  processing_metrics: nil,
  processing_metrics_interval: nil,
  audio_metrics: nil,
  end_of_phrase_silence_time: nil,
  split_transcript_at_phrase_end: nil,
  speech_detector_sensitivity: nil,
  background_audio_suppression: nil,
  low_latency: nil
)
  raise ArgumentError("Audio must be provided") if audio.nil? && !chunk_data
  raise ArgumentError("Recognize callback must be provided") if recognize_callback.nil?
  raise TypeError("Callback is not a derived class of RecognizeCallback") unless recognize_callback.is_a?(IBMWatson::RecognizeCallback)

  require_relative("./websocket/speech_to_text_websocket_listener.rb")
  headers = {}
  headers = conn.default_options.headers.to_hash unless conn.default_options.headers.to_hash.empty?
  @authenticator.authenticate(headers)
  service_url = @service_url.gsub("https:", "wss:")
  params = {
    "model" => model,
    "customization_id" => customization_id,
    "language_customization_id" => language_customization_id,
    "acoustic_customization_id" => acoustic_customization_id,
    "customization_weight" => customization_weight,
    "base_model_version" => base_model_version
  }
  params.delete_if { |_, v| v.nil? }
  service_url += "/v1/recognize?" + HTTP::URI.form_encode(params)
  options = {
    "content_type" => content_type,
    "inactivity_timeout" => inactivity_timeout,
    "interim_results" => interim_results,
    "keywords" => keywords,
    "keywords_threshold" => keywords_threshold,
    "max_alternatives" => max_alternatives,
    "word_alternatives_threshold" => word_alternatives_threshold,
    "word_confidence" => word_confidence,
    "timestamps" => timestamps,
    "profanity_filter" => profanity_filter,
    "smart_formatting" => smart_formatting,
    "speaker_labels" => speaker_labels,
    "grammar_name" => grammar_name,
    "redaction" => redaction,
    "processing_metrics" => processing_metrics,
    "processing_metrics_interval" => processing_metrics_interval,
    "audio_metrics" => audio_metrics,
    "end_of_phrase_silence_time" => end_of_phrase_silence_time,
    "split_transcript_at_phrase_end" => split_transcript_at_phrase_end,
    "speech_detector_sensitivity" => speech_detector_sensitivity,
    "background_audio_suppression" => background_audio_suppression,
    "low_latency" => low_latency
  }
  options.delete_if { |_, v| v.nil? }
  WebSocketClient.new(audio: audio, chunk_data: chunk_data, options: options, recognize_callback: recognize_callback, service_url: service_url, headers: headers, disable_ssl_verification: @disable_ssl_verification)
end

#register_callback(callback_url: , user_secret: nil) ⇒ IBMCloudSdkCore::DetailedResponse

Register a callback. Registers a callback URL with the service for use with subsequent asynchronous

recognition requests. The service attempts to register, or allowlist, the callback
URL if it is not already registered by sending a `GET` request to the callback
URL. The service passes a random alphanumeric challenge string via the
`challenge_string` parameter of the request. The request includes an `Accept`
header that specifies `text/plain` as the required response type.

To be registered successfully, the callback URL must respond to the `GET` request
from the service. The response must send status code 200 and must include the
challenge string in its body. Set the `Content-Type` response header to
`text/plain`. Upon receiving this response, the service responds to the original
registration request with response code 201.

The service sends only a single `GET` request to the callback URL. If the service
does not receive a reply with a response code of 200 and a body that echoes the
challenge string sent by the service within five seconds, it does not allowlist
the URL; it instead sends status code 400 in response to the request to register a
callback. If the requested callback URL is already allowlisted, the service
responds to the initial registration request with response code 200.

If you specify a user secret with the request, the service uses it as a key to
calculate an HMAC-SHA1 signature of the challenge string in its response to the
`POST` request. It sends this signature in the `X-Callback-Signature` header of
its `GET` request to the URL during registration. It also uses the secret to
calculate a signature over the payload of every callback notification that uses
the URL. The signature provides authentication and data integrity for HTTP
communications.

After you successfully register a callback URL, you can use it with an indefinite
number of recognition requests. You can register a maximum of 20 callback URLS in
a one-hour span of time.

**See also:** [Registering a callback
URL](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async#register).

Parameters:

  • callback_url (String) (defaults to: )

    An HTTP or HTTPS URL to which callback notifications are to be sent. To be allowlisted, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the ‘X-Callback-Signature` header to verify the origin of the request.

  • user_secret (String) (defaults to: nil)

    A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the ‘X-Callback-Signature` header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 834

def register_callback(callback_url:, user_secret: nil)
  raise ArgumentError.new("callback_url must be provided") if callback_url.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "register_callback")
  headers.merge!(sdk_headers)

  params = {
    "callback_url" => callback_url,
    "user_secret" => user_secret
  }

  method_url = "/v1/register_callback"

  response = request(
    method: "POST",
    url: method_url,
    headers: headers,
    params: params,
    accept_json: true
  )
  response
end

#reset_acoustic_model(customization_id: ) ⇒ nil

Reset a custom acoustic model. Resets a custom acoustic model by removing all audio resources from the model.

Resetting a custom acoustic model initializes the model to its state when it was
first created. Metadata such as the name and language of the model are preserved,
but the model's audio resources are removed and must be re-created. The service
cannot reset a model while it is handling another request for the model. The
service cannot accept subsequent requests for the model until the existing reset
request completes. You must use credentials for the instance of the service that
owns a model to reset it.

**Note:** Acoustic model customization is supported only for use with
previous-generation models. It is not supported for next-generation models.

**See also:** [Resetting a custom acoustic
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageAcousticModels#resetModel-acoustic).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


3039
3040
3041
3042
3043
3044
3045
3046
3047
3048
3049
3050
3051
3052
3053
3054
3055
3056
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 3039

def reset_acoustic_model(customization_id:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "reset_acoustic_model")
  headers.merge!(sdk_headers)

  method_url = "/v1/acoustic_customizations/%s/reset" % [ERB::Util.url_encode(customization_id)]

  request(
    method: "POST",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  nil
end

#reset_language_model(customization_id: ) ⇒ nil

Reset a custom language model. Resets a custom language model by removing all corpora, grammars, and words from

the model. Resetting a custom language model initializes the model to its state
when it was first created. Metadata such as the name and language of the model are
preserved, but the model's words resource is removed and must be re-created. You
must use credentials for the instance of the service that owns a model to reset
it.

**See also:**
* [Resetting a custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-manageLanguageModels#resetModel-language)
* [Language support for
customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 1805

def reset_language_model(customization_id:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "reset_language_model")
  headers.merge!(sdk_headers)

  method_url = "/v1/customizations/%s/reset" % [ERB::Util.url_encode(customization_id)]

  request(
    method: "POST",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  nil
end

#train_acoustic_model(customization_id: , custom_language_model_id: nil) ⇒ IBMCloudSdkCore::DetailedResponse

Train a custom acoustic model. Initiates the training of a custom acoustic model with new or changed audio

resources. After adding or deleting audio resources for a custom acoustic model,
use this method to begin the actual training of the model on the latest audio
data. The custom acoustic model does not reflect its changed data until you train
it. You must use credentials for the instance of the service that owns a model to
train it.

The training method is asynchronous. Training time depends on the cumulative
amount of audio data that the custom acoustic model contains and the current load
on the service. When you train or retrain a model, the service uses all of the
model's audio data in the training. Training a custom acoustic model takes
approximately as long as the length of its cumulative audio data. For example, it
takes approximately 2 hours to train a model that contains a total of 2 hours of
audio. The method returns an HTTP 200 response code to indicate that the training
process has begun.

You can monitor the status of the training by using the [Get a custom acoustic
model](#getacousticmodel) method to poll the model's status. Use a loop to check
the status once a minute. The method returns an `AcousticModel` object that
includes `status` and `progress` fields. A status of `available` indicates that
the custom model is trained and ready to use. The service cannot train a model
while it is handling another request for the model. The service cannot accept
subsequent training requests, or requests to add new audio resources, until the
existing training request completes.

You can use the optional `custom_language_model_id` parameter to specify the GUID
of a separately created custom language model that is to be used during training.
Train with a custom language model if you have verbatim transcriptions of the
audio files that you have added to the custom model or you have either corpora
(text files) or a list of words that are relevant to the contents of the audio
files. For training to succeed, both of the custom models must be based on the
same version of the same base model, and the custom language model must be fully
trained and available.

**Note:** Acoustic model customization is supported only for use with
previous-generation models. It is not supported for next-generation models.

**See also:**
* [Train the custom acoustic
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-acoustic#trainModel-acoustic)
* [Using custom acoustic and custom language models
together](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-useBoth#useBoth)

### Training failures

 Training can fail to start for the following reasons:
* The service is currently handling another request for the custom model, such as
another training request or a request to add audio resources to the model.
* The custom model contains less than 10 minutes or more than 200 hours of audio
data.
* You passed a custom language model with the `custom_language_model_id` query
parameter that is not in the available state. A custom language model must be
fully trained and available to be used to train a custom acoustic model.
* You passed an incompatible custom language model with the
`custom_language_model_id` query parameter. Both custom models must be based on
the same version of the same base model.
* The custom model contains one or more invalid audio resources. You can correct
the invalid audio resources or set the `strict` parameter to `false` to exclude
the invalid resources from the training. The model must contain at least one valid
resource for training to succeed.

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • custom_language_model_id (String) (defaults to: nil)

    The customization ID (GUID) of a custom language model that is to be used during training of the custom acoustic model. Specify a custom language model that has been trained with verbatim transcriptions of the audio resources or that contains words that are relevant to the contents of the audio resources. The custom language model must be based on the same version of the same base model as the custom acoustic model, and the custom language model must be fully trained and available. The credentials specified with the request must own both custom models.

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


2994
2995
2996
2997
2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011
3012
3013
3014
3015
3016
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 2994

def train_acoustic_model(customization_id:, custom_language_model_id: nil)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "train_acoustic_model")
  headers.merge!(sdk_headers)

  params = {
    "custom_language_model_id" => custom_language_model_id
  }

  method_url = "/v1/acoustic_customizations/%s/train" % [ERB::Util.url_encode(customization_id)]

  response = request(
    method: "POST",
    url: method_url,
    headers: headers,
    params: params,
    accept_json: true
  )
  response
end

#train_language_model(customization_id: , word_type_to_add: nil, customization_weight: nil) ⇒ IBMCloudSdkCore::DetailedResponse

Train a custom language model. Initiates the training of a custom language model with new resources such as

corpora, grammars, and custom words. After adding, modifying, or deleting
resources for a custom language model, use this method to begin the actual
training of the model on the latest data. You can specify whether the custom
language model is to be trained with all words from its words resource or only
with words that were added or modified by the user directly. You must use
credentials for the instance of the service that owns a model to train it.

The training method is asynchronous. It can take on the order of minutes to
complete depending on the amount of data on which the service is being trained and
the current load on the service. The method returns an HTTP 200 response code to
indicate that the training process has begun.

You can monitor the status of the training by using the [Get a custom language
model](#getlanguagemodel) method to poll the model's status. Use a loop to check
the status every 10 seconds. The method returns a `LanguageModel` object that
includes `status` and `progress` fields. A status of `available` means that the
custom model is trained and ready to use. The service cannot accept subsequent
training requests or requests to add new resources until the existing request
completes.

**See also:**
* [Train the custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageCreate#trainModel-language)
* [Language support for
customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support)

### Training failures

 Training can fail to start for the following reasons:
* The service is currently handling another request for the custom model, such as
another training request or a request to add a corpus or grammar to the model.
* No training data have been added to the custom model.
* The custom model contains one or more invalid corpora, grammars, or words (for
example, a custom word has an invalid sounds-like pronunciation). You can correct
the invalid resources or set the `strict` parameter to `false` to exclude the
invalid resources from the training. The model must contain at least one valid
resource for training to succeed.

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • word_type_to_add (String) (defaults to: nil)

    _For custom models that are based on previous-generation models_, the type of words from the custom language model’s words resource on which to train the model:

    • ‘all` (the default) trains the model on all new words, regardless of whether

    they were extracted from corpora or grammars or were added or modified by the user.

    • ‘user` trains the model only on custom words that were added or modified by the

    user directly. The model is not trained on new words extracted from corpora or grammars.

    _For custom models that are based on next-generation models_, the service ignores the parameter. The words resource contains only custom words that the user adds or modifies directly, so the parameter is unnecessary.

  • customization_weight (Float) (defaults to: nil)

    Specifies a customization weight for the custom language model. The customization weight tells the service how much weight to give to words from the custom language model compared to those from the base model for speech recognition. Specify a value between 0.0 and 1.0; the default is 0.3.

    The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model’s domain, but it can negatively affect performance on non-domain phrases.

    The value that you assign is used for all recognition requests that use the model. You can override it for any recognition request by specifying a customization weight for that request.

    See [Using customization weight](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-languageUse#weight).

Returns:

  • (IBMCloudSdkCore::DetailedResponse)

    A ‘IBMCloudSdkCore::DetailedResponse` object representing the response.

Raises:

  • (ArgumentError)


1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 1761

def train_language_model(customization_id:, word_type_to_add: nil, customization_weight: nil)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "train_language_model")
  headers.merge!(sdk_headers)

  params = {
    "word_type_to_add" => word_type_to_add,
    "customization_weight" => customization_weight
  }

  method_url = "/v1/customizations/%s/train" % [ERB::Util.url_encode(customization_id)]

  response = request(
    method: "POST",
    url: method_url,
    headers: headers,
    params: params,
    accept_json: true
  )
  response
end

#unregister_callback(callback_url: ) ⇒ nil

Unregister a callback. Unregisters a callback URL that was previously allowlisted with a [Register a

callback](#registercallback) request for use with the asynchronous interface. Once
unregistered, the URL can no longer be used with asynchronous recognition
requests.

**See also:** [Unregistering a callback
URL](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-async#unregister).

Parameters:

  • callback_url (String) (defaults to: )

    The callback URL that is to be unregistered.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 871

def unregister_callback(callback_url:)
  raise ArgumentError.new("callback_url must be provided") if callback_url.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "unregister_callback")
  headers.merge!(sdk_headers)

  params = {
    "callback_url" => callback_url
  }

  method_url = "/v1/unregister_callback"

  request(
    method: "POST",
    url: method_url,
    headers: headers,
    params: params,
    accept_json: false
  )
  nil
end

#upgrade_acoustic_model(customization_id: , custom_language_model_id: nil, force: nil) ⇒ nil

Upgrade a custom acoustic model. Initiates the upgrade of a custom acoustic model to the latest version of its base

language model. The upgrade method is asynchronous. It can take on the order of
minutes or hours to complete depending on the amount of data in the custom model
and the current load on the service; typically, upgrade takes approximately twice
the length of the total audio contained in the custom model. A custom model must
be in the `ready` or `available` state to be upgraded. You must use credentials
for the instance of the service that owns a model to upgrade it.

The method returns an HTTP 200 response code to indicate that the upgrade process
has begun successfully. You can monitor the status of the upgrade by using the
[Get a custom acoustic model](#getacousticmodel) method to poll the model's
status. The method returns an `AcousticModel` object that includes `status` and
`progress` fields. Use a loop to check the status once a minute.

While it is being upgraded, the custom model has the status `upgrading`. When the
upgrade is complete, the model resumes the status that it had prior to upgrade.
The service cannot upgrade a model while it is handling another request for the
model. The service cannot accept subsequent requests for the model until the
existing upgrade request completes.

If the custom acoustic model was trained with a separately created custom language
model, you must use the `custom_language_model_id` parameter to specify the GUID
of that custom language model. The custom language model must be upgraded before
the custom acoustic model can be upgraded. Omit the parameter if the custom
acoustic model was not trained with a custom language model.

**Note:** Acoustic model customization is supported only for use with
previous-generation models. It is not supported for next-generation models.

**See also:** [Upgrading a custom acoustic
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade#custom-upgrade-acoustic).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom acoustic model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

  • custom_language_model_id (String) (defaults to: nil)

    If the custom acoustic model was trained with a custom language model, the customization ID (GUID) of that custom language model. The custom language model must be upgraded before the custom acoustic model can be upgraded. The custom language model must be fully trained and available. The credentials specified with the request must own both custom models.

  • force (Boolean) (defaults to: nil)

    If ‘true`, forces the upgrade of a custom acoustic model for which no input data has been modified since it was last trained. Use this parameter only to force the upgrade of a custom acoustic model that is trained with a custom language model, and only if you receive a 400 response code and the message `No input data modified since last training`. See [Upgrading a custom acoustic model](cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade#custom-upgrade-acoustic).

Returns:

  • (nil)

Raises:

  • (ArgumentError)


3107
3108
3109
3110
3111
3112
3113
3114
3115
3116
3117
3118
3119
3120
3121
3122
3123
3124
3125
3126
3127
3128
3129
3130
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 3107

def upgrade_acoustic_model(customization_id:, custom_language_model_id: nil, force: nil)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "upgrade_acoustic_model")
  headers.merge!(sdk_headers)

  params = {
    "custom_language_model_id" => custom_language_model_id,
    "force" => force
  }

  method_url = "/v1/acoustic_customizations/%s/upgrade_model" % [ERB::Util.url_encode(customization_id)]

  request(
    method: "POST",
    url: method_url,
    headers: headers,
    params: params,
    accept_json: true
  )
  nil
end

#upgrade_language_model(customization_id: ) ⇒ nil

Upgrade a custom language model. Initiates the upgrade of a custom language model to the latest version of its base

language model. The upgrade method is asynchronous. It can take on the order of
minutes to complete depending on the amount of data in the custom model and the
current load on the service. A custom model must be in the `ready` or `available`
state to be upgraded. You must use credentials for the instance of the service
that owns a model to upgrade it.

The method returns an HTTP 200 response code to indicate that the upgrade process
has begun successfully. You can monitor the status of the upgrade by using the
[Get a custom language model](#getlanguagemodel) method to poll the model's
status. The method returns a `LanguageModel` object that includes `status` and
`progress` fields. Use a loop to check the status every 10 seconds.

While it is being upgraded, the custom model has the status `upgrading`. When the
upgrade is complete, the model resumes the status that it had prior to upgrade.
The service cannot accept subsequent requests for the model until the upgrade
completes.

**See also:**
* [Upgrading a custom language
model](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-upgrade#custom-upgrade-language)
* [Language support for
customization](https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-custom-support).

Parameters:

  • customization_id (String) (defaults to: )

    The customization ID (GUID) of the custom language model that is to be used for the request. You must make the request with credentials for the instance of the service that owns the custom model.

Returns:

  • (nil)

Raises:

  • (ArgumentError)


1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
# File 'lib/ibm_watson/speech_to_text_v1.rb', line 1854

def upgrade_language_model(customization_id:)
  raise ArgumentError.new("customization_id must be provided") if customization_id.nil?

  headers = {
  }
  sdk_headers = Common.new.get_sdk_headers("speech_to_text", "V1", "upgrade_language_model")
  headers.merge!(sdk_headers)

  method_url = "/v1/customizations/%s/upgrade_model" % [ERB::Util.url_encode(customization_id)]

  request(
    method: "POST",
    url: method_url,
    headers: headers,
    accept_json: true
  )
  nil
end