Class: SpokenDataAPI

Inherits:
Object
  • Object
show all
Defined in:
lib/spokenDataAPI.rb

Overview

An SDK for the spoken data API.
date: 22/07/2015
Link to Spoken Data API specifications
Spoken Data returns an XML.
This SDK does not implement all of the RESTFULL methods.
it implements:

  • retrieving srt file of recording if ready retrieve_subtitles_srt(recording_id) using spoken data uid, if ready returns srt file string, if not returns false boolean.
  • send video by URL send_by_video_url(video_url) which returns the spoken data video uid,that you should save for later retrieval.
  • also implemented getting list of recordings (not in use in example)
  • get_recording_by_recording_id(recording_id), returns a recording object (also used as helper method)
  • delete_recording(recording_id) You can test this as a rails runner,
    • uncomment the three steps example at the end of the page
    • and run from terminal, from the root of the project. bash rails r app/models/spokenDataAPI.rb to use it in rails add it as a modelspokenDataAPI.rb`

Author:

  • : Pietro Passarelli

Constant Summary collapse

@@BASE_URL =
"https://spokendata.com/api/"
@@ENDPOINTS =

Setup the endpoints

{}

Instance Method Summary collapse

Constructor Details

#initialize(user_id, api_token) ⇒ SpokenDataAPI

initializer


38
39
40
41
# File 'lib/spokenDataAPI.rb', line 38

def initialize(user_id, api_token)
  @@USER_ID = user_id.to_s
  @@API_TOKEN = api_token.to_s
end

Instance Method Details

#delete_recording(recording_id) ⇒ Object

to delete the recording video file on the spoken data server

Parameters:

  • (spoken data uid)

46
47
48
49
# File 'lib/spokenDataAPI.rb', line 46

def delete_recording(recording_id)
  request_url = get_base_api_request.to_s + "/recording/#{recording_id}/delete"
  open(request_url)
end

#get_api_keyObject


85
86
87
# File 'lib/spokenDataAPI.rb', line 85

def get_api_key
  return @@API_TOKEN
end

#get_base_api_requestObject


93
94
95
96
# File 'lib/spokenDataAPI.rb', line 93

def get_base_api_request
  url = get_base_url + get_user_id+"/" + get_api_key
  return url
end

#get_base_urlObject

API Query builder helper methods


81
82
83
# File 'lib/spokenDataAPI.rb', line 81

def get_base_url
  return @@BASE_URL
end

#get_recording_by_recording_id(recording_id) ⇒ Object

gets the recording based on id the id is the one defined by spokendata

Parameters:

  • (spoken data uid)

110
111
112
113
114
115
116
117
118
119
120
121
122
123
# File 'lib/spokenDataAPI.rb', line 110

def get_recording_by_recording_id(recording_id)
  url = get_base_api_request + @@ENDPOINTS['recordingList']
  Rails.logger.info  "url: #{url}"
  result = parse_xml(url)
  Rails.logger.info "result: #{result}"
  recordings =  result['recordings']['recording']
  recordings.select do |k,v|
    if k['id'].to_i == recording_id.to_i
      result = k
    end
  end  #retursn the recording

  return result
end

#get_recording_status(recording) ⇒ true if "done", false if "processing"

helper method for get_recordings_status takes in a recording object

Parameters:

  • (recording object)

Returns:

  • (true if "done", false if "processing")

130
131
132
133
134
135
136
137
138
139
# File 'lib/spokenDataAPI.rb', line 130

def get_recording_status(recording)
  recording_status = recording['status']
  if recording_status =="done"
    return true
  elsif recording_status =="processing"
    return false
  else
    "There was an error assessing the status of the recording"
  end
end

#get_recordings_listObject


98
99
100
101
102
# File 'lib/spokenDataAPI.rb', line 98

def get_recordings_list
  url = get_base_api_request + @@ENDPOINTS['recordingList']
  result = parse_xml(url)
  return  result['recordings']['recording']
end

#get_srt(recording_id) ⇒ srt file String

helper method for retrieve_subtitles_srt
spoken data srt file actually returns a String containing the content of the srt file.
sample output of what spoken data returns using the demo API user and key
http://spokendata.com/api/18/br3sp59a2it7fig94jdtbt3p9ife5qpx39fd8npp/recording/845/subtitles.srt

=> "1\r\n00:00:00,680 --> 00:00:01,400\r\ni everybody\r\n\r\n2\r\n00:00:02,530 --> 00:00:08,510\r\nthis week i plan to join the students teachers\r\nbusinesses and nonprofit organisations taking\r\nbig\r\n\r\n ...."

Parameters:

  • (spoken data uid)

Returns:

  • (srt file String)

161
162
163
164
165
# File 'lib/spokenDataAPI.rb', line 161

def get_srt(recording_id)
  url = get_base_api_request.to_s + "/recording/#{recording_id.to_s}/subtitles.srt"  # Rails.logger.debug "DEBUG #{open(url).class}"

  return open(url)
end

#get_user_idObject


89
90
91
# File 'lib/spokenDataAPI.rb', line 89

def get_user_id
  return @@USER_ID
end

#open(url) ⇒ content of url

external libraries helper methods open URLs, and returns it's content, can be a file, such as srt or xml.

Parameters:

  • (url string)

Returns:

  • (content of url)

56
57
58
59
60
61
62
# File 'lib/spokenDataAPI.rb', line 56

def open(url)
  if URI.parse(url)
  Net::HTTP.get(URI.parse(url))
  else
    raise "error"
  end
end

#parse_xml(url) ⇒ ruby hash

parses url / XML (as received by the API)

Parameters:

  • (url / XML (as received by the API))

Returns:

  • (ruby hash)

68
69
70
71
72
73
74
75
76
77
78
# File 'lib/spokenDataAPI.rb', line 68

def parse_xml(url)

  # IMPORTANT: there is an issue with the xml, the encoding returned by the API is written `utf8` instead of `utf-8`. and that trips up the parser. enche the substitution
  result_string = open(url).gsub("utf8","utf-8")  # Parse the xml with nokogiri

  nokogiri_xml_document = Nokogiri::XML(result_string)  # transform the xml into a ruby hash using built in active support methods.

  result = Hash.from_xml(nokogiri_xml_document.to_s)  # `data` tag encapsulate the rest of the keys /tags/

  return result['data']
end

#recording_processed?(recording_id) ⇒ true if "done" false if "processing"

from recording id return boolean for status of the recording. true if "done" false if "processing"

Parameters:

  • (spoken data uid)

Returns:

  • (true if "done" false if "processing")

146
147
148
149
# File 'lib/spokenDataAPI.rb', line 146

def recording_processed?(recording_id)
  recording = get_recording_by_recording_id(recording_id)
  return get_recording_status(recording)
end

#retrieve_subtitles_srt(recording_id) ⇒ false if recording status is "processing", string containing srt file if status is "done"

retrieves subitles srt file

Parameters:

  • (spoken data uid)

Returns:

  • (false if recording status is "processing", string containing srt file if status is "done")

171
172
173
174
175
176
177
178
# File 'lib/spokenDataAPI.rb', line 171

def retrieve_subtitles_srt(recording_id)
  if recording_processed?(recording_id)
    return get_srt(recording_id)
  else
    false
  end

end

#send_by_video_url(url) ⇒ spokn data uid

Method to send video for captioning takes in the location url of the video, for instance if you are using amazon S3 this is the full path, if using youtube is just normal URL, also works with Vimeo.
returns the recording id, to be able to check status and retrieve captions subsequently, best to save this in the database.
language options are from API documentation:

  RECORDING-URL - YouTube or any direct URL of a media file
  LANGUAGE - english | english-broadcastnews | english-test | russian | chinese-ma | spanish-us | czech | czech-medicine | czech-broadcastnews | slovak
  ANNOTATOR-ID = id of assigned annotator (leave empty if no annotator)

if you are working with languages other then english you could modify params of this url to change language option.

Parameters:

  • (url of video)

Returns:

  • (spokn data uid)

194
195
196
197
198
199
200
# File 'lib/spokenDataAPI.rb', line 194

def send_by_video_url(url)
  request_url = get_base_api_request.to_s + "/recording/add?url=#{url}&language=english"
  response = parse_xml(request_url)  # example response from api #{"message"=>"This media URL and language have already been entered.", "recording"=>{"id"=>"5747"}}

  return response['recording']['id']
  return response
end