Class: Typingpool::Project
- Inherits:
-
Object
- Object
- Typingpool::Project
- Defined in:
- lib/typingpool/project.rb,
lib/typingpool/project/local.rb,
lib/typingpool/project/remote.rb,
lib/typingpool/project/remote/s3.rb,
lib/typingpool/project/remote/sftp.rb
Overview
Class representing a transcription job, a job typically associated with a single interview or other event and with one or more audio files containing recordings of that event. A project is associated, locally, with a filesystem directory. On Amazon Mechanical Turk, a Project is associated with various HITs. A project is also associated with audio files on a remote server.
Defined Under Namespace
Instance Attribute Summary collapse
-
#bitrate ⇒ Object
Returns the desired bitrate of processed audio files.
-
#config ⇒ Object
Accessor for the Config object associated with the project.
-
#interval ⇒ Object
Returns a time interval corresponding to the length of each audio chunk within the project.
-
#name ⇒ Object
Accessor for the name of the project (sometimes referred to as the ‘title’ in command line code).
Class Method Summary collapse
-
.local_basename_from_url(url) ⇒ Object
Takes an url.
-
.url_regex ⇒ Object
Returns a Regexp for breaking an URL down into the original project basename as well as the audio chunk offset.
Instance Method Summary collapse
-
#create_assignment_csv(args) ⇒ Object
Writes a CSV file into project.local directory, storing information about the specified files.
-
#create_local(basedir = @config.transcripts) ⇒ Object
Creates a local filesystem directory corresponding to the project and constructs and returns a Project::Local instance associated with that directory and with this Project instance.
-
#create_remote_names(files) ⇒ Object
Takes an array of file paths, file names, or Filer instances.
-
#initialize(name, config = Config.file) ⇒ Project
constructor
Constructor.
-
#interval_as_min_dot_sec ⇒ Object
Returns the project.interval in a format understood by the Unix utility mp3splt: $min.$sec.
-
#local(dir = @config.transcripts) ⇒ Object
Constructs and returns a Project::Local instance associated with this Project instance IF the project exists at the appropriate location in the filesystem.
-
#remote(config = @config) ⇒ Object
Constructs and returns a Project::Remote instance associated with this Project instance.
Constructor Details
#initialize(name, config = Config.file) ⇒ Project
Constructor. Takes the project name and an optional Config instance (default is the default Config.file). Project does not have to exist locally or remotely.
32 33 34 35 36 |
# File 'lib/typingpool/project.rb', line 32 def initialize(name, config=Config.file) Local.valid_name?(name) or raise Error::Argument::Format, "Must be a valid name for a directory in the local filesystem. Eliminate '/' or any other illegal character." @name = name @config = config end |
Instance Attribute Details
#bitrate ⇒ Object
Returns the desired bitrate of processed audio files.
20 21 22 |
# File 'lib/typingpool/project.rb', line 20 def bitrate @bitrate end |
#config ⇒ Object
Accessor for the Config object associated with the project.
27 28 29 |
# File 'lib/typingpool/project.rb', line 27 def config @config end |
#interval ⇒ Object
Returns a time interval corresponding to the length of each audio chunk within the project. (Each chunk may be transcribed separately.)
17 18 19 |
# File 'lib/typingpool/project.rb', line 17 def interval @interval end |
#name ⇒ Object
Accessor for the name of the project (sometimes referred to as the ‘title’ in command line code)
24 25 26 |
# File 'lib/typingpool/project.rb', line 24 def name @name end |
Class Method Details
.local_basename_from_url(url) ⇒ Object
Takes an url. Returns the basename of the associated project.local file. This probably shouldn’t need to exist. (TODO: Make this unneccesary.)
166 167 168 169 |
# File 'lib/typingpool/project.rb', line 166 def self.local_basename_from_url(url) matches = Project.url_regex.match(url) or raise Error::Argument::Format, "Unexpected format to url '#{url}'" URI.unescape([matches[2..4].join('.'), matches[5]].join) end |
.url_regex ⇒ Object
Returns a Regexp for breaking an URL down into the original project basename as well as the audio chunk offset. This probably shouldn’t need to exist. (TODO: make this unneccesary.)
159 160 161 |
# File 'lib/typingpool/project.rb', line 159 def self.url_regex Regexp.new('.+\/((.+)\.(\d+)\.(\d\d)\.[a-fA-F0-9]{32}\.[A-Z]{6}(\.\w+))') end |
Instance Method Details
#create_assignment_csv(args) ⇒ Object
Writes a CSV file into project.local directory, storing information about the specified files.
Params
- :path
-
Relative path where the file will be written. Array of relative path elements. See Filer::Dir#file docs for details.
- :urls
-
Array of URLs corresponding to project files.
- :unusual
-
Optional. Array of unusual words spoken in the audio to be transcribed. This list is ultimately provided to transcribers to aid in their work.
- :voices
-
Optional. Array of hashes, with each having a :name and :description element. Each hash corresponds to a person whose voice is on the audio. These details are ultimately provided to transcibers to allow them to correctly label sections of the transcript
Returns
Path to the resulting CSV file.
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
# File 'lib/typingpool/project.rb', line 111 def create_assignment_csv(args) [:path, :urls].each{|arg| args[arg] or raise Error::Argument, "Missing arg '#{arg}'" } headers = ['audio_url', 'project_id', 'unusual', 'chunk', 'chunk_hours', 'chunk_minutes', 'chunk_seconds', 'voices_count', (1 .. args[:voices].count).map{|n| ["voice#{n}", "voice#{n}title"]} ].flatten csv = args[:urls].map do |url| [url, local.id, args[:unusual].join(', '), interval_as_time_string, interval_as_hours_minutes_seconds.map{|n| (n == 0) ? nil : n }, args[:voices].count, args[:voices].map{|v| [v[:name], v[:description]]} ].flatten end local.file(*args[:path]).as(:csv).write_arrays(csv, headers) local.file_path(*args[:path]) end |
#create_local(basedir = @config.transcripts) ⇒ Object
Creates a local filesystem directory corresponding to the project and constructs and returns a Project::Local instance associated with that directory and with this Project instance. Takes an optional path to a base directory in which to create the project directory; default is project.config.transcripts.
58 59 60 |
# File 'lib/typingpool/project.rb', line 58 def create_local(basedir=@config.transcripts) Local.create(@name, basedir, File.join(Utility.lib_dir, 'templates', 'project')) end |
#create_remote_names(files) ⇒ Object
Takes an array of file paths, file names, or Filer instances. Returns an array of file basenames. The return basenames will be the original basenames with the project id and a random or pseudo-random string insterted between the root basename and the file extension. The purpose of this is to make it difficult to guess the name of one remote file after seeing another, thus significantly complicating any attempt to download the entirety of a project (such as a journalistic interview) after seeing a single assignment on Amazon Mechanical Turk. (This should be considered an effort at obfuscation. It is not any guarantee of true security.)
148 149 150 151 152 153 154 |
# File 'lib/typingpool/project.rb', line 148 def create_remote_names(files) files.map do |file| name = [File.basename(file, '.*'), local.id, pseudo_random_uppercase_string].join('.') name += File.extname(file) if not(File.extname(file).to_s.empty?) name end end |
#interval_as_min_dot_sec ⇒ Object
Returns the project.interval in a format understood by the Unix utility mp3splt: $min.$sec.
75 76 77 78 79 80 81 82 |
# File 'lib/typingpool/project.rb', line 75 def interval_as_min_dot_sec seconds = @interval % 60 if seconds > seconds.to_i #mpl3splt takes fractions of a second to hundredths of a second precision seconds = seconds.round(2) end min_dot_sec = "#{(@interval.to_i / 60).floor}.#{seconds}" end |
#local(dir = @config.transcripts) ⇒ Object
Constructs and returns a Project::Local instance associated with this Project instance IF the project exists at the appropriate location in the filesystem. Takes an optional path to a base directory to look in; default is project.config.transcripts.
49 50 51 |
# File 'lib/typingpool/project.rb', line 49 def local(dir=@config.transcripts) Local.named(@name, dir) end |
#remote(config = @config) ⇒ Object
Constructs and returns a Project::Remote instance associated with this Project instance. Takes an optional Config instance; default is project.config.
41 42 43 |
# File 'lib/typingpool/project.rb', line 41 def remote(config=@config) Remote.from_config(config) end |