Class: Typingpool::Project

Inherits:
Object
  • Object
show all
Defined in:
lib/typingpool/project.rb,
lib/typingpool/project/local.rb,
lib/typingpool/project/remote.rb,
lib/typingpool/project/remote/s3.rb,
lib/typingpool/project/remote/sftp.rb

Overview

Class representing a transcription job, a job typically associated with a single interview or other event and with one or more audio files containing recordings of that event. A project is associated, locally, with a filesystem directory. On Amazon Mechanical Turk, a Project is associated with various HITs. A project is also associated with audio files on a remote server.

Defined Under Namespace

Classes: Local, Remote

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(name, config = Config.file) ⇒ Project

Constructor. Takes the project name and an optional Config instance (default is the default Config.file). Project does not have to exist locally or remotely.



32
33
34
35
36
# File 'lib/typingpool/project.rb', line 32

def initialize(name, config=Config.file)
  Local.valid_name?(name) or raise Error::Argument::Format, "Must be a valid name for a directory in the local filesystem. Eliminate '/' or any other illegal character."
  @name = name
  @config = config
end

Instance Attribute Details

#bitrateObject

Returns the desired bitrate of processed audio files.



20
21
22
# File 'lib/typingpool/project.rb', line 20

def bitrate
  @bitrate
end

#configObject

Accessor for the Config object associated with the project.



27
28
29
# File 'lib/typingpool/project.rb', line 27

def config
  @config
end

#intervalObject

Returns a time interval corresponding to the length of each audio chunk within the project. (Each chunk may be transcribed separately.)



17
18
19
# File 'lib/typingpool/project.rb', line 17

def interval
  @interval
end

#nameObject

Accessor for the name of the project (sometimes referred to as the ‘title’ in command line code)



24
25
26
# File 'lib/typingpool/project.rb', line 24

def name
  @name
end

Class Method Details

.local_basename_from_url(url) ⇒ Object

Takes an url. Returns the basename of the associated project.local file. This probably shouldn’t need to exist. (TODO: Make this unneccesary.)



166
167
168
169
# File 'lib/typingpool/project.rb', line 166

def self.local_basename_from_url(url)
  matches = Project.url_regex.match(url) or raise Error::Argument::Format, "Unexpected format to url '#{url}'"
  URI.unescape([matches[2..4].join('.'), matches[5]].join)
end

.url_regexObject

Returns a Regexp for breaking an URL down into the original project basename as well as the audio chunk offset. This probably shouldn’t need to exist. (TODO: make this unneccesary.)



159
160
161
# File 'lib/typingpool/project.rb', line 159

def self.url_regex
  Regexp.new('.+\/((.+)\.(\d+)\.(\d\d)\.[a-fA-F0-9]{32}\.[A-Z]{6}(\.\w+))')
end

Instance Method Details

#create_assignment_csv(args) ⇒ Object

Writes a CSV file into project.local directory, storing information about the specified files.

Params

:path

Relative path where the file will be written. Array of relative path elements. See Filer::Dir#file docs for details.

:urls

Array of URLs corresponding to project files.

:unusual

Optional. Array of unusual words spoken in the audio to be transcribed. This list is ultimately provided to transcribers to aid in their work.

:voices

Optional. Array of hashes, with each having a :name and :description element. Each hash corresponds to a person whose voice is on the audio. These details are ultimately provided to transcibers to allow them to correctly label sections of the transcript

Returns

Path to the resulting CSV file.



111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
# File 'lib/typingpool/project.rb', line 111

def create_assignment_csv(args)
  [:path, :urls].each{|arg| args[arg] or raise Error::Argument, "Missing arg '#{arg}'" }
  headers = ['audio_url',
             'project_id',
             'unusual',
             'chunk',
             'chunk_hours',
             'chunk_minutes',
             'chunk_seconds',
             'voices_count',
             (1 .. args[:voices].count).map{|n| ["voice#{n}", "voice#{n}title"]}
            ].flatten
  csv = args[:urls].map do |url|
    [url, 
     local.id,
     args[:unusual].join(', '),
     interval_as_time_string,
     interval_as_hours_minutes_seconds.map{|n| (n == 0) ? nil : n },
     args[:voices].count,
     args[:voices].map{|v| [v[:name], v[:description]]}
    ].flatten
  end
  local.file(*args[:path]).as(:csv).write_arrays(csv, headers)
  local.file_path(*args[:path])
end

#create_local(basedir = @config.transcripts) ⇒ Object

Creates a local filesystem directory corresponding to the project and constructs and returns a Project::Local instance associated with that directory and with this Project instance. Takes an optional path to a base directory in which to create the project directory; default is project.config.transcripts.



58
59
60
# File 'lib/typingpool/project.rb', line 58

def create_local(basedir=@config.transcripts)
  Local.create(@name, basedir, File.join(Utility.lib_dir, 'templates', 'project'))
end

#create_remote_names(files) ⇒ Object

Takes an array of file paths, file names, or Filer instances. Returns an array of file basenames. The return basenames will be the original basenames with the project id and a random or pseudo-random string insterted between the root basename and the file extension. The purpose of this is to make it difficult to guess the name of one remote file after seeing another, thus significantly complicating any attempt to download the entirety of a project (such as a journalistic interview) after seeing a single assignment on Amazon Mechanical Turk. (This should be considered an effort at obfuscation. It is not any guarantee of true security.)



148
149
150
151
152
153
154
# File 'lib/typingpool/project.rb', line 148

def create_remote_names(files)
  files.map do |file|
    name = [File.basename(file, '.*'), local.id, pseudo_random_uppercase_string].join('.')
    name += File.extname(file) if not(File.extname(file).to_s.empty?)
    name
  end
end

#interval_as_min_dot_secObject

Returns the project.interval in a format understood by the Unix utility mp3splt: $min.$sec.



75
76
77
78
79
80
81
82
# File 'lib/typingpool/project.rb', line 75

def interval_as_min_dot_sec
  seconds = @interval % 60
  if seconds > seconds.to_i
    #mpl3splt takes fractions of a second to hundredths of a second precision
    seconds = seconds.round(2)
  end
  min_dot_sec = "#{(@interval.to_i / 60).floor}.#{seconds}"
end

#local(dir = @config.transcripts) ⇒ Object

Constructs and returns a Project::Local instance associated with this Project instance IF the project exists at the appropriate location in the filesystem. Takes an optional path to a base directory to look in; default is project.config.transcripts.



49
50
51
# File 'lib/typingpool/project.rb', line 49

def local(dir=@config.transcripts)
  Local.named(@name, dir) 
end

#remote(config = @config) ⇒ Object

Constructs and returns a Project::Remote instance associated with this Project instance. Takes an optional Config instance; default is project.config.



41
42
43
# File 'lib/typingpool/project.rb', line 41

def remote(config=@config)
  Remote.from_config(config)
end