Class: SmTranscript::Transcript

Inherits:

Object

Object
SmTranscript::Transcript

show all

Defined in:: lib/sm_transcript/transcript.rb

Instance Method Summary collapse

#cleanup_phrase(phrase) ⇒ Object

There are some word combinations that occur with such regularity that they call out to be fixed.
#get_time_expression(milliseconds) ⇒ Object

words_to_phrase.
#initialize(word_arr) ⇒ Transcript constructor

A new instance of Transcript.
#words_to_phrase(start_time) ⇒ Object

Times are expressed in milliseconds, far more granularity than is useful for most user-facing apps, especially since the player reports elapsed time only ten times a second.
#write_html(dest_file) ⇒ Object
#write_ttml(dest_file) ⇒ Object

Constructor Details

#initialize(word_arr) ⇒ `Transcript`

Returns a new instance of Transcript.

# File 'lib/sm_transcript/transcript.rb', line 15

def initialize(word_arr)
  @metadata = {}
  @words = word_arr
end

Instance Method Details

#cleanup_phrase(phrase) ⇒ `Object`

There are some word combinations that occur with such regularity that they call out to be fixed. For example, “m I t” is unambiguously MIT. These edits can only be done when the phrase has been assembled.



124
125
126

# File 'lib/sm_transcript/transcript.rb', line 124

def cleanup_phrase(phrase)
  phrase
end

#get_time_expression(milliseconds) ⇒ `Object`

words_to_phrase



117
118
119

# File 'lib/sm_transcript/transcript.rb', line 117

def get_time_expression(milliseconds)
  milliseconds
end

#words_to_phrase(start_time) ⇒ `Object`

Times are expressed in milliseconds, far more granularity than is useful for most user-facing apps, especially since the player reports elapsed time only ten times a second. By reducing the time by orders of magnitude provides these benefits:

1) Multiple words fall within a <span> element. 2) Better mapping between start times and player time tracking



113
114
115

# File 'lib/sm_transcript/transcript.rb', line 113

def words_to_phrase(start_time)
  start_time.to_i/1000
end

#write_html(dest_file) ⇒ `Object`

# File 'lib/sm_transcript/transcript.rb', line 20

def write_html(dest_file)
  # TODO: Do we want to notify user when overwriting existing file?
  # if File.exists?(dest_file)
  #   p "overwriting existing destination file"
  # end
  File.open(dest_file, "w") do |f|
    span_element = ""
    prev_start_time = 0
    start_time = 0
    @words.each do |w|
      # get the start time and reduce its granularity so that multiple 
      # words fall within a <span> element.
      start_time = w.start_time.to_i/1000
      if start_time.to_i == prev_start_time.to_i # append word
        span_element << " #{w.word}"
      else # create a new span_element
        # since prev_start_time is zero on first line, this avoids
        # writing a closing </span> with no opening <span>
        f.puts span_element << "</span> " unless prev_start_time == 0

        span_element = "<span id='T#{start_time}'>#{w.word}" 
        prev_start_time = start_time 
      end
    end
    # In the block above, the last word isn't written if 
    # the start_time and prev_start_time are the same. 
    f.puts span_element << "</span> " unless start_time != prev_start_time			

  end
end

#write_ttml(dest_file) ⇒ `Object`

# File 'lib/sm_transcript/transcript.rb', line 52

def write_ttml(dest_file)
  # TODO: Do we want to notify user when overwriting existing file?
  # if File.exists?(dest_file)
  #   p "overwriting existing destination file"
  # end
  buf = ""
  bldr = Builder::XmlMarkup.new( :target => buf, :indent => 2 )
  bldr.instruct!
  bldr.tt("xmlns" => "http://www.w3.org/2006/04/ttaf1", 
  "xmlns:tts" => "http://www.w3.org/ns/ttml#styling",
  "xmlns:ttm" => "http://www.w3.org/ns/ttml#metadata",
  "xml:lang" => "en" ) { 
    bldr.head { |b|
      b.ttm :title, 'Document Metadata Example'
      b.ttm :desc,  'This document employs document metadata.'
    }
    bldr.body {
      bldr.div {
        span_element = ""
        prev_start_secs = 0
        start_ms = end_ms = 0
        start_secs = 0
        @words.each do |w|
          # get the start time and reduce its granularity so that multiple 
          # words fall within a span element.
          start_secs = w.start_time.to_i/1000
          if start_secs == prev_start_secs # append word
            end_ms   = w.end_time.to_i
            span_element << " #{w.word}"
          else # create a new span_element
            bldr.p( span_element, 
            "xml:id" => "T#{start_secs.to_s}", "begin" => "#{start_ms.to_s}ms", "end" => "#{end_ms.to_s}ms" )

            start_ms = w.start_time.to_i
            end_ms   = w.end_time.to_i
            span_element = " #{w.word}" 
            prev_start_secs = start_secs 
          end
        end
        # In the block above, the last word isn't written if 
        # the start_time and prev_start_time are the same. 
        bldr.p( span_element, 
          "xml:id" => "T#{start_secs.to_s}", 
          "begin" => "#{start_ms.to_s}ms", 
          "end" => "#{end_ms.to_s}ms" ) unless start_secs != prev_start_secs			
      }
    }
  } 
  # p buf
  File.open(dest_file, "w") do |f|
    f.puts buf
    f.flush
  end
end

Class: SmTranscript::Transcript

Instance Method Summary collapse

Constructor Details

#initialize(word_arr) ⇒ Transcript

Instance Method Details

#cleanup_phrase(phrase) ⇒ Object

#get_time_expression(milliseconds) ⇒ Object

#words_to_phrase(start_time) ⇒ Object