Class: IiifPrint::TextExtraction::AltoReader

Inherits:
Object
  • Object
show all
Defined in:
lib/iiif_print/text_extraction/alto_reader.rb

Overview

Class to obtain plain text and JSON word-coordinates from ALTO source

Defined Under Namespace

Classes: AltoDocStream

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(xml, image_width = nil, image_height = nil) ⇒ AltoReader

Construct with either path

Parameters:

  • xml (String)

    , and process document



93
94
95
96
97
98
99
100
# File 'lib/iiif_print/text_extraction/alto_reader.rb', line 93

def initialize(xml, image_width = nil, image_height = nil)
  @source = isxml?(xml) ? xml : File.read(xml)
  @image_width = image_width
  @image_height = image_height
  @doc_stream = AltoDocStream.new(image_width)
  parser = Nokogiri::XML::SAX::Parser.new(doc_stream)
  parser.parse(@source)
end

Instance Attribute Details

#doc_streamObject

Returns the value of attribute doc_stream.



10
11
12
# File 'lib/iiif_print/text_extraction/alto_reader.rb', line 10

def doc_stream
  @doc_stream
end

#sourceObject

Returns the value of attribute source.



10
11
12
# File 'lib/iiif_print/text_extraction/alto_reader.rb', line 10

def source
  @source
end

Instance Method Details

#isxml?(xml) ⇒ true, false

Determine if source parameter is path or xml

Parameters:

  • xml (String)

    either path to xml file or xml source

Returns:

  • (true, false)

    true if string appears to be XML source, not path



106
107
108
# File 'lib/iiif_print/text_extraction/alto_reader.rb', line 106

def isxml?(xml)
  xml.lstrip.start_with?('<')
end

#jsonString

Output JSON flattened word coordinates

Returns:

  • (String)

    JSON serialization of flattened word coordinates



113
114
115
116
117
118
119
120
# File 'lib/iiif_print/text_extraction/alto_reader.rb', line 113

def json
  words = @doc_stream.words
  IiifPrint::TextExtraction::WordCoordsBuilder.json_coordinates_for(
    words: words,
    width: @image_width,
    height: @image_height
  )
end