Class: IiifPrint::TextExtraction::AltoReader
- Inherits:
-
Object
- Object
- IiifPrint::TextExtraction::AltoReader
- Defined in:
- lib/iiif_print/text_extraction/alto_reader.rb
Overview
Class to obtain plain text and JSON word-coordinates from ALTO source
Defined Under Namespace
Classes: AltoDocStream
Instance Attribute Summary collapse
-
#doc_stream ⇒ Object
Returns the value of attribute doc_stream.
-
#source ⇒ Object
Returns the value of attribute source.
Instance Method Summary collapse
-
#initialize(xml, image_width = nil, image_height = nil) ⇒ AltoReader
constructor
Construct with either path.
-
#isxml?(xml) ⇒ true, false
Determine if source parameter is path or xml.
-
#json ⇒ String
Output JSON flattened word coordinates.
Constructor Details
#initialize(xml, image_width = nil, image_height = nil) ⇒ AltoReader
Construct with either path
93 94 95 96 97 98 99 100 |
# File 'lib/iiif_print/text_extraction/alto_reader.rb', line 93 def initialize(xml, image_width = nil, image_height = nil) @source = isxml?(xml) ? xml : File.read(xml) @image_width = image_width @image_height = image_height @doc_stream = AltoDocStream.new(image_width) parser = Nokogiri::XML::SAX::Parser.new(doc_stream) parser.parse(@source) end |
Instance Attribute Details
#doc_stream ⇒ Object
Returns the value of attribute doc_stream.
10 11 12 |
# File 'lib/iiif_print/text_extraction/alto_reader.rb', line 10 def doc_stream @doc_stream end |
#source ⇒ Object
Returns the value of attribute source.
10 11 12 |
# File 'lib/iiif_print/text_extraction/alto_reader.rb', line 10 def source @source end |
Instance Method Details
#isxml?(xml) ⇒ true, false
Determine if source parameter is path or xml
106 107 108 |
# File 'lib/iiif_print/text_extraction/alto_reader.rb', line 106 def isxml?(xml) xml.lstrip.start_with?('<') end |
#json ⇒ String
Output JSON flattened word coordinates
113 114 115 116 117 118 119 120 |
# File 'lib/iiif_print/text_extraction/alto_reader.rb', line 113 def json words = @doc_stream.words IiifPrint::TextExtraction::WordCoordsBuilder.json_coordinates_for( words: words, width: @image_width, height: @image_height ) end |