Class: JhoveTechnicalMetadata
- Inherits:
-
Nokogiri::XML::SAX::Document
- Object
- Nokogiri::XML::SAX::Document
- JhoveTechnicalMetadata
- Defined in:
- lib/jhove_technical_metadata.rb
Overview
A SAX handler for filtering JHOVE output to create a technicalMetadata datastream The previous mechanism (uising XSLT tranformation) was causing out of memory errors, due to XSLT’s behavior of loading both the input and output objects into memory.
Instance Attribute Summary collapse
-
#digital_object_id ⇒ String
The druid of the object, which gets inserted in the root element of the output.
-
#ios ⇒ IO
The output stream for the result.
Instance Method Summary collapse
-
#characters(string) ⇒ void
This method is called by the sax parser when a text node is encountered.
-
#end_element(tag) ⇒ void
This method is called by the sax parser at the end of an element.
-
#file_wrapper_close ⇒ void
but first inset a textMD stanza if the file has a text format.
-
#file_wrapper_open(attrs) ⇒ void
Append a <file> element to the output, setting the id attribute to the file path.
-
#initialize ⇒ JhoveTechnicalMetadata
constructor
A new instance of JhoveTechnicalMetadata.
-
#jhove_close(tag) ⇒ void
Output a closing tag, preceded by cached data, if such exists.
-
#jhove_open(tag, attrs) ⇒ void
Copy this jhove element tag and its attributes verbatum.
-
#linebreak_close(tag) ⇒ void
Look for the LineEndings name/value pair, which is spread across multiple elements.
-
#linebreak_open(tag) ⇒ void
Keep clearing the text cache any time a new element is encountered.
-
#mix_close(tag) ⇒ void
Output a closing tag, preceded by cached data, if such exists.
-
#mix_open(tag) ⇒ void
Copy any Mix data verbatum,.
-
#output(string) ⇒ void
Append the specified string to the output stream.
-
#output_file=(pathname) ⇒ void
Opens the output stream pointing to the specified file.
-
#output_textmd(linebreak) ⇒ void
Output a textMD section within the properties element.
-
#properties_close ⇒ void
Appending of a closing tag is handled elsewhere.
-
#properties_open ⇒ void
Output a <properties> element if one was encountered in the input, then ignore most input data from within the properties element, except mix and LineBreaks.
-
#root_close ⇒ void
Add the closing element of the output document.
-
#root_open(attrs) ⇒ void
Create the <technicalMetadata> root element of the XML output and include namespace declararions.
-
#start_element(tag, attrs = []) ⇒ void
This method is called by the sax parser at the beginning of an element.
Constructor Details
#initialize ⇒ JhoveTechnicalMetadata
Returns a new instance of JhoveTechnicalMetadata.
17 18 19 20 |
# File 'lib/jhove_technical_metadata.rb', line 17 def initialize() @indent = 0 @ios = STDOUT #File.open(STDOUT, 'w') end |
Instance Attribute Details
#digital_object_id ⇒ String
Returns The druid of the object, which gets inserted in the root element of the output.
15 16 17 |
# File 'lib/jhove_technical_metadata.rb', line 15 def digital_object_id @digital_object_id end |
#ios ⇒ IO
Returns the output stream for the result.
12 13 14 |
# File 'lib/jhove_technical_metadata.rb', line 12 def ios @ios end |
Instance Method Details
#characters(string) ⇒ void
This method returns an undefined value.
Returns this method is called by the sax parser when a text node is encountered.
64 65 66 |
# File 'lib/jhove_technical_metadata.rb', line 64 def characters(string) @text = string end |
#end_element(tag) ⇒ void
This method returns an undefined value.
Returns this method is called by the sax parser at the end of an element.
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/jhove_technical_metadata.rb', line 70 def end_element(tag) case tag when 'jhove' root_close when 'repInfo' file_wrapper_close when 'properties' properties_close else if tag[0..2] == 'mix' mix_close(tag) elsif @in_jhove jhove_close(tag) elsif @in_properties linebreak_close(tag) end end end |
#file_wrapper_close ⇒ void
This method returns an undefined value.
but first inset a textMD stanza if the file has a text format
123 124 125 126 127 128 129 130 131 132 133 134 |
# File 'lib/jhove_technical_metadata.rb', line 123 def file_wrapper_close case @format when 'ASCII', 'HTML','TEXT','UTF-8' output_textmd(@linebreak) end @indent -= 1 output " </jhove:properties>" if @in_properties output "</file>" @in_jhove = false @in_properties=false end |
#file_wrapper_open(attrs) ⇒ void
This method returns an undefined value.
Returns Append a <file> element to the output, setting the id attribute to the file path.
113 114 115 116 117 118 119 |
# File 'lib/jhove_technical_metadata.rb', line 113 def file_wrapper_open(attrs) filepath=nil attrs.each { |attr| filepath=attr[1] if attr[0]=='uri'} output "<file id='#{filepath}'>" @indent += 1 @in_jhove = true end |
#jhove_close(tag) ⇒ void
This method returns an undefined value.
Returns Output a closing tag, preceded by cached data, if such exists.
157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
# File 'lib/jhove_technical_metadata.rb', line 157 def jhove_close(tag) if @text && tag == @jhove_tag output "<jhove:#{@jhove_tag}#{@jhove_attrs}>#{@text}</jhove:#{tag}>" elsif tag == @jhove_tag output "<jhove:#{@jhove_tag}#{@jhove_attrs}/>" else @indent -=1 output "</jhove:#{tag}>" end @format = @text if tag == 'format' @text = nil @jhove_tag = nil @jhove_attrs="" end |
#jhove_open(tag, attrs) ⇒ void
This method returns an undefined value.
Returns Copy this jhove element tag and its attributes verbatum.
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
# File 'lib/jhove_technical_metadata.rb', line 139 def jhove_open(tag, attrs) if @jhove_tag # saved previously # we encountered a new element so output what was previously cached output "<jhove:#{@jhove_tag}#{@jhove_attrs}>" @indent += 1 end # cache the element name and its attributes @jhove_tag = tag @jhove_attrs = "" attrs.each do |attr| @jhove_attrs += " #{attr[0]}='#{attr[1]}'" end @text = nil @linebreak='LF' end |
#linebreak_close(tag) ⇒ void
This method returns an undefined value.
Returns Look for the LineEndings name/value pair, which is spread across multiple elements.
222 223 224 225 226 227 228 229 230 231 |
# File 'lib/jhove_technical_metadata.rb', line 222 def linebreak_close(tag) case tag when 'name' @in_line_endings = false @in_line_endings = true if @text == 'LineEndings' when 'value' @linebreak = @text if @in_line_endings @in_line_endings = false end end |
#linebreak_open(tag) ⇒ void
This method returns an undefined value.
Returns Keep clearing the text cache any time a new element is encountered.
216 217 218 |
# File 'lib/jhove_technical_metadata.rb', line 216 def linebreak_open(tag) @text = nil if @text end |
#mix_close(tag) ⇒ void
This method returns an undefined value.
Returns Output a closing tag, preceded by cached data, if such exists.
201 202 203 204 205 206 207 208 209 210 211 212 |
# File 'lib/jhove_technical_metadata.rb', line 201 def mix_close(tag) if @text && tag == @mix_tag output "<#{tag}>#{@text}</#{tag}>" elsif tag == @mix_tag output "<#{tag}/>" else @indent -=1 output "</#{tag}>" end @text = nil @mix_tag = nil end |
#mix_open(tag) ⇒ void
This method returns an undefined value.
Returns Copy any Mix data verbatum,.
188 189 190 191 192 193 194 195 196 197 |
# File 'lib/jhove_technical_metadata.rb', line 188 def mix_open(tag) if @mix_tag # we encountered a new element so output what was previously cached output "<#{@mix_tag}>" @indent += 1 end # cache the element name @mix_tag = tag @text = nil end |
#output(string) ⇒ void
This method returns an undefined value.
Returns Append the specified string to the output stream.
30 31 32 |
# File 'lib/jhove_technical_metadata.rb', line 30 def output(string) @ios.puts " "*@indent + string end |
#output_file=(pathname) ⇒ void
This method returns an undefined value.
Returns Opens the output stream pointing to the specified file.
24 25 26 |
# File 'lib/jhove_technical_metadata.rb', line 24 def output_file=(pathname) @ios = pathname.open('w') end |
#output_textmd(linebreak) ⇒ void
This method returns an undefined value.
Returns Output a textMD section within the properties element.
235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 |
# File 'lib/jhove_technical_metadata.rb', line 235 def output_textmd(linebreak) indent = @indent @indent = 0 if @in_properties # properties element tags provided by other code output " <textmd:textMD>\n <textmd:character_info>\n <textmd:byte_order>big</textmd:byte_order>\n <textmd:byte_size>8</textmd:byte_size>\n <textmd:character_size>1</textmd:character_size>\n <textmd:linebreak>\#{linebreak}</textmd:linebreak>\n </textmd:character_info>\n </textmd:textMD>\n EOF\n else\n # there were no properties elements in the input, so we must supply them ourselves\n output <<-EOF\n <jhove:properties>\n <textmd:textMD>\n <textmd:character_info>\n <textmd:byte_order>big</textmd:byte_order>\n <textmd:byte_size>8</textmd:byte_size>\n <textmd:character_size>1</textmd:character_size>\n <textmd:linebreak>\#{linebreak}</textmd:linebreak>\n </textmd:character_info>\n </textmd:textMD>\n </jhove:properties>\n EOF\n end\n @indent = indent\nend\n" |
#properties_close ⇒ void
This method returns an undefined value.
Returns Appending of a closing tag is handled elsewhere.
182 183 184 |
# File 'lib/jhove_technical_metadata.rb', line 182 def properties_close @indent -= 1 end |
#properties_open ⇒ void
This method returns an undefined value.
Returns Output a <properties> element if one was encountered in the input, then ignore most input data from within the properties element, except mix and LineBreaks.
174 175 176 177 178 179 |
# File 'lib/jhove_technical_metadata.rb', line 174 def properties_open output "<jhove:properties>" @indent += 1 @in_jhove = false @in_properties=true end |
#root_close ⇒ void
This method returns an undefined value.
Returns add the closing element of the output document.
105 106 107 108 109 |
# File 'lib/jhove_technical_metadata.rb', line 105 def root_close @indent -= 1 output "</technicalMetadata>" @ios.close end |
#root_open(attrs) ⇒ void
This method returns an undefined value.
Returns create the <technicalMetadata> root element of the XML output and include namespace declararions.
91 92 93 94 95 96 97 98 99 100 101 102 |
# File 'lib/jhove_technical_metadata.rb', line 91 def root_open(attrs) if @digital_object_id output "<technicalMetadata objectId='#{@digital_object_id}' datetime='#{Time.now.utc.iso8601}'" else output "<technicalMetadata datetime='#{Time.now.utc.iso8601}'" end @indent += 2 output "xmlns:jhove='http://hul.harvard.edu/ois/xml/ns/jhove'" output "xmlns:mix='http://www.loc.gov/mix/v10'" output "xmlns:textmd='info:lc/xmlns/textMD-v3' >" @indent -= 1 end |
#start_element(tag, attrs = []) ⇒ void
This method returns an undefined value.
Returns this method is called by the sax parser at the beginning of an element.
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/jhove_technical_metadata.rb', line 37 def start_element(tag, attrs = []) case tag when 'jhove' # <jhove> is the root element of the input root_open(attrs) when 'repInfo' # A <repInfo> element contains the data for each file file_wrapper_open(attrs) when 'properties' # A <properties> element contains the variable data for the file properties_open else if tag[0..2] == 'mix' # JHOVE output for image files contains tech md in MIX format that we copy verbatum to output mix_open(tag) elsif @in_jhove # we've encountered one of the JHOVE elements that we want to automatically copy jhove_open(tag, attrs) elsif @in_properties # we're looking for the LineEndings property in the JHOVE output linebreak_open(tag) end end end |