Class: MARC::UnsafeXMLWriter

Inherits:
XMLWriter show all
Defined in:
lib/marc/unsafe_xmlwriter.rb

Overview

UnsafeXMLWriter bypasses real xml handlers like REXML or Nokogiri and just concatenates strings to produce the XML document. This has no guarantees of validity if the MARC record you’re encoding isn’t valid and won’t do things like entity expansion, but it does escape using ruby’s String#encode(xml: :text) and it’s much, much faster – 4-5 times faster than using Nokogiri, and 15-20 times faster than the REXML version.

Constant Summary collapse

XML_HEADER =
'<?xml version="1.0" encoding="UTF-8"?>'
NS_ATTRS =
%(xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/MARC21/slim" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd")
NS_COLLECTION =
"<collection #{NS_ATTRS}>".freeze
COLLECTION =
"<collection>".freeze
NS_RECORD =
"<record #{NS_ATTRS}>".freeze
RECORD =
"<record>".freeze

Constants inherited from XMLWriter

XMLWriter::COLLECTION_TAG

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from XMLWriter

#close, fix_leader, #initialize, #stylesheet_tag

Constructor Details

This class inherits a constructor from MARC::XMLWriter

Class Method Details

.encode(record, include_namespace: true) ⇒ String

Take a record and turn it into a valid MARC-XML string. Note that this is an XML snippet, without an XML header or <collection> enclosure.



58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/marc/unsafe_xmlwriter.rb', line 58

def encode(record, include_namespace: true)
  xml = open_record(include_namespace: include_namespace).dup

  # MARCXML only allows alphanumerics or spaces in the leader
  lead = fix_leader(record.leader)

  xml << "<leader>" << lead.encode(xml: :text) << "</leader>"
  record.each do |f|
    if f.instance_of?(MARC::DataField)
      xml << open_datafield(f.tag, f.indicator1, f.indicator2)
      f.each do |sf|
        xml << open_subfield(sf.code) << sf.value.encode(xml: :text) << "</subfield>"
      end
      xml << "</datafield>"
    elsif f.instance_of?(MARC::ControlField)
      xml << open_controlfield(f.tag) << f.value.encode(xml: :text) << "</controlfield>"
    end
  end
  xml << "</record>"
  xml.force_encoding("utf-8")
end

.open_collection(include_namespace: true) ⇒ Object

Open ‘collection` tag, w or w/o namespace



26
27
28
29
30
31
32
# File 'lib/marc/unsafe_xmlwriter.rb', line 26

def open_collection(include_namespace: true)
  if include_namespace
    NS_COLLECTION
  else
    COLLECTION
  end
end

.open_controlfield(tag) ⇒ Object



88
89
90
# File 'lib/marc/unsafe_xmlwriter.rb', line 88

def open_controlfield(tag)
  "<controlfield tag=\"#{tag}\">"
end

.open_datafield(tag, ind1, ind2) ⇒ Object



80
81
82
# File 'lib/marc/unsafe_xmlwriter.rb', line 80

def open_datafield(tag, ind1, ind2)
  "<datafield tag=\"#{tag}\" ind1=\"#{ind1}\" ind2=\"#{ind2}\">"
end

.open_record(include_namespace: true) ⇒ Object



34
35
36
37
38
39
40
# File 'lib/marc/unsafe_xmlwriter.rb', line 34

def open_record(include_namespace: true)
  if include_namespace
    NS_RECORD
  else
    RECORD
  end
end

.open_subfield(code) ⇒ Object



84
85
86
# File 'lib/marc/unsafe_xmlwriter.rb', line 84

def open_subfield(code)
  "<subfield code=\"#{code}\">"
end

.single_record_document(record, include_namespace: true) ⇒ Object

Produce an XML string with a single document in a collection



45
46
47
48
49
50
51
# File 'lib/marc/unsafe_xmlwriter.rb', line 45

def single_record_document(record, include_namespace: true)
  xml = XML_HEADER.dup
  xml << open_collection(include_namespace: include_namespace)
  xml << encode(record, include_namespace: false)
  xml << "</collection>".freeze
  xml
end

Instance Method Details

#write(record) ⇒ Object

Write the record to the target



20
21
22
# File 'lib/marc/unsafe_xmlwriter.rb', line 20

def write(record)
  @fh.write(self.class.encode(record))
end