Class: MARC::XMLReader

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/marc/xmlreader.rb

Overview

the constructor which you can pass either a filename:

reader = MARC::XMLReader.new('/Users/edsu/marc.xml')

or a File object,

reader = Marc::XMLReader.new(File.new('/Users/edsu/marc.xml'))

or really any object that responds to read(n)

reader = MARC::XMLReader.new(StringIO.new(xml))

By default, XMLReader uses REXML’s pull parser, but you can swap that out with Nokogiri or jrexml (or let the system choose the ‘best’ one). The :parser can either be one of the defined constants or the constant’s value.

reader = MARC::XMLReader.new(fh, :parser=>'magic')

It is also possible to set the default parser at the class level so all subsequent instances will use it instead:

MARC::XMLReader.best_available
"nokogiri" # returns parser name, but doesn't set it.

Use:

MARC::XMLReader.best_available!

or

MARC::XMLReader.nokogiri!

By default, all XML parsers except REXML require the MARC namespace (www.loc.gov/MARC21/slim) to be included. Adding the option ‘ignore_namespace` to the call to `new` with a true value will allow parsing to proceed, e.g.,

reader = MARC::XMLReader.new(filename, parser: :nokogiri, ignore_namespace: true)

You can also pass in an error_handler option that will be called if there are any validation errors found when parsing a record.

reader = MARC::XMLReader.new(fh, error_handler: ->(reader, record, block) { ... })

By default, a MARC::RecordException is raised halting all future parsing.

Constant Summary collapse

USE_BEST_AVAILABLE =
"magic"
USE_REXML =
"rexml"
USE_NOKOGIRI =
"nokogiri"
USE_JREXML =
"jrexml"
USE_JSTAX =
"jstax"
USE_LIBXML =
"libxml"
@@parser =
USE_REXML

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(file, options = {}) ⇒ XMLReader

Returns a new instance of XMLReader.



58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# File 'lib/marc/xmlreader.rb', line 58

def initialize(file, options = {})
  if file.is_a?(String)
    handle = File.new(file)
  elsif file.respond_to?(:read, 5)
    handle = file
  else
    raise ArgumentError, "must pass in path or File"
  end
  @handle = handle

  if options[:ignore_namespace]
    @ignore_namespace = options[:ignore_namespace]
  end

  parser = if options[:parser]
    self.class.choose_parser(options[:parser].to_s)
  else
    @@parser
  end

  case parser
  when "magic" then extend MagicReader
  when "rexml" then extend REXMLReader
  when "jrexml"
    raise ArgumentError, "jrexml only available under jruby" unless defined? JRUBY_VERSION
    extend JREXMLReader
  when "nokogiri" then extend NokogiriReader
  when "jstax"
    raise ArgumentError, "jstax only available under jruby" unless defined? JRUBY_VERSION
    extend JRubySTAXReader
  when "libxml" then extend LibXMLReader
                     raise ArgumentError, "libxml not available under jruby" if defined? JRUBY_VERSION
  end

  @error_handler = options[:error_handler]
end

Instance Attribute Details

#error_handlerObject (readonly)

Returns the value of attribute error_handler.



56
57
58
# File 'lib/marc/xmlreader.rb', line 56

def error_handler
  @error_handler
end

#parserObject (readonly)

Returns the value of attribute parser.



56
57
58
# File 'lib/marc/xmlreader.rb', line 56

def parser
  @parser
end

Class Method Details

.best_availableObject

Returns the value of the best available parser



117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
# File 'lib/marc/xmlreader.rb', line 117

def best_available
  parser = nil
  if defined? JRUBY_VERSION
    unless parser
      begin
        require "nokogiri"
        parser = USE_NOKOGIRI
      rescue LoadError
      end
    end
    unless parser
      begin
        # try to find the class, so we throw an error if not found
        java.lang.Class.forName("javax.xml.stream.XMLInputFactory")
        parser = USE_JSTAX
      rescue java.lang.ClassNotFoundException
      end
    end
    unless parser
      begin
        require "jrexml"
        parser = USE_JREXML
      rescue LoadError
      end
    end
  else
    begin
      require "nokogiri"
      parser = USE_NOKOGIRI
    rescue LoadError
    end
    unless defined? JRUBY_VERSION
      unless parser
        begin
          require "xml"
          parser = USE_LIBXML
        rescue LoadError
        end
      end
    end
  end
  parser ||= USE_REXML
  parser
end

.best_available!Object

Sets the best available parser as the default



163
164
165
# File 'lib/marc/xmlreader.rb', line 163

def best_available!
  @@parser = best_available
end

.choose_parser(p) ⇒ Object

Raises:

  • (ArgumentError)


182
183
184
185
186
187
188
189
190
191
192
# File 'lib/marc/xmlreader.rb', line 182

def choose_parser(p)
  match = false
  constants.each do |const|
    next unless const.to_s.match?("^USE_")
    if const_get(const) == p
      match = true
      return p
    end
  end
  raise ArgumentError.new("Parser '#{p}' not defined") unless match
end

.jrexml!Object

Sets jrexml as the default parser



173
174
175
# File 'lib/marc/xmlreader.rb', line 173

def jrexml!
  @@parser = USE_JREXML
end

.nokogiri!Object

Sets Nokogiri as the default parser



168
169
170
# File 'lib/marc/xmlreader.rb', line 168

def nokogiri!
  @@parser = USE_NOKOGIRI
end

.parserObject

Returns the currently set parser type



97
98
99
# File 'lib/marc/xmlreader.rb', line 97

def parser
  @@parser
end

.parser=(p) ⇒ Object

Sets the class parser



112
113
114
# File 'lib/marc/xmlreader.rb', line 112

def parser=(p)
  @@parser = choose_parser(p)
end

.parsersObject

Returns an array of all the parsers available



102
103
104
105
106
107
108
109
# File 'lib/marc/xmlreader.rb', line 102

def parsers
  p = []
  constants.each do |const|
    next unless const.match?("^USE_")
    p << const
  end
  p
end

.rexml!Object

Sets REXML as the default parser



178
179
180
# File 'lib/marc/xmlreader.rb', line 178

def rexml!
  @@parser = USE_REXML
end