Class: Traject::MarcReader

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/traject/marc_reader.rb

Overview

  • "marc_source.type": serialization type. default 'binary'
    • "binary". standard ISO 2709 "binary" MARC format, will use ruby-marc MARC::Reader (Note, if you are using type 'binary', you probably want to also set 'marc_source.encoding')
    • "xml", MarcXML, will use ruby-marc MARC::XMLReader
    • "json" The "marc-in-json" format, encoded as newline-separated json. (synonym 'ndj'). A simplistic newline-separated json, with no comments allowed, and no unescpaed internal newlines allowed in the json objects -- we just read line by line, and assume each line is a marc-in-json. http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/ will use Traject::NDJReader which uses MARC::Record.new_from_hash.
  • "marc_source.encoding": Only used for marc_source.type 'binary', character encoding of the source marc records. Can be any encoding recognized by ruby, OR 'MARC-8'. For 'MARC-8', content will be transcoded (by ruby-marc) to UTF-8 in internal MARC::Record Strings. Default nil, meaning let MARC::Reader use it's default, which will be your system's Encoding.default_external, which will probably be UTF-8. (but may be something unexpected/undesired on Windows, where you may want to set this explicitly.) Right now Traject::MarcReader is hard-coded to transcode to UTF-8 as an internal encoding.
  • "marc_reader.xml_parser": For XML type, which XML parser to tell Marc::Reader to use. Anything recognized by Marc::Reader :parser argument. By default, asks Marc::Reader to take it's best guess as to highest performance available installed option. Probably best to leave as default.

Example

In a configuration file:

require 'traject/marc_reader'

settings do
  provide "reader_class_name", "Traject::MarcReader"
  provide "marc_source.type", "xml"
end

Constant Summary collapse

@@best_xml_parser =
MARC::XMLReader.best_available

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input_stream, settings) ⇒ MarcReader

Returns a new instance of MarcReader.


61
62
63
64
# File 'lib/traject/marc_reader.rb', line 61

def initialize(input_stream, settings)
  @settings = Traject::Indexer::Settings.new settings
  @input_stream = input_stream
end

Instance Attribute Details

#input_streamObject (readonly)

Returns the value of attribute input_stream


57
58
59
# File 'lib/traject/marc_reader.rb', line 57

def input_stream
  @input_stream
end

#settingsObject (readonly)

Returns the value of attribute settings


57
58
59
# File 'lib/traject/marc_reader.rb', line 57

def settings
  @settings
end

Instance Method Details

#each(*args, &block) ⇒ Object


86
87
88
# File 'lib/traject/marc_reader.rb', line 86

def each(*args, &block)
  self.internal_reader.each(*args, &block)
end

#internal_readerObject

Creates proper kind of ruby MARC reader, depending on settings or guesses.


68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# File 'lib/traject/marc_reader.rb', line 68

def internal_reader
  unless defined? @internal_reader
    @internal_reader =
      case settings["marc_source.type"]
      when "xml"
        parser = settings["marc_reader.xml_parser"] || @@best_xml_parser
        MARC::XMLReader.new(self.input_stream, :parser=> parser)
      when 'json'
        Traject::NDJReader.new(self.input_stream, settings)
      else
        args = { :invalid => :replace }
        args[:external_encoding] = settings["marc_source.encoding"]
        MARC::Reader.new(self.input_stream, args)
      end
  end
  return @internal_reader
end