Class: ONIX::Reader

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/onix/reader.rb

Overview

This is the primary class for reading data from an ONIX file, and there’s really not much to it

Each file should contain a single header, and 1 or more products:

reader = ONIX::Reader.new("somefile.xml")

puts reader.header.inspect

reader.each do |product|
  puts product.inspect
end

The header will be returned as an ONIX::Header object, and the product will be an ONIX::Product.

The ONIX::Product class can be a bit of a hassle to work with, as data can be nested in it fairly deeply. To wrap all the products returned by the reader in a shim that provides simple accessor access to common attributes, pass the shim class as a second argument

reader = ONIX::Reader.new("somefile.xml", :product_class => ONIX::APAProduct)

puts reader.header.inspect

reader.each do |product|
  puts product.inspect
end

APAProduct stands for Australian Publishers Association and provides simple access to the ONIX attributes that are commonly used in the Australian market.

As well as accessing the file header, there are handful of other read only attributes that might be useful

reader = ONIX::Reader.new("somefile.xml")

puts reader.version
puts reader.xml_lang
puts reader.xml_version
puts reader.encoding

The version attribute is particuarly useful. There are multiple revisions of the ONIX spec, and you may need to handle the file differently based on what version it is.

File Encoding

ONIX::Reader returns all strings as UTF-8. Source file encoding is detected by the encoding declaration at the top of the file, like so:

<?xml version="1.0" encoding="iso-8859-1"?>

If the encoding declaration is missing the file is assumed to be UTF-8.

If the encoding declaration is missing or wrong and the file isn’t UTF-8, you can manually set or override it like so:

reader = ONIX::Reader.new("somefile.xml", :encoding => "iso-8859-1")

If the file contains invalid bytes for the source encoding an exception will be raised. This isn’t ideal, but I’m still looking for ways to make this behaviour configurable.

If you’re running 1.9, you might imagine passing an IO stream that auto transcodes to UTF-8 into ONIX::Reader might have the same effect, but that isn’t the case. Nokogiri is used to parse the file, and it seems to ignore IO encoding and just read raw bytes.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input, *args) ⇒ Reader

Returns a new instance of Reader.



81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
# File 'lib/onix/reader.rb', line 81

def initialize(input, *args)
  opts = args.last.kind_of?(Hash) ? args.pop : {}
  if args.size > 0
    ActiveSupport::Deprecation.warn("Passing a klass as ONIX::Reader's second argument is deprecated, use the :product_class option instead", caller)
  end
  @product_klass = opts[:product_class] || args.pop || ::ONIX::Product

  if input.kind_of?(String)
    @file   = File.open(input, "r")
    @reader = Nokogiri::XML::Reader(@file, nil, opts[:encoding]) { |cfg| cfg.dtdload.noent }
  elsif input.kind_of?(IO)
    @reader = Nokogiri::XML::Reader(input, nil, opts[:encoding]) { |cfg| cfg.dtdload.noent }
  else
    raise ArgumentError, "Unable to read from file or IO stream"
  end

  @release = find_release
  @header = find_header

  @xml_lang    ||= @reader.lang
  @xml_version ||= @reader.xml_version.to_f
end

Instance Attribute Details

#headerObject (readonly)

Returns the value of attribute header.



79
80
81
# File 'lib/onix/reader.rb', line 79

def header
  @header
end

#releaseObject (readonly)

Returns the value of attribute release.



79
80
81
# File 'lib/onix/reader.rb', line 79

def release
  @release
end

Instance Method Details

#closeObject



119
120
121
# File 'lib/onix/reader.rb', line 119

def close
  @reader.close if @reader
end

#each(&block) ⇒ Object

Iterate over all the products in an ONIX file



106
107
108
109
110
111
112
113
114
115
116
117
# File 'lib/onix/reader.rb', line 106

def each(&block)
  @reader.each do |node|
    if @reader.node_type == 1 && @reader.name == "Product"
      str = @reader.outer_xml
      if str.nil?
        yield @product_klass.new
      else
        yield @product_klass.from_xml(str)
      end
    end
  end
end