Class: ONIX::Normaliser

Inherits:
Object
  • Object
show all
Defined in:
lib/onix/normaliser.rb

Overview

A standalone class that can be used to normalise ONIX files into a standardised form. If you’re accepting ONIX files from a wide range of suppliers, you’re guarunteed to get all sorts of dialects.

This will create a new file that:

  • is UTF-8 encoded

  • uses reference tags, not short

  • has no named entities (ndash, etc) other than & < and >

Usage:

ONIX::Normaliser.process("oldfile.xml", "newfile.xml")

Dependencies:

At this stage the class depends on several external apps, all commonly available on *nix systems: xsltproc, isutf8, iconv and sed

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(oldfile, newfile) ⇒ Normaliser

Returns a new instance of Normaliser.

Raises:

  • (ArgumentError)


39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/onix/normaliser.rb', line 39

def initialize(oldfile, newfile)
  raise ArgumentError, "#{oldfile} does not exist" unless File.file?(oldfile)
  raise ArgumentError, "#{newfile} already exists" if File.file?(newfile)
  raise "xsltproc app not found" unless app_available?("xsltproc")
  raise "tr app not found"       unless app_available?("tr")

  @oldfile = oldfile
  @newfile = newfile
  @curfile = next_tempfile
  FileUtils.cp(@oldfile, @curfile)
  @head    = File.open(@oldfile, "r") { |f| f.read(1024) }
end

Class Method Details

.process(oldfile, newfile) ⇒ Object

normalise oldfile and save it as newfile. oldfile will be left untouched



34
35
36
# File 'lib/onix/normaliser.rb', line 34

def process(oldfile, newfile)
  self.new(oldfile, newfile).run
end

Instance Method Details

#app_available?(app) ⇒ Boolean

check the specified app is available on the system

Returns:

  • (Boolean)


72
73
74
# File 'lib/onix/normaliser.rb', line 72

def app_available?(app)
  `which #{app}`.strip == "" ? false : true
end

#next_tempfileObject

generate a temp filename



78
79
80
81
82
83
84
85
# File 'lib/onix/normaliser.rb', line 78

def next_tempfile
  p = nil
  Tempfile.open("onix") do |tf|
    p = tf.path
    tf.close!
  end
  p
end

#remove_control_chars(src, dest) ⇒ Object

XML files shouldn’t contain low ASCII control chars. Strip them.



102
103
104
105
106
# File 'lib/onix/normaliser.rb', line 102

def remove_control_chars(src, dest)
  inpath = File.expand_path(src)
  outpath = File.expand_path(dest)
  `cat #{inpath} | tr -d "\\000-\\010\\013\\014\\016-\\037" > #{outpath}`
end

#runObject



52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# File 'lib/onix/normaliser.rb', line 52

def run
  # remove short tags
  if @head.include?("ONIXmessage")
    dest = next_tempfile
    to_reference_tags(@curfile, dest)
    @curfile = dest
  end

  # remove control chars
  dest = next_tempfile
  remove_control_chars(@curfile, dest)
  @curfile = dest

  FileUtils.cp(@curfile, @newfile)
end

#to_reference_tags(src, dest) ⇒ Object

uses an XSLT stylesheet provided by edituer to convert a file from short tags to long tags.

more detail here:

http://www.editeur.org/files/ONIX%203/ONIX%20tagname%20converter%20v2.htm


93
94
95
96
97
98
# File 'lib/onix/normaliser.rb', line 93

def to_reference_tags(src, dest)
  inpath = File.expand_path(src)
  outpath = File.expand_path(dest)
  xsltpath = File.dirname(__FILE__) + "/../../support/switch-onix-2.1-short-to-reference.xsl"
  `xsltproc -o #{outpath} #{xsltpath} #{inpath}`
end