XMLObject

(This is inspired by Python’s xml_objectify)

XMLObject attempts to make the accessing of small, well-formed XML structures convenient, by using dot notation to represent both attributes and child elements whenever possible.

XML parsing libraries (in general) have interfaces that are useful when one is using XML for its intended purpose, but cumbersome when one always sends the same XML structure, and always process all of it in the same way. This one aims to be a bit different.

At the moment, I aim for compatibility with Ruby 1.8, 1.9, and JRuby 1.1.

Dependencies

None outside of Ruby, though some optional gems add additional features. See below on “Adapters” and “Collection pluralization”.

Installation instructions

gem install xml-object

Or from Github’s gem server:

gem install jordi-xml-object --source http://gems.github.com

Both are the same, and are loaded the same way:

require 'xml-object'

Example usage

<recipe name="bread" prep_time="5 mins" cook_time="3 hours">
  <title>Basic bread</title>
  <ingredient amount="8" unit="dL">Flour</ingredient>
  <ingredient amount="10" unit="grams">Yeast</ingredient>
  <ingredient amount="4" unit="dL" state="warm">Water</ingredient>
  <ingredient amount="1" unit="teaspoon">Salt</ingredient>
  <instructions easy="yes" hard="false">
    <step>Mix all ingredients together.</step>
    <step>Knead thoroughly.</step>
    <step>Cover with a cloth, and leave for one hour in warm room.</step>
    <step>Knead again.</step>
    <step>Place in a bread baking tin.</step>
    <step>Cover with a cloth, and leave for one hour in warm room.</step>
    <step>Bake in the oven at 180(degrees)C for 30 minutes.</step>
  </instructions>
</recipe>

require 'xml-object'
recipe = XMLObject.new io_with_recipe_xml_shown_above

recipe.name                      => "bread"
recipe.title                     => "Basic bread"

recipe.ingredients.is_a?(Array)  => true
recipe.ingredients.first.amount  => "8" # Not a Fixnum. Too hard. :(

recipe.instructions.easy?        => true

recipe.instructions.first.upcase => "MIX ALL INGREDIENTS TOGETHER."
recipe.instructions.steps.size   => 7

Motivation

XML is an extensible markup language. It is extensible because it is meant to define markup languages for any type of document, so new tags are needed depending on the problem domain.

Sometimes, however, XML ends up being used to solve a much simpler problem: the issue of passing a data-structure over the network, and/or between two different languages. Tools like JSON or YAML are a much better fit for this kind of job, but one doesn’t always have that luxury.

Caveats

The dot notation is used as follows. For the given file:

<outer id="root" name="foo">
  <name>Outer Element</name>
</outer>

outer.name is the name element. Child elements are always looked up first, then attributes. To access the attribute in the case of ambiguity, use outer[:attr => ‘name’].

outer.id is really Object#id, because all of the object methods are preserved (this is on purpose). To access the attribute id, use outer[:attr => ‘id’], or outer since there’s no element/attribute ambiguity.

Features & Problems

Adapters

XMLObject supports different adapters to do the actual XML parsing. It ships with REXML, Hpricot, JREXML, and LibXML adapters. By default, the REXML adapter is used.

To use a different adapter than the REXML default:

require 'xml-object'                  # Require XMLObject first
require 'xml-object/adapters/hpricot' # (Under MRI or JRuby)
require 'xml-object/adapters/libxml'  # (Under MRI only)
require 'xml-object/adapters/jrexml'  # (Under Jruby only)

Collection auto-folding

Similar to XmlSimple, XMLObject folds same named elements at the same level. For example:

<student>
  <name>Bob</name>
  <course>Math</course>
  <course>Biology</course>
</student>

student = XMLObject.new(xml_file)

student.course.is_a? Array       => true
student.course.first == 'Math'   => true
student.course.last  == 'Biology => true

Collection pluralization

With the same file from the Collection auto-folding section above, you also get this:

student.courses.first == student.course.first => true

Note that the pluralization algorithm is just tacking an ‘s’ at the end of the singular, unless ActiveSupport is installed, in which case you get irregular plurals, as well as the ability to teach the Inflector about new ones.

Collection proxy

Sometimes, collections are expressed with a container element in XML:

<student>
  <name>Bob</name>
  <courses>
    <course>Math</course>
    <course>Biology</course>
  </courses>
</student>

In this case, since the container element courses has no text element of its own, and it only has elements of one name under it, it delegates all methods it doesn’t contain to the collection below, so you get:

student.courses.collect { |c| c.downcase.to_sym } => [:math, :biology]

Question mark notation

Strings that look like booleans are “booleanized” if called by their question mark names (such as enabled?)

Recursive

The design of the adapters assumes parsing of the objects recursively. Deep files are bound to throw SystemStackError, but for the kinds of files I need to read, things are working fine so far. In any case, stream parsing is on the TODO list.

Incomplete

It most likely doesn’t work with a ton of features of complex XML files. I’ll always try to accomodate those, as long as they don’t make the basic usage more complex. As usual, patches welcome.

Copyright © 2008 Jordi Bunster, released under the MIT license