DigestR - Fast XML Digester for Ruby

What is this?

DigestR is a fast rules-based XML processor (similar to the Jakarta Commons Digester) for Ruby, based upon the libxml-ruby Libxml2 binding. It’s SAX-based rules engine allows code to be executed based on patterns and individual elements in the source XML.

For the whys and wherefores of this kind of thing, see:

jakarta.apache.org/commons/digester

Prerequisites

DigestR requires Libxml-Ruby (libxml.rubyforge.org) version 0.3.7 (2005/4/14, developmental) / 0.3.8 (release) or later. If not installing via Rubygems (or if you do not wish to auto-install dependencies) you will need to install it prior to installing DigestR.

Installation …

… with RubyGems?

If you have RubyGems, you can install DigestR by simply issuing the command:

gem install -r digestr Which should download the latest version and install it. If you experience problems, or wish to perform an offline installation, then simply download the .gem file from the FRS, and execute the gem command from within the same directory.

Note if auto-installing dependencies: The libxml-ruby gem includes native extensions, and will require a sane build environment on the installation machine. If you experience problems with the libxml install you may need to install manually with additional extconf options - see the libxml-ruby guide (libxml.rubyforge.org/install.html) for more information.

… with install.rb?

If you don’t have RubyGems, you can install from one of the tarball or zip packages, using the following command:

ruby install.rb from the unpacked root directory. This will copy the libaries to the appropriate place for your ruby installation.

Did it work?

With that done, you should be able to run:

ruby [-rubygems] -rxml/digestr -e ‘puts XML::Digester::VERSION’ to verify that the installation succeeded and the library can be loaded by ruby.

How do I use it?

Please see the API reference for usage information. The latest version can be found online at digestr.rubyforge.org/ , and documentation source for a specific release is included in the release package.

The RDoc can be built by running ‘rake doc’ in the source directory.

How fast is ‘fast’?

Currently, ‘fast’ is a relative term - there is certainly room for improvement in DigestR itself, though the fact that it’s based on the (native) libxml2 ruby binding gives a good burst of speed and I think DigestR should be fast enough for most uses. To give an idea, here are some informal benchmarks run against the REXML-based xmldigester (rubyforge.org/projects/xmldigester/) using the addressbook example included with that package (over 500 runs):

###### ORIGINAL TWO-PERSON ADDRESSBOOK ###### ###### XMLDIGESTER ###### user system total real 2.860000 0.170000 3.030000 ( 3.097948) user system total real 2.820000 0.160000 2.980000 ( 3.061908) ###### DIGESTR ###### user system total real 0.980000 0.070000 1.050000 ( 1.118739) user system total real 0.970000 0.060000 1.030000 ( 1.089957) ###### TWENTY-PERSON ADDRESSBOOK ###### ###### XMLDIGESTER ###### user system total real 23.000000 0.990000 23.990000 ( 24.265204) user system total real 22.610000 1.010000 23.620000 ( 23.936342) ###### DIGESTR ###### user system total real 8.880000 0.140000 9.020000 ( 9.144904) user system total real 8.930000 0.140000 9.070000 ( 9.227588)

Notes

A note about version numbers

DigestR uses odd/even numbers for development/release versions. When the final version component is odd, the package is an ‘unofficial’ build - generally this means built manually from source, during development. These will never be distributed, and there’s no guarantee that any two packages with the same development version will actually be the same. These packages will have no corresponding SCM tag.

Even numbers always denote ‘official’ releases, which are released on RubyForge and tagged as such in SCM. These packages can be trusted to exhibit version consistency.

If you are bundling DigestR with your product, please ensure you use an official release version whenever possible. If you must use a developmental version, please modify the package version to reflect the fact that it is a custom build (e.g. 0.1.3-mycompany-20051021) to prevent inconsistent development packages from escaping into the wild.

Further information

DigestR is developed by Ross Bamford (rosco <at> roscopeco.co.uk), with help from the developers listed in CONTRIBUTORS. Any bugs are probably all his own.

As you may have guessed, DigestR’s hosting and development services are provided by RubyForge.org - many thanks to Tom Copeland and all concerned.

Thanks also to Yukihiro Matsumoto for a consistently amazing platform, and all those who write and contribute to the libraries DigestR depends on.