AXML

AXML - Provides a simple, minimalistic DOM for working with data stored in an XML document. The API is very similar to LibXML, differing slightly in the handling of text nodes. It is designed with very large documents in mind: nodes are represented in memory efficient Struct objects and it works with either XMLParser or LibXML!

‘AXML’ literally translates into ‘ax XML’ which succinctly describes the occasional feeling of a programmer towards XML or its myriad parsers. AXML won’t solve all your XML woes, but it does make working with XML much less painful.

Overview

  • fast: runs on either XMLParser or LibXML

  • lean: as in ‘lines of code’ and as in ‘memory consumption’

Examples

require 'axml'  

# a little example xml string to use
string = "
<n1>
  <n2 size='big'>
    <n3>words here</n3>
    <n3></n3>
  </n2>
  <n2 size='small' >
    <n3 id='3' thinks='out loud'></n3>
  </n2>
</n1>
"

Read a string, io, or file

n1_node = AXML.parse(string)          # <- can read xml as string
n1_node = AXML.parse(io)              # <- can read an io object
n1_node = AXML.parse('path/to/file')  # <- can read a file

Access children

n1_node.children # -> [array]
n1_node.each {|child|  # do something with each child }

Traverse the whole tree structure

n1_node.traverse do |node|
  # pre traversal
end

n1_node.traverse(:post) {|node| # post traversal }

Get attributes and text

n2_node['size'] == 'big'
n3_node = n2_node.child
n3_node.text    # -> 'words here'
n3_node.content # -> 'words here'

Navigate nodes

n2_node = n1_node.child
the_other_n2_node = n2_node.next
the_other_n2_node.next = nil

Does a little xpath

# find_first (returns the first node)
n3_node = n1_node.find_first('descendant::n3')
other_n3_node = n3_node.find_first('following-sibling::n3')
n1_node.find_first('child::n3')    # -> nil
# also callable as find_first_child and find_first_descendant

# find (returns an array)
n1_node.find('child::n2')          # -> [array of 2 <n2> nodes]
n1_node.find('descendant::n3')     # -> [array of all 3 <n3> nodes]
# also callable as find_child and find_descendant

Manipulate tree structure

node.drop  # drop the node from its parents
## (insert?)

Output

XML Output is currently tested only with XMLParser.

node.to_s             # -> formatted xml
node.to_doc           # -> with xml header line
node.to_doc(filename) # -> written to filename

See ‘spec/` dir for more examples and functionality

Details

If using XMLParser, builds nodes out of Struct objects (AXML::El). Currently only parses elements, attributes, and text(content) (no CDATA right now).

If using LibXML, it uses the underlying LibXML nodes already available. It overrides some methods to treat the text in a text node as the parent node’s text attribute.

Warnings

Output of xml (i.e., node#to_s) under LibXML is untested (and probably buggy) since the node text behavor has been modified. Will work it out in future release.

Doesn’t parse CDATA using XMLParser right now.

Installation

gem install axml

Can get instructions on installing XMLParser and LibXML by issuing this command:

ruby -rubygems -e 'require "axml"; puts AXML::Autoload.install_instructions(:all)'

See Also

If you are parsing HTML or complex word processing documents this is not the parser for you. Try something like hpricot or LibXML.