Introduction and purpose

Moxml provides a unified, modern XML processing interface for Ruby applications. It offers a consistent API that abstracts away the underlying XML implementation details while maintaining high performance through efficient node mapping and native XPath querying.

Key features:

  • Intuitive, Ruby-idiomatic API for XML manipulation

  • Consistent interface across different XML libraries

  • Efficient node mapping for XPath queries

  • Support for all XML node types and features

  • Easy switching between XML processing engines

  • Clean separation between interface and implementation

Supported XML libraries

General

Moxml supports the following XML libraries:

REXML

REXML, a pure Ruby XML parser distributed with standard Ruby. Not the fastest, but always available.

Nokogiri

(default) Nokogiri, a widely used implementation which wraps around the performant libxml2 C library.

Oga

Oga, a pure Ruby XML parser. Recommended when you need a pure Ruby solution say for Opal.

Ox

Ox, a fast XML parser.

LibXML

libxml-ruby, Ruby bindings for the performant libxml2 C library. Alternative to Nokogiri with similar performance characteristics.

Feature table

Moxml exercises its best effort to provide a consistent interface across basic XML features, various XML libraries have different features and capabilities.

The following table summarizes the features supported by each library.

Note
The checkmarks indicate support for the feature, while the footnotes provide additional context for specific features.
Feature Nokogiri Oga REXML LibXML Ox

HeadedOx

Parsing, serializing

SAX parsing

✅ Full (10/10 events)

✅ Full (10/10 events)

✅ Full (10/10 events)

✅ Full (10/10 events)

⚠️ Core (4/10 events) See NOTE 7.

⚠️ Core (4/10 events) See NOTE 7.

Node manipulation

✅ See NOTE 1.

✅ See NOTE 1.

Basic XPath

Uses Ox-specific API locate. See NOTE 2.

✅ Full XPath 1.0. See NOTE 3.

XPath with namespaces

Uses Ox-specific API locate. See NOTE 2.

⚠️ Basic. See NOTE 3.

Note
Ox/HeadedOx: Text node replacement may fail in some cases due to internal node structure.
Note
Limited XPath support via locate() method. See adapter limitations section.
Note
HeadedOx provides full XPath 1.0 support via a pure Ruby XPath engine layered on top of Ox’s C parser. See HeadedOx documentation for details.
Note
Ox/HeadedOx SAX: Only core events supported (start_element, end_element, characters, errors). No separate CDATA, comment, or processing instruction events.

Adapter comparison

Feature compatibility matrix

Feature/Operation Nokogiri Oga REXML LibXML Ox HeadedOx

Core Operations

Parse XML string

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Parse XML file/IO

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Serialize to XML

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Element Operations

Create elements

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Get/set attributes

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Add/remove children

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Replace nodes

✅ Full

✅ Full

✅ Full

✅ Full

⚠️ Limited1

⚠️ Limited1

Namespace Operations

Add namespaces

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Default namespaces

✅ Full

✅ Full

✅ Full

✅ Full

⚠️ Basic

⚠️ Basic

Namespace inheritance

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

❌ None5

Namespaced attributes

✅ Full

✅ Full

✅ Full

✅ Full

⚠️ Limited

⚠️ Limited5

XPath Queries

Basic paths (//element)

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Attribute predicates ([@id])

✅ Full

✅ Full

✅ Full

✅ Full

⚠️ Existence only2

✅ Full

Attribute values ([@id='123'])

✅ Full

✅ Full

✅ Full

✅ Full

❌ None3

✅ Full

Logical operators ([@a and @b])

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

✅ Full

Position predicates ([1], [last()])

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

✅ Full

Text predicates ([text()='x'])

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

✅ Full

Namespace-aware queries

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

⚠️ Basic5

Parent axis (..)

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

✅ Full

Sibling axes

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

❌ None5

XPath functions (count(), etc.)

✅ Full

✅ Full

✅ Full

✅ Full

❌ None

✅ All 27

Special Content

CDATA sections

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Comments

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Processing instructions

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

DOCTYPE declarations

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

✅ Full

Performance

Parse speed

Fast

Fast

Medium

Fast

Very Fast

Very Fast

Serialize speed

Fast

Fast

Medium

Medium

Very Fast

Very Fast

Memory usage

Good

Medium

Medium

Good

Excellent

Excellent

Thread safety

✅ Yes

✅ Yes

✅ Yes

✅ Yes

✅ Yes

✅ Yes

+ 1 Ox/HeadedOx: Text node replacement may fail in some cases due to internal node structure
2 Ox: //book[@id] works (returns all book elements), but doesn’t filter by attribute existence
3 HeadedOx: Full XPath 1.0 with all 27 functions and 6 axes. Pure Ruby XPath engine on Ox’s C parser. 99.20% pass rate. See docs/headed-ox.adoc
4 Ox: Use .find { |el| el["id"] == "123" } instead of XPath attribute value predicates
5 HeadedOx limitations: Namespace introspection and 7 axes not implemented. See docs/HEADED_OX_LIMITATIONS.md

Adapter selection guide

Choose Nokogiri when:

  • You need industry-standard compatibility

  • Large community support is important

  • C extension performance is acceptable

  • Cross-platform deployment is required

Choose Oga when:

  • Pure Ruby environment is required (JRuby, TruffleRuby)

  • Best test coverage is needed (98%)

  • No C extensions are allowed

  • Memory usage is not the primary concern

Choose REXML when:

  • Standard library only (no external gems)

  • Maximum portability is required

  • Small to medium documents

  • Deployment simplicity is critical

Choose LibXML when:

  • Alternative to Nokogiri is desired

  • Full namespace support is required

  • Good performance with correctness

  • Native C extension is acceptable

Choose Ox when:

  • Maximum parsing speed is critical

  • Simple document structures (limited nesting)

  • XPath usage is minimal or absent

  • Memory efficiency is paramount

Choose HeadedOx when:

  • Need Ox’s fast parsing with full XPath support

  • Want comprehensive XPath 1.0 features (functions, predicates)

  • Prefer pure Ruby XPath implementation for debugging

  • Need more XPath capabilities than standard Ox provides

  • Memory efficiency is important but XPath features are required

Caution
Ox’s custom XPath engine supports common patterns but cannot handle complex XPath expressions. Test thoroughly if your use case requires advanced XPath.

TODO: We should throw errors when unsupported XPath features are used with Ox or HeadedOx to prevent silent failures.

Getting started

Installation

Install the gem and at least one supported XML library:

# In your Gemfile
gem 'moxml'
gem 'nokogiri'  # Or 'oga', 'rexml', 'ox', or 'libxml-ruby'

Basic document creation

doc = Moxml.new.create_document

# Add XML declaration
doc.add_child(doc.create_declaration("1.0", "UTF-8"))

# Create root element with namespace
root = doc.create_element('book')
root.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
doc.add_child(root)

# Add content
title = doc.create_element('dc:title')
title.text = 'XML Processing with Ruby'
root.add_child(title)

# Output formatted XML
puts doc.to_xml(indent: 2)

Real-world examples

Practical, runnable examples demonstrating Moxml usage in common scenarios are available in the examples directory.

These examples include:

RSS Parser

Parse RSS/Atom feeds with XPath queries and namespace handling

Web Scraper

Extract data from HTML/XML using DOM navigation and table parsing

API Client

Build and parse XML API requests/responses with SOAP

Each example is:

  • Fully documented with detailed README

  • Self-contained and runnable

  • Demonstrates best practices

  • Includes sample data files

  • Shows comprehensive error handling

Run any example directly:

ruby examples/rss_parser/rss_parser.rb
ruby examples/web_scraper/web_scraper.rb
ruby examples/api_client/api_client.rb

See the examples README for complete documentation and learning paths.

Working with documents

Using the builder pattern

The builder pattern provides a clean DSL for creating XML documents:

doc = Moxml::Builder.new(Moxml.new).build do
  declaration version: "1.0", encoding: "UTF-8"

  element 'library', xmlns: 'http://example.org/library' do
    element 'book' do
      element 'title' do
        text 'Ruby Programming'
      end

      element 'author' do
        text 'Jane Smith'
      end

      comment 'Publication details'
      element 'published', year: '2024'

      cdata '<custom>metadata</custom>'
    end
  end
end

Direct document manipulation

doc = Moxml.new.create_document

# Add declaration
doc.add_child(doc.create_declaration("1.0", "UTF-8"))

# Create root with namespace
root = doc.create_element('library')
root.add_namespace(nil, 'http://example.org/library')
root.add_namespace("dc", "http://purl.org/dc/elements/1.1/")
doc.add_child(root)

# Add elements with attributes
book = doc.create_element('book')
book['id'] = 'b1'
book['type'] = 'technical'
root.add_child(book)

# Add mixed content
book.add_child(doc.create_comment('Book details'))
title = doc.create_element('title')
title.text = 'Ruby Programming'
book.add_child(title)

Fluent interface API

Moxml provides a fluent, chainable API for improved developer experience:

element = doc.create_element('book')
  .set_attributes(id: "123", type: "technical")
  .with_namespace("dc", "http://purl.org/dc/elements/1.1/")
  .with_child(doc.create_element("title"))

For complete fluent API documentation including all chainable methods, convenience methods, and practical examples, see Working with Documents Guide.

SAX (Event-Driven) Parsing

SAX (Simple API for XML) provides memory-efficient, event-driven XML parsing for large documents.

When to use SAX:

  • Processing very large XML files (>100MB)

  • Memory-constrained environments

  • Streaming data extraction

  • Need to process data as it arrives

Quick example:

class BookExtractor < Moxml::SAX::ElementHandler
  attr_reader :books

  def initialize
    super
    @books = []
  end

  def on_start_element(name, attributes = {}, namespaces = {})
    super
    @books << { id: attributes["id"] } if name == "book"
  end
end

handler = BookExtractor.new
Moxml.new.sax_parse(xml_string, handler)
puts handler.books.inspect

For complete SAX documentation including all handler types, event methods, adapter support, and best practices, see SAX Parsing Guide.

XML objects and their methods

For complete node API reference including traversal methods, manipulation, queries, type checking, and node information, see Node API Reference.

Node identity

Moxml provides a consistent #identifier method across all node types to safely identify nodes:

element = doc.at_xpath("//book")
puts element.identifier  # => "book"

attr = element.attribute("id")
puts attr.identifier     # => "id"

The #identifier method returns the primary identifier for each node type (tag name for elements, attribute name for attributes, target for processing instructions, or nil for content nodes).

Important
Always use type-safe patterns when working with mixed node types. See the Node API Consistency Guide for complete documentation on safe coding patterns, API surface by node type, and migration guidelines.

Advanced features

XPath querying

Moxml provides efficient XPath querying with consistent node mapping:

# Find all book elements
books = doc.xpath('//book')

# Find with namespaces
titles = doc.xpath('//dc:title', 'dc' => 'http://purl.org/dc/elements/1.1/')

# Find first matching node
first_book = doc.at_xpath('//book')

Namespace handling

# Add namespace to element
element.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')

# Create element in namespace
title = doc.create_element('dc:title')

For complete documentation on XPath querying, namespace handling, and accessing native implementations, see Advanced Features Guide.

Error handling

Moxml provides comprehensive error classes with enhanced context for debugging:

begin
  doc = Moxml.new.parse(xml_string, strict: true)
  results = doc.xpath("//book[@id='123']")
rescue Moxml::ParseError => e
  puts "Parse failed at line #{e.line}: #{e.message}"
rescue Moxml::XPathError => e
  puts "XPath error: #{e.expression}"
rescue Moxml::Error => e
  puts "XML processing error: #{e.message}"
end

For complete error class hierarchy, error types, best practices, and debugging techniques, see Error Handling Guide.

Configuration

Moxml can be configured globally or per instance:

# Global configuration
Moxml.configure do |config|
  config.default_adapter = :nokogiri
  config.strict = true
  config.encoding = 'UTF-8'
end

# Instance configuration
context = Moxml.new do |config|
  config.adapter = :oga
  config.strict = false
end

For all configuration options, adapter selection, serialization options, and environment-based configuration, see Configuration Guide.

Thread safety

For complete information on thread-safe patterns, context management, and concurrent processing, see the Thread Safety Guide.

Performance considerations

For detailed performance optimization strategies, memory management best practices, and efficient querying patterns, see the Performance Considerations Guide.

Best practices

For comprehensive best practices covering XPath queries, adapter selection, error handling, namespace handling, memory management, thread safety, performance optimization, and testing strategies, see Best Practices Guide.

Specific adapter limitations

Ox adapter

The Ox adapter provides maximum parsing speed but has XPath limitations.

XPath limitations:

  • No attribute value predicates: //book[@id='123']

  • No logical operators, position predicates, text predicates ❌

  • No namespace queries, parent axis, sibling axes ❌

  • No XPath functions ❌

Workaround: Use Ruby enumerable methods:

# Instead of: doc.xpath("//book[@id='123']")
doc.xpath("//book").find { |book| book["id"] == "123" }

For complete Ox adapter documentation including all limitations and workarounds, see Ox Adapter Guide.

HeadedOx adapter

The HeadedOx adapter combines Ox’s fast C-based XML parsing with Moxml’s comprehensive pure Ruby XPath 1.0 engine.

Status: Production-ready v1.2 (99.20% pass rate, 1,992/2,008 tests)

Key features:

  • Fast XML parsing (Ox C extension)

  • All 27 XPath 1.0 functions

  • 6 XPath axes (child, descendant, parent, attribute, self, descendant-or-self)

  • Expression caching for performance

  • Pure Ruby XPath engine (debuggable)

When to use:

  • Need Ox’s fast parsing with comprehensive XPath

  • Want XPath functions (count, sum, contains, etc.)

  • Prefer pure Ruby XPath for debugging

  • Basic namespace queries are sufficient

# Use HeadedOx adapter
context = Moxml.new(:headed_ox)
doc = context.parse(xml_string)

# Full XPath 1.0 support
books = doc.xpath('//book[@price < 20]')
count = doc.xpath('count(//book)')
titles = doc.xpath('//book/title[contains(., "Ruby")]')

For complete HeadedOx documentation including architecture, XPath capabilities, known limitations, and usage examples, see HeadedOx Adapter Guide and Limitations Documentation.

LibXML adapter

Performance:

  • Serialization speed: ~120 ips (slower than target)

  • Parsing speed: Good

  • For high-throughput serialization, consider Ox or Nokogiri

Other adapters

Nokogiri, Oga, REXML:

All three adapters have near-complete feature support with only minor edge case limitations. Use these adapters when you need full XPath and namespace support.

Development and testing

For complete information on development setup, testing strategies, benchmarking, and coverage reporting, see the Development and Testing Guide.

Contributing

  1. Fork the repository

  2. Create your feature branch (git checkout -b feature/my-new-feature)

  3. Commit your changes (git commit -am 'Add some feature')

  4. Push to the branch (git push origin feature/my-new-feature)

  5. Create a new Pull Request

License

Copyright Ribose.

This project is licensed under the Ribose 3-Clause BSD License. See the LICENSE.md file for details.