- Introduction and purpose
- Supported XML libraries
- Adapter comparison
- Getting started
- Real-world examples
- Working with documents
- XML objects and their methods
- Advanced features
- Error handling
- Configuration
- Thread safety
- Performance considerations
- Best practices
- Specific adapter limitations
- Development and testing
- Contributing
- License
Introduction and purpose
Moxml provides a unified, modern XML processing interface for Ruby applications. It offers a consistent API that abstracts away the underlying XML implementation details while maintaining high performance through efficient node mapping and native XPath querying.
Key features:
-
Intuitive, Ruby-idiomatic API for XML manipulation
-
Consistent interface across different XML libraries
-
Efficient node mapping for XPath queries
-
Support for all XML node types and features
-
Easy switching between XML processing engines
-
Clean separation between interface and implementation
Supported XML libraries
Moxml supports the following XML libraries:
- REXML
-
REXML, a pure Ruby XML parser distributed with standard Ruby. Not the fastest, but always available.
- Nokogiri
-
(default) Nokogiri, a widely used implementation which wraps around the performant libxml2 C library.
- Oga
-
Oga, a pure Ruby XML parser. Recommended when you need a pure Ruby solution say for Opal.
- Ox
-
Ox, a fast XML parser.
- LibXML
-
libxml-ruby, Ruby bindings for the performant libxml2 C library. Alternative to Nokogiri with similar performance characteristics.
Feature table
Moxml exercises its best effort to provide a consistent interface across basic XML features, various XML libraries have different features and capabilities.
The following table summarizes the features supported by each library.
|
Note
|
The checkmarks indicate support for the feature, while the footnotes provide additional context for specific features. |
| Feature | Nokogiri | Oga | REXML | LibXML | Ox |
|---|---|---|---|---|---|
HeadedOx |
Parsing, serializing |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
SAX parsing |
✅ Full (10/10 events) |
✅ Full (10/10 events) |
✅ Full (10/10 events) |
✅ Full (10/10 events) |
⚠️ Core (4/10 events) See NOTE 7. |
⚠️ Core (4/10 events) See NOTE 7. |
Node manipulation |
✅ |
✅ |
✅ |
✅ |
✅ See NOTE 1. |
✅ See NOTE 1. |
Basic XPath |
✅ |
✅ |
✅ |
✅ |
Uses Ox-specific API |
✅ Full XPath 1.0. See NOTE 3. |
XPath with namespaces |
✅ |
✅ |
❌ |
✅ |
Uses Ox-specific API |
⚠️ Basic. See NOTE 3. |
|
Note
|
Ox/HeadedOx: Text node replacement may fail in some cases due to internal node structure. |
|
Note
|
Limited XPath support via locate() method. See adapter limitations
section.
|
|
Note
|
HeadedOx provides full XPath 1.0 support via a pure Ruby XPath engine layered on top of Ox’s C parser. See HeadedOx documentation for details. |
|
Note
|
Ox/HeadedOx SAX: Only core events supported (start_element, end_element, characters, errors). No separate CDATA, comment, or processing instruction events. |
Adapter comparison
Feature compatibility matrix
| Feature/Operation | Nokogiri | Oga | REXML | LibXML | Ox | HeadedOx |
|---|---|---|---|---|---|---|
Core Operations |
||||||
Parse XML string |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Parse XML file/IO |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Serialize to XML |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Element Operations |
||||||
Create elements |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Get/set attributes |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Add/remove children |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Replace nodes |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
⚠️ Limited1 |
⚠️ Limited1 |
Namespace Operations |
||||||
Add namespaces |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Default namespaces |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
⚠️ Basic |
⚠️ Basic |
Namespace inheritance |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
❌ None5 |
Namespaced attributes |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
⚠️ Limited |
⚠️ Limited5 |
XPath Queries |
||||||
Basic paths ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Attribute predicates ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
⚠️ Existence only2 |
✅ Full |
Attribute values ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None3 |
✅ Full |
Logical operators ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
✅ Full |
Position predicates ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
✅ Full |
Text predicates ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
✅ Full |
Namespace-aware queries |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
⚠️ Basic5 |
Parent axis ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
✅ Full |
Sibling axes |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
❌ None5 |
XPath functions ( |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
❌ None |
✅ All 27 |
Special Content |
||||||
CDATA sections |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Comments |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
Processing instructions |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
✅ Full |
DOCTYPE declarations |
✅ Full |
✅ Full |
✅ Full |
⚠️ Limited4 |
✅ Full |
✅ Full |
Performance |
||||||
Parse speed |
Fast |
Fast |
Medium |
Fast |
Very Fast |
Very Fast |
Serialize speed |
Fast |
Fast |
Medium |
Medium |
Very Fast |
Very Fast |
Memory usage |
Good |
Medium |
Medium |
Good |
Excellent |
Excellent |
Thread safety |
✅ Yes |
✅ Yes |
✅ Yes |
✅ Yes |
✅ Yes |
✅ Yes |
1 Ox/HeadedOx: Text node replacement may fail in some cases due to internal node structure
2 Ox: //book[@id] works (returns all book elements), but doesn’t filter by attribute existence
3 HeadedOx: Full XPath 1.0 with all 27 functions and 6 axes. Pure Ruby XPath engine on Ox’s C parser. 99.20% pass rate. See docs/headed-ox.adoc
4 Ox: Use .find { |el| el["id"] == "123" } instead of XPath attribute value predicates
5 LibXML: DOCTYPE parsing works, serialization is limited (no round-trip preservation)
6 HeadedOx limitations: Namespace introspection and 7 axes not implemented. See docs/HEADED_OX_LIMITATIONS.md
Adapter selection guide
Choose Nokogiri when:
-
You need industry-standard compatibility
-
Large community support is important
-
C extension performance is acceptable
-
Cross-platform deployment is required
Choose Oga when:
-
Pure Ruby environment is required (JRuby, TruffleRuby)
-
Best test coverage is needed (98%)
-
No C extensions are allowed
-
Memory usage is not the primary concern
Choose REXML when:
-
Standard library only (no external gems)
-
Maximum portability is required
-
Small to medium documents
-
Deployment simplicity is critical
Choose LibXML when:
-
Alternative to Nokogiri is desired
-
Full namespace support is required
-
Good performance with correctness
-
Native C extension is acceptable
Choose Ox when:
-
Maximum parsing speed is critical
-
Simple document structures (limited nesting)
-
XPath usage is minimal or absent
-
Memory efficiency is paramount
Choose HeadedOx when:
-
Need Ox’s fast parsing with full XPath support
-
Want comprehensive XPath 1.0 features (functions, predicates)
-
Prefer pure Ruby XPath implementation for debugging
-
Need more XPath capabilities than standard Ox provides
-
Memory efficiency is important but XPath features are required
|
Caution
|
Ox’s custom XPath engine supports common patterns but may not handle complex XPath expressions. Test thoroughly if your use case requires advanced XPath. |
Getting started
Installation
Install the gem and at least one supported XML library:
# In your Gemfile
gem 'moxml'
gem 'nokogiri' # Or 'oga', 'rexml', 'ox', or 'libxml-ruby'
Basic document creation
doc = Moxml.new.create_document
# Add XML declaration
doc.add_child(doc.create_declaration("1.0", "UTF-8"))
# Create root element with namespace
root = doc.create_element('book')
root.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
doc.add_child(root)
# Add content
title = doc.create_element('dc:title')
title.text = 'XML Processing with Ruby'
root.add_child(title)
# Output formatted XML
puts doc.to_xml(indent: 2)
Real-world examples
Practical, runnable examples demonstrating Moxml usage in common scenarios are available in the examples directory.
These examples include:
- RSS Parser
-
Parse RSS/Atom feeds with XPath queries and namespace handling
- Web Scraper
-
Extract data from HTML/XML using DOM navigation and table parsing
- API Client
-
Build and parse XML API requests/responses with SOAP
Each example is:
-
Fully documented with detailed README
-
Self-contained and runnable
-
Demonstrates best practices
-
Includes sample data files
-
Shows comprehensive error handling
Run any example directly:
ruby examples/rss_parser/rss_parser.rb
ruby examples/web_scraper/web_scraper.rb
ruby examples/api_client/api_client.rb
See the examples README for complete documentation and learning paths.
Working with documents
Using the builder pattern
The builder pattern provides a clean DSL for creating XML documents:
doc = Moxml::Builder.new(Moxml.new).build do
declaration version: "1.0", encoding: "UTF-8"
element 'library', xmlns: 'http://example.org/library' do
element 'book' do
element 'title' do
text 'Ruby Programming'
end
element 'author' do
text 'Jane Smith'
end
comment 'Publication details'
element 'published', year: '2024'
cdata '<custom>metadata</custom>'
end
end
end
Direct document manipulation
doc = Moxml.new.create_document
# Add declaration
doc.add_child(doc.create_declaration("1.0", "UTF-8"))
# Create root with namespace
root = doc.create_element('library')
root.add_namespace(nil, 'http://example.org/library')
root.add_namespace("dc", "http://purl.org/dc/elements/1.1/")
doc.add_child(root)
# Add elements with attributes
book = doc.create_element('book')
book['id'] = 'b1'
book['type'] = 'technical'
root.add_child(book)
# Add mixed content
book.add_child(doc.create_comment('Book details'))
title = doc.create_element('title')
title.text = 'Ruby Programming'
book.add_child(title)
Fluent interface API
Moxml provides a fluent, chainable API for improved developer experience:
element = doc.create_element('book')
.set_attributes(id: "123", type: "technical")
.with_namespace("dc", "http://purl.org/dc/elements/1.1/")
.with_child(doc.create_element("title"))
For complete fluent API documentation including all chainable methods, convenience methods, and practical examples, see Working with Documents Guide.
SAX (Event-Driven) Parsing
SAX (Simple API for XML) provides memory-efficient, event-driven XML parsing for large documents.
When to use SAX:
-
Processing very large XML files (>100MB)
-
Memory-constrained environments
-
Streaming data extraction
-
Need to process data as it arrives
Quick example:
class BookExtractor < Moxml::SAX::ElementHandler
attr_reader :books
def initialize
super
@books = []
end
def on_start_element(name, attributes = {}, namespaces = {})
super
@books << { id: attributes["id"] } if name == "book"
end
end
handler = BookExtractor.new
Moxml.new.sax_parse(xml_string, handler)
puts handler.books.inspect
For complete SAX documentation including all handler types, event methods, adapter support, and best practices, see SAX Parsing Guide.
XML objects and their methods
For complete node API reference including traversal methods, manipulation, queries, type checking, and node information, see Node API Reference.
Advanced features
XPath querying
Moxml provides efficient XPath querying with consistent node mapping:
# Find all book elements
books = doc.xpath('//book')
# Find with namespaces
titles = doc.xpath('//dc:title', 'dc' => 'http://purl.org/dc/elements/1.1/')
# Find first matching node
first_book = doc.at_xpath('//book')
Namespace handling
# Add namespace to element
element.add_namespace('dc', 'http://purl.org/dc/elements/1.1/')
# Create element in namespace
title = doc.create_element('dc:title')
For complete documentation on XPath querying, namespace handling, and accessing native implementations, see Advanced Features Guide.
Error handling
Moxml provides comprehensive error classes with enhanced context for debugging:
begin
doc = Moxml.new.parse(xml_string, strict: true)
results = doc.xpath("//book[@id='123']")
rescue Moxml::ParseError => e
puts "Parse failed at line #{e.line}: #{e.message}"
rescue Moxml::XPathError => e
puts "XPath error: #{e.expression}"
rescue Moxml::Error => e
puts "XML processing error: #{e.message}"
end
For complete error class hierarchy, error types, best practices, and debugging techniques, see Error Handling Guide.
Configuration
Moxml can be configured globally or per instance:
# Global configuration
Moxml.configure do |config|
config.default_adapter = :nokogiri
config.strict = true
config.encoding = 'UTF-8'
end
# Instance configuration
context = Moxml.new do |config|
config.adapter = :oga
config.strict = false
end
For all configuration options, adapter selection, serialization options, and environment-based configuration, see Configuration Guide.
Thread safety
For complete information on thread-safe patterns, context management, and concurrent processing, see the Thread Safety Guide.
Performance considerations
For detailed performance optimization strategies, memory management best practices, and efficient querying patterns, see the Performance Considerations Guide.
Best practices
For comprehensive best practices covering XPath queries, adapter selection, error handling, namespace handling, memory management, thread safety, performance optimization, and testing strategies, see Best Practices Guide.
Specific adapter limitations
Ox adapter
The Ox adapter provides maximum parsing speed but has XPath limitations.
XPath limitations:
-
No attribute value predicates:
//book[@id='123']❌ -
No logical operators, position predicates, text predicates ❌
-
No namespace queries, parent axis, sibling axes ❌
-
No XPath functions ❌
Workaround: Use Ruby enumerable methods:
# Instead of: doc.xpath("//book[@id='123']")
doc.xpath("//book").find { |book| book["id"] == "123" }
For complete Ox adapter documentation including all limitations and workarounds, see Ox Adapter Guide.
HeadedOx adapter
The HeadedOx adapter combines Ox’s fast C-based XML parsing with Moxml’s comprehensive pure Ruby XPath 1.0 engine.
Status: Production-ready v1.2 (99.20% pass rate, 1,992/2,008 tests)
Key features:
-
Fast XML parsing (Ox C extension)
-
All 27 XPath 1.0 functions
-
6 XPath axes (child, descendant, parent, attribute, self, descendant-or-self)
-
Expression caching for performance
-
Pure Ruby XPath engine (debuggable)
When to use:
-
Need Ox’s fast parsing with comprehensive XPath
-
Want XPath functions (count, sum, contains, etc.)
-
Prefer pure Ruby XPath for debugging
-
Basic namespace queries are sufficient
# Use HeadedOx adapter
context = Moxml.new(:headed_ox)
doc = context.parse(xml_string)
# Full XPath 1.0 support
books = doc.xpath('//book[@price < 20]')
count = doc.xpath('count(//book)')
titles = doc.xpath('//book/title[contains(., "Ruby")]')
For complete HeadedOx documentation including architecture, XPath capabilities, known limitations, and usage examples, see HeadedOx Adapter Guide and Limitations Documentation.
LibXML adapter
DOCTYPE Limitations:
-
DOCTYPE parsing works
-
DOCTYPE round-trip preservation is limited
-
DOCTYPE cannot be reliably re-serialized after parsing
Performance:
-
Serialization speed: ~120 ips (slower than target)
-
Parsing speed: Good
-
For high-throughput serialization, consider Ox or Nokogiri
Other adapters
Nokogiri, Oga, REXML:
All three adapters have near-complete feature support with only minor edge case limitations. Use these adapters when you need full XPath and namespace support.
Development and testing
For complete information on development setup, testing strategies, benchmarking, and coverage reporting, see the Development and Testing Guide.
Contributing
-
Fork the repository
-
Create your feature branch (
git checkout -b feature/my-new-feature) -
Commit your changes (
git commit -am 'Add some feature') -
Push to the branch (
git push origin feature/my-new-feature) -
Create a new Pull Request
License
Copyright Ribose.
This project is licensed under the Ribose 3-Clause BSD License. See the LICENSE.md file for details.