RXerces
A Ruby XML library with a Nokogiri-compatible API, powered by Apache Xerces-C instead of libxml2.
Overview
RXerces provides a familiar Nokogiri-like interface for XML parsing and manipulation, but uses the robust Apache Xerces-C XML parser under the hood. This allows Ruby developers to leverage Xerces-C's performance and standards compliance while maintaining compatibility with existing Nokogiri-based code.
Features
- ✅ Nokogiri-compatible API
- ✅ Powered by Apache Xerces-C
- ✅ Parse XML documents
- ✅ Navigate and manipulate DOM trees
- ✅ Read and write node attributes
- ✅ Query nodes with XPath (basic support)
- ✅ Serialize documents back to XML strings
Installation
Prerequisites
You need to have Xerces-C installed on your system:
macOS (Homebrew):
brew install xerces-c
Ubuntu/Debian:
sudo apt-get install libxerces-c-dev
Fedora/RHEL:
sudo yum install xerces-c-devel
Install the Gem
Add this line to your application's Gemfile:
gem 'rxerces'
And then execute:
bundle install
Or install it yourself as:
gem install rxerces
Usage
Basic Parsing
require 'rxerces'
# Parse XML string
xml = '<root><person name="Alice">Hello</person></root>'
doc = RXerces.XML(xml)
# Access root element
root = doc.root
puts root.name # => "root"
Nokogiri Compatibility
RXerces provides a Nokogiri module for drop-in compatibility:
require 'rxerces'
# Use Nokogiri syntax
doc = Nokogiri.XML('<root><child>text</child></root>')
puts doc.root.name # => "root"
# Classes are aliased
Nokogiri::XML::Document == RXerces::XML::Document # => true
Working with Nodes
# Parse XML
xml = " <library>\n <book id=\"1\" title=\"1984\">\n <author>George Orwell</author>\n <year>1949</year>\n </book>\n <book id=\"2\" title=\"Brave New World\">\n <author>Aldous Huxley</author>\n <year>1932</year>\n </book>\n </library>\n"
doc = RXerces.XML(xml)
root = doc.root
# Get attributes
book = root.children.find { |n| n.is_a?(RXerces::XML::Element) }
puts book['id'] # => "1"
puts book['title'] # => "1984"
# Set attributes
book['isbn'] = '978-0451524935'
puts book['isbn'] # => "978-0451524935"
# Get text content
= book.children.find { |n| n.name == 'author' }
puts .text # => "George Orwell"
# Set text content
.text = "Eric Arthur Blair"
puts .text # => "Eric Arthur Blair"
Navigating the DOM
# Get all children
root.children.each do |child|
puts "#{child.name}: #{child.class}"
end
# Find specific elements
books = root.children.select { |n| n.is_a?(RXerces::XML::Element) && n.name == 'book' }
books.each do |book|
puts "Book ID: #{book['id']}"
end
Serialization
# Convert document back to XML string
xml_string = doc.to_xml
puts xml_string
# or use to_s
puts doc.to_s
XPath Queries
RXerces supports XPath queries using Xerces-C's XPath implementation:
xml = " <library>\n <book>\n <title>1984</title>\n <author>George Orwell</author>\n </book>\n <book>\n <title>Brave New World</title>\n <author>Aldous Huxley</author>\n </book>\n </library>\n"
doc = RXerces.XML(xml)
# Find all book elements
books = doc.xpath('//book')
puts books.length # => 2
# Find all titles
titles = doc.xpath('//title')
titles.each do |title|
puts title.text.strip
end
# Use path expressions
= doc.xpath('/library/book/author')
puts .length # => 2
# Query from a specific node
first_book = books[0]
title = first_book.xpath('.//title').first
puts title.text # => "1984"
Note on XPath Support: Xerces-C implements the XML Schema XPath subset, not full XPath 1.0. Supported features include:
- Basic path expressions (
/,//,.,..) - Element selection by name
- Descendant and child axes
Not supported:
- Attribute predicates (
[@attribute="value"]) - XPath functions (
last(),position(),text()) - Comparison operators in predicates
For more complex queries, you can combine basic XPath with Ruby's select and find methods.
API Reference
RXerces Module
RXerces.XML(string)- Parse XML string and return DocumentRXerces.parse(string)- Alias forXML
RXerces::XML::Document
.parse(string)- Parse XML string (class method)#root- Get root element#to_s/#to_xml- Serialize to XML string#xpath(path)- Query with XPath (returns NodeSet)
RXerces::XML::Node
#name- Get node name#text/#content- Get text content#text=/#content=- Set text content#[attribute]- Get attribute value#[attribute]=- Set attribute value#children- Get array of child nodes#xpath(path)- Query descendants with XPath
RXerces::XML::Element
Inherits all methods from Node. Represents element nodes.
RXerces::XML::Text
Inherits all methods from Node. Represents text nodes.
RXerces::XML::NodeSet
#length/#size- Get number of nodes#[]- Access node by index#each- Iterate over nodes (Enumerable)#to_a- Convert to array
Development
Building the Extension
bundle install
bundle exec rake compile
Running Tests
bundle exec rspec
Running Tests with Compilation
bundle exec rake
Implementation Notes
- Uses Apache Xerces-C 3.x for XML parsing
- C++ extension compiled with Ruby's native extension API
- XPath support is basic (full XPath requires additional implementation)
- Memory management handled by Ruby's GC and Xerces-C's DOM
Differences from Nokogiri
While RXerces aims for API compatibility with Nokogiri, there are some differences:
- Parser Backend: Uses Xerces-C instead of libxml2
- XPath: Basic XPath support (returns empty NodeSet currently)
- Features: Subset of Nokogiri's full feature set
- Performance: Different performance characteristics due to Xerces-C
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create new Pull Request
License
MIT License - see LICENSE file for details
Credits
- Built with Apache Xerces-C
- API inspired by Nokogiri
Misc
This library was almost entirely written using AI (Claude Sonnet 4.5). It was mainly a reaction to the lack of maintainers for libxml2, and the generally sorry state of that library in general. Since nokogiri uses it under the hood, I thought it best to create an alternative.
Copyright
(C) 2025, Daniel J. Berger All Rights Reserved
Author
- Daniel J. Berger