Moxml provides a unified API for XML processing in Ruby, supporting multiple XML parsing backends (Nokogiri, Ox, and Oga).
Moxml ("mox-em-el") stands for "Modular XML" and aims to provide a consistent interface for working with XML documents, regardless of the underlying XML library.
Installation
gem 'moxml'
Basic usage
Configuration
Configure Moxml to use your preferred XML backend:
require 'moxml'
Moxml.configure do |config|
config.backend = :nokogiri # or :ox, :oga
end
Creating and parsing documents
# Create new empty document
doc = Moxml::Document.new
# Parse from string
doc = Moxml::Document.parse("<root><child>content</child></root>")
# Parse with encoding
doc = Moxml::Document.parse(xml_string, encoding: 'UTF-8')
Document creation patterns
# Method 1: Create and build
doc = Moxml::Document.new
root = doc.create_element('root')
doc.add_child(root)
# Method 2: Parse from string
doc = Moxml::Document.parse("<root/>")
# Method 3: Parse with encoding
doc = Moxml::Document.parse(xml_string, encoding: 'UTF-8')
# Method 4: Parse with options
doc = Moxml::Document.parse(xml_string, {
encoding: 'UTF-8',
strict: true
})
Common XML patterns
# Working with namespaces
doc = Moxml::Document.new
root = doc.create_element('root')
root['xmlns:custom'] = 'http://example.com/ns'
child = doc.create_element('custom:element')
root.add_child(child)
# Creating structured data
person = doc.create_element('person')
person['id'] = '123'
name = doc.create_element('name')
name.add_child(doc.create_text('John Doe'))
person.add_child(name)
# Working with attributes
element = doc.create_element('div')
element['class'] = 'container'
element['data-id'] = '123'
element['style'] = 'color: blue'
# Handling special characters
text = doc.create_text('Special chars: < > & " \'')
cdata = doc.create_cdata('<script>alert("Hello!");</script>')
# Processing instructions
pi = doc.create_processing_instruction('xml-stylesheet',
'type="text/xsl" href="style.xsl"')
doc.add_child(pi)
Working with elements
# Create new element
element = Moxml::Element.new('tagname')
# Add attributes
element['class'] = 'content'
# Access attributes
class_attr = element['class']
# Add child elements
child = element.create_element('child')
element.add_child(child)
# Access text content
text_content = element.text
# Add text content
text = element.create_text('content')
element.add_child(text)
# Chaining operations
element
.add_child(doc.create_element('child'))
.add_child(doc.create_text('content'))
['class'] = 'new-class'
# Complex element creation
div = doc.create_element('div')
div['class'] = 'container'
div.add_child(doc.create_element('span'))
.add_child(doc.create_text('Hello'))
div.add_child(doc.create_element('br'))
div.add_child(doc.create_text('World'))
Working with different node types
# Text nodes with various content
plain_text = Moxml::Text.new("Simple text")
multiline_text = Moxml::Text.new("Line 1\nLine 2")
special_chars = Moxml::Text.new("Special: & < > \" '")
# CDATA sections for different content types
script_cdata = Moxml::Cdata.new("function() { alert('Hello!'); }")
xml_cdata = Moxml::Cdata.new("<data><item>value</item></data>")
mixed_cdata = Moxml::Cdata.new("Text with ]]> characters")
# Comments for documentation
todo_comment = Moxml::Comment.new("TODO: Add validation")
section_comment = Moxml::Comment.new("----- Section Break -----")
debug_comment = Moxml::Comment.new("DEBUG: Remove in production")
# Processing instructions for various uses
style_pi = Moxml::ProcessingInstruction.new(
"xml-stylesheet",
'type="text/css" href="style.css"'
)
php_pi = Moxml::ProcessingInstruction.new(
"php",
'echo "<?php echo $var; ?>>";'
)
custom_pi = Moxml::ProcessingInstruction.new(
"custom-processor",
'param1="value1" param2="value2"'
)
Element manipulation examples
# Building complex structures
doc = Moxml::Document.new
root = doc.create_element('html')
doc.add_child(root)
# Create head section
head = doc.create_element('head')
root.add_child(head)
title = doc.create_element('title')
title.add_child(doc.create_text('Example Page'))
head.add_child(title)
= doc.create_element('meta')
['charset'] = 'UTF-8'
head.add_child()
# Create body section
body = doc.create_element('body')
root.add_child(body)
div = doc.create_element('div')
div['class'] = 'container'
body.add_child(div)
# Add multiple paragraphs
3.times do |i|
p = doc.create_element('p')
p.add_child(doc.create_text("Paragraph #{i + 1}"))
div.add_child(p)
end
# Working with lists
ul = doc.create_element('ul')
div.add_child(ul)
['Item 1', 'Item 2', 'Item 3'].each do |text|
li = doc.create_element('li')
li.add_child(doc.create_text(text))
ul.add_child(li)
end
# Adding link element
a = doc.create_element('a')
a['href'] = 'https://example.com'
a.add_child(doc.create_text('Visit Example'))
div.add_child(a)
Advanced node manipulation
# Cloning nodes
original = doc.create_element('div')
original['id'] = 'original'
clone = original.clone
# Moving nodes
target = doc.create_element('target')
source = doc.create_element('source')
source.add_child(doc.create_text('Content'))
target.add_child(source)
# Replacing nodes
old_node = doc.at_xpath('//old')
new_node = doc.create_element('new')
old_node.replace(new_node)
# Inserting before/after
reference = doc.create_element('reference')
before = doc.create_element('before')
after = doc.create_element('after')
reference.add_previous_sibling(before)
reference.add_next_sibling(after)
# Conditional manipulation
element = doc.at_xpath('//conditional')
if element['flag'] == 'true'
element.add_child(doc.create_text('Flag is true'))
else
element.remove
end
Working with namespaces
# Creating namespaced document
doc = Moxml::Document.new
root = doc.create_element('root')
root['xmlns'] = 'http://example.com/default'
root['xmlns:custom'] = 'http://example.com/custom'
doc.add_child(root)
# Adding namespaced elements
default_elem = doc.create_element('default-elem')
custom_elem = doc.create_element('custom:elem')
root.add_child(default_elem)
root.add_child(custom_elem)
# Working with attributes in namespaces
custom_elem['custom:attr'] = 'value'
# Accessing namespaced content
ns_elem = doc.at_xpath('//custom:elem')
ns_attr = ns_elem['custom:attr']
Document serialization examples
# Basic serialization
xml_string = doc.to_xml
# Pretty printing with indentation
formatted_xml = doc.to_xml(
indent: 2,
pretty: true
)
# Controlling XML declaration
with_declaration = doc.to_xml(
xml_declaration: true,
encoding: 'UTF-8',
standalone: 'yes'
)
# Compact output
minimal_xml = doc.to_xml(
indent: 0,
pretty: false,
xml_declaration: false
)
# Custom formatting
custom_format = doc.to_xml(
indent: 4,
encoding: 'ISO-8859-1',
xml_declaration: true
)
Implementation details
Memory management
# Efficient document handling
doc = Moxml::Document.parse(large_xml)
begin
# Process document
result = process_document(doc)
ensure
# Clear references
doc = nil
GC.start
end
# Streaming large node sets
doc.xpath('//large-set/*').each do |node|
# Process node
process_node(node)
# Clear reference
node = nil
end
# Handling large collections
def process_large_nodeset(nodeset)
nodeset.each do |node|
yield node if block_given?
end
ensure
# Clear references
nodeset = nil
GC.start
end
Backend-specific optimizations
# Nokogiri-specific optimizations
if Moxml.config.backend == :nokogiri
# Use native CSS selectors
nodes = doc.native.css('complex > selector')
nodes.each do |native_node|
node = Moxml::Node.wrap(native_node)
# Process node
end
# Use native XPath
results = doc.native.xpath('//complex/xpath/expression')
end
# Ox-specific optimizations
if Moxml.config.backend == :ox
# Use native parsing options
doc = Moxml::Document.parse(xml, {
mode: :generic,
effort: :tolerant,
smart: true
})
# Direct element creation
element = Ox::Element.new('name')
wrapped = Moxml::Element.new(element)
end
# Oga-specific optimizations
if Moxml.config.backend == :oga
# Use native parsing features
doc = Moxml::Document.parse(xml, {
encoding: 'UTF-8',
strict: true
})
# Direct access to native methods
nodes = doc.native.xpath('//element')
end
Threading patterns
# Thread-safe document creation
require 'thread'
class ThreadSafeXmlProcessor
def initialize
@mutex = Mutex.new
end
def process_document(xml_string)
@mutex.synchronize do
doc = Moxml::Document.parse(xml_string)
# Process document
result = doc.to_xml
doc = nil
result
end
end
end
# Parallel document processing
def process_documents(xml_strings)
threads = xml_strings.map do |xml|
Thread.new do
doc = Moxml::Document.parse(xml)
# Process document
doc = nil
end
end
threads.each(&:join)
end
# Thread-local document storage
Thread.new do
Thread.current[:document] = Moxml::Document.new
# Process document
ensure
Thread.current[:document] = nil
end
Troubleshooting
Common issues and solutions
Parsing errors
# Handle malformed XML
begin
doc = Moxml::Document.parse(xml_string)
rescue Moxml::ParseError => e
puts "Parse error at line #{e.line}, column #{e.column}: #{e.message}"
# Attempt recovery
xml_string = cleanup_xml(xml_string)
retry
end
# Handle encoding issues
begin
doc = Moxml::Document.parse(xml_string, encoding: 'UTF-8')
rescue Moxml::ParseError => e
if e. =~ /encoding/
# Try detecting encoding
detected_encoding = detect_encoding(xml_string)
retry if detected_encoding
end
raise
end
Memory issues
# Handle large documents
def process_large_document(path)
# Read and process in chunks
File.open(path) do |file|
doc = Moxml::Document.parse(file)
doc.xpath('//chunk').each do |chunk|
process_chunk(chunk)
chunk = nil
end
doc = nil
end
GC.start
end
# Monitor memory usage
require 'get_process_mem'
def memory_safe_processing(xml)
memory = GetProcessMem.new
initial_memory = memory.mb
doc = Moxml::Document.parse(xml)
result = process_document(doc)
doc = nil
GC.start
final_memory = memory.mb
puts "Memory usage: #{final_memory - initial_memory}MB"
result
end
Backend-specific issues
# Handle backend limitations
def safe_xpath(doc, xpath)
case Moxml.config.backend
when :nokogiri
doc.xpath(xpath)
when :ox
# Ox has limited XPath support
fallback_xpath_search(doc, xpath)
when :oga
# Handle Oga-specific XPath syntax
modified_xpath = adjust_xpath_for_oga(xpath)
doc.xpath(modified_xpath)
end
end
# Handle backend switching
def with_backend(backend)
original_backend = Moxml.config.backend
Moxml.config.backend = backend
yield
ensure
Moxml.config.backend = original_backend
end
Performance optimization
Document creation
# Efficient document building
def build_large_document
doc = Moxml::Document.new
root = doc.create_element('root')
doc.add_child(root)
# Pre-allocate elements
elements = Array.new(1000) do |i|
elem = doc.create_element('item')
elem['id'] = i.to_s
elem
end
# Batch add elements
elements.each do |elem|
root.add_child(elem)
end
doc
end
# Memory-efficient processing
def process_large_xml(xml_string)
result = []
doc = Moxml::Document.parse(xml_string)
doc.xpath('//item').each do |item|
# Process and immediately discard
result << process_item(item)
item = nil
end
doc = nil
GC.start
result
end
Query optimization
# Optimize node selection
def efficient_node_selection(doc)
# Cache frequently used nodes
@header_nodes ||= doc.xpath('//header').to_a
# Use specific selectors
doc.xpath('//specific/path') # Better than '//*[name()="specific"]'
# Combine queries when possible
doc.xpath('//a | //b') # Better than two separate queries
end
# Optimize attribute access
def efficient_attribute_handling(element)
# Cache attribute values
@cached_attrs ||= element.attributes
# Direct attribute access
value = element['attr'] # Better than element.attributes['attr']
# Batch attribute updates
attrs = {'id' => '1', 'class' => 'new', 'data' => 'value'}
attrs.each { |k,v| element[k] = v }
end
Serialization optimization
# Efficient output generation
def optimized_serialization(doc)
# Minimal output
compact = doc.to_xml(
indent: 0,
pretty: false,
xml_declaration: false
)
# Balanced formatting
readable = doc.to_xml(
indent: 2,
pretty: true,
xml_declaration: true
)
# Stream large documents
File.open('large.xml', 'w') do |file|
doc.write_to(file, indent: 2)
end
end
Debugging tips
Inspection helpers
# Debug node structure
def inspect_node(node, level = 0)
indent = " " * level
puts "#{indent}#{node.class.name}: #{node.name}"
if node.respond_to?(:attributes)
node.attributes.each do |name, attr|
puts "#{indent} @#{name}=#{attr.value.inspect}"
end
end
if node.respond_to?(:children)
node.children.each { |child| inspect_node(child, level + 1) }
end
end
# Track node operations
def debug_node_operations
nodes_created = 0
nodes_removed = 0
yield
ensure
puts "Nodes created: #{nodes_created}"
puts "Nodes removed: #{nodes_removed}"
end
Backend validation
# Verify backend behavior
def verify_backend_compatibility
doc = Moxml::Document.new
# Test basic operations
element = doc.create_element('test')
doc.add_child(element)
# Verify node handling
raise "Node creation failed" unless doc.root
raise "Node type wrong" unless doc.root.is_a?(Moxml::Element)
# Verify serialization
xml = doc.to_xml
raise "Serialization failed" unless xml.include?('<test/>')
puts "Backend verification successful"
rescue => e
puts "Backend verification failed: #{e.message}"
end
Error handling
Moxml provides unified error handling:
-
Moxml::Error- Base error class -
Moxml::ParseError- XML parsing errors -
Moxml::ArgumentError- Invalid argument errors
Error handling patterns
# Handle parsing errors
begin
doc = Moxml::Document.parse(xml_string)
rescue Moxml::ParseError => e
logger.error "Parse error: #{e.message}"
logger.error "At line #{e.line}, column #{e.column}"
raise
end
# Handle invalid operations
begin
element['invalid/name'] = 'value'
rescue Moxml::ArgumentError => e
logger.warn "Invalid operation: #{e.message}"
# Use alternative approach
end
# Custom error handling
class XmlProcessor
def process(xml)
doc = Moxml::Document.parse(xml)
yield doc
rescue Moxml::Error => e
handle_moxml_error(e)
rescue StandardError => e
handle_standard_error(e)
ensure
doc = nil
end
end
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/lutaml/moxml.
Development guidelines
-
Follow Ruby style guide
-
Add tests for new features
-
Update documentation
-
Ensure backwards compatibility
-
Consider performance implications
-
Test with all supported backends
Copyright and license
Copyright Ribose.
The gem is available as open source under the terms of the BSD-2-Clause License.