Class: XML::SAX::FragmentBuilder

Inherits:
Builder
  • Object
show all
Defined in:
lib/xml-sax-machines/fragment_builder.rb

Overview

Build a Nokogiri::XML::Document fragments that match an XPath.

Stream large (or small) record based XML documents building each matching XPath into a document fragment making futher manipulation of each record easier.

Notes

  • In order to save memory well balanced elements that do not match any XPath are unlinked. This means you cannot match records by position in relation to siblings.

  • Because we are parsing a SAX stream there is no read ahead. You cannot match records by any children the element may have once further events are pushed.

  • You can match by attributes of an element.

Example

builder =  XML::SAX::FragmentBuilder.new(nil, {
  '//record' => lambda{|record| puts el.to_s} # Process each matched record element.
})
parser  =  Nokogiri::XML::SAX::PushParser.new(builder)
parser  << %q{
  <root>
    <record id="1">record one</record>
    <record id="2">record two</record>
  </root>
}
#=> <record id="1">record one</record>
#=> <record id="2">record two</record>
parser.finish

See

  • XML::SAX::Builder

  • XML::SAX::Filter

– TODO:

  • Namespaces.

Instance Attribute Summary

Attributes inherited from Builder

#document

Attributes inherited from Filter

#filter

Instance Method Summary collapse

Methods inherited from Builder

#start_document

Methods inherited from Filter

#end_document, #error, #start_document, #warning

Constructor Details

#initialize(options = {}) ⇒ FragmentBuilder

Parameters

handler<Nokogiri::XML::SAX::Document>

Optional next XML::SAX::Filter or Nokogiri::XML::SAX::Document<tt>(final) in the chain. By default a <tt>Nokogiri::XML::SAX::Document will be used making the chain final.

options<Hash>

=> &block<Proc> pairs. The first element passed to the block will be the matching Nokogiri::XML::Node. Keep in mind the node will be unlinked after your block returns.



50
51
52
53
54
55
# File 'lib/xml-sax-machines/fragment_builder.rb', line 50

def initialize(options = {})
  super()
  @find   = options
  @found  = {}
  @buffer = 0
end

Instance Method Details

#cdata_block(string) ⇒ Object

:nodoc:



95
96
97
# File 'lib/xml-sax-machines/fragment_builder.rb', line 95

def cdata_block(string) # :nodoc:
  @buffer > 0 ? super : (filter && filter.cdata_block(string))
end

#characters(string) ⇒ Object

:nodoc:



87
88
89
# File 'lib/xml-sax-machines/fragment_builder.rb', line 87

def characters(string) # :nodoc:
  @buffer > 0 ? super : (filter && filter.characters(string))
end

#comment(string) ⇒ Object

:nodoc:



91
92
93
# File 'lib/xml-sax-machines/fragment_builder.rb', line 91

def comment(string) # :nodoc:
  @buffer > 0 ? super : (filter && filter.comment(string))
end

#end_element(name) ⇒ Object

:nodoc:



69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# File 'lib/xml-sax-machines/fragment_builder.rb', line 69

def end_element(name) #:nodoc:
  path = @context.path
  if @buffer > 0 && block = @found.delete(path)
    @buffer -= 1
    block.call(@context)
  end
  super

  if @buffer == 0 && !(path == '/')
    @document.at(path).unlink

    # Unlinked children are not garbage collected till the document they were created in is (I think).
    # This hack job halves memory usage but it still grows too fast for my liking :(
    @document = @document.dup
    @context  = @document.at(@context.path) rescue nil
  end
end

#start_element(name, attributes = []) ⇒ Object

:nodoc:



57
58
59
60
61
62
63
64
65
66
67
# File 'lib/xml-sax-machines/fragment_builder.rb', line 57

def start_element(name, attributes = []) #:nodoc:
  super
  @find.each_pair do |xpath, block|
    if match = @document.at(xpath)
      unless @found[match.path]
        @buffer += 1
        @found[match.path] = block
      end
    end
  end
end