Class: LibXML::XML::Parser

Inherits:
Object
  • Object
show all
Defined in:
ext/libxml/ruby_xml_parser.c,
lib/libxml/parser.rb,
ext/libxml/ruby_xml_parser.c

Overview

The XML::Parser provides a tree based API for processing xml documents, in contract to XML::Reader’s stream based api and XML::SaxParser callback based API.

As a result, parsing a document creates an in-memory document object that consist of any number of XML::Node instances. This is simple and powerful model, but has the major limitation that the size of the document that can be processed is limited by the amount of memory available. In such cases, it is better to use the XML::Reader.

Using the parser is simple:

parser = XML::Parser.file('my_file')
doc = parser.parse

You can also parse documents (see XML::Parser.document), strings (see XML::Parser.string) and io objects (see XML::Parser.io).

Defined Under Namespace

Modules: Options Classes: Context

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(context) ⇒ XML::Parser

Creates a new XML::Parser from the specified XML::Parser::Context.



39
40
41
42
43
44
45
46
47
48
49
50
51
52
# File 'ext/libxml/ruby_xml_parser.c', line 39

static VALUE rxml_parser_initialize(int argc, VALUE *argv, VALUE self)
{
  VALUE context = Qnil;

  rb_scan_args(argc, argv, "01", &context);

  if (context == Qnil)
  {
    rb_raise(rb_eArgError, "An instance of a XML::Parser::Context must be passed to XML::Parser.new");
  }

  rb_ivar_set(self, CONTEXT_ATTR, context);
  return self;
}

Instance Attribute Details

#contextObject (readonly)

#inputObject (readonly)

Class Method Details

.document(doc) ⇒ Object

call-seq:

XML::Parser.document(document) -> XML::Parser

Creates a new parser for the specified document.

Parameters:

document - A preparsed document.


14
15
16
17
# File 'lib/libxml/parser.rb', line 14

def self.document(doc)
  context = XML::Parser::Context.document(doc)
  self.new(context)
end

.file(path, options = {}) ⇒ Object

call-seq:

XML::Parser.file(path) -> XML::Parser
XML::Parser.file(path, :encoding => XML::Encoding::UTF_8,
                       :options => XML::Parser::Options::NOENT) -> XML::Parser

Creates a new parser for the specified file or uri.

You may provide an optional hash table to control how the parsing is performed. Valid options are:

encoding - The document encoding, defaults to nil. Valid values
           are the encoding constants defined on XML::Encoding.
options - Parser options.  Valid values are the constants defined on
          XML::Parser::Options.  Mutliple options can be combined
          by using Bitwise OR (|).


34
35
36
37
38
39
# File 'lib/libxml/parser.rb', line 34

def self.file(path, options = {})
  context = XML::Parser::Context.file(path)
  context.encoding = options[:encoding] if options[:encoding]
  context.options = options[:options] if options[:options]
  self.new(context)
end

.io(io, options = {}) ⇒ Object

call-seq:

XML::Parser.io(io) -> XML::Parser
XML::Parser.io(io, :encoding => XML::Encoding::UTF_8,
                   :options => XML::Parser::Options::NOENT
                   :base_uri="http://libxml.org") -> XML::Parser

Creates a new parser for the specified io object.

Parameters:

io - io object that contains the xml to parser
base_uri - The base url for the parsed document.
encoding - The document encoding, defaults to nil. Valid values
           are the encoding constants defined on XML::Encoding.
options - Parser options.  Valid values are the constants defined on
          XML::Parser::Options.  Mutliple options can be combined
          by using Bitwise OR (|).


58
59
60
61
62
63
64
# File 'lib/libxml/parser.rb', line 58

def self.io(io, options = {})
  context = XML::Parser::Context.io(io)
  context.base_uri = options[:base_uri] if options[:base_uri]
  context.encoding = options[:encoding] if options[:encoding]
  context.options = options[:options] if options[:options]
  self.new(context)
end

.register_error_handler(proc) ⇒ Object



91
92
93
94
95
96
97
98
# File 'lib/libxml/parser.rb', line 91

def self.register_error_handler(proc)
  warn('Parser.register_error_handler is deprecated.  Use Error.set_handler instead')
  if proc.nil?
    Error.reset_handler
  else
    Error.set_handler(&proc)
  end
end

.string(string, options = {}) ⇒ Object

call-seq:

XML::Parser.string(string)
XML::Parser.string(string, :encoding => XML::Encoding::UTF_8,
                           :options => XML::Parser::Options::NOENT
                           :base_uri="http://libxml.org") -> XML::Parser

Creates a new parser by parsing the specified string.

You may provide an optional hash table to control how the parsing is performed. Valid options are:

base_uri - The base url for the parsed document.
encoding - The document encoding, defaults to nil. Valid values
           are the encoding constants defined on XML::Encoding.
options - Parser options.  Valid values are the constants defined on
          XML::Parser::Options.  Mutliple options can be combined
          by using Bitwise OR (|).


83
84
85
86
87
88
89
# File 'lib/libxml/parser.rb', line 83

def self.string(string, options = {})
  context = XML::Parser::Context.string(string)
  context.base_uri = options[:base_uri] if options[:base_uri]
  context.encoding = options[:encoding] if options[:encoding]
  context.options = options[:options] if options[:options]
  self.new(context)
end

Instance Method Details

#parseXML::Document

Parse the input XML and create an XML::Document with it’s content. If an error occurs, XML::Parser::ParseError is thrown.



62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# File 'ext/libxml/ruby_xml_parser.c', line 62

static VALUE rxml_parser_parse(VALUE self)
{
  xmlParserCtxtPtr ctxt;
  VALUE context = rb_ivar_get(self, CONTEXT_ATTR);
  
  Data_Get_Struct(context, xmlParserCtxt, ctxt);

  if ((xmlParseDocument(ctxt) == -1 || !ctxt->wellFormed) && ! ctxt->recovery)
  {
    rxml_raise(&ctxt->lastError);
  }

  rb_funcall(context, rb_intern("close"), 0);

  return rxml_document_wrap(ctxt->myDoc);
}