Class: Nokogiri::HTML::SAX::Parser

Inherits:
XML::SAX::Parser show all
Defined in:
lib/nokogiri/html/sax/parser.rb

Overview

This class lets you perform SAX style parsing on HTML with HTML error correction.

Here is a basic usage example:

class MyDoc < Nokogiri::XML::SAX::Document
  def start_element name, attributes = []
    puts "found a #{name}"
  end
end

parser = Nokogiri::HTML::SAX::Parser.new(MyDoc.new)
parser.parse(File.read(ARGV[0], mode: 'rb'))

For more information on SAX parsers, see Nokogiri::XML::SAX

Constant Summary

Constants inherited from XML::SAX::Parser

XML::SAX::Parser::ENCODINGS

Instance Attribute Summary

Attributes inherited from XML::SAX::Parser

#document, #encoding

Instance Method Summary collapse

Methods inherited from XML::SAX::Parser

#initialize, #parse

Constructor Details

This class inherits a constructor from Nokogiri::XML::SAX::Parser

Instance Method Details

#parse_file(filename, encoding = 'UTF-8') {|ctx| ... } ⇒ Object

Parse a file with filename

Yields:

  • (ctx)

Raises:

  • (ArgumentError)

51
52
53
54
55
56
57
58
# File 'lib/nokogiri/html/sax/parser.rb', line 51

def parse_file filename, encoding = 'UTF-8'
  raise ArgumentError unless filename
  raise Errno::ENOENT unless File.exist?(filename)
  raise Errno::EISDIR if File.directory?(filename)
  ctx = ParserContext.file(filename, encoding)
  yield ctx if block_given?
  ctx.parse_with self
end

#parse_io(io, encoding = 'UTF-8') {|ctx| ... } ⇒ Object

Parse given io

Yields:

  • (ctx)

41
42
43
44
45
46
47
# File 'lib/nokogiri/html/sax/parser.rb', line 41

def parse_io io, encoding = 'UTF-8'
  check_encoding(encoding)
  @encoding = encoding
  ctx = ParserContext.io(io, ENCODINGS[encoding])
  yield ctx if block_given?
  ctx.parse_with self
end

#parse_memory(data, encoding = 'UTF-8') {|ctx| ... } ⇒ Object

Parse html stored in data using encoding

Yields:

  • (ctx)

Raises:

  • (ArgumentError)

31
32
33
34
35
36
37
# File 'lib/nokogiri/html/sax/parser.rb', line 31

def parse_memory data, encoding = 'UTF-8'
  raise ArgumentError unless data
  return unless data.length > 0
  ctx = ParserContext.memory(data, encoding)
  yield ctx if block_given?
  ctx.parse_with self
end