Class: SAXish

Inherits:
Object
  • Object
show all
Defined in:
lib/xamplr-pp/saxish.rb

Instance Attribute Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#handlerObject

The Ruby implementation of the xampl-pp parser is called Xampl_PP, and SAXish will be the name of our SAX-like parser.



54
55
56
# File 'lib/xamplr-pp/saxish.rb', line 54

def handler
  @handler
end

#processNamespaceObject

Sax parsers need an event handler. ‘handler’ is it. Handler is expected to implement the methods defined in the module ‘saxishHandler’. SaxishHandler is intended to be an adapter (so you can include it in any hander you write), so only the event-handlers for those events in which you are interested in need to be re-defined. SAXdemo is an implementation of SaxishHandler that gathers some statistics.

Xampl-pp requires something it calls a resolver. This is a class that implements a method called resolve. There are a number of predefined entities in xampl-pp: & ' > < and ". It is possible to add more entities by adding entries to the entityMap hashtable. If an entity is encountered that is not in entityMap then the resolve method on the resolver is called. The default resolver returns nil, which causes an exception to be thrown. If you specify your own resolver you can do anything you like to obtain a value for the entity, or you can return nil (and an exception will be thrown). Xampl-pp, by default, is its own resolver and simply return nil.

We are going to require that our saxish handler also be the entity resolver. This is reflected in the SaxHandler module, which implements a resolve method that always returns nil.



80
81
82
# File 'lib/xamplr-pp/saxish.rb', line 80

def processNamespace
  @processNamespace
end

#reportNamespaceAttributesObject

Returns the value of attribute reportNamespaceAttributes.



81
82
83
# File 'lib/xamplr-pp/saxish.rb', line 81

def reportNamespaceAttributes
  @reportNamespaceAttributes
end

Instance Method Details

#attributeCountObject



193
194
195
# File 'lib/xamplr-pp/saxish.rb', line 193

def attributeCount
 	return @xpp.attributeName.length
end

#attributeName(i) ⇒ Object



197
198
199
# File 'lib/xamplr-pp/saxish.rb', line 197

def attributeName(i)
	return @xpp.attributeName[i]
end

#attributeNamespace(i) ⇒ Object



201
202
203
# File 'lib/xamplr-pp/saxish.rb', line 201

def attributeNamespace(i)
	return @xpp.attributeNamespace[i]
end

#attributePrefix(i) ⇒ Object



209
210
211
# File 'lib/xamplr-pp/saxish.rb', line 209

def attributePrefix(i)
	return @xpp.attributePrefix[i]
end

#attributeQName(i) ⇒ Object



205
206
207
# File 'lib/xamplr-pp/saxish.rb', line 205

def attributeQName(i)
	return @xpp.attributeQName[i]
end

#attributeValue(i) ⇒ Object



213
214
215
# File 'lib/xamplr-pp/saxish.rb', line 213

def attributeValue(i)
	return @xpp.attributeValue[i]
end

#columnObject



225
226
227
# File 'lib/xamplr-pp/saxish.rb', line 225

def column
	return @xpp.column
end

#depthObject



217
218
219
# File 'lib/xamplr-pp/saxish.rb', line 217

def depth
 	return @xpp.depth
end

#lineObject



221
222
223
# File 'lib/xamplr-pp/saxish.rb', line 221

def line
	return @xpp.line
end

#parse(filename) ⇒ Object

This block of comments can be ignored, certainly for the first reading. It talks about some control you have over how the xampl-pp works. The default behaviour is the most commonly used.

There are two main controls used here: processNamespace, and reportNamespaceAttributes. If processNamespaces is true, then namespaces in the XML file being parsed will be processed. Processing means that if an element <prefix:name/> is encountered, then four variables will be set up in the parser instance: name is ‘name’, prefix is ‘prefix’, qname is ‘prefix:name’, and namespace is defined. If the namespace cannot be defined an exception is thrown. In addition the xmlns attributes are processed. If processNamespace is false then name and qname will both be ‘prefix:name’, and both prefix and namespace undefined. If reportNamespaceAttributes is true then the xmlns attributes will be reported along with all the other attributes, if false then they will be hidden. The default behaviour is to process namespaces but to not report the namespace attributes.

There are two other controls that should be mentioned. They are not used here.

Pull parsers are pretty low level tools. They are meant to be fast. While may wellformedness constraints are enforced, not all are. If the control checkWellFormed is true then additional checks are made. Xampl-pp does not guarantee that it will parse only well formed XML documents. It will parse some XML files that are not well formed without objecting. In future releases, it will be possible to have xampl-pp accept only well formed documents. If checkWellFormed is false, then the parser doesn’t go out of its way to notice ill formed documents. The default is true.

The fourth control is ‘utf8encode’. If this is true, and it defaults to true, then an entity like &#1234; is encountered then it will be encoded using utf8 rules. Given the current state of the parser, it would be best to leave it set to true. If you want to change this then you must either never use &#; encodings with numbers greater than 255 (Ruby will throw an exception), or you must redefine xampl-pp’s encode method to do the right thing.



124
125
126
127
128
129
130
131
132
# File 'lib/xamplr-pp/saxish.rb', line 124

def parse(filename)
	@xpp = Xampl_PP.new
	@xpp.input = File.new(filename)
   @xpp.processNamespace = @processNamespace
   @xpp.reportNamespaceAttributes = @reportNamespaceAttributes
   @xpp.resolver = @handler

	work
end

#parseString(string) ⇒ Object



134
135
136
137
138
139
140
141
142
# File 'lib/xamplr-pp/saxish.rb', line 134

def parseString(string)
	@xpp = Xampl_PP.new
	@xpp.input = string
   @xpp.processNamespace = @processNamespace
   @xpp.reportNamespaceAttributes = @reportNamespaceAttributes
   @xpp.resolver = @handler

	work
end

#workObject

Constructing an instance of xampl-pp is pretty straight forward: Xampl_PP.new

Xampl_PP accepts two kinds of input: IO and String. The same method, ‘input’, is used to specify the input. It is possible to set the input anytime, but if you do, the current input will be closed if it is of type IO, and the parsing will begin at the current location of the input.

The methods parse and parseString illustrate.



155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
# File 'lib/xamplr-pp/saxish.rb', line 155

def work
	while not @xpp.endDocument? do
		case @xpp.nextEvent
      when Xampl_PP::START_DOCUMENT
				@handler.startDocument
       when Xampl_PP::END_DOCUMENT
				@handler.endDocument
       when Xampl_PP::START_ELEMENT
				@handler.startElement(@xpp.name,
				                      @xpp.namespace,
															@xpp.qname,
															@xpp.prefix,
															attributeCount,
															@xpp.emptyElement,
															self)
       when Xampl_PP::END_ELEMENT
				@handler.endElement(@xpp.name,
				                    @xpp.namespace,
														@xpp.qname,
														@xpp.prefix)
       when Xampl_PP::TEXT
				@handler.text(@xpp.text, @xpp.whitespace?)
       when Xampl_PP::CDATA_SECTION
				@handler.cdataSection(@xpp.text)
       when Xampl_PP::ENTITY_REF
				@handler.entityRef(@xpp.name, @xpp.text)
       when Xampl_PP::IGNORABLE_WHITESPACE
				@handler.ignoreableWhitespace(@xpp.text)
       when Xampl_PP::PROCESSING_INSTRUCTION
				@handler.processingInstruction(@xpp.text)
       when Xampl_PP::COMMENT
				@handler.comment(@xpp.text)
       when Xampl_PP::DOCTYPE
				@handler.doctype(@xpp.text)
		end
	end
end