Class: SCXML::Document

Inherits:

Object

Object
SCXML::Document

Includes:: XPath

Defined in:: lib/scxml/document.rb

Overview

SCXML Documents keep one central instance of an XML string. All elements in a document are tracked using ranges. For example:

<?xml version="1.0"?> 
<doc>
  <a>1</a>
</doc>

Is tracked as:

doc.string_range = 0...45
doc.string = "<?xml version="1.0"?>\n<doc>\n  <a>1</a>\n</doc>"
doc.content_range = 24...45
doc.content = "<doc>\n  <a>1</a>\n</doc>"

a.string_range = 30...-7 
a.string = "<a>1</a>"
a.content_range = 3...-4
a.content = "1"

In lightweight mode, the string and content of elements is not stored internally but rather computed from the ranges and extracted from document. Access is therefore slower, but the memory footprint significantly less, especially in large documents. Content for the document is always computed, and never stored internally.

Attributes are parsed into a hash on their first access. In lightweight mode, the hash is not stored internally. These attribute hashes account for most of the access/footprint difference.

Note that ranges are exclusive of the last indexed character (ie ‘…’ is used in the range rather than ‘..’), and that range end will ONLY be positive if the range extends to the end of the document.

String ranges encompass the entire tag and are relative to the full document. Content ranges are relative to the string range, and indicate all content within the tag. Ergo:

doc.string == doc.string
doc.content == doc.string[doc.content_range]
a.string == doc.string[a.string_range]
a.content == a.string[a.content_range]

Instance Attribute Summary collapse

#root ⇒ Object readonly

Returns the value of attribute root.
#scanner ⇒ Object readonly

Returns the value of attribute scanner.

Attributes included from XPath

#string

Instance Method Summary collapse

#content_range ⇒ Object

The range from the beginning of the first element tag to the end of the corresponding end tag.
#initialize(string, options = {}) ⇒ Document constructor

Creates a new Document from the input string.
#lightweight? ⇒ Boolean

Returns true if the document is set to lightweight mode.
#node_names ⇒ Object

Returns an array of all node names present in the document.
#select(xpath) ⇒ Object

Select elements using XPath statements.
#string_range ⇒ Object

The full range of the document (ie 0…length).
#tableize(options = {}, &block) ⇒ Object

Returns a table of element contents as configured.

Methods included from XPath

#content

Constructor Details

#initialize(string, options = {}) ⇒ `Document`

Creates a new Document from the input string. Options:

lightweight

In lightweight mode the string, content, and attributes of elements is recalculated on

every access.  This results in slower access, but a much smaller memory footprint.  default => true

# File 'lib/scxml/document.rb', line 49

def initialize(string, options={})
	@options = {:lightweight => true, :remove_whitespace => false}.merge(options)
	
	@string = options[:remove_whitespace] ? string.gsub(/\s*\r?\n\s*/, '') : string
	
	@scanner = StringScanner.new(@string)
	s, range = scan_node(@scanner)	
	range = range.nil? ? 0...@string.length : range.begin...(range.end == 0 ? @string.length : range.end)

	@content_range = range
	@root = Element.new(self, range, s)
end

Instance Attribute Details

#root ⇒ `Object` (readonly)

Returns the value of attribute root.



43
44
45

# File 'lib/scxml/document.rb', line 43

def root
  @root
end

#scanner ⇒ `Object` (readonly)

Returns the value of attribute scanner.



43
44
45

# File 'lib/scxml/document.rb', line 43

def scanner
  @scanner
end

Instance Method Details

#content_range ⇒ `Object`

The range from the beginning of the first element tag to the end of the corresponding end tag.



73
74
75

# File 'lib/scxml/document.rb', line 73

def content_range
	@content_range #||= string_range
end

#lightweight? ⇒ `Boolean`

Returns true if the document is set to lightweight mode.

Returns:

(Boolean)



63
64
65

# File 'lib/scxml/document.rb', line 63

def lightweight?
	@options[:lightweight]
end

#node_names ⇒ `Object`

Returns an array of all node names present in the document.

# File 'lib/scxml/document.rb', line 88

def node_names
	return @nodes if @nodes
	
	nodes = string.scan(/<(\w+)/m).flatten.uniq
	@nodes = nodes unless lightweight?
	nodes
end

#select(xpath) ⇒ `Object`

Select elements using XPath statements. Not all statements are supported. See the introduction or tests for allowed statements.

# File 'lib/scxml/document.rb', line 79

def select(xpath)
	return [] if xpath.nil?
	return [self] if xpath == '/'

	paths = xpath.scan(/\/*[^\/]+/)
	select_by_paths(paths)
end

#string_range ⇒ `Object`

The full range of the document (ie 0…length)



68
69
70

# File 'lib/scxml/document.rb', line 68

def string_range
	0...string.length
end

#tableize(options = {}, &block) ⇒ `Object`

Returns a table of element contents as configured. Options:

target

Specify the output target of the tableize operation. By default a string, but any object

	responding to '<<' can be provided.  The target is returned by +tableize+

row:: The xpath expression used to select rows of the table. default => ‘*’ col:: The xpath expression used to select columns relative to the row elements. default => ‘*’ header_row:: These should currently select the header row and cols, but should be replaced in favor

of a more intutive interface

header_col:: row_delimit:: The row delimiter. default => ‘n’ col_delimit:: The column delimiter. default => ‘t’ index:: If true, the output rows will be prefixed by an index corresponding to the row. col_width:: Specifies the width of the columns. Content will be trimmed if it exceeds this width,

			and will be justified left if width > 0 and justified right if width < 0.

Selected elements are passed to the block. The content for each table cell will be the return value of the block, or the element contents if no block is given.

# File 'lib/scxml/document.rb', line 113

def tableize(options={}, &block)
	options = {
		:target => "",
		:row_delimit => "\n",
		:col_delimit => "\t",
		:row => "*",
		:col => "*",
		:header_row => nil,
		:header_col => "*",
		:index => false,
		:col_width => nil
	}.merge(options)

	target = options[:target]
	col_delimit = options[:col_delimit]
	row_delimit = options[:row_delimit]
	index = options[:index]
	col_width = options[:col_width]
	
	['header_', ''].each do |prefix|
		row_xpath = options[ "#{prefix}row".to_sym ]
		col_xpath = options[ "#{prefix}col".to_sym ]
		
		rows = select(row_xpath)
		rows.each_index do |i|
			row = rows[i]
			cols = row.select(col_xpath)
			cols = block_given? ? 
				yield(row, cols) : 
				cols.collect {|col| col.content}
			
			cols.unshift i if index
			unless col_width.nil?
				cols = cols.collect do |c| 
					col_width < 0 ? c.to_s.rjust(-col_width) : c.to_s.ljust(col_width) 
				end 
			end
			
			target << cols.join(col_delimit)
			target << row_delimit
		end
	end
	
	target
end

Class: SCXML::Document

Overview

Instance Attribute Summary collapse

Attributes included from XPath

Instance Method Summary collapse

Methods included from XPath

Constructor Details

#initialize(string, options = {}) ⇒ Document

Instance Attribute Details

#root ⇒ Object (readonly)

#scanner ⇒ Object (readonly)

Instance Method Details

#content_range ⇒ Object

#lightweight? ⇒ Boolean

#node_names ⇒ Object

#select(xpath) ⇒ Object

#string_range ⇒ Object

#tableize(options = {}, &block) ⇒ Object