Class: SCXML::Document

Inherits:
Object
  • Object
show all
Includes:
XPath
Defined in:
lib/scxml/document.rb

Overview

SCXML Documents keep one central instance of an XML string. All elements in a document are tracked using ranges. For example:

<?xml version="1.0"?> 
<doc>
  <a>1</a>
</doc>

Is tracked as:

doc.string_range = 0...45
doc.string = "<?xml version="1.0"?>\n<doc>\n  <a>1</a>\n</doc>"
doc.content_range = 24...45
doc.content = "<doc>\n  <a>1</a>\n</doc>"

a.string_range = 30...-7 
a.string = "<a>1</a>"
a.content_range = 3...-4
a.content = "1"

In lightweight mode, the string and content of elements is not stored internally but rather computed from the ranges and extracted from document. Access is therefore slower, but the memory footprint significantly less, especially in large documents. Content for the document is always computed, and never stored internally.

Attributes are parsed into a hash on their first access. In lightweight mode, the hash is not stored internally. These attribute hashes account for most of the access/footprint difference.

Note that ranges are exclusive of the last indexed character (ie ‘…’ is used in the range rather than ‘..’), and that range end will ONLY be positive if the range extends to the end of the document.

String ranges encompass the entire tag and are relative to the full document. Content ranges are relative to the string range, and indicate all content within the tag. Ergo:

doc.string == doc.string
doc.content == doc.string[doc.content_range]
a.string == doc.string[a.string_range]
a.content == a.string[a.content_range]

Instance Attribute Summary collapse

Attributes included from XPath

#string

Instance Method Summary collapse

Methods included from XPath

#content

Constructor Details

#initialize(string, options = {}) ⇒ Document

Creates a new Document from the input string. Options:

lightweight

In lightweight mode the string, content, and attributes of elements is recalculated on

every access.  This results in slower access, but a much smaller memory footprint.  default => true


49
50
51
52
53
54
55
56
57
58
59
60
# File 'lib/scxml/document.rb', line 49

def initialize(string, options={})
	@options = {:lightweight => true, :remove_whitespace => false}.merge(options)
	
	@string = options[:remove_whitespace] ? string.gsub(/\s*\r?\n\s*/, '') : string
	
	@scanner = StringScanner.new(@string)
	s, range = scan_node(@scanner)	
	range = range.nil? ? 0...@string.length : range.begin...(range.end == 0 ? @string.length : range.end)

	@content_range = range
	@root = Element.new(self, range, s)
end

Instance Attribute Details

#rootObject (readonly)

Returns the value of attribute root.



43
44
45
# File 'lib/scxml/document.rb', line 43

def root
  @root
end

#scannerObject (readonly)

Returns the value of attribute scanner.



43
44
45
# File 'lib/scxml/document.rb', line 43

def scanner
  @scanner
end

Instance Method Details

#content_rangeObject

The range from the beginning of the first element tag to the end of the corresponding end tag.



73
74
75
# File 'lib/scxml/document.rb', line 73

def content_range
	@content_range #||= string_range
end

#lightweight?Boolean

Returns true if the document is set to lightweight mode.

Returns:

  • (Boolean)


63
64
65
# File 'lib/scxml/document.rb', line 63

def lightweight?
	@options[:lightweight]
end

#node_namesObject

Returns an array of all node names present in the document.



88
89
90
91
92
93
94
# File 'lib/scxml/document.rb', line 88

def node_names
	return @nodes if @nodes
	
	nodes = string.scan(/<(\w+)/m).flatten.uniq
	@nodes = nodes unless lightweight?
	nodes
end

#select(xpath) ⇒ Object

Select elements using XPath statements. Not all statements are supported. See the introduction or tests for allowed statements.



79
80
81
82
83
84
85
# File 'lib/scxml/document.rb', line 79

def select(xpath)
	return [] if xpath.nil?
	return [self] if xpath == '/'

	paths = xpath.scan(/\/*[^\/]+/)
	select_by_paths(paths)
end

#string_rangeObject

The full range of the document (ie 0…length)



68
69
70
# File 'lib/scxml/document.rb', line 68

def string_range
	0...string.length
end

#tableize(options = {}, &block) ⇒ Object

Returns a table of element contents as configured. Options:

target

Specify the output target of the tableize operation. By default a string, but any object

	responding to '<<' can be provided.  The target is returned by +tableize+

row:: The xpath expression used to select rows of the table. default => ‘*’ col:: The xpath expression used to select columns relative to the row elements. default => ‘*’ header_row:: These should currently select the header row and cols, but should be replaced in favor

of a more intutive interface

header_col:: row_delimit:: The row delimiter. default => ‘n’ col_delimit:: The column delimiter. default => ‘t’ index:: If true, the output rows will be prefixed by an index corresponding to the row. col_width:: Specifies the width of the columns. Content will be trimmed if it exceeds this width,

			and will be justified left if width > 0 and justified right if width < 0.

Selected elements are passed to the block. The content for each table cell will be the return value of the block, or the element contents if no block is given.



113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
# File 'lib/scxml/document.rb', line 113

def tableize(options={}, &block)
	options = {
		:target => "",
		:row_delimit => "\n",
		:col_delimit => "\t",
		:row => "*",
		:col => "*",
		:header_row => nil,
		:header_col => "*",
		:index => false,
		:col_width => nil
	}.merge(options)

	target = options[:target]
	col_delimit = options[:col_delimit]
	row_delimit = options[:row_delimit]
	index = options[:index]
	col_width = options[:col_width]
	
	['header_', ''].each do |prefix|
		row_xpath = options[ "#{prefix}row".to_sym ]
		col_xpath = options[ "#{prefix}col".to_sym ]
		
		rows = select(row_xpath)
		rows.each_index do |i|
			row = rows[i]
			cols = row.select(col_xpath)
			cols = block_given? ? 
				yield(row, cols) : 
				cols.collect {|col| col.content}
			
			cols.unshift i if index
			unless col_width.nil?
				cols = cols.collect do |c| 
					col_width < 0 ? c.to_s.rjust(-col_width) : c.to_s.ljust(col_width) 
				end 
			end
			
			target << cols.join(col_delimit)
			target << row_delimit
		end
	end
	
	target
end