Class: SCXML::Document
Overview
SCXML Documents keep one central instance of an XML string. All elements in a document are tracked using ranges. For example:
<?xml version="1.0"?>
<doc>
<a>1</a>
</doc>
Is tracked as:
doc.string_range = 0...45
doc.string = "<?xml version="1.0"?>\n<doc>\n <a>1</a>\n</doc>"
doc.content_range = 24...45
doc.content = "<doc>\n <a>1</a>\n</doc>"
a.string_range = 30...-7
a.string = "<a>1</a>"
a.content_range = 3...-4
a.content = "1"
In lightweight mode, the string and content of elements is not stored internally but rather computed from the ranges and extracted from document. Access is therefore slower, but the memory footprint significantly less, especially in large documents. Content for the document is always computed, and never stored internally.
Attributes are parsed into a hash on their first access. In lightweight mode, the hash is not stored internally. These attribute hashes account for most of the access/footprint difference.
Note that ranges are exclusive of the last indexed character (ie ‘…’ is used in the range rather than ‘..’), and that range end will ONLY be positive if the range extends to the end of the document.
String ranges encompass the entire tag and are relative to the full document. Content ranges are relative to the string range, and indicate all content within the tag. Ergo:
doc.string == doc.string
doc.content == doc.string[doc.content_range]
a.string == doc.string[a.string_range]
a.content == a.string[a.content_range]
Instance Attribute Summary collapse
-
#root ⇒ Object
readonly
Returns the value of attribute root.
-
#scanner ⇒ Object
readonly
Returns the value of attribute scanner.
Attributes included from XPath
Instance Method Summary collapse
-
#content_range ⇒ Object
The range from the beginning of the first element tag to the end of the corresponding end tag.
-
#initialize(string, options = {}) ⇒ Document
constructor
Creates a new Document from the input string.
-
#lightweight? ⇒ Boolean
Returns true if the document is set to lightweight mode.
-
#node_names ⇒ Object
Returns an array of all node names present in the document.
-
#select(xpath) ⇒ Object
Select elements using XPath statements.
-
#string_range ⇒ Object
The full range of the document (ie 0…length).
-
#tableize(options = {}, &block) ⇒ Object
Returns a table of element contents as configured.
Methods included from XPath
Constructor Details
#initialize(string, options = {}) ⇒ Document
Creates a new Document from the input string. Options:
lightweight
-
In lightweight mode the string, content, and attributes of elements is recalculated on
every access. This results in slower access, but a much smaller memory footprint. default => true
49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/scxml/document.rb', line 49 def initialize(string, ={}) @options = {:lightweight => true, :remove_whitespace => false}.merge() @string = [:remove_whitespace] ? string.gsub(/\s*\r?\n\s*/, '') : string @scanner = StringScanner.new(@string) s, range = scan_node(@scanner) range = range.nil? ? 0...@string.length : range.begin...(range.end == 0 ? @string.length : range.end) @content_range = range @root = Element.new(self, range, s) end |
Instance Attribute Details
#root ⇒ Object (readonly)
Returns the value of attribute root.
43 44 45 |
# File 'lib/scxml/document.rb', line 43 def root @root end |
#scanner ⇒ Object (readonly)
Returns the value of attribute scanner.
43 44 45 |
# File 'lib/scxml/document.rb', line 43 def scanner @scanner end |
Instance Method Details
#content_range ⇒ Object
The range from the beginning of the first element tag to the end of the corresponding end tag.
73 74 75 |
# File 'lib/scxml/document.rb', line 73 def content_range @content_range #||= string_range end |
#lightweight? ⇒ Boolean
Returns true if the document is set to lightweight mode.
63 64 65 |
# File 'lib/scxml/document.rb', line 63 def lightweight? @options[:lightweight] end |
#node_names ⇒ Object
Returns an array of all node names present in the document.
88 89 90 91 92 93 94 |
# File 'lib/scxml/document.rb', line 88 def node_names return @nodes if @nodes nodes = string.scan(/<(\w+)/m).flatten.uniq @nodes = nodes unless lightweight? nodes end |
#select(xpath) ⇒ Object
Select elements using XPath statements. Not all statements are supported. See the introduction or tests for allowed statements.
79 80 81 82 83 84 85 |
# File 'lib/scxml/document.rb', line 79 def select(xpath) return [] if xpath.nil? return [self] if xpath == '/' paths = xpath.scan(/\/*[^\/]+/) select_by_paths(paths) end |
#string_range ⇒ Object
The full range of the document (ie 0…length)
68 69 70 |
# File 'lib/scxml/document.rb', line 68 def string_range 0...string.length end |
#tableize(options = {}, &block) ⇒ Object
Returns a table of element contents as configured. Options:
target
-
Specify the output target of the tableize operation. By default a string, but any object
responding to '<<' can be provided. The target is returned by +tableize+
row
:: The xpath expression used to select rows of the table. default => ‘*’ col
:: The xpath expression used to select columns relative to the row elements. default => ‘*’ header_row
:: These should currently select the header row and cols, but should be replaced in favor
of a more intutive interface
header_col
:: row_delimit
:: The row delimiter. default => ‘n’ col_delimit
:: The column delimiter. default => ‘t’ index
:: If true, the output rows will be prefixed by an index corresponding to the row. col_width
:: Specifies the width of the columns. Content will be trimmed if it exceeds this width,
and will be justified left if width > 0 and justified right if width < 0.
Selected elements are passed to the block. The content for each table cell will be the return value of the block, or the element contents if no block is given.
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
# File 'lib/scxml/document.rb', line 113 def tableize(={}, &block) = { :target => "", :row_delimit => "\n", :col_delimit => "\t", :row => "*", :col => "*", :header_row => nil, :header_col => "*", :index => false, :col_width => nil }.merge() target = [:target] col_delimit = [:col_delimit] row_delimit = [:row_delimit] index = [:index] col_width = [:col_width] ['header_', ''].each do |prefix| row_xpath = [ "#{prefix}row".to_sym ] col_xpath = [ "#{prefix}col".to_sym ] rows = select(row_xpath) rows.each_index do |i| row = rows[i] cols = row.select(col_xpath) cols = block_given? ? yield(row, cols) : cols.collect {|col| col.content} cols.unshift i if index unless col_width.nil? cols = cols.collect do |c| col_width < 0 ? c.to_s.rjust(-col_width) : c.to_s.ljust(col_width) end end target << cols.join(col_delimit) target << row_delimit end end target end |