Structured Text Utilities

This module provides utilities for working with structured text.

This includes comment handling and delimited field (aka CSV) parsing with support for quoted strings.

Commented Text

The StructuredText::CommentedReader class removes comments from text.

Comments start with a specified comment delimiter (the default is “#”) and continue to the end of the line. For example, the following text:

line 1 # comment 1 # comment 2 line 3 # comment 3

becomes:

line 1 line 3

The default comment delimiter is “#”. A different delimiter may be specified when the object is created. Blank lines may either be returned or ignored.

Delimited Text

The StructuredText::DelimitedReader class parses field-delimited text yielding records.

In field-delimited text, each line is a record that consists of a series of fields delimited by a specified character. When that character is a comma these are called comma-separated-value (CSV) files.

An array of fields is yielded for each line of field-delimited text. For example, the following text:

apples, red, round bananas, yellow, oblong

is parsed into these arrays:

[‘apples’, ‘red’, ‘round’] [‘bananas’, ‘yellow’, ‘oblong’]

By default leading and trailing whitespace is removed from each field, though this option may be overridden.

The field text may contain quoted strings. Delimiter characters inside quotes are not treated as field delimiters. So:

apples,“red,green”,round bananas,yellow,oblong

becomes:

[‘apples’, ‘“red,green”’, ‘round’] [‘bananas’, ‘yellow’, ‘oblong’]

Note here that the second field of the first line contains the text “red,green”.

The caller may specify custom field delimiter and right- and left-hand quote characters.

The StructuredText::LabeledDelimitedReader class extends this functionality by treating the first line of the text as a header row that contains field names. A hash with the field values assigned to their corresponding header names is yielded for each line of input. For example, the following text:

Fruit,Color,Shape apples,red,round bananas,yellow,oblong

is parsed into these arrays:

“Fruit”=>“apples”, “Color”=>“red” “Fruit”=>“bananas”, “Color”=>“yellow”

History

1.0.0

Comment handling and field-delimited text

1.0.1

Source code refinement; no functionality change

1.1.0

Strip whitespace from field edges

Copyright

Copyright 2009, William Patrick McNeill

This program is distributed under the GNU General Public License.

Author

W.P. McNeill [email protected]