Class: Text::Reform

Inherits:
Object
  • Object
show all
Defined in:
lib/text/reform.rb

Overview

Introduction

Text::Reform class is a rewrite from the perl module with the same name by Damian Conway ([email protected]). Much of this documentation has been copied from the original documentation and adapted to the Ruby version.

The interface is subject to change, since it will undergo major Rubyfication.

Synopsis

require 'text/reform'
f = Text::Reform.new

puts f.format(template, data)

Description

The Reform#format method

Reform#format takes a series of format (or “picture”) strings followed by replacement values, interpolates those values into each picture string, and returns the result.

A picture string consists of sequences of the following characters:

<

Left-justified field indicator. A series of two or more sequential <‘s specify a left-justified field to be filled by a subsequent value. A single < is formatted as the literal character ’<‘.

>

Right-justified field indicator. A series of two or more sequential >‘s specify a right-justified field to be filled by a subsequent value. A single < is formatted as the literal character ’<‘.

<<>>

Fully-justified field indicator. Field may be of any width, and brackets need not balance, but there must be at least 2 ‘<’ and 2 ‘>’.

^

Centre-justified field indicator. A series of two or more sequential ^‘s specify a centred field to be filled by a subsequent value. A single ^ is formatted as the literal character ’<‘.

>>.<<<<

A numerically formatted field with the specified number of digits to either side of the decimal place. See _Numerical formatting_ below.

[

Left-justified block field indicator. Just like a < field, except it repeats as required on subsequent lines. See below. A single [ is formatted as the literal character ‘[’.

]

Right-justified block field indicator. Just like a > field, except it repeats as required on subsequent lines. See below. A single ] is formatted as the literal character ‘]’.

[[]]

Fully-justified block field indicator. Just like a <<<>>> field, except it repeats as required on subsequent lines. See below. Field may be of any width, and brackets need not balance, but there must be at least 2 ‘[’ and 2 ‘]’.

|

Centre-justified block field indicator. Just like a ^ field, except it repeats as required on subsequent lines. See below. A single | is formatted as the literal character ‘|’.

]]].[[[[

A numerically formatted block field with the specified number of digits to either side of the decimal place. Just like a >>>.<<<< field, except it repeats as required on subsequent lines. See below.

~

A one-character wide block field.

\

Literal escape of next character (e.g. ++ is formatted as ‘~’, not a one character wide block field).

Any other character

That literal character.

Any substitution value which is nil (either explicitly so, or because it is missing) is replaced by an empty string.

Controlling Reform instance options

There are several ways to influence options set in the Reform instance:

  1. At creation:

      # using a hash
    r1 = Text::Reform.new(:squeeze => true)
    
      # using a block
    r2 = Text::Reform.new do |rf|
      rf.squeeze = true
      rf.fill    = true
    end
    
  2. Using accessors:

    r         = Text::Reform.new
    r.squeeze = true
    r.fill    = true
    

The Perl way of interleaving option changes with picture strings and data is currently NOT supported.

Controlling line filling

#squeeze replaces sequences of spaces or tabs to be replaced with a single space; #fill removes newlines from the input. To minimize all whitespace, you need to specify both options. Hence:

format  = "EG> [[[[[[[[[[[[[[[[[[[[["
data    = "h  e\t l lo\nworld\t\t\t\t\t"
r         = Text::Reform.new
r.squeeze = false # default, implied
r.fill    = false # default, implied
puts r.format(format, data)
  # all whitespace preserved:
  #
  # EG> h  e        l lo
  # EG> world

r.squeeze = true
r.fill    = false # default, implied
puts r.format(format, data)
  # only newlines preserved
  #
  # EG> h e l lo
  # EG> world

r.squeeze = false # default, implied
r.fill    = true
puts r.format(format, data)
  # only spaces/tabs preserved:
  #
  # EG> h  e        l lo world

r.fill    = true
r.squeeze = true
puts r.format(format, data)
  # no whitespace preserved:
  #
  # EG> h e l lo world

Whether or not filling or squeezing is in effect, #format can also be directed to trim any extra whitespace from the end of each line it formats, using the #trim option. If this option is specified with a true value, every line returned by #format will automatically have the substitution .gsub!(/[ t]/, ”)+ applied to it.

r.format("[[[[[[[[[[[", 'short').length # => 11
r.trim = true
r.format("[[[[[[[[[[[", 'short').length # => 6

It is also possible to control the character used to fill lines that are too short, using the #filler option. If this option is specified the value of the #filler flag is used as the fill string, rather than the default “ ”.

For example:

r.filler = '*'
print r.format("Pay bearer: ^^^^^^^^^^^^^^^^^^^^", '$123.4')

prints:

Pay bearer: *******$123.4*******

If the filler string is longer than one character, it is truncated to the appropriate length. So:

r.filler = '-->'
print r.format("Pay bearer: ^^^^^^^^^^^^^^^^^^^^", '$123.4')
print r.format("Pay bearer: ^^^^^^^^^^^^^^^^^^^^", '$13.4')
print r.format("Pay bearer: ^^^^^^^^^^^^^^^^^^^^", '$1.4')

prints:

Pay bearer: -->-->-$123.4-->-->-
Pay bearer: -->-->--$13.4-->-->-
Pay bearer: -->-->--$1.4-->-->--

If the value of the #filler option is a hash, then its :left and :right entries specify separate filler strings for each side of an interpolated value.

Options

The Perl variant supports option switching during processing of the arguments of a single call to #format. This has been removed while porting to Ruby, since I believe that this does not add to clarity of code. So you have to change options explicitly.

Data argument types and handling

The data part of the call to format can be either in String form, the items being newline separated, or in Array form. The array form can contain any kind of type you want, as long as it supports #to_s.

So all of the following examples return the same result:

  # String form
r.format("]]]].[[", "1234\n123")
  # Array form
r.format("]]]].[[", [ 1234, 123 ])
  # Array with another type
r.format("]]]].[[", [ 1234.0, 123.0 ])

Multi-line format specifiers and interleaving

By default, if a format specifier contains two or more lines (i.e. one or more newline characters), the entire format specifier is repeatedly filled as a unit, until all block fields have consumed their corresponding arguments. For example, to build a simple look-up table:

values = (1..12).to_a
squares   = values.map { |el| sprintf "%.6g", el**2         }
roots     = values.map { |el| sprintf "%.6g", Math.sqrt(el) }
logs      = values.map { |el| sprintf "%.6g", Math.log(el)  }
inverses  = values.map { |el| sprintf "%.6g", 1/el          }

puts reform.format(
  "  N      N**2    sqrt(N)      log(N)      1/N",
  "=====================================================",
  "| [[  |  [[[  |  [[[[[[[[[[ | [[[[[[[[[ | [[[[[[[[[ |" +
  "-----------------------------------------------------",
  values, squares, roots, logs, inverses
)

The multiline format specifier:

"| [[  |  [[[  |  [[[[[[[[[[ | [[[[[[[[[ | [[[[[[[[[ |" +
"-----------------------------------------------------"

is treated as a single logical line. So #format alternately fills the first physical line (interpolating one value from each of the arrays) and the second physical line (which puts a line of dashes between each row of the table) producing:

  N      N**2    sqrt(N)      log(N)      1/N
=====================================================
| 1   |  1    |  1          | 0         | 1         |
-----------------------------------------------------
| 2   |  4    |  1.41421    | 0.693147  | 0.5       |
-----------------------------------------------------
| 3   |  9    |  1.73205    | 1.09861   | 0.333333  |
-----------------------------------------------------
| 4   |  16   |  2          | 1.38629   | 0.25      |
-----------------------------------------------------
| 5   |  25   |  2.23607    | 1.60944   | 0.2       |
-----------------------------------------------------
| 6   |  36   |  2.44949    | 1.79176   | 0.166667  |
-----------------------------------------------------
| 7   |  49   |  2.64575    | 1.94591   | 0.142857  |
-----------------------------------------------------
| 8   |  64   |  2.82843    | 2.07944   | 0.125     |
-----------------------------------------------------
| 9   |  81   |  3          | 2.19722   | 0.111111  |
-----------------------------------------------------
| 10  |  100  |  3.16228    | 2.30259   | 0.1       |
-----------------------------------------------------
| 11  |  121  |  3.31662    | 2.3979    | 0.0909091 |
-----------------------------------------------------
| 12  |  144  |  3.4641     | 2.48491   | 0.0833333 |
-----------------------------------------------------

This implies that formats and the variables from which they’re filled need to be interleaved. That is, a multi-line specification like this:

puts r.format(
  "Passed:                      ##
     [[[[[[[[[[[[[[[             # single format specification
  Failed:                        # (needs two sets of data)
     [[[[[[[[[[[[[[[",          ##
  passes, fails)                ##  data for previous format

would print:

Passed:
   <pass 1>
Failed:
   <fail 1>
Passed:
   <pass 2>
Failed:
   <fail 2>
Passed:
   <pass 3>
Failed:
   <fail 3>

because the four-line format specifier is treated as a single unit, to be repeatedly filled until all the data in passes and fails has been consumed.

Unlike the table example, where this unit filling correctly put a line of dashes between lines of data, in this case the alternation of passes and fails is probably /not/ the desired effect.

Judging by the labels, it is far more likely that the user wanted:

Passed:
   <pass 1>
   <pass 2>
   <pass 3>
Failed:
   <fail 4>
   <fail 5>
   <fail 6>

To achieve that, either explicitly interleave the formats and their data sources:

puts r.format(
  "Passed:",               ## single format (no data required)
  "   [[[[[[[[[[[[[[[",    ## single format (needs one set of data)
      passes,              ## data for previous format
  "Failed:",               ## single format (no data required)
  "   [[[[[[[[[[[[[[[",    ## single format (needs one set of data)
      fails)               ## data for previous format

or instruct #format to do it for you automagically, by setting the ‘interleave’ flag true:

r.interleave = true
puts r.format(
  "Passed:                ##
   [[[[[[[[[[[[[[[         # single format
Failed:                    # (needs two sets of data)
   [[[[[[[[[[[[[[[",      ##
                          ## data to be automagically interleaved
   passes, fails)          # as necessary between lines of previous
                          ## format

How #format hyphenates

Any line with a block field repeats on subsequent lines until all block fields on that line have consumed all their data. Non-block fields on these lines are replaced by the appropriate number of spaces.

Words are wrapped whole, unless they will not fit into the field at all, in which case they are broken and (by default) hyphenated. Simple hyphenation is used (i.e. break at the N-1th character and insert a ‘-’), unless a suitable alternative subroutine is specified instead.

Words will not be broken if the break would leave less than 2 characters on the current line. This minimum can be varied by setting the min_break option to a numeric value indicating the minumum total broken characters (including hyphens) required on the current line. Note that, for very narrow fields, words will still be broken (but __unhyphenated__). For example:

puts r.format('~', 'split')

would print:

s
p
l
i
t

whilst:

r.min_break= 1
puts r.format('~', 'split')

would print:

s-
p-
l-
i-
t

Alternative breaking strategies can be specified using the “break” option in a configuration hash. For example:

r.break = MyBreaker.new
r.format(fmt, data)

#format expects a user-defined line-breaking strategy to listen to the method #break that takes three arguments (the string to be broken, the maximum permissible length of the initial section, and the total width of the field being filled). #break must return a list of two strings: the initial (broken) section of the word, and the remainder of the string respectivly).

For example:

class MyBreaker
  def break(str, initial, total)
    [ str[0, initial-1].'~'], str[initial-1..-1] ]
  end
end

r.break = MyBreaker.new

makes ‘~’ the hyphenation character, whilst:

class WrapAndSlop
  def break(str, initial, total)
    if (initial == total)
      str =~ /\A(\s*\S*)(.*)/
      [ $1, $2 ]
    else
      [ '', str ]
    end
  end
end

r.break = WrapAndSlop.new

wraps excessively long words to the next line and “slops” them over the right margin if necessary.

The Text::Reform class provides three functions to simplify the use of variant hyphenation schemes. Text::Reform::break_wrap returns an instance implementing the “wrap-and-slop” algorithm shown in the last example, which could therefore be rewritten:

r.break = Text::Reform.break_wrap

Text::Reform::break_with takes a single string argument and returns an instance of a class which hyphenates by cutting off the text at the right margin and appending the string argument. Hence the first of the two examples could be rewritten:

r.break = Text::Reform.break_with('~')

The method Text::Reform::break_at takes a single string argument and returns a reference to a sub which hyphenates by breaking immediately after that string. For example:

r.break = Text::Reform.break_at('-')
r.format("[[[[[[[[[[[[[[", "The Newton-Raphson methodology")

returns:

"The Newton-
 Raphson
 methodology"

Note that this differs from the behaviour of Text::Reform::break_with, which would be:

r.break = Text::Reform.break_width('-')
r.format("[[[[[[[[[[[[[[", "The Newton-Raphson methodology")

returns:

"The Newton-R-
 aphson metho-
 dology"

Choosing the correct breaking strategy depends on your kind of data.

The method Text::Reform::break_hyphen returns an instance of a class which hyphenates using a Ruby hyphenator. The hyphenator must be provided to the method. At the time of release, there are two implementations of hyphenators available: TeX::Hyphen by Martin DeMello and Austin Ziegler (a Ruby port of Jan Pazdziora’s TeX::Hyphen module); and Text::Hyphen by Austin Ziegler (a significant recoding of TeX::Hyphen to better support non-English languages).

For example:

r.break = Text::Reform.break_hyphen

Note that in the previous example the calls to .break_at, .break_wrap and .break_hyphen produce instances of the corresponding strategy class.

The algorithm #format uses is:

  1. If interleaving is specified, split the first string in the argument list into individual format lines and add a terminating newline (unless one is already present). therwise, treat the entire string as a single “line” (like /s does in regexes)

  2. For each format line…

    1. determine the number of fields and shift that many values off the argument list and into the filling list. If insufficient arguments are available, generate as many empty strings as are required.

    2. generate a text line by filling each field in the format line with the initial contents of the corresponding arg in the filling list (and remove those initial contents from the arg).

    3. replace any <,>, or ^ fields by an equivalent number of spaces. Splice out the corresponding args from the filling list.

    4. Repeat from step 2.2 until all args in the filling list are empty.

  3. concatenate the text lines generated in step 2

Note that in difference to the Perl version of Text::Reform, this version does not currently loop over several format strings in one function call.

Reform#format examples

As an example of the use of #format, the following:

count = 1
text = "A big long piece of text to be formatted exquisitely"
output = ''
output << r.format("       ||||  <<<<<<<<<<   ", count, text)
output << r.format("       ----------------   ",
                    "       ^^^^  ]]]]]]]]]]|  ", count+11, text)

results in output:

1    A big lon-
----------------
12      g piece|
        of text|
     to be for-|
     matted ex-|
      quisitely|

Note that block fields in a multi-line format string, cause the entire multi-line format to be repeated as often as necessary.

Unlike traditional Perl #format arguments, picture strings and arguments cannot be interleaved in Ruby version. This is partly by intention to see if the feature is a feature or if it can be disposed with. Another example:

report = ''
report << r.format(
            'Name           Rank    Serial Number',
            '====           ====    =============',
            '<<<<<<<<<<<<<  ^^^^    <<<<<<<<<<<<<',
            name,           rank,   serial_number
         )

results in:

Name           Rank    Serial Number
====           ====    =============
John Doe       high    314159

Numerical formatting

The “>>>.<<<” and “]]].[[[” field specifiers may be used to format numeric values about a fixed decimal place marker. For example:

puts r.format('(]]]]].[[)', %w{
             1
             1.0
             1.001
             1.009
             123.456
             1234567
             one two
})

would print:

(   1.0)
(   1.0)
(   1.00)
(   1.01)
( 123.46)
(#####.##)
(?????.??)
(?????.??)

Fractions are rounded to the specified number of places after the decimal, but only significant digits are shown. That’s why, in the above example, 1 and 1.0 are formatted as “1.0”, whilst 1.001 is formatted as “1.00”.

You can specify that the maximal number of decimal places always be used by giving the configuration option ‘numeric’ the value NUMBERS_ALL_PLACES. For example:

r.numeric = Text::Reform::NUMBERS_ALL_PLACES
puts r.format('(]]]]].[[)', <<EONUMS)
  1
  1.0
EONUMS

would print:

(   1.00)
(   1.00)

Note that although decimal digits are rounded to fit the specified width, the integral part of a number is never modified. If there are not enough places before the decimal place to represent the number, the entire number is replaced with hashes.

If a non-numeric sequence is passed as data for a numeric field, it is formatted as a series of question marks. This querulous behaviour can be changed by giving the configuration option ‘numeric’ a value that matches /bSkipNaNb/i in which case, any invalid numeric data is simply ignored. For example:

r.numeric = Text::Reform::NUMBERS_SKIP_NAN
puts r.format('(]]]]].[[)', %w{
             1
             two three
             4
})

would print:

(   1.0)
(   4.0)

Filling block fields with lists of values

If an argument contains an array, then #format automatically joins the elements of the array into a single string, separating each element with a newline character. As a result, a call like this:

 svalues = %w{ 1 10 100 1000 }
 nvalues = [1, 10, 100, 1000]
 puts r.format(
   "(]]]].[[)",
   svalues                         # you could also use nvalues here.
)

will print out

(  1.00)
( 10.00)
(100.00)
(1000.00)

as might be expected.

Note: While String arguments are consumed during formatting process and will be empty at the end of formatting, array arguments are not. So svalues (nvalues) still contains [1,10,100,1000] after the call to #format.

Headers, footers, and pages

The #format method can also insert headers, footers, and page-feeds as it formats. These features are controlled by the “header”, “footer”, “page_feed”, “page_len”, and “page_num” options.

If the page_num option is set to an Integer value, page numbering will start at that value.

The page_len option specifies the total number of lines in a page (including headers, footers, and page-feeds).

The page_width option specifies the total number of columns in a page.

If the header option is specified with a string value, that string is used as the header of every page generated. If it is specified as a block, that block is called at the start of every page and its return value used as the header string. When called, the block is passed the current page number.

Likewise, if the footer option is specified with a string value, that string is used as the footer of every page generated. If it is specified as a block, that block is called at the start of every page and its return value used as the footer string. When called, the footer block is passed the current page number.

Both the header and footer options can also be specified as hash references. In this case the hash entries for keys left, centre (or center), and right specify what is to appear on the left, centre, and right of the header/footer. The entry for the key width specifies how wide the footer is to be. If the width key is omitted, the page_width configuration option (which defaults to 72 characters) is used.

The :left, :centre, and :right values may be literal strings, or blocks (just as a normal header/footer specification may be.) See the second example, below.

Another alternative for header and footer options is to specify them as a block that returns a hash reference. The subroutine is called for each page, then the resulting hash is treated like the hashes described in the preceding paragraph. See the third example, below.

The page_feed option acts in exactly the same way, to produce a page_feed which is appended after the footer. But note that the page_feed is not counted as part of the page length.

All three of these page components are recomputed at the *start of each new page*, before the page contents are formatted (recomputing the header and footer first makes it possible to determine how many lines of data to format so as to adhere to the specified page length).

When the call to #format is complete and the data has been fully formatted, the footer subroutine is called one last time, with an extra argument of true. The string returned by this final call is used as the final footer.

So for example, a 60-line per page report, starting at page 7, with appropriate headers and footers might be set up like so:

small = Text::Reform.new
r.header = lambda do |page| "Page #{page}\n\n" end
r.footer = lambda do |page, last|
  if last
    ''
  else
    ('-'*50 + "\n" + small.format('>'*50, "...#{page+1}"))
  end
end
r.page_feed = "\n\n"
r.page_len = 60
r.page_num = 7

r.format(template, data)

Note that you can’t reuse the r instance of Text::Reform inside the footer, it will end up calling itself recursivly until stack exhaustion.

Alternatively, to set up headers and footers such that the running head is right justified in the header and the page number is centred in the footer:

r.header = { :right => 'Running head' }
r.footer = { :centre => lambda do |page| "page #{page}" end }
r.page_len = 60

r.format(template, data)

The footer in the previous example could also have been specified the other way around, as a block that returns a hash (rather than a hash containing a block):

r.header = { :right => 'Running head' }
r.footer = lambda do |page| { :center => "page #{page}" } end

AUTHOR

Original Perl library and documentation: Damian Conway (damian at conway dot org)

Translating everything to Ruby (and leaving a lot of stuff out): Kaspar Schiess (eule at space dot ch)

BUGS

There are undoubtedly serious bugs lurking somewhere in code this funky :-) Bug reports and other feedback are most welcome.

COPYRIGHT

Copyright © 2005, Kaspar Schiess. All Rights Reserved. This module is free software. It may be used, redistributed and/or modified under the terms of the Ruby License (see www.ruby-lang.org/en/LICENSE.txt)

Defined Under Namespace

Classes: BreakAt, BreakHyphenator, BreakWith, BreakWrap

Constant Summary collapse

VERSION =
"0.3.0"
BSPECIALS =

various regexp parts for matching patterns.

%w{ [ | ] }
LSPECIALS =
%w{ < ^ > }
LJUSTIFIED =
"[<]{2,} [>]{2,}"
BJUSTIFIED =
"[\\[]{2,} [\\]]{2,}"
BSINGLE =
"~+"
SPECIALS =
[BSPECIALS, LSPECIALS].flatten.map { |spec| Regexp.escape(spec)+"{2,}" }
FIXED_FIELDPAT =
[LJUSTIFIED, BJUSTIFIED, BSINGLE, SPECIALS ].flatten.join('|')
DECIMAL =

TODO: Make this locale dependent

'.'
LNUMERICAL =

Matches one or more > followed by . followed by one or more <

"[>]+ (?:#{Regexp.escape(DECIMAL)}[<]{1,})"
BNUMERICAL =

Matches one or more ] followed by . followed by one or more [

"[\\]]+ (?: #{Regexp.escape(DECIMAL)} [\\[]{1,})"
FIELDPAT =
[LNUMERICAL, BNUMERICAL, FIXED_FIELDPAT].join('|')
LFIELDMARK =
[LNUMERICAL, LJUSTIFIED, LSPECIALS.map { |l| Regexp.escape(l) + "{2}" } ].flatten.join('|')
BFIELDMARK =
[BNUMERICAL, BJUSTIFIED, BSINGLE, BSPECIALS.map { |l| Regexp.escape(l) + "{2}" } ].flatten.join('|')
FIELDMARK =
[LNUMERICAL, BNUMERICAL, BSINGLE, LJUSTIFIED, BJUSTIFIED, LFIELDMARK, BFIELDMARK].flatten.join('|')
CLEAR_BLOCK =

For use with #header, #footer, and #page_feed; this will clear the header, footer, or page feed block result to be an empty block.

lambda { |*args| "" }
NUMBERS_NORMAL =

Numbers are printed, leaving off unnecessary decimal places. Non- numeric data is printed as a series of question marks. This is the default for formatting numbers.

0
NUMBERS_ALL_PLACES =

Numbers are printed, retaining all decimal places. Non-numeric data is printed as a series of question marks.

[[[[[.]]       # format
1.0 ->     1.00
1   ->     1.00
1
NUMBERS_SKIP_NAN =

Numbers are printed as ffor NUMBERS_NORMAL, but NaN (“not a number”) values are skipped.

2
NUMBERS_ALL_AND_SKIP =

Numbers are printed as for NUMBERS_ALL_PLACES, but NaN values are skipped.

NUMBERS_ALL_PLACES | NUMBERS_SKIP_NAN

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) {|_self| ... } ⇒ Reform

Create a Text::Reform object. Accepts an optional hash of construction option (this will change to named parameters in Ruby 2.0). After the initial object is constructed (with either the provided or default values), the object will be yielded (as self) to an optional block for further construction and operation.

Yields:

  • (_self)

Yield Parameters:

  • _self (Text::Reform)

    the object that the method was called on



918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
# File 'lib/text/reform.rb', line 918

def initialize(options = {}) #:yields self:
  @debug      = options[:debug]       || false
  @header     = options[:header]      || CLEAR_BLOCK
  @footer     = options[:footer]      || CLEAR_BLOCK
  @page_feed  = options[:page_feed]   || CLEAR_BLOCK
  @page_len   = options[:page_len]    || nil
  @page_num   = options[:page_num]    || nil
  @page_width = options[:page_width]  || 72
  @break      = options[:break]       || Text::Reform.break_with('-')
  @min_break  = options[:min_break]   || 2
  @squeeze    = options[:squeeze]     || false
  @fill       = options[:fill]        || false
  @filler     = options[:filler]      || { :left => ' ', :right => ' ' }
  @interleave = options[:interleave]  || false
  @numeric    = options[:numeric]     || 0
  @trim       = options[:trim]        || false

  yield self if block_given?
end

Instance Attribute Details

#breakObject

Break class instance that is used to break words in hyphenation. This class must have a #break method accepting the three arguments str, initial_max_length and maxLength.

You can directly call the break_* methods to produce such a class instance for you; Available methods are #break_width, #break_at, #break_wrap, #break_hyphenator.

Default

Text::Hyphen::break_with(‘-’)



806
807
808
# File 'lib/text/reform.rb', line 806

def break
  @break
end

#fillObject

If true, causes newlines to be removed from the input. If you want to squeeze all whitespace, set #fill and #squeeze to true.

Default

false



825
826
827
# File 'lib/text/reform.rb', line 825

def fill
  @fill
end

#fillerObject

Controls character that is used to fill lines that are too short. If this attribute has a hash value, the symbols :left and :right store the filler character to use on the left and the right, respectivly.

Default

‘ ’ on both sides



833
834
835
# File 'lib/text/reform.rb', line 833

def filler
  @filler
end

Proc returning the page footer. This gets called before the page gets formatted to permit calculation of page length.

Default

CLEAR_BLOCK



773
774
775
# File 'lib/text/reform.rb', line 773

def footer
  @footer
end

#headerObject

Proc returning page header. This is called before the page actually gets formatted to permit calculation of page length.

Default

CLEAR_BLOCK



767
768
769
# File 'lib/text/reform.rb', line 767

def header
  @header
end

#interleaveObject

This implies that formats and the variables from which they’re filled need to be interleaved. That is, a multi-line specification like this:

print format(
"Passed:              ##
   [[[[[[[[[[[[[[[     # single format specification
Failed:                # (needs two sets of data)
   [[[[[[[[[[[[[[[",  ##

 fails, passes)       ##  two arrays, data for previous format

would print:

Passed:
    <pass 1>
Failed:
   <fail 1>
Passed:
   <pass 2>
Failed:
   <fail 2>
Passed:
   <pass 3>
Failed:
   <fail 3>

because the four-line format specifier is treated as a single unit, to be repeatedly filled until all the data in passes and fails has been consumed.

Default

false



878
879
880
# File 'lib/text/reform.rb', line 878

def interleave
  @interleave
end

#min_breakObject

Specifies the minimal number of characters that must be left on a line. This prevents breaking of words below its value.

Default

2



812
813
814
# File 'lib/text/reform.rb', line 812

def min_break
  @min_break
end

#numericObject

Specifies handling method for numerical data. Allowed values include:

  • NUMBERS_NORMAL

  • NUMBERS_ALL_PLACES

  • NUMBERS_SKIP_NAN

  • NUMBERS_ALL_AND_SKIP

Default

NUMBERS_NORMAL



905
906
907
# File 'lib/text/reform.rb', line 905

def numeric
  @numeric
end

#page_feedObject

Proc to be called for page feed text. This is also called at the start of each page, but does not count towards page length.

Default

CLEAR_BLOCK



779
780
781
# File 'lib/text/reform.rb', line 779

def page_feed
  @page_feed
end

#page_lenObject

Specifies the total number of lines in a page (including headers, footers, and page-feeds).

Default

nil



785
786
787
# File 'lib/text/reform.rb', line 785

def page_len
  @page_len
end

#page_numObject

Where to start page numbering.

Default

nil



790
791
792
# File 'lib/text/reform.rb', line 790

def page_num
  @page_num
end

#page_widthObject

Specifies the total number of columns in a page.

Default

72



795
796
797
# File 'lib/text/reform.rb', line 795

def page_width
  @page_width
end

#squeezeObject

If true, causes any sequence of spaces and/or tabs (but not newlines) in an interpolated string to be replaced with a single space.

Default

false



819
820
821
# File 'lib/text/reform.rb', line 819

def squeeze
  @squeeze
end

#trimObject

Controls trimming of whitespace at end of lines.

Default

true



910
911
912
# File 'lib/text/reform.rb', line 910

def trim
  @trim
end

Class Method Details

.break_at(bat) ⇒ Object

Takes a bat string as argument, breaks by looking for that substring and breaking just after it.



1296
1297
1298
# File 'lib/text/reform.rb', line 1296

def break_at(bat)
  BreakAt.new(bat)
end

.break_hyphenator(hyphenator) ⇒ Object

Hyphenates with a class that implements the API of TeX::Hyphen or Text::Hyphen.



1307
1308
1309
# File 'lib/text/reform.rb', line 1307

def break_hyphenator(hyphenator)
  BreakHyphenator.new(hyphenator)
end

.break_with(hyphen) ⇒ Object

Takes a hyphen string as argument, breaks by inserting that hyphen into the word to be hyphenated.



1290
1291
1292
# File 'lib/text/reform.rb', line 1290

def break_with(hyphen)
  BreakWith.new(hyphen)
end

.break_wrapObject

Breaks by using a ‘wrap and slop’ algorithm.



1301
1302
1303
# File 'lib/text/reform.rb', line 1301

def break_wrap
  BreakWrap.new
end

Instance Method Details

#__construct_type(str, justifiedPattern) ⇒ Object

Construct a type that can be passed to #replace from last a string.



1408
1409
1410
1411
1412
1413
1414
# File 'lib/text/reform.rb', line 1408

def __construct_type(str, justifiedPattern)
  if str =~ /#{justifiedPattern}/x
    'J'
  else
    str
  end
end

#count_lines(*args) ⇒ Object

Count occurrences of n (lines) of all strings that are passed as parameter.



1401
1402
1403
1404
1405
# File 'lib/text/reform.rb', line 1401

def count_lines(*args)
  args.inject(0) do |sum, el|
    sum + el.count("\n")
  end
end

#debugObject

Turn on internal debugging output for the duration of the block.



1280
1281
1282
1283
1284
1285
# File 'lib/text/reform.rb', line 1280

def debug
  d = @debug
  @debug = true
  yield
  @debug = d
end

#format(*args) ⇒ Object

Format data according to format.



939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
# File 'lib/text/reform.rb', line 939

def format(*args)
  @page_num ||= 1

  __debug("Acquiring header and footer: ", @page_num)
  header = __header(@page_num)
  footer = __footer(@page_num, false)

  previous_footer = footer

  line_count  = count_lines(header, footer)
  hf_count    = line_count

  text          = header
  format_stack  = []

  while (args and not args.empty?) or (not format_stack.empty?)
    __debug("Arguments: ", args)
    __debug("Formats left: ", format_stack)

    if format_stack.empty?
      if @interleave
        # split format in its parts and recombine line by line
        format_stack += args.shift.split(%r{\n}o).collect { |fmtline| fmtline << "\n" }
      else
        format_stack << args.shift
      end
    end

    format = format_stack.shift

    parts = format.split(%r{
      (              # Capture
       \n          | # newline... OR
       (?:\\.)+    | # one or more escapes... OR
       #{FIELDPAT} | # patterns
      )}ox)
    parts << "\n" unless parts[-1] == "\n"
    __debug("Parts: ", parts)

    # Count all fields (inject 0, increment when field) and prepare
    # data.
    field_count = parts.inject(0) do |count, el|
      if (el =~ /#{LFIELDMARK}/ox or el =~ /#{FIELDMARK}/ox)
        count + 1
      else
        count
      end
    end

    if field_count.nonzero?
      data = args.first(field_count).collect do |el|
        if el.kind_of?(Array)
          el.join("\n")
        else
          el.to_s
        end
      end
      # shift all arguments that we have just consumed
      args = args[field_count..-1]
      # Is argument count correct?
      data += [''] * (field_count - data.length) unless data.length == field_count
    else
      data = [[]] # one line of data, contains nothing
    end

    first_line = true
    data_left = true
    while data_left
      idx = 0
      data_left = false

      parts.each do |part|
        # Is part an escaped format literal ?
        if part =~ /\A (?:\\.)+/ox
          __debug("esc literal: ", part)
          text << part.gsub(/\\(.)/, "\1")
          # Is part a once field mark ?
        elsif part =~ /(#{LFIELDMARK})/ox
          if first_line
            type = __construct_type($1, LJUSTIFIED)

            __debug("once field: ", part)
            __debug("data is: ", data[idx])
            text << replace(type, part.length, data[idx])
            __debug("data now: ", data[idx])
          else
            text << (@filler[:left] * part.length)[0, part.length]
            __debug("missing once field: ", part)
          end
        idx += 1
        # Is part a multi field mark ?
        elsif part =~ /(#{FIELDMARK})/ox and part[0, 2] != '~~'
          type = __construct_type($1, BJUSTIFIED)

          __debug("multi field: ", part)
          __debug("data is: ", data[idx])
          text << replace(type, part.length, data[idx])
          __debug("text is: ", text)
          __debug("data now: ", data[idx])
          data_left = true if data[idx].strip.length > 0
          idx += 1
          # Part is a literal.
        else
          __debug("literal: ", part)
          text << part.gsub(/\0(\0*)/, '\1')  # XXX: What is this gsub for ?

          # New line ?
          if part == "\n"
            line_count += 1
            if @page_len && line_count >= @page_len
              __debug("\tejecting page: #@page_num")

              @page_num += 1
              page_feed = __pagefeed
              header = __header(@page_num)

              text << footer + page_feed + header
              previous_footer = footer

              footer = __footer(@page_num, false)

              line_count = hf_count = (header.count("\n") + footer.count("\n"))

              header = page_feed + header
            end
          end
        end  # multiway if on part
      end # parts.each

      __debug("Accumulated: ", text)

      first_line = false
    end
  end  # while args or formats left

  # Adjust final page header or footer as required
  if hf_count > 0 and line_count == hf_count
    # there is a header that we don't need
    text.sub!(/#{Regexp.escape(header)}\Z/, '')
  elsif line_count > 0 and @page_len and @page_len > 0
    # missing footer:
    text << "\n" * (@page_len - line_count) + footer
    previous_footer = footer
  end

  # Replace last footer
  if previous_footer and not previous_footer.empty?
    lastFooter = __footer(@page_num, true)
    footerDiff = lastFooter.count("\n") - previous_footer.count("\n")

    # Enough space to squeeze the longer final footer in ?
    if footerDiff > 0 && text =~ /(#{'^[^\S\n]*\n' * footerDiff}#{Regexp.escape(previous_footer)})\Z/
      previous_footer = $1
      footerDiff = 0
    end

    # If not, create an empty page for it.
    if footerDiff > 0
      @page_num += 1
      lastHeader = __header(@page_num)
      lastFooter = __footer(@page_num, true)

      text << lastHeader
      text << "\n" * (@page_len - lastHeader.count("\n") - lastFooter.count("\n"))
      text << lastFooter
    else
      lastFooter = "\n" * (-footerDiff) + lastFooter
      text[-(previous_footer.length), text.length] = lastFooter
    end
  end

  # Trim text
  text.gsub!(/[ ]+$/m, '') if @trim
  text
end

#quote(str) ⇒ Object

Quotes any characters that might be interpreted in str to be normal characters.



1273
1274
1275
1276
# File 'lib/text/reform.rb', line 1273

def quote(str)
  puts 'Text::Reform warning: not quoting string...' if @debug
  str
end

#replace(format, length, value) ⇒ Object

Replaces a placeholder with the text given. The format string gives the type of the replace match: When exactly two chars, this indicates a text replace field, when longer, this is a numeric field.



1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
# File 'lib/text/reform.rb', line 1118

def replace(format, length, value)
  text      = ''
  remaining = length
  filled    = 0

  __debug("value is: ", value)

  if @fill
    value.sub!(/\A\s*/m, '')
  else
    value.sub!(/\A[ \t]*/, '')
  end

  if value and format.length > 2
    # find length of numerical fields
    if format =~ /([\]>]+)#{Regexp.escape(DECIMAL)}([\[<]+)/
      ilen, dlen = $1.length, $2.length
    end

    # Try to extract a numeric value from +value+
    done = false
    while not done
      num, extra = scanf_remains(value, "%f")
      __debug "Number split into: ", [num, extra]
      done = true

      if extra.length == value.length
        value.sub!(/\s*\S*/, '')  # skip offending non number value
        if (@numeric & NUMBERS_SKIP_NAN) > 0 && value =~ /\S/
          __debug("Not a Number, retrying ", value)
          done = false
        else
          text = '?' * ilen + DECIMAL + '?' * dlen
          return text
        end
      end
    end

    num = num.first if num.kind_of?(Array)
    __debug("Finally number is: ", num)

    formatted = "%#{format.length}.#{dlen}f" % num

    if formatted.length > format.length
      text = '#' * ilen + DECIMAL + '#' * dlen
    else
      text = formatted
    end

    __debug("Formatted number is: ", text)

    # Only output significant digits. Unless not all places were
    # explicitly requested or the number has more digits than we just
    # output replace trailing zeros with spaces.
    unless (@numeric & NUMBERS_ALL_PLACES > 0) or num.to_s =~ /#{Regexp.escape(DECIMAL)}\d\d{#{dlen},}$/
      text.sub!(/(#{Regexp.escape(DECIMAL)}\d+?)(0+)$/) do |mv|
      $1 + ' ' * $2.length
      end
    end

    value.replace(extra)
    remaining = 0
  else
    while !((value =~ /\S/o).nil?)
      # Only whitespace remaining ?
      if ! @fill && value.sub!(/\A[ \t]*\n/, '')
        filled = 2
        break
      end
      break unless value =~ /\A(\s*)(\S+)(.*)\z/om;

      ws, word, extra = $1, $2, $3

      # Replace all newlines by spaces when fill was specified.
      nonnl = (ws =~ /[^\n]/o)
      if @fill
        ws.gsub!(/\n/) do |match|
          nonnl ? '' : ' '
        end
      end

      # Replace all whitespace by one space if squeeze was specified.
      lead = @squeeze ? (ws.length > 0 ? ' ' : '') : ws
      match = lead + word

      __debug("Extracted: ", match)
      break if text and match =~ /\n/o

      if match.length <= remaining
        __debug("Accepted: ", match)
        text << match
        remaining -= match.length
        value.replace(extra)
      else
        __debug("Need to break: ", match)
        if (remaining - lead.length) >= @min_break
          __debug("Trying to break: ", match)
          broken, left = @break.break(match, remaining, length)
          text << broken
          __debug("Broke as: ", [broken, left])
          value.replace left + extra

          # Adjust remaining chars, but allow for underflow.
          t = remaining-broken.length
          if t < 0
            remaining = 0
          else
            remaining = t
          end
        end
        break
      end

      filled = 1
    end
  end

  if filled.zero? and remaining > 0 and value =~ /\S/ and text.empty?
    value.sub!(/^\s*(.{1,#{remaining}})/, '')
      text = $1
    remaining -= text.length
  end

  # Justify format?
  if text =~ / /o and format == 'J' and value =~ /\S/o and filled != 2
    # Fully justified
    text.reverse!
    text.gsub!(/( +)/o) do |mv|
      remaining -= 1
      if remaining > 0
        " #{$1}"
      else
        $1
      end
    end while remaining > 0
    text.reverse!
  elsif format =~ /\>|\]/o
    # Right justified
    text[0, 0] = (@filler[:left] * remaining)[0, remaining] if remaining > 0
  elsif format =~ /\^|\|/o
    # Center justified
    half_remaining = remaining / 2
    text[0, 0] = (@filler[:left] * half_remaining)[0, half_remaining]
    half_remaining = remaining - half_remaining
    text << (@filler[:right] * half_remaining)[0, half_remaining]
  else
    # Left justified
    text << (@filler[:right] * remaining)[0, remaining]
  end

  text
end

#scanf_remains(value, fstr, &block) ⇒ Object

Using Scanf module, scanf a string and return what has not been matched in addition to normal scanf return.



1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
# File 'lib/text/reform.rb', line 1388

def scanf_remains(value, fstr, &block)
  if block.nil?
    unless fstr.kind_of?(Scanf::FormatString)
      fstr = Scanf::FormatString.new(fstr)
    end
    [ fstr.match(value), fstr.string_left ]
  else
    value.block_scanf(fstr, &block)
  end
end

#unchomp(str) ⇒ Object

Adds a n character to the end of the line unless it already has a n at the end of the line. Returns a modified copy of str.



1418
1419
1420
# File 'lib/text/reform.rb', line 1418

def unchomp(str)
  unchomp!(str.dup)
end

#unchomp!(str) ⇒ Object

Adds a n character to the end of the line unless it already has a n at the end of the line.



1424
1425
1426
1427
1428
1429
1430
# File 'lib/text/reform.rb', line 1424

def unchomp!(str)
  if str.empty? or str[-1] == ?\n
    str
  else
    str << "\n"
  end
end