Class: FatTable::Table

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/fat_table/table.rb

Overview

A container for a two-dimensional table. All cells in the table must be a String, a DateTime (or Date), a Numeric (Bignum, Integer, or BigDecimal), or a Boolean (TrueClass or FalseClass). All columns must be of one of those types or be a string convertible into one of them. It is considered an error if a single column contains cells of different types. Any cell that cannot be parsed as one of the Numeric, DateTime, or Boolean types will be treated as a String and have #to_s applied. Until the column type is determined, it will have the type NilClass.

You can initialize a Table in several ways:

  1. ::new, which will return an empty table to which rows or columns can be added later,

  2. ::from_csv_file('table.csv'), where the argument is the name of a .csv file, in which case, the headers will be taken from the first row of the data.

  3. ::from_org_file('table.org'), where the argument is the name of an .org file and the first Emacs org mode table found in the file will be read. The headers will be taken from the first row of the table if it is followed by an hrule, otherwise the headers will be synthesized as +:col_1+, +:col_2+, etc.

  4. ::from_csv_string('csv_string'), where +csv_string+ is a string in the same form as a .csv file, and it will be parsed in the same way.

  5. ::from_org_string('org_string'), where +org_string+ is a string in the same form as an Emacs org mode table, and it will be parsed in the same way.

  6. ::from_aoa(+aoa+), where +aoa+ is an Array of elements that are either Arrays or nil. The headers will be taken from the first Array if it is followed by a nil, otherwise the headers will be synthesized as +:col_1+, +:col_2+, etc. Each inner Array will be read as a row of the table and each nil, after the first will be take as a group boundary.

  7. ::from_aoh(+aoh+), where +aoh+ is an Array of elements each of which is either (1) a Hash (or any object that responds to #to_h) or (2) a nil. All Hashes must have the same keys, which become the headers for the table. Each nil will be taken as marking a group boundary.

  8. ::from_table(+table+), where +table+ is another FatTable::Table object.

In the resulting Table, the headers are converted into symbols, with all spaces converted to underscore and everything down-cased. So, the heading, 'Two Words' becomes the header +:two_words+.

Constant Summary collapse

JOIN_TYPES =

An Array of symbols for the valid join types.

%i[inner left right full cross].freeze

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(*heads, **types) ⇒ Table

Constructors

:category: Constructors

Return an empty FatTable::Table object. Specifying headers is optional. By default, all columns start our as having an "open" type and get assigned a type based on their contents. For example, if a column contains items that can be interpreted as dates, the column gets assigned a DateTime type. Other types are Numeric, Boolean, and String. Once a type is assigned to a column, any non-conforming vaules in that column raise an IncompatibleType error. If a column is marked "tolerant", however, the incompatible item is converted to a string and allowed to remain in the column without raising an error. They count as nils when calculations are performed on the column and paricipate only in string formatting directives on output.

Rather than have a column's type determined by content, you can also specify a column type by providing a type hash, where the key is the header's name and the value is the desired type. In that case, any incompatible type raises an an IncompatibleTypeError unless the column is also marked tolerant, in which case it gets converted to a string as discussed above. If the type name in the types hash ends in a '~', it is treated as a specifying the given type but marking it as tolerant as well. The values in the type hash can be any string or sybol that starts with 'num', 'dat', 'bool', or 'str' to specify Numeric, DateTime, Boolean, or String types respectively.



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# File 'lib/fat_table/table.rb', line 98

def initialize(*heads, **types)
  @heads = heads.flatten.map(&:as_sym)
  @types = types
  @columns = []
  @tolerant_cols = []
  # Check for the special 'omni' key
  @omni_type = 'NilClass'
  @omni_tol = false
  if types.keys.map(&:to_s).include?('omni')
    # All columns not otherwise included in types should have the type and
    # tolerance of omni.
    omni_val = (types['omni'] || types[:omni])
    @omni_type, @omni_tol = Table.typ_tol(omni_val)
    # Remove omni from types.
    types.delete(:omni)
    types.delete('omni')
  end
  # heads += types.keys
  (heads.flatten + types.keys).uniq.each do |h|
    if types[h]
      typ, tol = Table.typ_tol(types[h])
    else
      typ = @omni_type
      tol = @omni_tol
    end
    @tolerant_cols << h.to_s.as_sym if tol
    @columns << Column.new(header: h.to_s.sub(/~\s*\z/, ''), type: typ,
                           tolerant: tol)
  end
  @explicit_boundaries = []
end

Instance Attribute Details

#columnsObject (readonly)

An Array of FatTable::Columns that constitute the table.



55
56
57
# File 'lib/fat_table/table.rb', line 55

def columns
  @columns
end

#explicit_boundariesObject

Record boundaries set explicitly with mark_boundaries or from reading hlines from input. When we want to access boundaries, however, we want to add an implict boundary at the last row of the table. Since, as the table grows, the implict boundary changes index, we synthesize the boundaries by dynamically adding the final boundary with the #boundaries method call.



68
69
70
# File 'lib/fat_table/table.rb', line 68

def explicit_boundaries
  @explicit_boundaries
end

#headsObject (readonly)

Returns the value of attribute heads.



56
57
58
# File 'lib/fat_table/table.rb', line 56

def heads
  @heads
end

#omni_tolObject (readonly)

Returns the value of attribute omni_tol.



60
61
62
# File 'lib/fat_table/table.rb', line 60

def omni_tol
  @omni_tol
end

#omni_typObject (readonly)

Returns the value of attribute omni_typ.



60
61
62
# File 'lib/fat_table/table.rb', line 60

def omni_typ
  @omni_typ
end

#tolerant_colsObject

Headers of columns that are to be tolerant when they are built.



59
60
61
# File 'lib/fat_table/table.rb', line 59

def tolerant_cols
  @tolerant_cols
end

Class Method Details

.from_aoa(aoa, hlines: false, **types) ⇒ Object

Construct a new table from an Array of Arrays +aoa+. By default, with +hlines+ set to false, do not look for separators, i.e. +nils+, just treat the first row as headers. With +hlines+ set true, expect +nil+ separators to mark the header row and any boundaries. If the second element of the array is a +nil+, interpret the first element of the array as a row of headers. Otherwise, synthesize headers of the form +:col_1+, +:col_2+, ... and so forth. The remaining elements are taken as the body of the table, except that if an element of the outer array is a +nil+, mark the preceding row as a group boundary. Note for Emacs users: In org mode code blocks when an org-mode table is passed in as a variable it is passed in as an Array of Arrays. By default (+ HEADER: :hlines no +) org-mode strips all hrules from the table; otherwise (+ HEADER: :hlines yes +) they are indicated with nil elements in the outer array.



213
214
215
# File 'lib/fat_table/table.rb', line 213

def self.from_aoa(aoa, hlines: false, **types)
  from_array_of_arrays(aoa, hlines: hlines, **types)
end

.from_aoh(aoh, hlines: false, **types) ⇒ Object

Construct a Table from +aoh+, an Array of Hashes or an Array of any objects that respond to the #to_h method. All hashes must have the same keys, which, when converted to symbols will become the headers for the Table. If hlines is set true, mark a group boundary whenever a nil, rather than a hash appears in the outer array.



224
225
226
227
228
229
230
231
# File 'lib/fat_table/table.rb', line 224

def self.from_aoh(aoh, hlines: false, **types)
  if aoh.first.respond_to?(:to_h)
    from_array_of_hashes(aoh, hlines: hlines, **types)
  else
    raise UserError,
          "Cannot initialize Table with an array of #{input[0].class}"
  end
end

.from_csv_file(fname, **types) ⇒ Object

Construct a Table from the contents of a CSV file named +fname+. Headers will be taken from the first CSV row and converted to symbols.



155
156
157
158
159
160
161
# File 'lib/fat_table/table.rb', line 155

def self.from_csv_file(fname, **types)
  File.open(fname, 'r') do |io|
    from_csv_io(io, **types)
  end
rescue NoTable
  raise NoTable, "no table found in CSV file '#{fname}'"
end

.from_csv_string(str, has_headers: true) ⇒ Object

Construct a Table from a CSV string +str+, treated in the same manner as the input from a CSV file in ::from_org_file.



167
168
169
170
171
# File 'lib/fat_table/table.rb', line 167

def self.from_csv_string(str, has_headers: true, **)
  from_csv_io(StringIO.new(str), has_headers:, **)
rescue NoTable
  raise NoTable, "no table found in string '#{str[0..20]}...'"
end

.from_org_file(fname, **types) ⇒ Object

Construct a Table from the first table found in the given Emacs org-mode file named +fname+. Headers are taken from the first row if the second row is an hrule. Otherwise, synthetic headers of the form +:col_1+, +:col_2+, etc. are created.



179
180
181
182
183
184
185
# File 'lib/fat_table/table.rb', line 179

def self.from_org_file(fname, **types)
  File.open(fname, 'r') do |io|
    from_org_io(io, **types)
  end
rescue NoTable
  raise NoTable, "no table found in file '#{fname}'"
end

.from_org_string(str, **types) ⇒ Object

Construct a Table from a string +str+, treated in the same manner as the contents of an org-mode file in ::from_org_file.



191
192
193
194
195
# File 'lib/fat_table/table.rb', line 191

def self.from_org_string(str, **types)
  from_org_io(StringIO.new(str), **types)
rescue NoTable
  raise NoTable, "no table found in string '#{str[0..20]...}'"
end

.from_sql(query, **types) ⇒ Object

Construct a Table by running a SQL +query+ against the database set up with FatTable.connect, with the rows of the query result as rows.

Raises:



245
246
247
248
249
250
251
252
253
254
255
# File 'lib/fat_table/table.rb', line 245

def self.from_sql(query, **types)
  msg = 'FatTable.db must be set with FatTable.connect'
  raise UserError, msg if FatTable.db.nil?

  result = Table.new
  rows = FatTable.db[query]
  rows.each do |h|
    result << h
  end
  result
end

.from_table(table) ⇒ Object

Construct a new table from another FatTable::Table object +table+. Inherit any group boundaries from the input table.



237
238
239
# File 'lib/fat_table/table.rb', line 237

def self.from_table(table)
  table.deep_dup
end

.typ_tol(str) ⇒ Object

Return [typ, tol] based on the type string, str.



263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
# File 'lib/fat_table/table.rb', line 263

def typ_tol(str)
  tol = str ? str.match?(/~\s*\Z/) : false
  typ =
    case str
    when /\A\s*num/i
      'Numeric'
    when /\A\s*boo/i
      'Boolean'
    when /\A\s*dat/i
      'DateTime'
    when /\A\s*str/i
      'String'
    else
      'NilClass'
    end
  [typ, tol]
end

Instance Method Details

#<<(row) ⇒ Object

Add a +row+ to this Table without marking it as a group boundary.



1595
1596
1597
# File 'lib/fat_table/table.rb', line 1595

def <<(row)
  add_row(row)
end

#[](key) ⇒ Object

Return the array of items of the column with the given header symbol +key+, or if +key+ is an Integer, return that row at that index. So a table's rows can be accessed by number, and its columns can be accessed by column header. Also, double indexing works in either row-major or column-major order: \tab[:id][8] returns the 9th item in the column headed :id and so does \tab[8][:id].



475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
# File 'lib/fat_table/table.rb', line 475

def [](key)
  case key
  when Integer
    msg = "index '#{key}' out of range"
    raise UserError, msg unless (0..size - 1).cover?(key.abs)

    rows[key]
  when String
    msg = "header '#{key}' not in table"
    raise UserError, msg unless headers.include?(key)

    column(key).items
  when Symbol
    msg = "header ':#{key}' not in table"
    raise UserError, msg unless headers.include?(key)

    column(key).items
  else
    raise UserError, "cannot index table with a #{key.class}"
  end
end

#add_column(col) ⇒ Object

Add a FatTable::Column object +col+ to the table.



1602
1603
1604
1605
1606
1607
1608
# File 'lib/fat_table/table.rb', line 1602

def add_column(col)
  msg = "Table already has a column with header '#{col.header}'"
  raise msg if column?(col.header)

  columns << col
  self
end

#add_row(row, mark: false) ⇒ Object

Add a +row+ represented by a Hash having the headers as keys. If +mark:+ is set true, mark this row as a boundary. All tables should be built ultimately using this method as a primitive.



1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
# File 'lib/fat_table/table.rb', line 1574

def add_row(row, mark: false)
  row.transform_keys!(&:as_sym)
  # Make sure there is a column for each known header and each new key
  # present in row.
  new_heads = row.keys - headers
  new_heads.each do |h|
    # This column is new, so it needs nil items for all prior rows lest
    # the value be added to a prior row.
    items = Array.new(size, nil)
    columns << Column.new(header: h, items: items, tolerant: tolerant_col?(h))
  end
  headers.each do |h|
    # NB: This adds a nil if h is not in row.
    column(h) << row[h]
  end
  self
end

#boundariesObject

Return the explicit_boundaries, augmented by an implicit boundary for the end of the table, unless it's already an implicit boundary.



751
752
753
754
755
756
757
758
759
# File 'lib/fat_table/table.rb', line 751

def boundaries
  return [] if empty?

  if explicit_boundaries.last == size - 1
    explicit_boundaries
  else
    explicit_boundaries + [size - 1]
  end
end

#col_typesObject

Return a Hash of the Table's Column header symbols to type strings.



507
508
509
510
511
512
513
# File 'lib/fat_table/table.rb', line 507

def col_types
  result = {}
  columns.each do |c|
    result[c.header] = c.type
  end
  result
end

#column(key) ⇒ FatTable::Column

Return the table's Column with the given +key+ as its header.

Parameters:

  • key (Symbol)

    symbol for header of column to return

Returns:



433
434
435
# File 'lib/fat_table/table.rb', line 433

def column(key)
  columns.detect { |c| c.header == key.as_sym }
end

#column?(key) ⇒ Boolean

Return true if the table has a Column with the given +key+ as a header.

Returns:

  • (Boolean)


500
501
502
# File 'lib/fat_table/table.rb', line 500

def column?(key)
  headers.include?(key.as_sym)
end

#cross_join(other) ⇒ Object

Perform a cross join as described in FatTable::Table#join.



1356
1357
1358
# File 'lib/fat_table/table.rb', line 1356

def cross_join(other)
  join(other, join_type: :cross)
end

#degroup!Object

Return this table mutated with all groups removed. Useful after something like #order_by, which adds groups as a side-effect, when you do not want the groups displayed in the output. This modifies the input table, so is a departure from the otherwise immutability of Tables.



709
710
711
712
# File 'lib/fat_table/table.rb', line 709

def degroup!
  self.explicit_boundaries = []
  self
end

#distinctObject

Return a new table with all duplicate rows eliminated. Resets groups. Same as #uniq.



1096
1097
1098
1099
1100
1101
1102
1103
# File 'lib/fat_table/table.rb', line 1096

def distinct
  result = empty_dup
  uniq_rows = rows.uniq
  uniq_rows.each do |row|
    result << row
  end
  result
end

#eachObject

Yield each row of the table as a Hash with the column symbols as keys.



607
608
609
610
611
612
613
614
615
616
# File 'lib/fat_table/table.rb', line 607

def each
  if block_given?
    rows.each do |row|
      yield row
    end
    self
  else
    to_enum(:each)
  end
end

#empty?Boolean

Return whether this Table is empty.

Returns:

  • (Boolean)


552
553
554
# File 'lib/fat_table/table.rb', line 552

def empty?
  size.zero?
end

#empty_dup(result_cols = nil) ⇒ Object

Return an new table based on this Table but with empty columns named by the result_cols parameter, by default the this Table's columns. If any of the result_cols have the same name as an existing column, inherit that column's type and tolerance. Also, set any instance variables that might have been set by a subclass instance.



137
138
139
140
141
142
143
144
145
146
147
148
149
# File 'lib/fat_table/table.rb', line 137

def empty_dup(result_cols = nil)
  result_cols ||= heads
  result_types = types.select { |k,_v| result_cols.include?(k) }
  result = self.class.new(result_cols, **result_types)
  tolerant_cols.each do |h|
    result.tolerant_cols << h
    result.column(h).tolerant = true
  end
  (instance_variables - result.instance_variables).each do |v|
    result.instance_variable_set(v, instance_variable_get(v))
  end
  result
end

#except(other) ⇒ Object

Return a Table that includes the rows of this table except for any rows that are the same as those in Table +other+. In other words, return the set difference between this table and +other+. The headers of this table are used in the result. There must be the same number of columns of the same type in the two tables, or an exception will be raised. Duplicates are eliminated from the result. Any groups present in either Table are eliminated in the output Table.



1180
1181
1182
# File 'lib/fat_table/table.rb', line 1180

def except(other)
  set_operation(other, :difference, distinct: true)
end

#except_all(other) ⇒ Object

Return a Table that includes the rows of this table except for any rows that are the same as those in Table +other+. In other words, return the set difference between this table an the other. The headers of this table are used in the result. There must be the same number of columns of the same type in the two tables, or an exception will be thrown. Duplicates are /not/ eliminated from the result. Any groups present in either Table are eliminated in the output Table.



1193
1194
1195
# File 'lib/fat_table/table.rb', line 1193

def except_all(other)
  set_operation(other, :difference, distinct: false)
end

#force_string!(*keys) ⇒ Object

Set the column type for Column with the given +key+ as a String type.



458
459
460
461
462
463
464
465
# File 'lib/fat_table/table.rb', line 458

def force_string!(*keys)
  keys.each do |h|
    raise UserError, "force_string!: #{h} not a column in table" unless column(h)

    column(h).force_string!
  end
  self
end

#full_join(other, *exps) ⇒ Object

Perform a full join as described in FatTable::Table#join.



1349
1350
1351
# File 'lib/fat_table/table.rb', line 1349

def full_join(other, *exps)
  join(other, *exps, join_type: :full)
end

#group_by(*group_cols, **agg_cols) ⇒ Object

Return a Table with a single row for each group of rows in the input table where the value of all columns +group_cols+ named as simple symbols are equal. All other columns, +agg_cols+, are set to the result of aggregating the values of that column within the group according to a aggregate function (:count, :sum, :min, :max, etc.) that you can specify by adding a hash parameter with the column as the key and a symbol for the aggregate function as the value. For example, consider the following call:

tab.group_by(:date, :code, :price, shares: :sum).

The first three parameters are simple symbols and count as +group_cols+, so the table is divided into groups of rows in which the value of :date, :code, and :price are equal. The shares: hash parameter is an +agg_col+ parameter set to the aggregate function :sum, so it will appear in the result as the sum of all the :shares values in each group. Because of the way Ruby parses parameters to a method call, all the grouping symbols must appear first in the parameter list before any hash parameters.



1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
# File 'lib/fat_table/table.rb', line 1530

def group_by(*group_cols, **agg_cols)
  sorted_tab = order_by(group_cols)
  groups = sorted_tab.rows.group_by do |r|
    group_cols.map { |k| r[k] }
  end
  grp_types = types.select { |k, _v| group_cols.include?(k) }
  result = Table.new(*group_cols, **grp_types)
  groups.each_pair do |_vals, grp_rows|
    result << row_from_group(grp_rows, group_cols, agg_cols)
  end
  result.normalize_boundaries
  result
end

#group_cols(col) ⇒ Object

Return an Array of Column objects for header +col+ representing a sub-column for each group in the table under that header.



690
691
692
693
694
695
696
697
698
699
700
701
# File 'lib/fat_table/table.rb', line 690

def group_cols(col)
  normalize_boundaries
  cols = []
  (0..boundaries.size - 1).each do |k|
    range = group_row_range(k)
    tab_col = column(col)
    gitems = tab_col.items[range]
    cols << Column.new(header: col, items: gitems,
                       type: tab_col.type, tolerant: tab_col.tolerant?)
  end
  cols
end

#group_row_range(k) ⇒ Object

Return the range of row indexes for boundary number +k+



672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
# File 'lib/fat_table/table.rb', line 672

def group_row_range(k)
  last_k = boundaries.size - 1
  if k < 0 || k > last_k
    raise ArgumentError, "boundary number '#{k}' out of range in boundary_row_range"
  end

  if boundaries.size == 1
    (0..boundaries.first)
  elsif k.zero?
    # Keep index at or above zero
    (0..boundaries[k])
  else
    ((boundaries[k - 1] + 1)..boundaries[k])
  end
end

#groupsObject

Boundaries mark the last row in each "group" within the table. The last row of the table is always an implicit boundary, and having the last row as the sole boundary is the default for new tables unless mentioned otherwise. Resetting the boundaries means to put it back in that default state.

Boundaries can be added when a table is read in, for example, from the text of an org table in which each hline (other than the one separating the headers from the body) marks a boundary for the row immediately preceding the hline.

Boundaries can also be added manually with the +mark_boundary+ method.

The #order_by method resets the boundaries then adds boundaries at the last row of each group of rows on which the sort keys were equal as a boundary.

The #union_all (but not #union since it deletes duplicates) method adds a boundary between the constituent tables. #union_all also preserves any boundary markers within the constituent tables. In doing so, the boundaries of the second table in the #union_all are increased by the size of the first table so that they refer to rows in the new table.

The #select method preserves any boundaries from the input table without change, since it only selects columns for the output and deletes no rows.

Perhaps surprisingly, the #group_by method does /not/ result in any groups in the output table since the result of #group_by is to reduce all groups it finds into a single row, and having a group for each row of the output table would have no use.

All the other table-transforming methods reset the boundaries in the new table. For example, #where re-arranges and deletes rows, so the old boundaries would make no sense anyway. Likewise, #union, #intersection,

except, and #join reset the boundaries to their default.

Return an array of an Array of row Hashes for the groups in this Table.



657
658
659
660
661
662
663
664
# File 'lib/fat_table/table.rb', line 657

def groups
  normalize_boundaries
  groups = []
  (0..boundaries.size - 1).each do |k|
    groups << group_rows(k)
  end
  groups
end

#headersObject

Return the headers for the Table as an Array of Symbols.



518
519
520
# File 'lib/fat_table/table.rb', line 518

def headers
  columns.map(&:header)
end

#inner_join(other, *exps) ⇒ Object

Perform an inner join as described in FatTable::Table#join.



1328
1329
1330
# File 'lib/fat_table/table.rb', line 1328

def inner_join(other, *exps)
  join(other, *exps)
end

#intersect(other) ⇒ Object

Return a Table that includes the rows that appear in this table and in +other+ table. In other words, return the intersection of this table with the other. The headers of this table are used in the result. There must be the same number of columns of the same type in the two tables, or an exception will be thrown. Duplicates are eliminated from the result. Any groups present in either Table are eliminated in the output Table.



1154
1155
1156
# File 'lib/fat_table/table.rb', line 1154

def intersect(other)
  set_operation(other, :intersect, distinct: true)
end

#intersect_all(other) ⇒ Object

Return a Table that includes all the rows in this table that also occur in +other+ table. Note that the order of the operands matters. Duplicates in this table will be included in the output, but duplicates in other will not. The headers of this table are used in the result. There must be the same number of columns of the same type in the two tables, or an exception will be thrown. Duplicates are not eliminated from the result. Resets groups.



1167
1168
1169
# File 'lib/fat_table/table.rb', line 1167

def intersect_all(other)
  set_operation(other, :intersect, distinct: false)
end

#join(other, *exps, join_type: :inner) ⇒ FatTable::Table

:category: Operators

Return a table that joins this Table to +other+ based on one or more join expressions +exps+ using the +join_type+ in determining the rows of the result table. There are several possible forms for the join expressions +exps+:

  1. If no join expressions are given, the tables will be joined when all values with the same name in both tables have the same value, a "natural" join. However, if the join type is :cross, the join expression will be taken to be 'true'. Otherwise, if there are no common column names, an exception will be raised.

  2. If the join expressions are one or more symbols, the join condition requires that the values of both tables are equal for all columns named by the symbols. A column that appears in both tables can be given without modification and will be assumed to require equality on that column. If an unmodified symbol is not a name that appears in both tables, an exception will be raised. Column names that are unique to the first table must have a '_a' appended to the column name and column names that are unique to the other table must have a '_b' appended to the column name. These disambiguated column names must come in pairs, one for the first table and one for the second, and they will imply a join condition that the columns must be equal on those columns. Several such symbol expressions will require that all such implied pairs are equal in order for the join condition to be met.

  3. Finally, a string expression can be given that contains an arbitrary ruby expression that will be evaluated for truthiness. Within the string, all column names must be disambiguated with the '_a' or '_b' modifiers whether they are common to both tables or not. The names of the columns in both tables (without the leading ':' for symbols) are available as variables within the expression.

The join_type parameter specifies what sort of join is performed, :inner, :left, :right, :full, or :cross. The default is an :inner join. The types of joins are defined as follows where T1 means this table, the receiver, and T2 means other. These descriptions are taken from the Postgresql documentation.

:inner:: For each row R1 of T1, the joined table has a row for each row in T2 that satisfies the join condition with R1.

:left:: First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Thus, the joined table always has at least one row for each row in T1.

:right:: First, an inner join is performed. Then, for each row in T2 that does not satisfy the join condition with any row in T1, a joined row is added with null values in columns of T1. This is the converse of a left join: the result table will always have a row for each row in T2.

:full:: First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Also, for each row of T2 that does not satisfy the join condition with any row in T1, a joined row with null values in the columns of T1 is added.

:cross:: For every possible combination of rows from T1 and T2 (i.e., a Cartesian product), the joined table will contain a row consisting of all columns in T1 followed by all columns in T2. If the tables have N and M rows respectively, the joined table will have N * M rows.

Any groups present in either Table are eliminated in the output Table. See the README for examples.

Parameters:

Returns:



1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
# File 'lib/fat_table/table.rb', line 1273

def join(other, *exps, join_type: :inner)
  unless other.is_a?(Table)
    raise UserError, 'need other table as first argument to join'
  end
  unless JOIN_TYPES.include?(join_type)
    raise UserError, "join_type may only be: #{JOIN_TYPES.join(', ')}"
  end

  # These may be needed for outer joins.
  self_row_nils = headers.map { |h| [h, nil] }.to_h
  other_row_nils = other.headers.map { |h| [h, nil] }.to_h
  join_exp, other_common_heads =
    build_join_expression(exps, other, join_type)
  ev = Evaluator.new
  result = empty_dup
  other_rows = other.rows
  other_row_matches = Array.new(other_rows.size, false)
  rows.each do |self_row|
    self_row_matched = false
    other_rows.each_with_index do |other_row, k|
      # Same as other_row, but with keys that are common with self and equal
      # in value, removed, so the output table need not repeat them.
      locals = build_locals_hash(row_a: self_row, row_b: other_row)
      matches = ev.evaluate(join_exp, locals: locals)
      next unless matches

      self_row_matched = other_row_matches[k] = true
      out_row = build_out_row(row_a: self_row, row_b: other_row,
                              common_heads: other_common_heads,
                              type: join_type)
      result << out_row
    end
    next unless [:left, :full].include?(join_type)
    next if self_row_matched

    result << build_out_row(row_a: self_row,
                            row_b: other_row_nils,
                            type: join_type)
  end
  if [:right, :full].include?(join_type)
    other_rows.each_with_index do |other_row, k|
      next if other_row_matches[k]

      result << build_out_row(row_a: self_row_nils,
                              row_b: other_row,
                              type: join_type)
    end
  end
  result.normalize_boundaries
  result
end

#left_join(other, *exps) ⇒ Object

Perform a left join as described in FatTable::Table#join.



1335
1336
1337
# File 'lib/fat_table/table.rb', line 1335

def left_join(other, *exps)
  join(other, *exps, join_type: :left)
end

#mark_boundary(row_num = nil) ⇒ Object

Mark a group boundary at row +row+, and if +row+ is +nil+, mark the last row in the table as a group boundary. An attempt to add a boundary to an empty table has no effect. We adopt the convention that the last row of the table always marks an implicit boundary even if it is not in the be an explicit boundary, even if it marks the last row of the table.



720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
# File 'lib/fat_table/table.rb', line 720

def mark_boundary(row_num = nil)
  return self if empty?

  if row_num
    unless row_num < size
      raise ArgumentError, "can't mark boundary at row #{row_num}, last row is #{size - 1}"
    end
    unless row_num >= 0
      raise ArgumentError, "can't mark boundary at non-positive row #{row_num}"
    end
    explicit_boundaries.push(row_num)
  elsif size > 0
    explicit_boundaries.push(size - 1)
  end
  normalize_boundaries
  self
end

#normalize_boundariesObject

Make sure size - 1 is last boundary and that they are unique and sorted.



741
742
743
744
745
746
747
# File 'lib/fat_table/table.rb', line 741

def normalize_boundaries
  unless empty?
    self.explicit_boundaries = explicit_boundaries.uniq.sort
  end
  explicit_boundaries
  self
end

#number_of_groupsObject

Return the number of groups in the table.



667
668
669
# File 'lib/fat_table/table.rb', line 667

def number_of_groups
  empty? ? 0 : boundaries.size
end

#order_by(*sort_heads) ⇒ Object

Return a new Table sorting the rows of this Table on the possibly multiple keys given in +sort_heads+ as an Array of Symbols. Append a ! to the symbol name to indicate reverse sorting on that column.

tab.order_by(:ref, :date) => sorted table tab.order_by(:date!) => reverse sort on :date

After sorting, the output Table will have group boundaries added after each row where the sort key changes.



827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
# File 'lib/fat_table/table.rb', line 827

def order_by(*sort_heads)
  # Sort the rows in order and add to new_rows.
  key_hash = partition_sort_keys(sort_heads)
  new_rows = rows.sort do |r1, r2|
    # Set the sort keys based on direction
    key1 = []
    key2 = []
    key_hash.each_pair do |h, dir|
      if dir == :forward
        key1 << r1[h]
        key2 << r2[h]
      else
        key1 << r2[h]
        key2 << r1[h]
      end
    end
    # Make any booleans comparable with <=>
    key1 = key1.map_booleans
    key2 = key2.map_booleans

    # If there are any nils, <=> will return nil, and we have to use the
    # special comparison method, compare_with_nils, instead.
    result = (key1 <=> key2)
    result.nil? ? compare_with_nils(key1, key2) : result
  end

  # Add the new_rows to the table, but mark a group boundary at the points
  # where the sort key changes value.  NB: I use self.class.new here
  # rather than Table.new because if this class is inherited, I want the
  # new_tab to be an instance of the subclass.  With Table.new, this
  # method's result will be an instance of FatTable::Table rather than of
  # the subclass.
  new_tab = empty_dup
  last_key = nil
  new_rows.each_with_index do |nrow, k|
    new_tab << nrow
    key = nrow.fetch_values(*key_hash.keys)
    new_tab.mark_boundary(k - 1) if last_key && key != last_key
    last_key = key
  end
  new_tab.normalize_boundaries
  new_tab
end

#order_with(expr) ⇒ Object

Return a new Table sorting the rows of this Table on an any expression +expr+ that is valid with the +select+ method, except that they expression may end with an exclamation mark +!+ to indicate a reverse sort. The new table will have an additional column called +sort_key+ populated with the result of evaluating the given expression and will be sorted (or reverse sorted) on that column.

tab.order_with('date.year') => table sorted by date's year tab.order_with('date.year!') => table reverse sorted by date's year

After sorting, the output Table will have group boundaries added after each row where the sort key changes.



885
886
887
888
889
890
891
892
893
894
895
896
897
# File 'lib/fat_table/table.rb', line 885

def order_with(expr)
  unless expr.is_a?(String)
    raise "must call FatTable::Table\#order_with with a single string expression"
  end

  rev = false
  if expr.match?(/\s*!\s*\z/)
    rev = true
    expr = expr.sub(/\s*!\s*\z/, '')
  end
  sort_sym = rev ? :sort_key! : :sort_key
  dup.select(*headers, sort_key: expr).order_by(sort_sym)
end

#right_join(other, *exps) ⇒ Object

Perform a right join as described in FatTable::Table#join.



1342
1343
1344
# File 'lib/fat_table/table.rb', line 1342

def right_join(other, *exps)
  join(other, *exps, join_type: :right)
end

#rowsObject

Return the rows of the Table as an Array of Hashes, keyed by the headers.



559
560
561
562
563
564
565
566
567
568
569
570
571
# File 'lib/fat_table/table.rb', line 559

def rows
  rows = []
  unless columns.empty?
    0.upto(columns.first.items.last_i) do |rnum|
      row = {}
      columns.each do |col|
        row[col.header] = col[rnum]
      end
      rows << row
    end
  end
  rows
end

#select(*cols, **new_cols) ⇒ Object

:category: Operators

Return a Table having the selected column expressions. Each expression can be either a

  1. in +cols+, a symbol, +:old_col+, representing a column in the current table,

  2. a hash in +new_cols+ of the form +new_col: :old_col+ to rename an existing +:old_col+ column as +:new_col+, or

  3. a hash in +new_cols+ of the form +new_col: 'expression'+, to add a new column +new_col+ that is computed as an arbitrary ruby expression in which there are local variables bound to the names of existing columns (whether selected for the output table or not) as well as any +new_col+ defined earlier in the argument list. The expression string can also access the instance variable @row, as the row number of the row being evaluated, and @group, as the group number of the row being evaluated.

  4. a hash in +new_cols+ with one of the special keys, +ivars: hash+, +before_hook: 'ruby-code'+, or +after_hook: 'ruby-code'+ for defining custom instance variables to be used during evaluation of parameters described in point 3 and hooks of ruby code snippets to be evaluated before and after processing each row.

The bare symbol arguments +cols+ (1) must precede any hash arguments +new_cols+ (2 or 3). Each expression results in a column in the resulting Table in the order given in the argument list. The expressions are evaluated in left-to-right order as well. The output table preserves any groups present in the input table.

tab.select(:ref, :date, :shares) => table with only 3 columns selected tab.select(:ref, :date, shares: :quantity) => rename :shares->:quantity tab.select(:ref, :date, :shares, cost: 'price * shares') => new column tab.select(:ref, :date, :shares, seq: '@row') => add sequential nums

The instance variables and hooks mentioned in point 4 above allow you to keep track of things that cross row boundaries, such as running sums or the values of columns before or after construction of the new row. You can define instance variables other than the default @row and @group variables to be available when evaluating normal string expressions for constructing a new row.

You define custom instance variables by passing a Hash to the ivars parameter. The names of the instance variables will be the keys and their initial values will be the values. For example, you can keep track of a running sum of the cost of shares and the number of shares in the prior row by adding two custom instance variables and the appropriate hooks:

tab.select(:ref, :date, :shares, :price, cost: 'shares * price', cumulative_cost: '@total_cost' ivars: { total_cost: 0, prior_shares: 0}, before_hook: '@total_cost += shares * price, after_hook: '@prior_shares = shares')

Notice that in the +ivars:+ parameter, the '@' is not prefixed to the name since it is a symbol, but must be prefixed when the instance variable is referenced in an expression, otherwise it would be interpreted as a column name. You could include the '@' if you use a string as a key, e.g., +{ '@total_cost' => 0 }+ The ivars values are evaluated once, before the first row is processed with the select statement.

For each row, the +before_hook+ is evaluated, then the +new_cols+ expressions for setting the new value of columns, then the +after_hook+ is evaluated.

In the before_hook, the values of all columns are available as local variables as they were before processing the row. The values of all instance variables are available as well with the values they had after processing the prior row of the table.

In the string expressions for new columns, all the instance variables are available with the values they have after the before_hook is evaluated. You could also modify instance variables in the new_cols expression, but remember, they are evaluated once for each new column expression. Also, the new column is assigned the value of the entire expression, so you must ensure that the last expression is the one you want assigned to the new column. You might want to use a semicolon: +cost: '@total_cost += shares * price; shares * price'

In the after_hook, the new, updated values of all columns, old and new are available as local variables, and the instance variables are available with the values they had after executing the before_hook.



982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
# File 'lib/fat_table/table.rb', line 982

def select(*cols, **new_cols)
  # Set up the Evaluator
  ivars = { row: 0, group: 0 }
  if new_cols.key?(:ivars)
    ivars = ivars.merge(new_cols[:ivars])
    new_cols.delete(:ivars)
  end
  if new_cols.key?(:before_hook)
    before_hook = new_cols[:before_hook].to_s
    new_cols.delete(:before_hook)
  end
  after_hook = nil
  if new_cols.key?(:after_hook)
    after_hook = new_cols[:after_hook].to_s
    new_cols.delete(:after_hook)
  end
  ev = Evaluator.new(ivars: ivars,
                     before: before_hook,
                     after: after_hook)
  # Compute the new Table from this Table
  result_cols =
    if cols.include?(:omni)
      (headers + new_cols.keys - [:omni])
    else
      (cols + new_cols.keys)
    end
  result = empty_dup(result_cols)
  normalize_boundaries

  rows.each_with_index do |old_row, old_k|
    # Set the group number in the before hook and run the hook with the
    # local variables set to the row before the new row is evaluated.
    grp = row_index_to_group_index(old_k)
    ev.update_ivars(row: old_k + 1, group: grp)
    ev.eval_before_hook(locals: old_row)
    # Compute the new row.
    new_row = {}
    # Allow the :omni col to stand for all columns if it is alone and
    # first.
    cols_to_include =
      if cols.size == 1 && cols.first.as_sym == :omni
        headers
      else
        cols
      end
    cols_to_include.each do |k|
      h = k.as_sym
      msg = "Column '#{h}' in select does not exist"
      raise UserError, msg unless column?(h)

      new_row[h] = old_row[h]
    end
    new_cols.each_pair do |key, expr|
      key = key.as_sym
      vars = old_row.merge(new_row)
      case expr
      when Symbol
        msg = "select column '#{expr}' does not exist"
        raise UserError, msg unless vars.key?(expr)

        new_row[key] = vars[expr]
      when String
        if expr.match?(/\A\s*:/)
          # Leading colon signal a literal string
          new_row[key] = expr.sub(/\A\s*:/, '')
        else
          # Otherwise, evaluate the string.
          new_row[key] = ev.evaluate(expr, locals: vars)
        end
      when Numeric, DateTime, Date, TrueClass, FalseClass
        new_row[key] = expr
      else
        msg = "select can't set column at '#{key}' to '#{expr}' of class #{expr.class}"
        raise UserError, msg
      end
    end
    ev.eval_after_hook(locals: new_row)
    result << new_row
  end
  result.explicit_boundaries = explicit_boundaries
  result.normalize_boundaries
  result
end

#sizeObject

Return the number of rows in the Table.



534
535
536
537
538
# File 'lib/fat_table/table.rb', line 534

def size
  return 0 if columns.empty?

  columns.first.size
end

#to_any(fmt_type, options = {}) ⇒ Object

Return a string or ruby object according to the format type +fmt_type+ given in the first argument, passing the +options+ on to the Formatter. Valid format types are :psv, :aoa, :aoh, :latex, :org, :term, :text, or their string equivalents. If a block is given, it will yield a Formatter of the appropriate type to which format and footers can be applied. Otherwise, the default format for the type will be used.

:call-seq: to_any(fmt_type, options = {}) { |fmt| ... }

Raises:



1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
# File 'lib/fat_table/table.rb', line 1650

def to_any(fmt_type, options = {})
  fmt = fmt_type.as_sym
  msg = "unknown format '#{fmt}'"
  raise UserError, msg unless FatTable::FORMATS.include?(fmt)

  method = "to_#{fmt}"
  if block_given?
    send(method, options, &Proc.new)
  else
    send(method, options)
  end
end

#to_aoa(options = {}) {|fmt| ... } ⇒ Object

Return the table as an Array of Array of Strings, passing the +options+ on to the AoaFormatter. If no block is given, default formatting is applies to the table's cells. If a block is given, it yields an AoaFormatter to the block to which formatting instructions and footers can be added by calling methods on it.

Yields:

  • (fmt)


1685
1686
1687
1688
1689
# File 'lib/fat_table/table.rb', line 1685

def to_aoa(options = {})
  fmt = FatTable::AoaFormatter.new(self, **options)
  yield fmt if block_given?
  fmt.output
end

#to_aoh(options = {}) {|fmt| ... } ⇒ Object

Return the table as an Array of Hashes, passing the +options+ on to the AohFormatter. Each inner hash uses the Table's columns as keys and it values are strings representing the cells of the table. If no block is given, default formatting is applies to the table's cells. If a block is given, it yields an AohFormatter to the block to which formatting instructions and footers can be added by calling methods on it.

Yields:

  • (fmt)


1699
1700
1701
1702
1703
# File 'lib/fat_table/table.rb', line 1699

def to_aoh(options = {})
  fmt = AohFormatter.new(self, **options)
  yield fmt if block_given?
  fmt.output
end

#to_format(options = {}) ⇒ Object

Return a string or ruby object according to the format specified in FatTable.format, passing the +options+ on to the Formatter. If a block is given, it will yield a Formatter of the appropriate type to which format and footers can be applied. Otherwise, the default format for the type will be used.

:call-seq: to_format(options = {}) { |fmt| ... }



1631
1632
1633
1634
1635
1636
1637
# File 'lib/fat_table/table.rb', line 1631

def to_format(options = {})
  if block_given?
    to_any(FatTable.format, self, options, &Proc.new)
  else
    to_any(FatTable.format, self, options)
  end
end

#to_latex(options = {}) {|fmt| ... } ⇒ Object

Return the table as a string containing a LaTeX table, passing the +options+ on to the LaTeXFormatter. If no block is given, default formatting applies to the table's cells. If a block is given, it yields a LaTeXFormatter to the block to which formatting instructions and footers can be added by calling methods on it.

Yields:

  • (fmt)


1712
1713
1714
1715
1716
# File 'lib/fat_table/table.rb', line 1712

def to_latex(options = {})
  fmt = LaTeXFormatter.new(self, **options)
  yield fmt if block_given?
  fmt.output
end

#to_org(options = {}) {|fmt| ... } ⇒ Object

Return the table as a string containing an Emacs org-mode table, passing the +options+ on to the OrgFormatter. If no block is given, default formatting applies to the table's cells. If a block is given, it yields a OrgFormatter to the block to which formatting instructions and footers can be added by calling methods on it.

Yields:

  • (fmt)


1725
1726
1727
1728
1729
# File 'lib/fat_table/table.rb', line 1725

def to_org(options = {})
  fmt = OrgFormatter.new(self, **options)
  yield fmt if block_given?
  fmt.output
end

#to_psv(options = {}) {|fmt| ... } ⇒ Object

Return the table as a string formatted as a pipe-separated values, passing the +options+ on to the Formatter. If no block is given, default formatting is applies to the table's cells. If a block is given, it yields a Formatter to the block to which formatting instructions and footers can be added by calling methods on it. Since the pipe-separated format is the default format for Formatter, there is no class PsvFormatter as you might expect.

Yields:

  • (fmt)


1672
1673
1674
1675
1676
# File 'lib/fat_table/table.rb', line 1672

def to_psv(options = {})
  fmt = Formatter.new(self, **options)
  yield fmt if block_given?
  fmt.output
end

#to_term(options = {}) {|fmt| ... } ⇒ Object

Return the table as a string containing ANSI terminal text representing table, passing the +options+ on to the TermFormatter. If no block is given, default formatting applies to the table's cells. If a block is given, it yields a TermFormatter to the block to which formatting instructions and footers can be added by calling methods on it.

Yields:

  • (fmt)


1738
1739
1740
1741
1742
# File 'lib/fat_table/table.rb', line 1738

def to_term(options = {})
  fmt = TermFormatter.new(self, **options)
  yield fmt if block_given?
  fmt.output
end

#to_text(options = {}) {|fmt| ... } ⇒ String

Return the table as a string containing ordinary text representing table, passing the +options+ on to the TextFormatter. If no block is given, default formatting applies to the table's cells. If a block is given, it yields a TextFormatter to the block to which formatting instructions and footers can be added by calling methods on it.

Yields:

  • (fmt)

Returns:



1752
1753
1754
1755
1756
# File 'lib/fat_table/table.rb', line 1752

def to_text(options = {})
  fmt = TextFormatter.new(self, **options)
  yield fmt if block_given?
  fmt.output
end

#tolerant_col?(h) ⇒ Boolean

Return whether the column with the given head is supposed to be tolerant. We can't just look up the Column because it may not be build yet, as when we do a row-by-row add.

Returns:

  • (Boolean)


527
528
529
# File 'lib/fat_table/table.rb', line 527

def tolerant_col?(h)
  tolerant_cols.include?(h.to_s.as_sym) || self.omni_tol
end

#type(key) ⇒ Object

Return the type of the Column with the given +key+ as its header as a String.



441
442
443
# File 'lib/fat_table/table.rb', line 441

def type(key)
  column(key).type
end

#typesObject

Return the type of the Column with the given +key+ as its header as a String.



447
448
449
450
451
452
453
# File 'lib/fat_table/table.rb', line 447

def types
  result = {}
  headers.each do |h|
    result[h] = type(h)
  end
  result
end

#union(other) ⇒ Object

Return a Table that combines this table with +other+ table, i.e., return the union of this table with the other. The headers of this table are used in the result. There must be the same number of columns of the same type in the two tables, otherwise an exception will be raised. Duplicates are eliminated from the result. Any groups present in either Table are eliminated in the output Table.



1121
1122
1123
1124
1125
# File 'lib/fat_table/table.rb', line 1121

def union(other)
  set_operation(other, :+,
                distinct: true,
                add_boundaries: true)
end

#union_all(other) ⇒ Object

Return a Table that combines this table with +other+ table. In other words, return the union of this table with the other. The headers of this table are used in the result. There must be the same number of columns of the same type in the two tables, or an exception will be thrown. Duplicates are not eliminated from the result. Adds group boundaries at boundaries of the constituent tables. Preserves and adjusts the group boundaries of the constituent table.



1136
1137
1138
1139
1140
1141
1142
1143
1144
# File 'lib/fat_table/table.rb', line 1136

def union_all(other)
  set_operation(
    other,
    :+,
    distinct: false,
    add_boundaries: true,
    inherit_boundaries: true
  )
end

#uniqObject

Return this table with all duplicate rows eliminated. Resets groups. Same as #distinct.



1109
1110
1111
# File 'lib/fat_table/table.rb', line 1109

def uniq
  distinct
end

#where(expr) ⇒ Object

Return a Table containing only rows for which the Ruby where expression, +exp+, evaluates to a truthy value. Within the string expression +exp+, each header is a local variable bound to the value of the current row in that column, and the instance variables @row and @group are available as the row and group number of the row being evaluated. Any groups present in the input Table are eliminated in the output Table.

tab.where('date > Date.today - 30') => rows with recent dates tab.where('@row.even? && shares > 500') => even rows with lots of shares



1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
# File 'lib/fat_table/table.rb', line 1077

def where(expr)
  expr = expr.to_s
  result = empty_dup
  ev = Evaluator.new(ivars: { row: 0, group: 0 })
  rows.each_with_index do |row, k|
    grp = row_index_to_group_index(k)
    ev.update_ivars(row: k + 1, group: grp)
    ev.eval_before_hook(locals: row)
    result << row if ev.evaluate(expr, locals: row)
    ev.eval_after_hook(locals: row)
  end
  result.normalize_boundaries
  result
end

#widthObject

Return the number of Columns in the Table.



543
544
545
546
547
# File 'lib/fat_table/table.rb', line 543

def width
  return 0 if columns.empty?

  columns.size
end