Class: Table

Inherits:
Object
  • Object
show all
Includes:
Enumerable
Defined in:
lib/tablestakes.rb

Overview

This class is a Ruby representation of a table. All data is captured as type String by default. Columns are referred to by their String headers which are assumed to be identified in the first row of the input file. Output is written by default to tab-delimited files with the first row serving as the header names.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input = nil) ⇒ Table

Instantiate a Table object using a tab-delimited file

Attributes

input

OPTIONAL Array of rows or String to identify the name of the tab-delimited file to read

Examples

cities = Table.new() # empty table
cities = Table.new([ ["City", "State], ["New York", "NY"], ["Dallas", "TX"] ]) # create from Array of rows
cities = Table.new("cities.txt") # read from file
cities = Table.new(capitals)  # create from table


43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# File 'lib/tablestakes.rb', line 43

def initialize(input=nil)
  @headers = []
  @table = {}
  @indices = {}
  
  if input.respond_to?(:fetch)
    if input[0].respond_to?(:fetch)
      #create Table from rows
      add_rows(input)
    end
  elsif input.respond_to?(:upcase)
    # a string, then read_file
    read_file(input)
  elsif input.respond_to?(:headers)
    input.each {|row| add_row(row) }
  end
  # else create empty +Table+
end

Instance Attribute Details

#headersObject (readonly)

The headers attribute contains the table headers used to reference columns in the Table. All headers are represented as String types.



24
25
26
# File 'lib/tablestakes.rb', line 24

def headers
  @headers
end

Instance Method Details

#add_column(*args) ⇒ Object

Add a column to the Table. Raises ArgumentError if the column name is already taken or there are not the correct number of values.

Attributes

args

Array of String to identify the name of the column (see examples)

Examples

cities.add_column("City", ["New York", "Dallas", "San Franscisco"])
cities.add_column(["City","New York", "Dallas", "San Franscisco"])
cities.add_column("City", "New York", "Dallas", "San Franscisco")

Raises:

  • (ArgumentError)


114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# File 'lib/tablestakes.rb', line 114

def add_column(*args)
  if args.kind_of? Array
    args.flatten!
    colname = args.shift
    column_vals = args
  else
    raise ArgumentError, "Invalid Arguments to add_column"
  end
  # check arguments
  raise ArgumentError, "Duplicate Column Name!" if @table.has_key?(colname)
  unless self.empty?
    if column_vals.length != @table[@headers.first].length
      raise ArgumentError, "Number of elements in column does not match existing table"
    end
  end
  append_col(colname, column_vals)    
end

#add_row(*row) ⇒ Object Also known as: <<

Add a row to the Table, appending it to the end. Raises ArgumentError if there are not the correct number of values.

Attributes

row

Array to hold the row values

Examples

cities = Table.new.add_row( ["City", "State"] ) # create new Table with headers
cities.add_row("New York", "NY") # add data row to Table


158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
# File 'lib/tablestakes.rb', line 158

def add_row(*row)
  if row.kind_of? Array
    row = row.flatten
  else
    raise ArgumentError, "Invalid Arguments to add_row"
  end
  if @headers.empty?
      @headers = row
  else
    unless row.length == @headers.length
      raise ArgumentError, "Wrong number of fields in Table input"
    end
    append_row(row)
  end
  return self
end

#add_rows(array_of_rows) ⇒ Object

Add one or more rows to the Table, appending it to the end. Raises ArgumentError if there are not the correct number of values. The first row becomes the table headers if currently undefined.

Attributes

array_of_rows

Array of Arrays to hold the rows values

Examples

cities.add_rows([ ["New York", "NY"], ["Austin", "TX"] ])


141
142
143
144
145
146
# File 'lib/tablestakes.rb', line 141

def add_rows(array_of_rows)
  array_of_rows.each do |r|
    add_row(r.clone)
  end
  return self
end

#bottom(colname, num = 1) ⇒ Object

Returns counts of the least frequent values found in a given column in the form of a Table. Raises ArgumentError if the column is not found. If no limit is given to the number of values, only the least frequent value will be returned.

Attributes

colname

String to identify the column to count

num

OPTIONAL String number of values to return

Examples

cities.bottom("State")  # returns a Table with the least frequent state in the cities Table
cities.bottom("State", 10)  # returns a Table with the 10 least frequent states in the cities Table


320
321
322
323
# File 'lib/tablestakes.rb', line 320

def bottom(colname, num=1)
  freq = tally(colname).to_a[1..-1].sort_by {|k,v| v }
  return Table.new(freq[0..num-1].unshift([colname,"Count"]))
end

#column(colname) ⇒ Object

Return a copy of a column from the table, identified by column name. Returns empty Array if column name not found.

Attributes

colname

String to identify the name of the column



80
81
82
83
84
85
86
87
# File 'lib/tablestakes.rb', line 80

def column(colname)
  # return empty Array if column name not found
  unless @table.has_key?(colname) 
    Array.new()
  else
    Array(@table[colname])
  end
end

#count(colname = nil, value = nil) ⇒ Object Also known as: size, length

Counts the number of instances of a particular string, given a column name, and returns an integer >= 0. Returns nil if the column is not found. If no parameters are given, returns the number of rows in the table.

Attributes

colname

OPTIONAL String to identify the column to count

value

OPTIONAL String value to count

Examples

cities.count  # returns number of rows in cities Table
cities.size   # same as cities.count
cities.length # same as cities.count
cities.count("State", "NY")  # returns the number of rows with State == "NY"

Raises:

  • (ArgumentError)


266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
# File 'lib/tablestakes.rb', line 266

def count(colname=nil, value=nil)
  if colname.nil? || value.nil?
    if @table.size > 0
      @table.each_key {|e| return @table.fetch(e).length }
    else
      return 0
    end
  end
  raise ArgumentError, "Invalid column name" unless @headers.include?(colname)
  
  if @table[colname]
    result = 0
    @table[colname].each do |val|
      val == value.to_s ? result += 1 : nil 
    end
    result
  else
    nil 
  end
end

#del_column(colname) ⇒ Object

Delete a column from the Table. Raises ArgumentError if the column name does not exist.

Attributes

colname

String to identify the name of the column

Examples

cities.del_column("State") # returns table without "State" column

Raises:

  • (ArgumentError)


184
185
186
187
188
189
190
191
# File 'lib/tablestakes.rb', line 184

def del_column(colname)
  # check arguments
  raise ArgumentError, "Column name does not exist!" unless @table.has_key?(colname)
  
  @headers.delete(colname)
  @table.delete(colname)
  return self
end

#del_row(rownum) ⇒ Object

Delete a row from the Table. Raises ArgumentError if the row number is not found

Attributes

rownum

FixNum to hold the row number

Examples

cities.del_row(3)  # deletes row with index 3 (4th row)
cities.del_row(-1) # deletes last row (per Ruby convention)


202
203
204
205
206
207
208
209
210
211
# File 'lib/tablestakes.rb', line 202

def del_row(rownum)
  # check arguments
  if self.empty? || rownum >= @table[@headers.first].length
    raise ArgumentError, "Row number does not exist!" 
  end
  @headers.each do |col|
    @table[col].delete_at(rownum)
  end
  return self
end

#eachObject

Defines an iterator for Table which produces rows of data (headers omitted) for its calling block.



65
66
67
68
69
70
71
72
73
# File 'lib/tablestakes.rb', line 65

def each
  @table[@headers.first].each_index do |index|
    nextrow = []
    @headers.each do |col|
      nextrow << @table[col][index].clone
    end
    yield nextrow
  end
end

#empty?Boolean

Return true if the Table is empty, false otherwise.

Returns:

  • (Boolean)


100
101
102
# File 'lib/tablestakes.rb', line 100

def empty?
  @headers.length == 0 && @table.length == 0
end

#intersect(table2, colname, col2name = colname) ⇒ Object

Return an Array with the intersection of columns from different tables, eliminating duplicates. Return nil if a column is not found.

Attributes

table2

Table to identify the secondary table in the intersection

colname

String to identify the column to intersection

col2name

OPTIONAL String to identify the column in the second table to intersection

Examples

cities.intersect(capitals, "City", "Capital")  # returns Array with all capitals that are also in the cities table

Raises:

  • (ArgumentError)


530
531
532
533
534
535
536
537
# File 'lib/tablestakes.rb', line 530

def intersect(table2, colname, col2name=colname)
  # check arguments
  raise ArgumentError, "Invalid table!" unless table2.is_a?(Table)
  raise ArgumentError, "Invalid column name" unless @table.has_key?(colname)
  raise ArgumentError, "Invalid column name" unless table2.headers.include?(col2name)

  return self.column(colname) & table2.column(col2name)
end

#join(table2, colname, col2name = colname) ⇒ Object

Given a second table to join against, and a field/column, return a Table which contains a join of the two tables. Join only lists the common column once, under the column name of the first table (if different from the name of thee second). All columns from both tables are returned. Returns nil if the column is not found.

Attributes

table2

Table to identify the secondary table in the join

colname

String to identify the column to join on

col2name

OPTIONAL String to identify the column in the second table to join on

Examples

cities.join(capitals, "City", "Capital")  # returns a Table of cities that are also state capitals
capitals.join(cities, "State")  # returns a Table of capital cities with populations info from the cities table

Raises:

  • (ArgumentError)


427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
# File 'lib/tablestakes.rb', line 427

def join(table2, colname, col2name=colname)
  # check arguments
  raise ArgumentError, "Invalid table!" unless table2.is_a?(Table)
  raise ArgumentError, "Invalid column name" unless @table.has_key?(colname)
  raise ArgumentError, "Invalid column name" unless table2.headers.include?(col2name)
  t2_col_index = table2.headers.index(col2name)
  
  dedupe_headers(table2, colname)

  result = [ Array(@headers) + Array(table2.headers) ]
  @table[colname].each_index do |index|
    t2_index = table2.column(col2name).find_index(@table[colname][index])
    unless t2_index.nil?
      result << self.row(index) + table2.row(t2_index)
    end
  end
  if result.length == 1 #no rows selected
    return nil
  else
    return Table.new(result) 
  end
end

#row(index) ⇒ Object

Return a copy of a row from the table as an Array, given an index (i.e. row number). Returns empty Array if the index is out of bounds.

Attributes

index

FixNum indicating index of the row.



94
95
96
# File 'lib/tablestakes.rb', line 94

def row(index)    
  Array(get_row(index))
end

#select(*columns) ⇒ Object Also known as: get_columns

Select columns from the table, given one or more column names. Returns an instance of Table with the results. Raises ArgumentError if any column is not valid.

Attributes

columns

Variable String arguments to identify the columns to select

Examples

cities.select("City", "State")  # returns a Table of "City" and "State" columns
cities.select(cities.headers)  # returns a new Table that is a duplicate of cities

Raises:

  • (ArgumentError)


357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
# File 'lib/tablestakes.rb', line 357

def select(*columns)
  # check arguments
  raise ArgumentError, "Invalid column name(s)" unless columns
  columns.kind_of?(Array) ? columns.flatten! : nil
  columns.each do |c|
    raise ArgumentError, "Invalid column name" unless @table.has_key?(c)
  end

  result = []
  result_headers = []
  columns.each { |col| @headers.include?(col) ? result_headers << col : nil }
  result << result_headers
  @table[@headers.first].each_index do |index|
    this_row = []
    result_headers.each do |col|
      this_row << @table[col][index]
    end
    result << this_row
  end
  result_headers.empty? ? Table.new() : Table.new(result)
end

#sort(column = nil, &block) ⇒ Object Also known as: sort!

Sort the table based on given column. Uses precedence as defined in the column. By default will sort by the value in the first column.

Attributes

args

OPTIONAL String to identify the column on which to sort

Options

datatype => :Fixnum
datatype => :Float
datatype => :Date

Examples

cities.sort("State")  # Re-orders the cities table based on State name
cities.sort { |a,b| b<=>a }  # Reverse the order of the cities table
cities.sort("State") { |a,b| b<=>a }  # Sort by State in reverse alpha order


555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
# File 'lib/tablestakes.rb', line 555

def sort(column=nil, &block)
  col_index = 0
  if column.kind_of? String
    col_index = @headers.index(column)
  elsif column.kind_of? Fixnum
    col_index = column 
  end
  # return empty Table if empty
  if self.empty? 
    return Table.new() 
  end

  neworder = []
  self.each { |row| neworder << OrderedRow.new(row,col_index) }

  result = [neworder.shift.data] # take off headers
  block_given? ? neworder.sort!(&block) : neworder.sort!
  neworder.each { |row| result << row.data }

  return Table.new(result)
end

#sub(colname, re = nil, replace = nil, &block) ⇒ Object

Given a field/column, and a regular expression to match against, and a replacement string, create a new table which performs a substitute operation on column data. In the case that the given replacement is a String, a direct substitute is performed. In the case that it is a Hash and the matched text is one of its keys, the corresponding Hash value will be substituted.

Optionally takes a block containing an operation to perform on all matching data elements in the given column. Raises ArgumentError if the column is not found.

Attributes

colname

String to identify the column to join on

re

Regexp to match the value in the selected column

replace

OPTIONAL String or Hash to specify the replacement text for the given Regexp

&block

OPTIONAL block to execute against matching values

Examples

cities.sub("Population", /(.*?),(.*?)/, '\1\2')  # eliminate commas
capitals.sub("State", /NY/, "New York")  # replace acronym with full name
capitals.sub("State") { |state| state.downcase } # Lowercase for all values

Raises:

  • (ArgumentError)


470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
# File 'lib/tablestakes.rb', line 470

def sub(colname, re=nil, replace=nil, &block)
  # check arguments
  raise ArgumentError, "No regular expression to match against" unless re
  raise ArgumentError, "Invalid column name" unless @table.has_key?(colname)
  replace_str = ""
  if replace.respond_to?(:fetch)
    replace_str = replace.fetch(re)
  elsif replace.respond_to?(:to_str)
    replace_str = replace.to_str
  else
    raise ArgumentError, "Replacement must be String or Hash"
  end

  result = Table.new([@headers])
  col_index = @headers.index(colname)

  self.each do |row|
    if block_given?
      row[col_index] = block.call row[col_index]
    else
      row[col_index] = row[col_index].sub(re, replace_str)
    end  
    result.add_row(row)
  end
  return result
end

#tally(colname) ⇒ Object

Count instances in a particular field/column and return a Table of the results. Raises ArgumentError if the column is not found.

Attributes

colname

String to identify the column to tally

Examples

cities.tally("State")  # returns each State in the cities Table with number of occurences

Raises:

  • (ArgumentError)


336
337
338
339
340
341
342
343
344
345
# File 'lib/tablestakes.rb', line 336

def tally(colname)
  # check arguments
  raise ArgumentError, "Invalid column name"  unless @table.has_key?(colname)

  result = {}
  @table[colname].each do |val|
    result.has_key?(val) ? result[val] += 1 : result[val] = 1
  end
  return Table.new([[colname,"Count"]] + result.to_a)
end

#to_aObject

Converts a Table object to an array of arrays (each row). The first entry are the table headers.

Attributes

none



239
240
241
242
243
244
245
246
247
248
249
250
# File 'lib/tablestakes.rb', line 239

def to_a
  result = [ Array(@headers) ]
  
  @table[@headers.first].each_index do |index|
    items = []
    @headers.each do |col|
      items << @table[col][index]
    end
    result << items
  end
  result
end

#to_sObject

Converts a Table object to a tab-delimited string.

Attributes

none



218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
# File 'lib/tablestakes.rb', line 218

def to_s
  result = @headers.join("\t") << "\n"
  
  @table[@headers.first].each_index do |index|
    @headers.each do |col|
      result << @table[col][index].to_s
      unless col == @headers.last
        result << "\t"
      else
        result << "\n"
      end
    end
  end
  result
end

#top(colname, num = 1) ⇒ Object

Returns counts of the most frequent values found in a given column in the form of a Table. Raises ArgumentError if the column is not found. If no limit is given to the number of values, only the top value will be returned.

Attributes

colname

String to identify the column to count

num

OPTIONAL String number of values to return

Examples

cities.top("State")  # returns a Table with the most frequent state in the cities Table
cities.top("State", 10)  # returns a Table with the 10 most frequent states in the cities Table


302
303
304
305
# File 'lib/tablestakes.rb', line 302

def top(colname, num=1)
  freq = tally(colname).to_a[1..-1].sort_by {|k,v| v }.reverse
  return Table.new(freq[0..num-1].unshift([colname,"Count"]))
end

#union(table2, colname, col2name = colname) ⇒ Object

Return Array with the union of elements columns in the given tables, eliminating duplicates. Raises an ArgumentError if a column is not found.

Attributes

table2

Table to identify the secondary table in the union

colname

String to identify the column to union

col2name

OPTIONAL String to identify the column in the second table to union

Examples

cities.union(capitals, "City", "Capital")  # returns Array with all cities in both tables

Raises:

  • (ArgumentError)


510
511
512
513
514
515
516
517
# File 'lib/tablestakes.rb', line 510

def union(table2, colname, col2name=colname)
  # check arguments
  raise ArgumentError, "Invalid table!" unless table2.is_a?(Table)
  raise ArgumentError, "Invalid column name" unless @table.has_key?(colname)
  raise ArgumentError, "Invalid column name" unless table2.headers.include?(col2name)

  return self.column(colname) | table2.column(col2name)
end

#where(colname, condition = nil) ⇒ Object Also known as: get_rows

Given a particular condition for a given column field/column, return a subtable that matches the condition. If no condition is given, a new Table is returned with all records. Returns an empty table if the condition is not met or the column is not found.

Attributes

colname

String to identify the column to tally

condition

OPTIONAL String containing a ruby condition to evaluate

Examples

cities.where("State", "=='NY'")  # returns a Table of cities in New York state 
cities.where("State", "=~ /New.*/")  # returns a Table of cities in states that start with "New"
cities.where("Population", ".to_i > 1000000")  # returns a Table of cities with population over 1 million

Raises:

  • (ArgumentError)


395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
# File 'lib/tablestakes.rb', line 395

def where(colname, condition=nil)
  # check arguments
  raise ArgumentError, "Invalid Column Name" unless @headers.include?(colname)

  result = []
  result << @headers
  self.each do |row|
    if condition
      eval(%q["#{row[headers.index(colname)]}"] << "#{condition}") ? result << row : nil
    else
      result << row
    end
  end
  result.length > 1 ? Table.new(result) : Table.new()
end

#write_file(filename) ⇒ Object

Write a representation of the Table object to a file (tab delimited).

Attributes

filename

String to identify the name of the file to write



583
584
585
586
# File 'lib/tablestakes.rb', line 583

def write_file(filename)
  file = File.open(filename, "w")
  file.print self.to_s
end