Class: Table

Inherits:
Object
  • Object
show all
Defined in:
lib/tablestakes.rb

Overview

This class is a Ruby representation of a table. All data is captured as type String by default. Columns are referred to by their String headers which are assumed to be identified in the first row of the input file. Output is written by default to tab-delimited files with the first row serving as the header names.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(input = nil) ⇒ Table

Instantiate a Table object using a tab-delimited file

Attributes

input

OPTIONAL Array of rows or String to identify the name of the tab-delimited file to read

Examples

cities = Table.new() # empty table
cities = Table.new([ ["City", "State], ["New York", "NY"], ["Dallas", "TX"] ]) # create from Array of rows
cities = Table.new("cities.txt") # read from file
cities = Table.new(capitals)  # create from table


42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# File 'lib/tablestakes.rb', line 42

def initialize(input=nil)
  @headers = []
  @table = {}
  @indices = {}
  
  if input.respond_to?(:fetch)
    if input[0].respond_to?(:fetch)
      #create Table from rows
      add_rows(input)
    end
  elsif input.respond_to?(:upcase)
    # a string, then read_file
    read_file(input)
  elsif input.respond_to?(:headers)
    @headers = input.headers.dup
    input.each {|row| add_row(row) }
  end
  # else create empty +Table+
end

Instance Attribute Details

#headersObject (readonly)

The headers attribute contains the table headers used to reference columns in the Table. All headers are represented as String types.



23
24
25
# File 'lib/tablestakes.rb', line 23

def headers
  @headers
end

Instance Method Details

#add_column(*args) ⇒ Object

Add a column to the Table. Raises ArgumentError if the column name is already taken or there are not the correct number of values.

Attributes

args

Array of String to identify the name of the column (see examples)

Examples

cities.add_column("City", ["New York", "Dallas", "San Franscisco"])
cities.add_column(["City","New York", "Dallas", "San Franscisco"])
cities.add_column("City", "New York", "Dallas", "San Franscisco")

Raises:

  • (ArgumentError)


119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# File 'lib/tablestakes.rb', line 119

def add_column(*args)
  if args.kind_of? Array
    args.flatten!
    colname = args.shift
    column_vals = args
  end
  # check arguments
  raise ArgumentError, "Duplicate Column Name!" if @table.has_key?(colname)
  unless self.empty?
    if column_vals.length != @table[@headers.first].length
      raise ArgumentError, "Number of elements in column does not match existing table"
    end
  end
  append_col(colname, column_vals)    
end

#add_row(*row) ⇒ Object Also known as: <<

Add a row to the Table, appending it to the end. Raises ArgumentError if there are not the correct number of values.

Attributes

row

Array to hold the row values

Examples

cities = Table.new.add_row( ["City", "State"] ) # create new Table with headers
cities.add_row("New York", "NY") # add data row to Table


189
190
191
192
193
194
195
196
197
198
199
200
201
202
# File 'lib/tablestakes.rb', line 189

def add_row(*row)
  if row.kind_of? Array
    row = row.flatten
  end
  if @headers.empty?
      @headers = row
  else
    unless row.length == @headers.length
      raise ArgumentError, "Wrong number of fields in Table input"
    end
    append_row(row)
  end
  return self
end

#add_rows(array_of_rows) ⇒ Object

Add one or more rows to the Table, appending it to the end. Raises ArgumentError if there are not the correct number of values. The first row becomes the table headers if currently undefined.

Attributes

array_of_rows

Array of Arrays to hold the rows values

Examples

cities.add_rows([ ["New York", "NY"], ["Austin", "TX"] ])


144
145
146
147
148
149
# File 'lib/tablestakes.rb', line 144

def add_rows(array_of_rows)
  array_of_rows.each do |r|
    add_row(r.clone)
  end
  return self
end

#append(a_table) ⇒ Object

Append one Table object to another. Raises ArgumentError if the header values and order do not align with the destination Table. Return self if appending an empty table. Return given table if appending to an empty table.

Attributes

a_table

Table to be added

Examples

cities.append(more_cities)


160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
# File 'lib/tablestakes.rb', line 160

def append(a_table)
  if !a_table.kind_of? Table 
    raise ArgumentError, "Argument to append is not a Table"
  end
  if self.empty? 
    return a_table
  elsif a_table.empty? 
    return self
  end
  if a_table.headers != @headers 
    raise ArgumentError, "Argument to append does not have matching headers"
  end

  a_table.each do |r|
      add_row(r.clone)
  end
  return self
end

#bottom(colname, num = 1) ⇒ Object

Returns counts of the least frequent values found in a given column in the form of a Table. Raises ArgumentError if the column is not found. If no limit is given to the number of values, only the least frequent value will be returned.

Attributes

colname

String to identify the column to count

num

OPTIONAL String number of values to return

Examples

cities.bottom("State")  # returns a Table with the least frequent state in the cities Table
cities.bottom("State", 10)  # returns a Table with the 10 least frequent states in the cities Table


363
364
365
366
# File 'lib/tablestakes.rb', line 363

def bottom(colname, num=1)
  freq = tally(colname).to_a[1..-1].sort_by {|k,v| v }
  return Table.new(freq[0..num-1].unshift([colname,"Count"]))
end

#column(colname) ⇒ Object

Return a copy of a column from the table, identified by column name. Returns empty Array if column name not found.

Attributes

colname

String to identify the name of the column



90
91
92
# File 'lib/tablestakes.rb', line 90

def column(colname)
  Array(get_col(colname))
end

#count(colname = nil, value = nil) ⇒ Object Also known as: size, length

Counts the number of instances of a particular string, given a column name, and returns an integer >= 0. Returns nil if the column is not found. If no parameters are given, returns the number of rows in the table.

Attributes

colname

OPTIONAL String to identify the column to count

value

OPTIONAL String value to count

Examples

cities.count  # returns number of rows in cities Table
cities.size   # same as cities.count
cities.length # same as cities.count
cities.count("State", "NY")  # returns the number of rows with State == "NY"

Raises:

  • (ArgumentError)


309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
# File 'lib/tablestakes.rb', line 309

def count(colname=nil, value=nil)
  if colname.nil? || value.nil?
    if @table.size > 0
      @table.each_key {|e| return @table.fetch(e).length }
    else
      return 0
    end
  end
  raise ArgumentError, "Invalid column name" unless @headers.include?(colname)
  
  if @table[colname]
    result = 0
    @table[colname].each do |val|
      val == value.to_s ? result += 1 : nil 
    end
    result
  else
    nil 
  end
end

#del_column(colname) ⇒ Object

Delete a column from the Table. Raises ArgumentError if the column name does not exist.

Attributes

colname

String to identify the name of the column

Examples

cities.del_column("State") # returns table without "State" column

Raises:

  • (ArgumentError)


213
214
215
216
217
218
219
220
# File 'lib/tablestakes.rb', line 213

def del_column(colname)
  # check arguments
  raise ArgumentError, "Column name does not exist!" unless @table.has_key?(colname)
  
  @headers.delete(colname)
  @table.delete(colname)
  return self
end

#del_row(rownum) ⇒ Object

Delete a row from the Table. Raises ArgumentError if the row number is not found

Attributes

rownum

FixNum to hold the row number

Examples

cities.del_row(3)  # deletes row with index 3 (4th row)
cities.del_row(-1) # deletes last row (per Ruby convention)


231
232
233
234
235
236
237
238
239
240
# File 'lib/tablestakes.rb', line 231

def del_row(rownum)
  # check arguments
  if self.empty? || rownum >= @table[@headers.first].length
    raise ArgumentError, "Row number does not exist!" 
  end
  @headers.each do |col|
    @table[col].delete_at(rownum)
  end
  return self
end

#eachObject

Defines an iterator for Table which produces rows of data (headers omitted) for its calling block.



65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# File 'lib/tablestakes.rb', line 65

def each

  if block_given?
    @table[@headers.first].each_index do |index|
      nextrow = []
      @headers.each do |col|
        begin
          nextrow << @table[col][index].clone 
        rescue
          nextrow << @table[col][index]
        end
      end
      yield nextrow
    end
  else
    self.to_enum(:each)
  end

end

#empty?Boolean

Return true if the Table is empty, false otherwise.

Returns:

  • (Boolean)


105
106
107
# File 'lib/tablestakes.rb', line 105

def empty?
  @headers.length == 0 && @table.length == 0
end

#intersect(table2, colname, col2name = colname) ⇒ Object

Return an Array with the intersection of columns from different tables, eliminating duplicates. Return nil if a column is not found.

Attributes

table2

Table to identify the secondary table in the intersection

colname

String to identify the column to intersection

col2name

OPTIONAL String to identify the column in the second table to intersection

Examples

cities.intersect(capitals, "City", "Capital")  # returns Array with all capitals that are also in the cities table

Raises:

  • (ArgumentError)


573
574
575
576
577
578
579
580
# File 'lib/tablestakes.rb', line 573

def intersect(table2, colname, col2name=colname)
  # check arguments
  raise ArgumentError, "Invalid table!" unless table2.is_a?(Table)
  raise ArgumentError, "Invalid column name" unless @table.has_key?(colname)
  raise ArgumentError, "Invalid column name" unless table2.headers.include?(col2name)

  return self.column(colname) & table2.column(col2name)
end

#join(table2, colname, col2name = colname) ⇒ Object

Given a second table to join against, and a field/column, return a Table which contains a join of the two tables. Join only lists the common column once, under the column name of the first table (if different from the name of thee second). All columns from both tables are returned. Returns nil if the column is not found.

Attributes

table2

Table to identify the secondary table in the join

colname

String to identify the column to join on

col2name

OPTIONAL String to identify the column in the second table to join on

Examples

cities.join(capitals, "City", "Capital")  # returns a Table of cities that are also state capitals
capitals.join(cities, "State")  # returns a Table of capital cities with populations info from the cities table

Raises:

  • (ArgumentError)


470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
# File 'lib/tablestakes.rb', line 470

def join(table2, colname, col2name=colname)
  # check arguments
  raise ArgumentError, "Invalid table!" unless table2.is_a?(Table)
  raise ArgumentError, "Invalid column name" unless @table.has_key?(colname)
  raise ArgumentError, "Invalid column name" unless table2.headers.include?(col2name)
  
  dedupe_headers(table2, colname)

  result = [ Array(@headers) + Array(table2.headers) ]
  @table[colname].each_index do |index|
    t2_index = table2.column(col2name).find_index(@table[colname][index])
    unless t2_index.nil?
      result << self.row(index) + table2.row(t2_index)
    end
  end
  if result.length == 1 #no rows selected
    return nil
  else
    return Table.new(result) 
  end
end

#rename_header(orig_name, new_name) ⇒ Object

Rename a header value for this Table object.

Attributes

orig_name

String current header name

new_name

String indicating new header name

Raises:

  • (ArgumentError)


248
249
250
251
252
253
254
255
# File 'lib/tablestakes.rb', line 248

def rename_header(orig_name, new_name)
  raise ArgumentError, "Original Column name type invalid" unless orig_name.kind_of? String
  raise ArgumentError, "New Column name type invalid" unless new_name.kind_of? String
  raise ArgumentError, "Column Name does not exist!" unless @headers.include? orig_name

  update_header(orig_name, new_name)
  return self
end

#row(index) ⇒ Object

Return a copy of a row from the table as an Array, given an index (i.e. row number). Returns empty Array if the index is out of bounds.

Attributes

index

FixNum indicating index of the row.



99
100
101
# File 'lib/tablestakes.rb', line 99

def row(index)    
  Array(get_row(index))
end

#select(*columns) ⇒ Object Also known as: get_columns

Select columns from the table, given one or more column names. Returns an instance of Table with the results. Raises ArgumentError if any column is not valid.

Attributes

columns

Variable String arguments to identify the columns to select

Examples

cities.select("City", "State")  # returns a Table of "City" and "State" columns
cities.select(cities.headers)  # returns a new Table that is a duplicate of cities

Raises:

  • (ArgumentError)


400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
# File 'lib/tablestakes.rb', line 400

def select(*columns)
  # check arguments
  raise ArgumentError, "Invalid column name(s)" unless columns
  columns.kind_of?(Array) ? columns.flatten! : nil
  columns.each do |c|
    raise ArgumentError, "Invalid column name" unless @table.has_key?(c)
  end

  result = []
  result_headers = []
  columns.each { |col| @headers.include?(col) ? result_headers << col : nil }
  result << result_headers
  @table[@headers.first].each_index do |index|
    this_row = []
    result_headers.each do |col|
      this_row << @table[col][index]
    end
    result << this_row
  end
  result_headers.empty? ? Table.new() : Table.new(result)
end

#sort(column = nil, &block) ⇒ Object Also known as: sort!

Sort the table based on given column. Uses precedence as defined in the column. By default will sort by the value in the first column.

Attributes

args

OPTIONAL String to identify the column on which to sort

Options

datatype => :Fixnum
datatype => :Float
datatype => :Date

Examples

cities.sort("State")  # Re-orders the cities table based on State name
cities.sort { |a,b| b<=>a }  # Reverse the order of the cities table
cities.sort("State") { |a,b| b<=>a }  # Sort by State in reverse alpha order


598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
# File 'lib/tablestakes.rb', line 598

def sort(column=nil, &block)
  col_index = 0
  if column.kind_of? String
    col_index = @headers.index(column)
  elsif column.kind_of? Fixnum
    col_index = column 
  end
  # return empty Table if empty
  if self.empty? 
    return Table.new() 
  end

  neworder = []
  self.each { |row| neworder << OrderedRow.new(row,col_index) }

  result = [neworder.shift.data] # take off headers
  block_given? ? neworder.sort!(&block) : neworder.sort!
  neworder.each { |row| result << row.data }

  return Table.new(result)
end

#sub(colname, match = nil, replace = nil, &block) ⇒ Object

Given a field/column, and a regular expression to match against, and a replacement string, create a new table which performs a substitute operation on column data. In the case that the given replacement is a String, a direct substitute is performed. In the case that it is a Hash and the matched text is one of its keys, the corresponding Hash value will be substituted.

Optionally takes a block containing an operation to perform on all matching data elements in the given column. Raises ArgumentError if the column is not found.

Attributes

colname

String to identify the column to substitute on

match

OPTIONAL String or Regexp to match the value in the selected column

replace

OPTIONAL String or Hash to specify the replacement text for the given match value

&block

OPTIONAL block to execute against matching values

Examples

cities.sub("Population", /(.*?),(.*?)/, '\1\2')  # eliminate commas
capitals.sub("State", /NY/, "New York")  # replace acronym with full name
capitals.sub("State", /North|South/, {"North" => "South", "South" => "North"}) # Northern states for Southern and vice-versa
capitals.sub("State") { |state| state.downcase } # Lowercase for all values

Raises:

  • (ArgumentError)


513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
# File 'lib/tablestakes.rb', line 513

def sub(colname, match=nil, replace=nil, &block)
  # check arguments
  raise ArgumentError, "No regular expression to match against" unless match || block_given?
  raise ArgumentError, "Invalid column name" unless @table.has_key?(colname)

  if ! block_given?
    if ! (String.try_convert(match) || Regexp.try_convert(match))
  	   raise ArgumentError, "Match expression must be String or Regexp"
    elsif ! (replace.respond_to?(:fetch) || replace.respond_to?(:to_str))
       raise ArgumentError, "Replacement must be String or Hash"
    end
  end

  result = Table.new([@headers])
  col_index = @headers.index(colname)

  self.each do |row|
    if block_given?
      row[col_index] = block.call row[col_index]
    else
      row[col_index] = row[col_index].sub(match, replace)
    end  
    result.add_row(row)
  end
  return result
end

#tally(colname) ⇒ Object

Count instances in a particular field/column and return a Table of the results. Raises ArgumentError if the column is not found.

Attributes

colname

String to identify the column to tally

Examples

cities.tally("State")  # returns each State in the cities Table with number of occurences

Raises:

  • (ArgumentError)


379
380
381
382
383
384
385
386
387
388
# File 'lib/tablestakes.rb', line 379

def tally(colname)
  # check arguments
  raise ArgumentError, "Invalid column name"  unless @table.has_key?(colname)

  result = {}
  @table[colname].each do |val|
    result.has_key?(val) ? result[val] += 1 : result[val] = 1
  end
  return Table.new([[colname,"Count"]] + result.to_a)
end

#to_aObject

Converts a Table object to an array of arrays (each row). The first entry are the table headers.

Attributes

none



282
283
284
285
286
287
288
289
290
291
292
293
# File 'lib/tablestakes.rb', line 282

def to_a
  result = [ Array(@headers) ]
  
  @table[@headers.first].each_index do |index|
    items = []
    @headers.each do |col|
      items << @table[col][index]
    end
    result << items
  end
  result
end

#to_sObject

Converts a Table object to a tab-delimited string.

Attributes

none



261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
# File 'lib/tablestakes.rb', line 261

def to_s
  result = @headers.join("\t") << "\n"
  
  @table[@headers.first].each_index do |index|
    @headers.each do |col|
      result << @table[col][index].to_s
      unless col == @headers.last
        result << "\t"
      else
        result << "\n"
      end
    end
  end
  result
end

#top(colname, num = 1) ⇒ Object

Returns counts of the most frequent values found in a given column in the form of a Table. Raises ArgumentError if the column is not found. If no limit is given to the number of values, only the top value will be returned.

Attributes

colname

String to identify the column to count

num

OPTIONAL String number of values to return

Examples

cities.top("State")  # returns a Table with the most frequent state in the cities Table
cities.top("State", 10)  # returns a Table with the 10 most frequent states in the cities Table


345
346
347
348
# File 'lib/tablestakes.rb', line 345

def top(colname, num=1)
  freq = tally(colname).to_a[1..-1].sort_by {|k,v| v }.reverse
  return Table.new(freq[0..num-1].unshift([colname,"Count"]))
end

#union(table2, colname, col2name = colname) ⇒ Object

Return Array with the union of elements columns in the given tables, eliminating duplicates. Raises an ArgumentError if a column is not found.

Attributes

table2

Table to identify the secondary table in the union

colname

String to identify the column to union

col2name

OPTIONAL String to identify the column in the second table to union

Examples

cities.union(capitals, "City", "Capital")  # returns Array with all cities in both tables

Raises:

  • (ArgumentError)


553
554
555
556
557
558
559
560
# File 'lib/tablestakes.rb', line 553

def union(table2, colname, col2name=colname)
  # check arguments
  raise ArgumentError, "Invalid table!" unless table2.is_a?(Table)
  raise ArgumentError, "Invalid column name" unless @table.has_key?(colname)
  raise ArgumentError, "Invalid column name" unless table2.headers.include?(col2name)

  return self.column(colname) | table2.column(col2name)
end

#where(colname, condition = nil) ⇒ Object Also known as: get_rows

Given a particular condition for a given column field/column, return a subtable that matches the condition. If no condition is given, a new Table is returned with all records. Returns an empty table if the condition is not met or the column is not found.

Attributes

colname

String to identify the column to tally

condition

OPTIONAL String containing a ruby condition to evaluate

Examples

cities.where("State", "=='NY'")  # returns a Table of cities in New York state 
cities.where("State", "=~ /New.*/")  # returns a Table of cities in states that start with "New"
cities.where("Population", ".to_i > 1000000")  # returns a Table of cities with population over 1 million

Raises:

  • (ArgumentError)


438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
# File 'lib/tablestakes.rb', line 438

def where(colname, condition=nil)
  # check arguments
  raise ArgumentError, "Invalid Column Name" unless @headers.include?(colname)

  result = []
  result << @headers
  self.each do |row|
    if condition
      eval(%q["#{row[headers.index(colname)]}"] << "#{condition}") ? result << row : nil
    else
      result << row
    end
  end
  result.length > 1 ? Table.new(result) : Table.new()
end

#write_file(filename) ⇒ Object

Write a representation of the Table object to a file (tab delimited).

Attributes

filename

String to identify the name of the file to write



626
627
628
629
# File 'lib/tablestakes.rb', line 626

def write_file(filename)
  file = File.open(filename, "w")
  file.print self.to_s
end