Class: Optimus::Reader::TabfileParser

Inherits:
Object
  • Object
show all
Defined in:
lib/tabfile_parser.rb

Overview

This class is for reading tab-delimited Optimus files. (Or, really, any tab-delimited file). The main option of interest is the :skip_lines option, which specifies how many lines to skip before finding column names. For example:

TabfileParser.new(stream, :skip_lines => 1)

is what you’d use for skipping the filename line in a standard optimus Excel file.

Note: you’ll generally be using subclasses of this, and not manually specifying skip_lines.

Direct Known Subclasses

ExcelParser, OptimustabParser, RawTabParser

Instance Method Summary collapse

Constructor Details

#initialize(file, options = {}) ⇒ TabfileParser

Returns a new instance of TabfileParser.



23
24
25
26
27
28
# File 'lib/tabfile_parser.rb', line 23

def initialize(file, options = {})
  @file = file
  @skip_lines = options[:skip_lines] || 0
  @columns = options[:columns]
  @merge_header_lines = options[:merge_header_lines] || 1
end

Instance Method Details

#to_optimusObject

Raises:



30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# File 'lib/tabfile_parser.rb', line 30

def to_optimus
  lines = @file.read
  lines.gsub!(/\r\n?/,"\n")
  lines = lines.split("\n")
  
  raise DamagedFileError.new("File #{@file.path} appears to be empty.") if lines.nil?
  
  @skip_lines.times do
    lines.shift
  end
  
  headers = []

  @merge_header_lines.times do
    l = lines.shift
    headers << l.split("\t",-1).map {|elt| elt.strip }
  end
  file_columns = headers[0].zip(*headers[1..-1]).map{|labels| labels.join(' ')}

  expected_size = file_columns.size
  columns = file_columns
  data = Optimus::Data.new(columns)
  current_line = @skip_lines+1
  lines.each do |line|
    current_line += 1
    row = data.add_row
    col_data = line.split("\t",-1).map {|e| e.strip }
    if col_data.size != expected_size
      raise DamagedFileError.new("In #{@file.path}, line #{current_line} should have #{expected_size} columns but had #{col_data.size}.")
    end
    row.values = col_data
  end
  return data
end