Module: IMW::Formats::Delimited Abstract
Overview
Defines methods used for parsing and writing delimited data formats (CSV, TSV, &c.) with the FasterCSV library. This module is not used to directly extend a resource. Instead, more specific modules (e.g. - IMW::Resources::Formats::Csv) include this one and also define delimited_options which is actually what’s passed to FasterCSV.
Instance Method Summary collapse
-
#delimited_options ⇒ Hash
Default options to be passed to FasterCSV; see its documentation for more information.
-
#each(&block) ⇒ Object
Call
blockwith each row in this delimited resource. -
#emit(data, options = {}) ⇒ Object
(also: #<<)
Emit a single array or an array of arrays into this resource.
-
#fields_in_first_line? ⇒ true, false
Do a heuristic check to determine whether or not the first row of this delimited data is a row of headers.
-
#guess_fields! ⇒ Object
If it seems like there are fields in the first line of this data then go ahead and use them to define this resource’s fields.
-
#load {|Array| ... } ⇒ Array
Return the data in this delimited resource as an array of arrays.
-
#snippet ⇒ Array<Array>
Return a 10-line sample of this file.
Instance Method Details
#delimited_options ⇒ Hash
Default options to be passed to FasterCSV; see its documentation for more information.
19 20 21 22 23 |
# File 'lib/imw/formats/delimited.rb', line 19 def @delimited_options ||= { :headers => fields && fields.map { |field| field['name'] } }.merge() end |
#each(&block) ⇒ Object
Call block with each row in this delimited resource.
41 42 43 44 |
# File 'lib/imw/formats/delimited.rb', line 41 def each &block require 'fastercsv' FasterCSV.new(io, ).each(&block) end |
#emit(data, options = {}) ⇒ Object Also known as: <<
Emit a single array or an array of arrays into this resource.
51 52 53 54 55 56 57 58 |
# File 'lib/imw/formats/delimited.rb', line 51 def emit data, ={} require 'fastercsv' data = [data] unless data.first.is_a?(Array) data.each do |row| write(FasterCSV.generate_line(row, )) end self end |
#fields_in_first_line? ⇒ true, false
Do a heuristic check to determine whether or not the first row of this delimited data is a row of headers.
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
# File 'lib/imw/formats/delimited.rb', line 65 def fields_in_first_line? # grab the header and up to 10 body rows require 'fastercsv' copy = FasterCSV.new(io, .merge(:headers => false)) header = (copy.shift || []) rescue [] body = 10.times.map { (copy.shift || []) rescue []}.flatten # guess how many elements in a row #size_guess = ((header.size + body.map(&:size).inject(0.0) { |e, s| s += e }).to_f / (1 + body.length).to_f).to_i # calculate the fraction of bytes that are [-A-z_] (letters + # underscore + hypen) for header and body and compute a # threshold determinant header_chars = header.map(&:to_s).join header_schema_bytes = header_chars.bytes.find_all { |byte| (byte >= 65 && byte <= 90) || (byte >= 97 && byte <= 122) || byte == 95 || byte == 45 } body_chars = body.map(&:to_s).join body_schema_bytes = body_chars.bytes.find_all { |byte| (byte >= 65 && byte <= 90) || (byte >= 97 && byte <= 122) || byte == 95 || byte == 45 } header_schema_fraction = header_schema_bytes.size.to_f / header_chars.size.to_f rescue nil body_schema_fraction = body_schema_bytes.size.to_f / body_chars.size.to_f rescue nil determinant = (body_schema_fraction - header_schema_fraction).abs / 2.0 rescue nil # decide, setting the threshold at 0.05 based on some guesswork... determinant && determinant >= 0.05 end |
#guess_fields! ⇒ Object
If it seems like there are fields in the first line of this data then go ahead and use them to define this resource’s fields.
Will overwrite any fields already present for this resource.
95 96 97 98 99 100 101 |
# File 'lib/imw/formats/delimited.rb', line 95 def guess_fields! return unless fields_in_first_line? copy = FasterCSV.new(io, .merge(:headers => false)) names = (copy.shift || []) rescue [] self.fields = names.map { |n| { 'name' => n } } [:headers] = names end |
#load {|Array| ... } ⇒ Array
Return the data in this delimited resource as an array of arrays.
Yield each outer array (row) if passed a block.
32 33 34 35 |
# File 'lib/imw/formats/delimited.rb', line 32 def load &block require 'fastercsv' FasterCSV.parse(read, , &block) end |
#snippet ⇒ Array<Array>
Return a 10-line sample of this file.
106 107 108 109 110 111 112 113 114 115 116 |
# File 'lib/imw/formats/delimited.rb', line 106 def snippet require 'fastercsv' returning([]) do |rows| row_num = 1 each do |row| break if row_num > 10 rows << row.size.times.map { |index| row[index] } row_num += 1 end end end |