Class: BioDSL::MergeTable
- Inherits:
-
Object
- Object
- BioDSL::MergeTable
- Defined in:
- lib/BioDSL/commands/merge_table.rb
Overview
Merge records on a given key with tabular data from one or more files.
merge_table
reads in one or more tabular files and merges any records in the stream with identical values for a given key. The values for the given key must be unique in the tabular files, but not necesarily in the stream.
Consult read_table for details on how the tabular files are read.
The stats for merge_table
includes the following values:
-
rows_total - total number of table rows.
-
rows_matched - number of table rows with the given key.
-
rows_unmatched - number of table rows without the given key.
-
merged - number of records that was merged.
-
non_merged - number of records that was not merged.
Usage
merge_table(<input: <glob>>, <key: <string>>[, columns: <list>
[, keys: <list>[, skip: <uint>[, delimiter: <string>]]]])
Options
-
input <glob> - Input file or file glob expression.
-
key <string> - Key used to merge
-
columns <list> - List of columns to read in that order.
-
keys <list> - List of key identifiers to use for each column.
-
skip <uint> - Number of initial lines to skip (default=0).
-
delimiter <string> - Delimter to use for separating columsn
(default="\s+").
Examples
Consider the following two files:
test1.tab:
#ID ORGANISM
1 parrot
2 eel
3 platypus
4 beetle
test2.tab:
#ID COUNT
1 5423
2 34
3 2423
4 234
We can merge the data with merge_table
like this:
BD.new.
read_table(input: "test1.tab").
merge_table(input: "test2.tab", key: :ID).
dump.
run
{:ID=>1, :ORGANISM=>"parrot", :COUNT=>5423}
{:ID=>2, :ORGANISM=>"eel", :COUNT=>34}
{:ID=>3, :ORGANISM=>"platypus", :COUNT=>2423}
{:ID=>4, :ORGANISM=>"beetle", :COUNT=>234}
Constant Summary collapse
- STATS =
%i(records_in records_out rows_total rows_matched rows_unmatched merged non_merged)
Instance Method Summary collapse
-
#initialize(options) ⇒ MergeTable
constructor
Constructor for MergeTable.
-
#lmb ⇒ Proc
Return command lambda for merge_table.
Constructor Details
#initialize(options) ⇒ MergeTable
Constructor for MergeTable.
117 118 119 120 121 122 123 124 125 126 |
# File 'lib/BioDSL/commands/merge_table.rb', line 117 def initialize() @options = defaults @table = {} @key = @options[:key].to_sym @keys = [:keys] ? @options[:keys].map(&:to_sym) : nil end |
Instance Method Details
#lmb ⇒ Proc
Return command lambda for merge_table.
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
# File 'lib/BioDSL/commands/merge_table.rb', line 131 def lmb lambda do |input, output, status| status_init(status, STATS) parse_input_tables input.each do |record| @status[:records_in] += 1 if record[@key] && @table[record[@key]] @status[:merged] += 1 record = record.merge(@table[record[@key]]) else @status[:non_merged] += 1 end output << record @status[:records_out] += 1 end @status[:rows_total] = @status[:rows_matched] + @status[:rows_unmatched] end end |