Class: BioDSL::UniqueValues

Inherits:
Object
  • Object
show all
Defined in:
lib/BioDSL/commands/unique_values.rb

Overview

Select unique or non-unique records based on the value of a given key.

_unique_values+ selects records from the stream by checking values of a given key. If a duplicate record exists based on the given key, it will only output one record (the first). If the invert option is used, then non-unique records are selected.

Usage

unique_values(<key: <string>[, invert: <bool>])

Options

  • key: <string> - Key for which the value is checked for uniqueness.

  • invert: <bool> - Select non-unique records (default=false).

Examples

Consider the following two column table in the file ‘test.tab`:

Human   H1
Human   H2
Human   H3
Dog     D1
Dog     D2
Mouse   M1

To output only unique values for the first column we first read the table with read_table and then pass the result to unique_values:

BD.new.read_table(input: "test.tab").unique_values(key: :V0).dump.run

{:V0=>"Human", :V1=>"H1"}
{:V0=>"Dog", :V1=>"D1"}
{:V0=>"Mouse", :V1=>"M1"}

To output duplicate records instead use the invert options:

BD.new.
read_table(input: "test.tab").
unique_values(key: :V0, invert: true).
dump.
run

{:V0=>"Human", :V1=>"H2"}
{:V0=>"Human", :V1=>"H3"}
{:V0=>"Dog", :V1=>"D2"}

Constant Summary collapse

STATS =
%i(records_in records_out)

Instance Method Summary collapse

Constructor Details

#initialize(options) ⇒ UniqueValues

Constructor for UniqueValues.

Parameters:

  • options (Hash)

    Options hash.

Options Hash (options):

  • :key (String, Symbol)
  • :invert (Boolean)


88
89
90
91
92
93
94
95
# File 'lib/BioDSL/commands/unique_values.rb', line 88

def initialize(options)
  @options     = options
  @lookup      = Set.new
  @key         = options[:key].to_sym
  @invert      = options[:invert]

  check_options
end

Instance Method Details

#lmbProc

Return command lambda for unique_values

Returns:

  • (Proc)

    Command lambda.



100
101
102
103
104
105
106
107
108
109
110
111
112
113
# File 'lib/BioDSL/commands/unique_values.rb', line 100

def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    input.each do |record|
      @status[:records_in] += 1

      if output_record?(record)
        output << record
        @status[:records_out] += 1
      end
    end
  end
end