Class: BioDSL::SplitValues

Inherits:
Object
  • Object
show all
Defined in:
lib/BioDSL/commands/split_values.rb

Overview

Split the values of a key into new key/value pairs.

split_values splits the value of a given key into multiple values that are added to the record. The keys used for the values are per default based on the given key with an added index, but using the keys option allows specifying a list of keys to use instead.

Usage

split_values(<key>: <string>>[, delimiter: <string>[, keys: <list>]])

Options

# key: <string> - Key who’s value to split.

  • keys: <list> - List of keys to use with split values.

  • delimiter: <string> - Delimiter (default=‘_’).

Examples

Consider the following records:

{ID: "FOO:count=10", SEQ: "gataag"}
{ID: "FOO_10_20", SEQ: "gataag"}

To split the value belinging to ID do:

split_values(key: :ID)

{:ID=>"FOO:count=10", :SEQ=>"gataag"}
{:ID=>"FOO_10_20", :SEQ=>"gataag", :ID_0=>"FOO", :ID_1=>10, :ID_2=>20}

Using a different delimiter:

split_values(key: "ID", delimiter: ':count=')

{:ID=>"FOO:count=10", :SEQ=>"gataag", :ID_0=>"FOO", :ID_1=>10}
{:ID=>"FOO_10_20", :SEQ=>"gataag"}

Using a different delimiter and a list of keys:

split_values(key: "ID", keys: ["ID", :COUNT], delimiter: ':count=')

{:ID=>"FOO", :SEQ=>"gataag", :COUNT=>10}
{:ID=>"FOO_10_20", :SEQ=>"gataag"}

Constant Summary collapse

STATS =
%i(records_in records_out)

Instance Method Summary collapse

Constructor Details

#initialize(options) ⇒ SplitValues

Constructor for SplitValues.

Parameters:

  • options (Hash)

    Options hash.

Options Hash (options):



84
85
86
87
88
89
90
91
92
93
94
# File 'lib/BioDSL/commands/split_values.rb', line 84

def initialize(options)
  @options = options

  check_options

  @first       = true
  @convert     = []
  @keys        = @options[:keys]
  @key         = @options[:key].to_sym
  @delimiter   = @options[:delimiter] || '_'
end

Instance Method Details

#lmbProc

Return command lambda for split_values.

Returns:

  • (Proc)

    Command lambda.



99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# File 'lib/BioDSL/commands/split_values.rb', line 99

def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    input.each do |record|
      @status[:records_in] += 1

      if (value = record[@key])
        values = value.split(@delimiter)

        if values.size > 1
          determine_types(values) if @first

          split_values(values, record)
        end
      end

      output << record

      @status[:records_out] += 1
    end
  end
end