Class: BioDSL::Sort

Inherits:
Object
  • Object
show all
Defined in:
lib/BioDSL/commands/sort.rb

Overview

Sort records in the stream.

sort records in the stream given a specific key. Sorting on multiple keys is currently not supported.

Usage

sort(key: <value>[, reverse: <bool>[, block_size: <uint>]])

Options

  • key: <value> - Sort records on the value for key.

  • reverse: <bool> - Reverse sort.

  • block_size: <uint> - Block size used for disk based sorting

    (default=250_000_000).
    

Examples

Consider the following table in the file ‘test.tab`:

#COUNT  ORGANISM
4 Dog
3 Cat
1 Eel

To sort this accoring to COUNT in descending order do:

BD.new.read_table(input: "test.tab").sort(key: :COUNT).dump.run

{:COUNT=>1, :ORGANISM=>"Eel"}
{:COUNT=>3, :ORGANISM=>"Cat"}
{:COUNT=>4, :ORGANISM=>"Dog"}

And in ascending order:

BD.new.
read_table(input: "test.tab").
sort(key: :COUNT, reverse: true).
dump.
run

{:COUNT=>4, :ORGANISM=>"Dog"}
{:COUNT=>3, :ORGANISM=>"Cat"}
{:COUNT=>1, :ORGANISM=>"Eel"}

The type of value determines the sorting, alphabetical order:

BD.new.read_table(input: "test.tab").sort(key: :ORGANISM).dump.run

{:COUNT=>3, :ORGANISM=>"Cat"}
{:COUNT=>4, :ORGANISM=>"Dog"}
{:COUNT=>1, :ORGANISM=>"Eel"}

And reverse alphabetic order:

BD.new.
read_table(input: "test.tab").
sort(key: :ORGANISM, reverse: true).
dump.
run

{:COUNT=>1, :ORGANISM=>"Eel"}
{:COUNT=>4, :ORGANISM=>"Dog"}
{:COUNT=>3, :ORGANISM=>"Cat"}

Constant Summary collapse

STATS =
%i(records_in records_out)
SORT_BLOCK_SIZE =

max bytes to hold in memory.

250_000_000

Instance Method Summary collapse

Constructor Details

#initialize(options) ⇒ Sort

Constructor for Sort.

Parameters:

  • options (Hash)

    Options hash.

Options Hash (options):

  • :key (String, Symbol)
  • :reverse (Boolean)
  • :block_size (Integer)


108
109
110
111
112
113
114
115
116
117
118
119
# File 'lib/BioDSL/commands/sort.rb', line 108

def initialize(options)
  @options    = options
  @block_size = options[:block_size] || SORT_BLOCK_SIZE
  @key        = options[:key].to_sym
  @files      = []
  @records    = []
  @size       = 0
  @pqueue     = pqueue_init
  @fds        = nil

  check_options
end

Instance Method Details

#lmbProc

Return command lambda for Sort.

Returns:

  • (Proc)

    Command lambda.



124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
# File 'lib/BioDSL/commands/sort.rb', line 124

def lmb
  lambda do |input, output, status|
    status_init(status, STATS)

    input.each do |record|
      @status[:records_in] += 1
      @records << record
      @size += record.to_s.size
      save_block if @size > @block_size
    end

    save_block
    open_block_files
    fill_pqueue
    output_pqueue(output)
  end
end