Class: Unipept::CSVFormatter

Inherits:
Formatter show all
Defined in:
lib/formatters.rb

Instance Method Summary collapse

Methods inherited from Formatter

available, default, #format, formatters, #group_by_first_key, hidden?, #integrate_fasta_headers, new_for_format, register

Instance Method Details

#convert(data, _first) ⇒ String

Converts the given input data to the CSV format.

Parameters:

  • data (Array)

    The data we wish to convert

  • Is (Boolean)

    this the first output batch?

Returns:

  • (String)

    The converted input data in the CSV format



237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
# File 'lib/formatters.rb', line 237

def convert(data, _first)
  keys = get_keys(data)

  CSV.generate do |csv|
    data.each do |o|
      row = {}
      o.each do |k, v|
        if %w[ec go ipr].include? k
          if v && !v.empty?
            v.first.each_key do |key|
              row[key == 'protein_count' ? "#{k}_protein_count" : key] = (v.map { |el| el[key] }).join(' ').strip
            end
          else
            row[k] = row.concat(Array.new($keys_length[0], nil)) # rubocop:disable Style/GlobalVars
          end
        else
          row[k] = (v == '' ? nil : v)
        end
      end
      csv << keys.map { |k| row[k] }
    end
  end
end


226
227
228
# File 'lib/formatters.rb', line 226

def footer
  ''
end

#get_keys(data, fasta_mapper = nil) ⇒ Object



172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
# File 'lib/formatters.rb', line 172

def get_keys(data, fasta_mapper = nil)
  # This global variable is necessary because we need to know how many items should be
  # nil in the convert function.
  $keys_length = 0 # rubocop:disable Style/GlobalVars
  # This array keeps track of items that are certainly filled in for each type of annotation
  non_empty_items = { 'ec' => nil, 'go' => nil, 'ipr' => nil }

  # First we look for items for both ec numbers, go terms and ipr codes that are fully filled in.
  data.each do |row|
    non_empty_items.each_key do |annotation_type|
      non_empty_items[annotation_type] = row if row[annotation_type] && !row[annotation_type].empty?
    end
  end

  keys = fasta_mapper ? ['fasta_header'] : []
  keys += (data.first.keys - %w[ec go ipr])
  processed_keys = keys

  non_empty_items.each do |annotation_type, non_empty_item|
    next unless non_empty_item

    keys += (non_empty_item.keys - processed_keys)
    processed_keys += non_empty_item.keys

    idx = keys.index(annotation_type)
    keys.delete_at(idx)
    keys.insert(idx, *non_empty_item[annotation_type].first.keys.map { |el| %w[ec_number go_term ipr_code].include?(el) ? el : "#{annotation_type}_#{el}" })
    $keys_length = *non_empty_item[annotation_type].first.keys.length # rubocop:disable Style/GlobalVars
  end

  keys
end

#header(data, fasta_mapper = nil) ⇒ String

Returns the header row for the given data and fasta_mapper. This row contains all the keys of the first element of the data, preceded by ‘fasta_header’ if a fasta_mapper is given.

data and corresponding fasta header. The data is represented as a list containing tuples where the first element is the fasta header and second element is the input data If a fasta_mapper is given, the output will be preceded with ‘fasta_header’.

Parameters:

  • data (Array)

    The data that we will use to extract the keys from.

  • fasta_mapper (Array<Array<String>>) (defaults to: nil)

    Optional mapping between input

Returns:

  • (String)

    The header row



218
219
220
221
222
223
224
# File 'lib/formatters.rb', line 218

def header(data, fasta_mapper = nil)
  keys = get_keys(data, fasta_mapper)

  CSV.generate do |csv|
    csv << keys.map(&:to_s) if keys.length.positive?
  end
end

#typeString

Returns The type of the current formatter: csv.

Returns:

  • (String)

    The type of the current formatter: csv



168
169
170
# File 'lib/formatters.rb', line 168

def type
  'csv'
end