Class: Bio::Blast::Fastacmd

Inherits:
Object show all
Includes:
Enumerable
Defined in:
lib/bio/io/fastacmd.rb

Overview

DESCRIPTION

Retrieves FASTA formatted sequences from a blast database using NCBI fastacmd command.

This class requires ‘fastacmd’ command and a blast database

(formatted using the ‘-o’ option of ‘formatdb’).

USAGE

require 'bio'

fastacmd = Bio::Blast::Fastacmd.new("/db/myblastdb")

entry = fastacmd.get_by_id("sp:128U_DROME")
fastacmd.fetch("sp:128U_DROME")
fastacmd.fetch(["sp:1433_SPIOL", "sp:1432_MAIZE"])

fastacmd.fetch(["sp:1433_SPIOL", "sp:1432_MAIZE"]).each do |fasta|
  puts fasta
end

REFERENCES

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(blast_database_file_path) ⇒ Fastacmd

This method provides a handle to a BLASTable database, which you can then use to retrieve sequences.

Prerequisites:

  • You have created a BLASTable database with the ‘-o T’ option.

  • You have the NCBI fastacmd tool installed.

For example, suppose the original input file looks like:

>my_seq_1
ACCGACCTCCGGAACGGATAGCCCGACCTACG
>my_seq_2
TCCGACCTTTCCTACCGCACACCTACGCCATCAC
...

and you’ve created a BLASTable database from that with the command

cd /my_dir/
formatdb -i my_input_file -t Test -n Test -o T

then you can get a handle to this database with the command

fastacmd = Bio::Blast::Fastacmd.new("/my_dir/Test")

Arguments:

  • database

    path and name of BLASTable database



80
81
82
83
# File 'lib/bio/io/fastacmd.rb', line 80

def initialize(blast_database_file_path)
  @database = blast_database_file_path
  @fastacmd = 'fastacmd'
end

Instance Attribute Details

#databaseObject

Database file path.



54
55
56
# File 'lib/bio/io/fastacmd.rb', line 54

def database
  @database
end

#fastacmdObject

fastacmd command file path.



57
58
59
# File 'lib/bio/io/fastacmd.rb', line 57

def fastacmd
  @fastacmd
end

Instance Method Details

#each_entryObject Also known as: each

Iterates over all sequences in the database.

fastacmd.each_entry do |fasta|
  p [ fasta.definition[0..30], fasta.seq.size ]
end

Returns

a Bio::FastaFormat object for each iteration



129
130
131
132
133
134
135
136
137
138
139
140
# File 'lib/bio/io/fastacmd.rb', line 129

def each_entry
  cmd = [ @fastacmd, '-d', @database, '-D', '1' ]
  Bio::Command.call_command(cmd) do |io|
    io.close_write
    Bio::FlatFile.open(Bio::FastaFormat, io) do |f|
      f.each_entry do |entry|
        yield entry
      end
    end
  end
  self
end

#fetch(list) ⇒ Object

Get the sequence for a list of IDs in the database.

For example:

p fastacmd.fetch(["sp:1433_SPIOL", "sp:1432_MAIZE"])

This method always returns an array of Bio::FastaFormat objects, even when the result is a single entry.


Arguments:

  • ids: list of IDs to retrieve from the database

Returns

array of Bio::FastaFormat objects



108
109
110
111
112
113
114
115
116
117
118
119
120
# File 'lib/bio/io/fastacmd.rb', line 108

def fetch(list)
  if list.respond_to?(:join)
    entry_id = list.join(",")
  else
    entry_id = list
  end

  cmd = [ @fastacmd, '-d', @database, '-s', entry_id ]
  Bio::Command.call_command(cmd) do |io|
    io.close_write
    Bio::FlatFile.new(Bio::FastaFormat, io).to_a
  end
end

#get_by_id(entry_id) ⇒ Object

Get the sequence of a specific entry in the BLASTable database. For example:

entry = fastacmd.get_by_id("sp:128U_DROME")

Arguments:

  • id: id of an entry in the BLAST database

Returns

a Bio::FastaFormat object



93
94
95
# File 'lib/bio/io/fastacmd.rb', line 93

def get_by_id(entry_id)
  fetch(entry_id).shift
end