Class: Bio::Blast

Inherits:
Object show all
Defined in:
lib/bio/appl/blast.rb,
lib/bio/io/fastacmd.rb,
lib/bio/appl/blast/rexml.rb,
lib/bio/appl/blast/remote.rb,
lib/bio/appl/blast/report.rb,
lib/bio/appl/bl2seq/report.rb,
lib/bio/appl/blast/format0.rb,
lib/bio/appl/blast/format8.rb,
lib/bio/appl/blast/wublast.rb,
lib/bio/appl/blast/rpsblast.rb,
lib/bio/appl/blast/xmlparser.rb,
lib/bio/appl/blast/ncbioptions.rb

Overview

Description

The Bio::Blast class contains methods for running local or remote BLAST searches, as well as for parsing of the output of such BLASTs (i.e. the BLAST reports). For more information on similarity searches and the BLAST program, see www.ncbi.nlm.nih.gov/Education/BLASTinfo/similarity.html.

Usage

require 'bio'

# To run an actual BLAST analysis:
#   1. create a BLAST factory
remote_blast_factory = Bio::Blast.remote('blastp', 'swissprot',
                                         '-e 0.0001', 'genomenet')
#or:
local_blast_factory = Bio::Blast.local('blastn','/path/to/db')

#   2. run the actual BLAST by querying the factory
report = remote_blast_factory.query(sequence_text)

# Then, to parse the report, see Bio::Blast::Report

See also

  • Bio::Blast::Report

  • Bio::Blast::Report::Hit

  • Bio::Blast::Report::Hsp

References

Defined Under Namespace

Modules: Default, RPSBlast, Remote, WU Classes: Bl2seq, Fastacmd, NCBIOptions, Report, Report_tab

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(program, db, opt = [], server = 'local') ⇒ Blast

Creates a Bio::Blast factory object.

To run any BLAST searches, a factory has to be created that describes a certain BLAST pipeline: the program to use, the database to search, any options and the server to use. E.g.

blast_factory = Bio::Blast.new('blastn','dbsts', '-e 0.0001 -r 4', 'genomenet')

Arguments:

  • program (required): ‘blastn’, ‘blastp’, ‘blastx’, ‘tblastn’ or ‘tblastx’

  • db (required): name of the (local or remote) database

  • options: blastall options \

(see www.genome.jp/dbget-bin/show_man?blast2)

  • server: server to use (e.g. ‘genomenet’; DEFAULT = ‘local’)

Returns

Bio::Blast factory object



316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
# File 'lib/bio/appl/blast.rb', line 316

def initialize(program, db, opt = [], server = 'local')
  @program  = program
  @db       = db

  @blastall = 'blastall'
  @matrix   = nil
  @filter   = nil

  @output   = ''
  @parser   = nil
  @format   = nil

  @options = set_options(opt, program, db)
  self.server = server
end

Instance Attribute Details

#blastallObject

Full path for blastall. (default: ‘blastall’).



279
280
281
# File 'lib/bio/appl/blast.rb', line 279

def blastall
  @blastall
end

#dbObject

Database name (-d option for blastall)



248
249
250
# File 'lib/bio/appl/blast.rb', line 248

def db
  @db
end

#filterObject

Filter option for blastall -F (T or F).



285
286
287
# File 'lib/bio/appl/blast.rb', line 285

def filter
  @filter
end

#formatObject

Output report format for blastall -m

0, pairwise; 1; 2; 3; 4; 5; 6; 7, XML Blast outpu;, 8, tabular; 9, tabular with comment lines; 10, ASN text; 11, ASN binery [intege].



294
295
296
# File 'lib/bio/appl/blast.rb', line 294

def format
  @format
end

#matrixObject

Substitution matrix for blastall -M



282
283
284
# File 'lib/bio/appl/blast.rb', line 282

def matrix
  @matrix
end

#optionsObject

Options for blastall



251
252
253
# File 'lib/bio/appl/blast.rb', line 251

def options
  @options
end

#outputObject (readonly)

Returns a String containing blast execution output in as is the Bio::Blast#format.



288
289
290
# File 'lib/bio/appl/blast.rb', line 288

def output
  @output
end

#parser=(value) ⇒ Object (writeonly)

to change :xmlparser, :rexml, :tab



297
298
299
# File 'lib/bio/appl/blast.rb', line 297

def parser=(value)
  @parser = value
end

#programObject

Program name (-p option for blastall): blastp, blastn, blastx, tblastn or tblastx



245
246
247
# File 'lib/bio/appl/blast.rb', line 245

def program
  @program
end

#serverObject

Server to submit the BLASTs to



259
260
261
# File 'lib/bio/appl/blast.rb', line 259

def server
  @server
end

Class Method Details

.local(program, db, options = '', blastall = nil) ⇒ Object

This is a shortcut for Bio::Blast.new:

Bio::Blast.local(program, database, options)

is equivalent to

Bio::Blast.new(program, database, options, 'local')

Arguments:

  • program (required): ‘blastn’, ‘blastp’, ‘blastx’, ‘tblastn’ or ‘tblastx’

  • db (required): name of the local database

  • options: blastall options \

(see www.genome.jp/dbget-bin/show_man?blast2)

  • blastall: full path to blastall program (e.g. “/opt/bin/blastall”; DEFAULT: “blastall”)

Returns

Bio::Blast factory object



78
79
80
81
82
83
84
# File 'lib/bio/appl/blast.rb', line 78

def self.local(program, db, options = '', blastall = nil)
  f = self.new(program, db, options, 'local')
  if blastall then
    f.blastall = blastall
  end
  f
end

.remote(program, db, option = '', server = 'genomenet') ⇒ Object

Bio::Blast.remote does exactly the same as Bio::Blast.new, but sets the remote server ‘genomenet’ as its default.


Arguments:

  • program (required): ‘blastn’, ‘blastp’, ‘blastx’, ‘tblastn’ or ‘tblastx’

  • db (required): name of the remote database

  • options: blastall options \

(see www.genome.jp/dbget-bin/show_man?blast2)

  • server: server to use (DEFAULT = ‘genomenet’)

Returns

Bio::Blast factory object



96
97
98
# File 'lib/bio/appl/blast.rb', line 96

def self.remote(program, db, option = '', server = 'genomenet')
  self.new(program, db, option, server)
end

.reports(input, parser = nil) ⇒ Object

Bio::Blast.report parses given data, and returns an array of report (Bio::Blast::Report or Bio::Blast::Default::Report) objects, or yields each report object when a block is given.

Supported formats: NCBI default (-m 0), XML (-m 7), tabular (-m 8).


Arguments:

  • input (required): input data

  • parser: type of parser. see Bio::Blast::Report.new

Returns

Undefiend when a block is given. Otherwise, an Array containing report (Bio::Blast::Report or Bio::Blast::Default::Report) objects.



113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
# File 'lib/bio/appl/blast.rb', line 113

def self.reports(input, parser = nil)
  begin
    istr = input.to_str
  rescue NoMethodError
    istr = nil
  end
  if istr then
    input = StringIO.new(istr)
  end
  raise 'unsupported input data type' unless input.respond_to?(:gets)

  # if proper parser is given, emulates old behavior.
  case parser
  when :xmlparser, :rexml
    ff = Bio::FlatFile.new(Bio::Blast::Report, input)
    if block_given? then
      ff.each do |e|
        yield e
      end
      return []
    else
      return ff.to_a
    end
  when :tab
    istr = input.read unless istr
    rep = Report.new(istr, parser)
    if block_given? then
      yield rep
      return []
    else
      return [ rep ]
    end
  end

  # preparation of the new format autodetection rule if needed
  if !defined?(@@reports_format_autodetection_rule) or
      !@@reports_format_autodetection_rule then
    regrule = Bio::FlatFile::AutoDetect::RuleRegexp
    blastxml = regrule[ 'Bio::Blast::Report',
                        /\<\!DOCTYPE BlastOutput PUBLIC / ]
    blast    = regrule[ 'Bio::Blast::Default::Report',
                        /^BLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ]
    tblast   = regrule[ 'Bio::Blast::Default::Report_TBlast',
                        /^TBLAST.? +[\-\.\w]+ +\[[\-\.\w ]+\]/ ]
    tab      = regrule[ 'Bio::Blast::Report_tab',
                        /^([^\t]*\t){11}[^\t]*$/ ]
    auto = Bio::FlatFile::AutoDetect[ blastxml,
                                      blast,
                                      tblast,
                                      tab
                                    ]
    # sets priorities
    blastxml.is_prior_to blast
    blast.is_prior_to tblast
    tblast.is_prior_to tab
    # rehash
    auto.rehash
    @@report_format_autodetection_rule = auto
  end

  # Creates a FlatFile object with dummy class
  ff = Bio::FlatFile.new(Object, input)
  ff.dbclass = nil

  # file format autodetection
  3.times do
    break if ff.eof? or
      ff.autodetect(31, @@report_format_autodetection_rule)
  end
  # If format detection failed, assumed to be tabular (-m 8)
  ff.dbclass = Bio::Blast::Report_tab unless ff.dbclass

  if block_given? then
    ff.each do |entry|
      yield entry
    end
    ret = []
  else
    ret = ff.to_a
  end
  ret
end

.reports_xml(input, parser = nil) ⇒ Object

Note that this is the old implementation of Bio::Blast.reports. The aim of this method is keeping compatibility for older BLAST XML documents which might not be parsed by the new Bio::Blast.reports nor Bio::FlatFile. (Though we are not sure whether such documents exist or not.)

Bio::Blast.reports_xml parses given data, and returns an array of Bio::Blast::Report objects, or yields each Bio::Blast::Report object when a block is given.

It can be used only for XML format. For default (-m 0) format, consider using Bio::FlatFile, or Bio::Blast.reports.


Arguments:

  • input (required): input data

  • parser: type of parser. see Bio::Blast::Report.new

Returns

Undefiend when a block is given. Otherwise, an Array containing Bio::Blast::Report objects.



219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
# File 'lib/bio/appl/blast.rb', line 219

def self.reports_xml(input, parser = nil)
  ary = []
  input.each_line("</BlastOutput>\n") do |xml|
    xml.sub!(/[^<]*(<?)/, '\1') # skip before <?xml> tag
    next if xml.empty?          # skip trailing no hits
    rep = Report.new(xml, parser)
    if rep.reports then
      if block_given?
        rep.reports.each { |r| yield r }
      else
        ary.concat rep.reports
      end
    else
      if block_given?
        yield rep
      else
        ary.push rep
      end
    end
  end
  return ary
end

Instance Method Details

#optionObject

Returns options of blastall



373
374
375
376
# File 'lib/bio/appl/blast.rb', line 373

def option
  # backward compatibility
  Bio::Command.make_command_line(options)
end

#option=(str) ⇒ Object

Set options for blastall



379
380
381
382
# File 'lib/bio/appl/blast.rb', line 379

def option=(str)
  # backward compatibility
  self.options = Shellwords.shellwords(str)
end

#query(query) ⇒ Object

This method submits a sequence to a BLAST factory, which performs the actual BLAST.

# example 1
seq = Bio::Sequence::NA.new('agggcattgccccggaagatcaagtcgtgctcctg')
report = blast_factory.query(seq)

# example 2
str <<END_OF_FASTA
>lcl|MySequence
MPPSAISKISNSTTPQVQSSSAPNLTMLEGKGISVEKSFRVYSEEENQNQHKAKDSLGF
KELEKDAIKNSKQDKKDHKNWLETLYDQAEQKWLQEPKKKLQDLIKNSGDNSRVILKDS
END_OF_FASTA
report = blast_factory.query(str)

Bug note: When multi-FASTA is given and the format is 7 (XML) or 8 (tab), it should return an array of Bio::Blast::Report objects, but it returns a single Bio::Blast::Report object. This is a known bug and should be fixed in the future.


Arguments:

  • query (required): single- or multiple-FASTA formatted sequence(s)

Returns

a Bio::Blast::Report (or Bio::Blast::Default::Report) object when single query is given. When multiple sequences are given as the query, it returns an array of Bio::Blast::Report (or Bio::Blast::Default::Report) objects. If it can not parse result, nil will be returnd.



357
358
359
360
361
362
363
364
365
366
367
368
369
370
# File 'lib/bio/appl/blast.rb', line 357

def query(query)
  case query
  when Bio::Sequence
    query = query.output(:fasta)
  when Bio::Sequence::NA, Bio::Sequence::AA, Bio::Sequence::Generic
    query = query.to_fasta('query', 70)
  else
    query = query.to_s
  end

  @output = self.__send__("exec_#{@server}", query)
  report = parse_result(@output)
  return report
end