Class: Bio::Blat::Report::Hit

Inherits:
Object
  • Object
show all
Defined in:
lib/bio/appl/blat/report.rb

Overview

Hit class for the BLAT result parser. Similar to Bio::Blast::Report::Hit but lacks many methods. Its object may contain some Bio::Blat::Report::SegmentPair objects.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(str) ⇒ Hit

Creates a new Hit object from a piece of BLAT result text. It is designed to be called internally from Bio::Blat::Report object. Users shall not use it directly.



293
294
295
# File 'lib/bio/appl/blat/report.rb', line 293

def initialize(str)
  @data = str.chomp.split(/\t/)
end

Instance Attribute Details

#dataObject (readonly)

Raw data of the hit. (Note that it doesn’t add 1 to position numbers.)



299
300
301
# File 'lib/bio/appl/blat/report.rb', line 299

def data
  @data
end

Instance Method Details

#block_countObject

Number of blocks(exons, segment pairs).



350
# File 'lib/bio/appl/blat/report.rb', line 350

def block_count; @data[17].to_i; end

#block_sizesObject

Sizes of all blocks(exons, segment pairs). Returns an array of numbers.



354
355
356
357
358
359
# File 'lib/bio/appl/blat/report.rb', line 354

def block_sizes
  unless defined?(@block_sizes) then
    @block_sizes = split_comma(@data[18]).collect { |x| x.to_i }
  end
  @block_sizes
end

#blocksObject Also known as: exons, hsps

Returns blocks(exons, segment pairs) of the hit. Returns an array of Bio::Blat::Report::SegmentPair objects.



363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
# File 'lib/bio/appl/blat/report.rb', line 363

def blocks
  unless defined?(@blocks)
    bs    = block_sizes
    qst   = query.starts
    tst   = target.starts
    qseqs = query.seqs
    tseqs = target.seqs
    pflag = self.protein?
    @blocks = (0...block_count).collect do |i|
      SegmentPair.new(query.size, target.size, strand, bs[i],
                      qst[i], tst[i], qseqs[i], tseqs[i],
                      pflag)
    end
  end
  @blocks
end

#each(&x) ⇒ Object

Iterates over each block(exon, segment pair) of the hit. Yields a Bio::Blat::Report::SegmentPair object.



404
405
406
# File 'lib/bio/appl/blat/report.rb', line 404

def each(&x) #:yields: segmentpair
  exons.each(&x)
end

#matchObject

Match nucleotides.



332
# File 'lib/bio/appl/blat/report.rb', line 332

def match;       @data[0].to_i;  end

#milli_badObject

Calculates the pslCalcMilliBad value defined in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).

The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).



418
419
420
421
422
423
424
425
426
427
428
429
430
# File 'lib/bio/appl/blat/report.rb', line 418

def milli_bad
  w = (self.protein? ? 3 : 1)
  qalen = w * (self.query.end - self.query.start)
  talen = self.target.end - self.target.start
  alen = (if qalen < talen then qalen; else talen; end)
  return 0 if alen <= 0
  d = qalen - talen
  d = 0 if d < 0
  total = w * (self.match + self.rep_match + self.mismatch)
  return 0 if total == 0
  return (1000 * (self.mismatch * w + self.query.gap_count +
                    (3 * Math.log(1 + d)).round) / total)
end

#mismatchObject

Mismatch nucleotides.



334
# File 'lib/bio/appl/blat/report.rb', line 334

def mismatch;    @data[1].to_i;  end

#n_sObject

“N’s”. Number of ‘N’ bases.



342
# File 'lib/bio/appl/blat/report.rb', line 342

def n_s;         @data[3].to_i;  end

#percent_identityObject

Calculates the percent identity compatible with the BLAT web server as described in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).

The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).



438
439
440
# File 'lib/bio/appl/blat/report.rb', line 438

def percent_identity
  100.0 - self.milli_bad * 0.1
end

#protein?Boolean

When the output data comes from the protein query, returns true. Otherwise (nucleotide query), returns false. It returns nil if this cannot be determined.

The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).

Note: It seems that it returns true only when protein query with nucleotide database (blat options: -q=prot -t=dnax).

Returns:

  • (Boolean)


451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
# File 'lib/bio/appl/blat/report.rb', line 451

def protein?
  return nil if self.block_sizes.empty?
  case self.strand[1,1]
  when '+'
    if self.target.end == self.target.starts[-1] +
        3 * self.block_sizes[-1] then
      true
    else
      false
    end
  when '-'
    if self.target.start == self.target.size -
        self.target.starts[-1] - 3 * self.block_sizes[-1] then
      true
    else
      false
    end
  else
    nil
  end
end

#queryObject

Returns sequence informations of the query. Returns a Bio::Blat::Report::SeqDesc object. This would be Bio::Blat specific method.



310
311
312
313
314
315
316
317
# File 'lib/bio/appl/blat/report.rb', line 310

def query
  unless defined?(@query)
    d = @data
    @query = SeqDesc.new(d[4], d[5], d[9], d[10], d[11], d[12],
                         split_comma(d[19]), split_comma(d[21]))
  end
  @query
end

#query_defObject Also known as: query_id

Returns the name of query sequence.



390
# File 'lib/bio/appl/blat/report.rb', line 390

def query_def;  query.name;  end

#query_lenObject

Returns the length of query sequence.



387
# File 'lib/bio/appl/blat/report.rb', line 387

def query_len;  query.size;  end

#rep_matchObject

“rep. match”. Number of bases that match but are part of repeats. Note that current version of BLAT always set 0.



339
# File 'lib/bio/appl/blat/report.rb', line 339

def rep_match;   @data[2].to_i;  end

#scoreObject

Calculates the score compatible with the BLAT web server as described in the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).

The algorithm is taken from the BLAT FAQ (genome.ucsc.edu/FAQ/FAQblat#blat4).



479
480
481
482
483
# File 'lib/bio/appl/blat/report.rb', line 479

def score
  w = (self.protein? ? 3 : 1)
  w * (self.match + (self.rep_match >> 1)) -
    w * self.mismatch - self.query.gap_count - self.target.gap_count
end

#strandObject

Returns strand information of the hit. Returns ‘+’ or ‘-’. This would be a Bio::Blat specific method.



347
# File 'lib/bio/appl/blat/report.rb', line 347

def strand;      @data[8];       end

#targetObject

Returns sequence informations of the target(hit). Returns a Bio::Blat::Report::SeqDesc object. This would be Bio::Blat specific method.



322
323
324
325
326
327
328
329
# File 'lib/bio/appl/blat/report.rb', line 322

def target
  unless defined?(@target)
    d = @data
    @target = SeqDesc.new(d[6], d[7], d[13], d[14], d[15], d[16],
                          split_comma(d[20]), split_comma(d[22]))
  end
  @target
end

#target_defObject Also known as: target_id, definition

Returns the name of the target(subject) sequence.



398
# File 'lib/bio/appl/blat/report.rb', line 398

def target_def; target.name; end

#target_lenObject Also known as: len

Returns the length of the target(subject) sequence.



394
# File 'lib/bio/appl/blat/report.rb', line 394

def target_len; target.size; end