Class: Bio::SPTR

Inherits:
EMBLDB show all
Includes:
EMBLDB::Common
Defined in:
lib/bio/db/embl/sptr.rb

Overview

Parser class for UniProtKB/SwissProt and TrEMBL database entry.

Direct Known Subclasses

SwissProt, TrEMBL, UniProt

Constant Summary collapse

@@entry_regrexp =
/[A-Z0-9]{1,4}_[A-Z0-9]{1,5}/
@@data_class =
["STANDARD", "PRELIMINARY"]
@@ac_regrexp =

Bio::EMBLDB::Common#ac -> ary

#accessions  -> ary
#accession  -> String (accessions.first)
/[OPQ][0-9][A-Z0-9]{3}[0-9]/
@@cc_topics =
['PHARMACEUTICAL',
'BIOTECHNOLOGY',
'TOXIC DOSE', 
'ALLERGEN',   
'RNA EDITING',
'POLYMORPHISM',
'BIOPHYSICOCHEMICAL PROPERTIES',
'MASS SPECTROMETRY',
'WEB RESOURCE', 
'ENZYME REGULATION',
'DISEASE',
'INTERACTION',
'DEVELOPMENTAL STAGE',
'INDUCTION',
'CAUTION',
'ALTERNATIVE PRODUCTS',
'DOMAIN',
'PTM',
'MISCELLANEOUS',
'TISSUE SPECIFICITY',
'COFACTOR',
'PATHWAY',
'SUBUNIT',
'CATALYTIC ACTIVITY',
'SUBCELLULAR LOCATION',
'FUNCTION',
'SIMILARITY']
@@dr_database_identifier =

returns databases cross-references in the DR lines.

  • Bio::SPTR#dr -> Hash w/in Array

DR Line; defabases cross-reference (>=0)

  DR  database_identifier; primary_identifier; secondary_identifier.
a cross_ref pre one line
['EMBL','CARBBANK','DICTYDB','ECO2DBASE',
'ECOGENE',
'FLYBASE','GCRDB','HIV','HSC-2DPAGE','HSSP','INTERPRO','MAIZEDB',
'MAIZE-2DPAGE','MENDEL','MGD''MIM','PDB','PFAM','PIR','PRINTS',
'PROSITE','REBASE','AARHUS/GHENT-2DPAGE','SGD','STYGENE','SUBTILIST',
'SWISS-2DPAGE','TIGR','TRANSFAC','TUBERCULIST','WORMPEP','YEPD','ZFIN']

Constants included from EMBLDB::Common

EMBLDB::Common::DELIMITER, EMBLDB::Common::RS, EMBLDB::Common::TAGSIZE

Instance Method Summary collapse

Methods included from EMBLDB::Common

#ac, #accession, #de, #initialize, #kw, #oc, #og

Methods inherited from EMBLDB

#initialize

Methods inherited from DB

#exists?, #fetch, #get, open, #tags

Instance Method Details

#cc(topic = nil) ⇒ Object

returns contents in the CC lines.

  • Bio::SPTR#cc -> Hash

returns an object of contents in the TOPIC.

  • Bio::SPTR#cc(TOPIC) -> Array w/in Hash, Hash

returns contents of the “ALTERNATIVE PRODUCTS”.

  • Bio::SPTR#cc('ALTERNATIVE PRODUCTS') -> Hash

    {'Event' => str, 
     'Named isoforms' => int,  
     'Comment' => str,
     'Variants'=>[{'Name' => str, 'Synonyms' => str, 'IsoId' => str, 'Sequence' => []}]}
    
    CC   -!- ALTERNATIVE PRODUCTS:
    CC       Event=Alternative splicing; Named isoforms=15;
    ...
    CC         placentae isoforms. All tissues differentially splice exon 13;
    CC       Name=A; Synonyms=no del;
    CC         IsoId=P15529-1; Sequence=Displayed;
    

returns contents of the “DATABASE”.

  • Bio::SPTR#cc('DATABASE') -> Array

    [{'NAME'=>str,'NOTE'=>str, 'WWW'=>URI,'FTP'=>URI}, ...]
    
    CC   -!- DATABASE: NAME=Text[; NOTE=Text][; WWW="Address"][; FTP="Address"].
    

returns contents of the “MASS SPECTROMETRY”.

  • Bio::SPTR#cc('MASS SPECTROMETRY') -> Array

    [{'MW"=>float,'MW_ERR'=>float, 'METHOD'=>str,'RANGE'=>str}, ...]
    
    CC   -!- MASS SPECTROMETRY: MW=XXX[; MW_ERR=XX][; METHOD=XX][;RANGE=XX-XX].
    

CC lines (>=0, optional)

CC   -!- TISSUE SPECIFICITY: HIGHEST LEVELS FOUND IN TESTIS. ALSO PRESENT
CC       IN LIVER, KIDNEY, LUNG AND BRAIN.

CC   -!- TOPIC: FIRST LINE OF A COMMENT BLOCK;
CC       SECOND AND SUBSEQUENT LINES OF A COMMENT BLOCK.

See also www.expasy.org/sprot/userman.html#CC_line


612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
# File 'lib/bio/db/embl/sptr.rb', line 612

def cc(topic = nil)
  unless @data['CC']
    cc  = Hash.new
    comment_border= '-' * (77 - 4 + 1)
    dlm = /-!- /

    # 12KD_MYCSM has no CC lines.
    return cc if get('CC').size == 0
    
    cc_raw = fetch('CC')

    # Removing the copyright statement.
    cc_raw.sub!(/ *---.+---/m, '')

    # Not any CC Lines without the copyright statement.
    return cc if cc_raw == ''

    begin
      cc_raw, copyright = cc_raw.split(/#{comment_border}/)[0]
      cc_raw = cc_raw.sub(dlm,'')
      cc_raw.split(dlm).each do |tmp|
        tmp = tmp.strip

        if /(^[A-Z ]+[A-Z]): (.+)/ =~ tmp
          key  = $1
          body = $2
          body.gsub!(/- (?!AND)/,'-')
          body.strip!
          unless cc[key]
            cc[key] = [body]
          else
            cc[key].push(body)
          end
        else
          raise ["Error: [#{entry_id}]: CC Lines", '"', tmp, '"',
                 '', get('CC'),''].join("\n")
        end
      end
    rescue NameError
      if fetch('CC') == ''
        return {}
      else
        raise ["Error: Invalid CC Lines: [#{entry_id}]: ",
               "\n'#{self.get('CC')}'\n", "(#{$!})"].join
      end
    rescue NoMethodError
    end
    
    @data['CC'] = cc
  end


  case topic
  when 'ALLERGEN'
    return @data['CC'][topic]
  when 'ALTERNATIVE PRODUCTS'
    return cc_alternative_products(@data['CC'][topic])
  when 'BIOPHYSICOCHEMICAL PROPERTIES'
    return cc_biophysiochemical_properties(@data['CC'][topic])
  when 'BIOTECHNOLOGY'
    return @data['CC'][topic]
  when 'CATALITIC ACTIVITY'
    return cc_catalytic_activity(@data['CC'][topic])
  when 'CAUTION'
    return cc_caution(@data['CC'][topic])
  when 'COFACTOR'
    return @data['CC'][topic]
  when 'DEVELOPMENTAL STAGE'
    return @data['CC'][topic].join('')
  when 'DISEASE'
    return @data['CC'][topic].join('')
  when 'DOMAIN'
    return @data['CC'][topic]
  when 'ENZYME REGULATION'
    return @data['CC'][topic].join('')
  when 'FUNCTION'
    return @data['CC'][topic].join('')
  when 'INDUCTION'
    return @data['CC'][topic].join('')
  when 'INTERACTION'
    return cc_interaction(@data['CC'][topic])
  when 'MASS SPECTROMETRY'
    return cc_mass_spectrometry(@data['CC'][topic])
  when 'MISCELLANEOUS'
    return @data['CC'][topic]
  when 'PATHWAY'
    return cc_pathway(@data['CC'][topic])
  when 'PHARMACEUTICAL'
    return @data['CC'][topic]
  when 'POLYMORPHISM'
    return @data['CC'][topic]
  when 'PTM'
    return @data['CC'][topic]
  when 'RNA EDITING'
    return cc_rna_editing(@data['CC'][topic])
  when 'SIMILARITY'
    return @data['CC'][topic]
  when 'SUBCELLULAR LOCATION'
    return cc_subcellular_location(@data['CC'][topic])
  when 'SUBUNIT'
    return @data['CC'][topic]
  when 'TISSUE SPECIFICITY'
    return @data['CC'][topic]
  when 'TOXIC DOSE'
    return @data['CC'][topic]
  when 'WEB RESOURCE'
    return cc_web_resource(@data['CC'][topic])
  when 'DATABASE'
    # DATABASE: NAME=Text[; NOTE=Text][; WWW="Address"][; FTP="Address"].
    tmp = Array.new
    db = @data['CC']['DATABASE']
    return db unless db

    db.each do |e|
      db = {'NAME' => nil, 'NOTE' => nil, 'WWW' => nil, 'FTP' => nil}
      e.sub(/.$/,'').split(/;/).each do |line|
        case line
        when /NAME=(.+)/
          db['NAME'] = $1
        when /NOTE=(.+)/
          db['NOTE'] = $1
        when /WWW="(.+)"/
          db['WWW'] = $1
        when /FTP="(.+)"/
          db['FTP'] = $1
        end 
      end
      tmp.push(db)
    end
    return tmp
  when nil
    return @data['CC']
  else
    return @data['CC'][topic]
  end
end

#cc_web_resource(data) ⇒ Object

CC -!- WEB RESOURCE: NAME=ResourceName[; NOTE=FreeText][; URL=WWWAddress].


924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
# File 'lib/bio/db/embl/sptr.rb', line 924

def cc_web_resource(data)
  data.map {|x|
    entry = {'NAME' => nil, 'NOTE' => nil, 'URL' => nil}
    x.split(';').each do |y|
      case y
      when /NAME=(.+)/
        entry['NAME'] = $1.strip
      when /NOTE=(.+)/
        entry['NOTE'] = $1.strip
      when /URL="(.+)"/
        entry['URL'] = $1.strip
      end
    end
    entry
  }
end

#dr(key = nil) ⇒ Object

Bio::SPTR#dr


959
960
961
962
963
964
965
966
967
968
969
970
# File 'lib/bio/db/embl/sptr.rb', line 959

def dr(key = nil)
  unless key
    embl_dr
  else
    (embl_dr[key] or []).map {|x|
      {'Accession' => x[0],
       'Version' => x[1],
       ' ' => x[2],
       'Molecular Type' => x[3]}
    }
  end
end

#dt(key = nil) ⇒ Object

returns a Hash of information in the DT lines.

hash keys: 
  ['created', 'sequence', 'annotation']
also Symbols acceptable (ASAP):
  [:created, :sequence, :annotation]

returns a String of information in the DT lines by a given key..

DT Line; date (3/entry)

DT DD-MMM-YYY (rel. NN, Created)
DT DD-MMM-YYY (rel. NN, Last sequence update)
DT DD-MMM-YYY (rel. NN, Last annotation update)

123
124
125
126
127
128
129
130
131
132
133
# File 'lib/bio/db/embl/sptr.rb', line 123

def dt(key = nil)
  return dt[key] if key
  return @data['DT'] if @data['DT']

  part = self.get('DT').split(/\n/)
  @data['DT'] = {
    'created'    => part[0].sub(/\w{2}   /,'').strip,
    'sequence'   => part[1].sub(/\w{2}   /,'').strip,
    'annotation' => part[2].sub(/\w{2}   /,'').strip
  }
end

#embl_drObject

Backup Bio::EMBLDB#dr as embl_dr


956
# File 'lib/bio/db/embl/sptr.rb', line 956

alias :embl_dr :dr

#entry_idObject Also known as: entry_name, entry

returns a ENTRY_NAME in the ID line.


79
80
81
# File 'lib/bio/db/embl/sptr.rb', line 79

def entry_id
  id_line('ENTRY_NAME')
end

#ft(feature_key = nil) ⇒ Object

returns contents in the feature table.

Examples

sp = Bio::SPTR.new(entry)
ft = sp.ft
ft.class #=> Hash
ft.keys.each do |feature_key|
  ft[feature_key].each do |feature|
    feature['From'] #=> '1'
    feature['To']   #=> '21'
    feature['Description'] #=> ''
    feature['FTId'] #=> ''
    feature['diff'] #=> []
    feature['original'] #=> [feature_key, '1', '21', '', '']
  end
end
  • Bio::SPTR#ft -> Hash

    {FEATURE_KEY => [{'From' => int, 'To' => int, 
                      'Description' => aStr, 'FTId' => aStr,
                      'diff' => [original_residues, changed_residues],
                      'original' => aAry }],...}
    

returns an Array of the information about the feature_name in the feature table.

  • Bio::SPTR#ft(feature_name) -> Array of Hash

    [{'From' => str, 'To' => str, 'Description' => str, 'FTId' => str},...]
    

FT Line; feature table data (>=0, optional)

Col     Data item
-----   -----------------
 1- 2   FT
 6-13   Feature name 
15-20   `FROM' endpoint
22-27   `TO' endpoint
35-75   Description (>=0 per key)
-----   -----------------

Note: 'FROM' and 'TO' endopoints are allowed to use non-numerial charactors including '<', '>' or '?'. (c.f. '<1', '?42')

See also www.expasy.org/sprot/userman.html#FT_line


1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
# File 'lib/bio/db/embl/sptr.rb', line 1024

def ft(feature_key = nil)
  return ft[feature_key] if feature_key
  return @data['FT'] if @data['FT']

  table = []
  begin
    get('FT').split("\n").each do |line|
      if line =~ /^FT   \w/
        feature = line.chomp.ljust(74)
        table << [feature[ 5..12].strip,   # Feature Name
                  feature[14..19].strip,   # From
                  feature[21..26].strip,   # To
                  feature[34..74].strip ]  # Description
      else
        table.last << line.chomp.sub!(/^FT +/, '')
      end
    end

    # Joining Description lines
    table = table.map { |feature| 
      ftid = feature.pop if feature.last =~ /FTId=/
      if feature.size > 4
        feature = [feature[0], 
                   feature[1], 
                   feature[2], 
                   feature[3, feature.size - 3].join(" ")]
      end
      feature << if ftid then ftid else '' end
    }

    hash = {}
    table.each do |feature|
      hash[feature[0]] = [] unless hash[feature[0]]
      hash[feature[0]] << {
        # Removing '<', '>' or '?' in FROM/TO endopoint.
        'From' => feature[1].sub(/\D/, '').to_i,  
        'To'   => feature[2].sub(/\D/, '').to_i, 
        'Description' => feature[3], 
        'FTId' => feature[4].to_s.sub(/\/FTId=/, '').sub(/\.$/, ''),
        'diff' => [],
        'original' => feature
      }

      case feature[0]
      when 'VARSPLIC', 'VARIANT', 'VAR_SEQ', 'CONFLICT'
        case hash[feature[0]].last['Description']
        when /(\w[\w ]*\w*) - ?> (\w[\w ]*\w*)/
          original_res = $1
          changed_res = $2
          original_res = original_res.gsub(/ /,'').strip
          chenged_res = changed_res.gsub(/ /,'').strip
        when /Missing/i
          original_res = seq.subseq(hash[feature[0]].last['From'],
                                    hash[feature[0]].last['To'])
          changed_res = ''
        end
        hash[feature[0]].last['diff'] = [original_res, chenged_res]
      end
    end
  rescue
    raise "Invalid FT Lines(#{$!}) in #{entry_id}:, \n'#{self.get('FT')}'\n"
  end

  @data['FT'] = hash
end

#gene_nameObject

returns a String of the first gene name in the GN line.


275
276
277
# File 'lib/bio/db/embl/sptr.rb', line 275

def gene_name
  gene_names.first
end

#gene_namesObject

returns a Array of gene names in the GN line.


264
265
266
267
268
269
270
271
# File 'lib/bio/db/embl/sptr.rb', line 264

def gene_names
  gn # set @data['GN'] if it hasn't been already done
  if @data['GN'].first.class == Hash then
    @data['GN'].collect { |element| element[:name] }
  else
    @data['GN'].first
  end
end

#gnObject

returns gene names in the GN line.

New UniProt/SwissProt format:

  • Bio::SPTR#gn -> [ <gene record>* ]

where <gene record> is:

{ :name => '...', 
  :synonyms => [ 's1', 's2', ... ],
  :loci   => [ 'l1', 'l2', ... ],
  :orfs     => [ 'o1', 'o2', ... ] 
}

Old format:

GN Line: Gene name(s) (>=0, optional)


188
189
190
191
192
193
194
195
196
197
198
# File 'lib/bio/db/embl/sptr.rb', line 188

def gn
  unless @data['GN']
    case fetch('GN')
    when /Name=/,/ORFNames=/,/OrderedLocusNames=/,/Synonyms=/
      @data['GN'] = gn_uniprot_parser
    else
      @data['GN'] = gn_old_parser
    end
  end
  @data['GN']
end

#hiObject

The HI line

Bio::SPTR#hi #=> hash


528
529
530
531
532
533
534
535
536
537
538
539
540
541
# File 'lib/bio/db/embl/sptr.rb', line 528

def hi
  unless @data['HI']
    @data['HI'] = []
    fetch('HI').split(/\. /).each do |hlist|
      hash = {'Category' => '',  'Keywords' => [], 'Keyword' => ''}
      hash['Category'], hash['Keywords'] = hlist.split(': ')
      hash['Keywords'] = hash['Keywords'].split('; ')
      hash['Keyword'] = hash['Keywords'].pop
      hash['Keyword'].sub!(/\.$/, '')
      @data['HI'] << hash
    end
  end
  @data['HI']
end

#id_line(key = nil) ⇒ Object

returns a Hash of the ID line.

returns a content (Int or String) of the ID line by a given key. Hash keys: ['ENTRY_NAME', 'DATA_CLASS', 'MODECULE_TYPE', 'SEQUENCE_LENGTH']

ID Line

ID   P53_HUMAN      STANDARD;      PRT;   393 AA.
#"ID  #{ENTRY_NAME} #{DATA_CLASS}; #{MOLECULE_TYPE}; #{SEQUENCE_LENGTH}."

Examples

obj.id_line  #=> {"ENTRY_NAME"=>"P53_HUMAN", "DATA_CLASS"=>"STANDARD", 
                  "SEQUENCE_LENGTH"=>393, "MOLECULE_TYPE"=>"PRT"}

obj.id_line('ENTRY_NAME') #=> "P53_HUMAN"

63
64
65
66
67
68
69
70
71
72
73
74
# File 'lib/bio/db/embl/sptr.rb', line 63

def id_line(key = nil)
  return id_line[key] if key
  return @data['ID'] if @data['ID']

  part = @orig['ID'].split(/ +/)         
  @data['ID'] = {
    'ENTRY_NAME'      => part[1],
    'DATA_CLASS'      => part[2].sub(/;/,''),
    'MOLECULE_TYPE'   => part[3].sub(/;/,''),
    'SEQUENCE_LENGTH' => part[4].to_i 
  }
end

#moleculeObject Also known as: molecule_type

returns a MOLECULE_TYPE in the ID line.

A short-cut for Bio::SPTR#id_line('MOLECULE_TYPE').


89
90
91
# File 'lib/bio/db/embl/sptr.rb', line 89

def molecule
  id_line('MOLECULE_TYPE')
end

#ohObject

The OH Line;

OH NCBI_TaxID=TaxID; HostName. br.expasy.org/sprot/userman.html#OH_line


358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
# File 'lib/bio/db/embl/sptr.rb', line 358

def oh
  unless @data['OH']
    @data['OH'] = fetch('OH').split("\. ").map {|x|
      if x =~ /NCBI_TaxID=(\d+);/
        taxid = $1
      else
        raise ArgumentError, ["Error: Invalid OH line format (#{self.entry_id}):",
                              $!, "\n", get('OH'), "\n"].join
        
      end
      if x =~ /NCBI_TaxID=\d+; (.+)/ 
        host_name = $1
        host_name.sub!(/\.$/, '')
      else
        host_name = nil
      end
      {'NCBI_TaxID' => taxid, 'HostName' => host_name}
    }
  end
  @data['OH']
end

#os(num = nil) ⇒ Object

returns a Array of Hashs or a String of the OS line when a key given.

  • Bio::EMBLDB#os -> Array

[{'name' => '(Human)', 'os' => 'Homo sapiens'}, 
 {'name' => '(Rat)', 'os' => 'Rattus norveticus'}]
{'name' => "(Human)", 'os' => 'Homo sapiens'}
  • Bio::SPTR#os['name'] -> “(Human)”

  • Bio::EPTR#os(0) -> “Homo sapiens (Human)”

OS Line; organism species (>=1)

OS   Genus species (name).
OS   Genus species (name0) (name1).
OS   Genus species (name0) (name1).
OS   Genus species (name0), G s0 (name0), and G s (name0) (name1).
OS   Homo sapiens (Human), and Rarrus norveticus (Rat)
OS   Hippotis sp. Clark and Watts 825.
OS   unknown cyperaceous sp.

297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
# File 'lib/bio/db/embl/sptr.rb', line 297

def os(num = nil)
  unless @data['OS']
    os = Array.new
    fetch('OS').split(/, and|, /).each do |tmp|
      if tmp =~ /(\w+ *[\w\d \:\'\+\-\.]+[\w\d\.])/
        org = $1
        tmp =~ /(\(.+\))/ 
        os.push({'name' => $1, 'os' => org})
      else
        raise "Error: OS Line. #{$!}\n#{fetch('OS')}\n"
      end
    end
    @data['OS'] = os
  end

  if num
    # EX. "Trifolium repens (white clover)"
    return "#{@data['OS'][num]['os']} #{@data['OS'][num]['name']}"
  else
    return @data['OS']
  end
end

#oxObject

returns a Hash of oraganism taxonomy cross-references.

  • Bio::SPTR#ox -> Hash

    {'NCBI_TaxID' => ['1234','2345','3456','4567'], ...}
    

OX Line; organism taxonomy cross-reference (>=1 per entry)

OX   NCBI_TaxID=1234;
OX   NCBI_TaxID=1234, 2345, 3456, 4567;

341
342
343
344
345
346
347
348
349
350
351
352
# File 'lib/bio/db/embl/sptr.rb', line 341

def ox
  unless @data['OX']
    tmp = fetch('OX').sub(/\.$/,'').split(/;/).map { |e| e.strip }
    hsh = Hash.new
    tmp.each do |e|
      db,refs = e.split(/=/)
      hsh[db] = refs.split(/, */)
    end
    @data['OX'] = hsh
  end
  return @data['OX']
end

#protein_nameObject

returns the proposed official name of the protein.

DE Line; description (>=1)

"DE #{OFFICIAL_NAME} (#{SYNONYM})"
"DE #{OFFICIAL_NAME} (#{SYNONYM}) [CONTEINS: #1; #2]."
OFFICIAL_NAME  1/entry
SYNONYM        >=0
CONTEINS       >=0

144
145
146
147
148
149
150
151
152
# File 'lib/bio/db/embl/sptr.rb', line 144

def protein_name
  name = ""
  if de_line = fetch('DE') then
    str = de_line[/^[^\[]*/] # everything preceding the first [ (the "contains" part)
    name = str[/^[^(]*/].strip
    name << ' (Fragment)' if str =~ /fragment/i
  end
  return name
end

#refObject

returns contents in the R lines.

  • Bio::EMBLDB::Common#ref -> [ <refernece information Hash>* ]

where <reference information Hash> is:

{'RN' => '', 'RC' => '', 'RP' => '', 'RX' => '', 
 'RA' => '', 'RT' => '', 'RL' => '', 'RG' => ''}

R Lines

  • RN RC RP RX RA RT RL RG


394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
# File 'lib/bio/db/embl/sptr.rb', line 394

def ref
  unless @data['R']
    @data['R'] = [get('R').split(/\nRN   /)].flatten.map { |str|
      hash = {'RN' => '', 'RC' => '', 'RP' => '', 'RX' => '', 
             'RA' => '', 'RT' => '', 'RL' => '', 'RG' => ''}
      str = 'RN   ' + str unless /^RN   / =~ str

      str.split("\n").each do |line|
        if /^(R[NPXARLCTG])   (.+)/ =~ line
          hash[$1] += $2 + ' '
        else
          raise "Invalid format in R lines, \n[#{line}]\n"
        end
      end

      hash['RN'] = set_RN(hash['RN'])
      hash['RC'] = set_RC(hash['RC'])
      hash['RP'] = set_RP(hash['RP'])
      hash['RX'] = set_RX(hash['RX'])
      hash['RA'] = set_RA(hash['RA'])
      hash['RT'] = set_RT(hash['RT'])
      hash['RL'] = set_RL(hash['RL'])
      hash['RG'] = set_RG(hash['RG'])

      hash
    }

  end
  @data['R']
end

#referencesObject

returns Bio::Reference object from Bio::EMBLDB::Common#ref.

  • Bio::EMBLDB::Common#ref -> Bio::References


488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
# File 'lib/bio/db/embl/sptr.rb', line 488

def references
  unless @data['references']
    ary = self.ref.map {|ent|
      hash = Hash.new('')
      ent.each {|key, value|
        case key
        when 'RA'
          hash['authors'] = value.split(/, /)
        when 'RT'
          hash['title'] = value
        when 'RL'
          if value =~ /(.*) (\d+) \((\d+)\), (\d+-\d+) \((\d+)\)$/
            hash['journal'] = $1
            hash['volume']  = $2
            hash['issue']   = $3
            hash['pages']   = $4
            hash['year']    = $5
          else
            hash['journal'] = value
          end
        when 'RX'  # PUBMED, MEDLINE, DOI
          value.each do |tag, xref|
            hash[ tag.downcase ]  = xref
          end
        end
      }
      Reference.new(hash)
    }
    @data['references'] = References.new(ary)
  end
  @data['references']
end

#seqObject Also known as: aaseq

returns a Bio::Sequence::AA of the amino acid sequence.

  • Bio::SPTR#seq -> Bio::Sequence::AA

blank Line; sequence data (>=1)


1134
1135
1136
1137
1138
1139
# File 'lib/bio/db/embl/sptr.rb', line 1134

def seq
  unless @data['']
    @data[''] = Sequence::AA.new( fetch('').gsub(/ |\d+/,'') )
  end
  return @data['']
end

#sequence_lengthObject Also known as: aalen

returns a SEQUENCE_LENGTH in the ID line.

A short-cut for Bio::SPTR#id_line('SEQUENCE_LENGHT').


98
99
100
# File 'lib/bio/db/embl/sptr.rb', line 98

def sequence_length
  id_line('SEQUENCE_LENGTH')
end

#set_RN(data) ⇒ Object


425
426
427
# File 'lib/bio/db/embl/sptr.rb', line 425

def set_RN(data)
  data.strip
end

#sq(key = nil) ⇒ Object

returns a Hash of conteins in the SQ lines.

  • Bio::SPTRL#sq -> hsh

returns a value of a key given in the SQ lines.

  • Bio::SPTRL#sq(key) -> int or str

  • Keys: ['MW', 'mw', 'molecular', 'weight', 'aalen', 'len', 'length',

    'CRC64']
    

SQ Line; sequence header (1/entry)

SQ   SEQUENCE   233 AA;  25630 MW;  146A1B48A1475C86 CRC64;
SQ   SEQUENCE  \d+ AA; \d+ MW;  [0-9A-Z]+ CRC64;

MW, Dalton unit. CRC64 (64-bit Cyclic Redundancy Check, ISO 3309).


1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
# File 'lib/bio/db/embl/sptr.rb', line 1106

def sq(key = nil)
  unless @data['SQ']
    if fetch('SQ') =~ /(\d+) AA\; (\d+) MW; (.+) CRC64;/
      @data['SQ'] = { 'aalen' => $1.to_i, 'MW' => $2.to_i, 'CRC64' => $3 }
    else
      raise "Invalid SQ Line: \n'#{fetch('SQ')}'"
    end
  end

  if key
    case key
    when /mw/, /molecular/, /weight/
      @data['SQ']['MW']
    when /len/, /length/, /AA/
      @data['SQ']['aalen']
    else
      @data['SQ'][key]
    end
  else 
    @data['SQ']
  end
end

#synonymsObject

returns an array of synonyms (unofficial names).

synonyms are each placed in () following the official name on the DE line.


158
159
160
161
162
163
164
165
166
167
168
169
# File 'lib/bio/db/embl/sptr.rb', line 158

def synonyms
  ary = Array.new
  if de_line = fetch('DE') then
    line = de_line.sub(/\[.*\]/,'') # ignore stuff between [ and ].  That's the "contains" part
    line.scan(/\([^)]+/) do |synonym| 
      unless synonym =~ /fragment/i then 
        ary << synonym[1..-1].strip # index to remove the leading (  
      end
    end
  end
  return ary
end