Class: MiGA::RemoteDataset
- Defined in:
- lib/miga/remote_dataset.rb,
lib/miga/remote_dataset/base.rb,
lib/miga/remote_dataset/download.rb
Overview
MiGA representation of datasets with data in remote locations.
Defined Under Namespace
Constant Summary
Constants included from MiGA
CITATION, VERSION, VERSION_DATE, VERSION_NAME
Instance Attribute Summary collapse
-
#db ⇒ Object
readonly
Database storing the dataset.
-
#ids ⇒ Object
readonly
Array of IDs of the entries composing the dataset.
-
#metadata ⇒ Object
readonly
Internal metadata hash.
-
#universe ⇒ Object
readonly
Universe of the dataset.
Class Method Summary collapse
-
.download(universe, db, ids, format, file = nil, extra = [], obj = nil) ⇒ Object
Download data from the
universein the databasedbwith IDsidsand informat. -
.download_rest(opts) ⇒ Object
(also: download_net)
Download data using the REST method.
-
.download_url(url) ⇒ Object
Download the given
urland return the result regardless of response code. - .ncbi_asm_acc2id(acc) ⇒ Object
-
.ncbi_asm_rest(opts) ⇒ Object
Download data from NCBI Assembly database using the REST method.
-
.ncbi_gb_rest(opts) ⇒ Object
Download data from NCBI GenBank (nuccore) database using the REST method.
-
.ncbi_map(id, dbfrom, db) ⇒ Object
Looks for the entry
idindbfrom, and returns the linked identifier indb(or nil). - .UNIVERSE ⇒ Object
Instance Method Summary collapse
-
#get_metadata(metadata_def = {}) ⇒ Object
Get metadata from the remote location.
-
#get_ncbi_taxid ⇒ Object
Get NCBI Taxonomy ID.
-
#get_ncbi_taxonomy ⇒ Object
Get NCBI taxonomy as MiGA::Taxonomy.
-
#get_type_status(metadata) ⇒ Object
Get the type material status and return an (updated)
metadatahash. -
#initialize(ids, db, universe) ⇒ RemoteDataset
constructor
Initialize MiGA::RemoteDataset with
idsin databasedbfromuniverse. -
#ncbi_asm_json_doc ⇒ Object
Get the JSON document describing an NCBI assembly entry.
-
#save_to(project, name = nil, is_ref = true, metadata_def = {}) ⇒ Object
Save dataset to the MiGA::Project
projectidentified withname. -
#update_metadata(dataset, metadata = {}) ⇒ Object
Updates the MiGA::Dataset
datasetwith the remotely available metadata, and optionally the Hashmetadata.
Methods included from Download
Methods inherited from MiGA
CITATION, CITATION_ARRAY, DEBUG, DEBUG_OFF, DEBUG_ON, DEBUG_TRACE_OFF, DEBUG_TRACE_ON, FULL_VERSION, LONG_VERSION, VERSION, VERSION_DATE, #advance, debug?, debug_trace?, initialized?, #like_io?, #num_suffix, rc_path, #result_files_exist?, #say
Methods included from Common::Path
Methods included from Common::Format
#clean_fasta_file, #seqs_length, #tabulate
Methods included from Common::Net
#download_file_ftp, #known_hosts, #remote_connection
Methods included from Common::SystemCall
Constructor Details
#initialize(ids, db, universe) ⇒ RemoteDataset
Initialize MiGA::RemoteDataset with ids in database db from universe.
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
# File 'lib/miga/remote_dataset.rb', line 42 def initialize(ids, db, universe) ids = [ids] unless ids.is_a? Array @ids = (ids.is_a?(Array) ? ids : [ids]) @db = db.to_sym @universe = universe.to_sym @metadata = {} @metadata[:"#{universe}_#{db}"] = ids.join(',') @@UNIVERSE.keys.include?(@universe) or raise "Unknown Universe: #{@universe}. Try: #{@@UNIVERSE.keys}" @@UNIVERSE[@universe][:dbs].include?(@db) or raise "Unknown Database: #{@db}. Try: #{@@UNIVERSE[@universe][:dbs]}" @_ncbi_asm_json_doc = nil # FIXME: Part of the +map_to+ support: # unless @@UNIVERSE[@universe][:dbs][@db][:map_to].nil? # MiGA::RemoteDataset.download # end end |
Instance Attribute Details
#db ⇒ Object (readonly)
Database storing the dataset.
32 33 34 |
# File 'lib/miga/remote_dataset.rb', line 32 def db @db end |
#ids ⇒ Object (readonly)
Array of IDs of the entries composing the dataset.
34 35 36 |
# File 'lib/miga/remote_dataset.rb', line 34 def ids @ids end |
#metadata ⇒ Object (readonly)
Internal metadata hash
36 37 38 |
# File 'lib/miga/remote_dataset.rb', line 36 def @metadata end |
#universe ⇒ Object (readonly)
Universe of the dataset.
30 31 32 |
# File 'lib/miga/remote_dataset.rb', line 30 def universe @universe end |
Class Method Details
.download(universe, db, ids, format, file = nil, extra = [], obj = nil) ⇒ Object
Download data from the universe in the database db with IDs ids and in format. If passed, it saves the result in file. Additional parameters specific to the download method can be passed using extra. Returns String. The obj can also be passed as MiGA::RemoteDataset or MiGA::Dataset.
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
# File 'lib/miga/remote_dataset/download.rb', line 14 def download(universe, db, ids, format, file = nil, extra = [], obj = nil) ids = [ids] unless ids.is_a? Array getter = @@UNIVERSE[universe][:dbs][db][:getter] || :download method = @@UNIVERSE[universe][:method] opts = { universe: universe, db: db, ids: ids, format: format, file: file, extra: extra, obj: obj } doc = send("#{getter}_#{method}", opts) unless opts[:file].nil? ofh = File.open(opts[:file], 'w') ofh.print doc.force_encoding('UTF-8') ofh.close end doc end |
.download_rest(opts) ⇒ Object Also known as: download_net
Download data using the REST method. Supported opts (Hash) include: universe (mandatory): Symbol db (mandatory): Symbol ids (mandatory): Array of String format: String extra: Array
76 77 78 79 80 81 82 83 |
# File 'lib/miga/remote_dataset/download.rb', line 76 def download_rest(opts) u = @@UNIVERSE[opts[:universe]] url = sprintf( u[:url], opts[:db], opts[:ids].join(','), opts[:format], *opts[:extra] ) url = u[:api_key][url] unless u[:api_key].nil? download_url url end |
.download_url(url) ⇒ Object
Download the given url and return the result regardless of response code. Attempts download up to three times before raising Net::ReadTimeout.
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
# File 'lib/miga/remote_dataset/download.rb', line 92 def download_url(url) doc = '' @timeout_try = 0 begin DEBUG 'GET: ' + url URI.parse(url).open(read_timeout: 600) { |f| doc = f.read } rescue => e @timeout_try += 1 raise e if @timeout_try >= 3 sleep 5 # <- For: 429 Too Many Requests DEBUG "RETRYING after: #{e}" retry end doc end |
.ncbi_asm_acc2id(acc) ⇒ Object
15 16 17 18 19 20 21 22 23 |
# File 'lib/miga/remote_dataset.rb', line 15 def ncbi_asm_acc2id(acc) return acc if acc =~ /^\d+$/ search_doc = MiGA::Json.parse( download(:ncbi_search, :assembly, acc, :json), symbolize: false, contents: true ) (search_doc['esearchresult']['idlist'] || []).first end |
.ncbi_asm_rest(opts) ⇒ Object
Download data from NCBI Assembly database using the REST method. Supported opts (Hash) include: obj (mandatory): MiGA::RemoteDataset ids (mandatory): String or Array of String file: String, passed to download extra: Array, passed to download format: String, passed to download
44 45 46 47 48 49 50 51 |
# File 'lib/miga/remote_dataset/download.rb', line 44 def ncbi_asm_rest(opts) url_dir = opts[:obj].ncbi_asm_json_doc['ftppath_genbank'] url = "#{url_dir}/#{File.basename url_dir}_genomic.fna.gz" download( :web, :assembly_gz, url, opts[:format], opts[:file], opts[:extra], opts[:obj] ) end |
.ncbi_gb_rest(opts) ⇒ Object
Download data from NCBI GenBank (nuccore) database using the REST method. Supported opts (Hash) are the same as #download_rest and #ncbi_asm_rest.
56 57 58 59 60 61 62 63 64 65 66 67 |
# File 'lib/miga/remote_dataset/download.rb', line 56 def ncbi_gb_rest(opts) o = download_rest(opts) return o unless o.strip.empty? MiGA::MiGA.DEBUG 'Empty sequence, attempting download from NCBI assembly' opts[:format] = :fasta_gz if opts[:file] File.unlink(opts[:file]) if File.exist? opts[:file] opts[:file] = "#{opts[:file]}.gz" end ncbi_asm_rest(opts) end |
.ncbi_map(id, dbfrom, db) ⇒ Object
Looks for the entry id in dbfrom, and returns the linked identifier in db (or nil).
112 113 114 115 116 117 118 119 120 121 122 |
# File 'lib/miga/remote_dataset/download.rb', line 112 def ncbi_map(id, dbfrom, db) doc = download(:ncbi_map, dbfrom, id, :json, nil, [db]) return if doc.empty? tree = MiGA::Json.parse(doc, contents: true) [:linksets, 0, :linksetdbs, 0, :links, 0].each do |i| tree = tree[i] break if tree.nil? end tree end |
.UNIVERSE ⇒ Object
7 8 9 |
# File 'lib/miga/remote_dataset/base.rb', line 7 def UNIVERSE @@UNIVERSE end |
Instance Method Details
#get_metadata(metadata_def = {}) ⇒ Object
Get metadata from the remote location.
101 102 103 104 105 106 107 108 109 |
# File 'lib/miga/remote_dataset.rb', line 101 def ( = {}) .each { |k, v| @metadata[k] = v } case universe when :ebi, :ncbi, :web # Get taxonomy @metadata[:tax] = get_ncbi_taxonomy end @metadata = get_type_status() end |
#get_ncbi_taxid ⇒ Object
Get NCBI Taxonomy ID.
113 114 115 116 |
# File 'lib/miga/remote_dataset.rb', line 113 def get_ncbi_taxid origin = (universe == :ncbi and db == :assembly) ? :web : universe send("get_ncbi_taxid_from_#{origin}") end |
#get_ncbi_taxonomy ⇒ Object
Get NCBI taxonomy as MiGA::Taxonomy.
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
# File 'lib/miga/remote_dataset.rb', line 133 def get_ncbi_taxonomy tax_id = get_ncbi_taxid return nil if tax_id.nil? lineage = { ns: 'ncbi' } doc = MiGA::RemoteDataset.download(:ncbi, :taxonomy, tax_id, :xml) doc.scan(%r{<Taxon>(.*?)</Taxon>}m).map(&:first).each do |i| name = i.scan(%r{<ScientificName>(.*)</ScientificName>}).first.to_a.first rank = i.scan(%r{<Rank>(.*)</Rank>}).first.to_a.first rank = nil if rank == 'no rank' or rank.empty? rank = 'dataset' if lineage.empty? and rank.nil? lineage[rank] = name unless rank.nil? or rank.nil? end MiGA.DEBUG "Got lineage: #{lineage}" MiGA::Taxonomy.new(lineage) end |
#get_type_status(metadata) ⇒ Object
Get the type material status and return an (updated) metadata hash.
121 122 123 124 125 126 127 128 129 |
# File 'lib/miga/remote_dataset.rb', line 121 def get_type_status() if [:ncbi_asm] get_type_status_ncbi_asm elsif [:ncbi_nuccore] get_type_status_ncbi_nuccore else end end |
#ncbi_asm_json_doc ⇒ Object
Get the JSON document describing an NCBI assembly entry.
152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
# File 'lib/miga/remote_dataset.rb', line 152 def ncbi_asm_json_doc return @_ncbi_asm_json_doc unless @_ncbi_asm_json_doc.nil? [:ncbi_asm] ||= ids.first if universe == :ncbi and db == :assembly return nil unless [:ncbi_asm] ncbi_asm_id = self.class.ncbi_asm_acc2id [:ncbi_asm] txt = nil 3.times do txt = self.class.download(:ncbi_summary, :assembly, ncbi_asm_id, :json) txt.empty? ? sleep(1) : break end doc = MiGA::Json.parse(txt, symbolize: false, contents: true) return if doc.nil? || doc['result'].nil? || doc['result'].empty? @_ncbi_asm_json_doc = doc['result'][ doc['result']['uids'].first ] end |
#save_to(project, name = nil, is_ref = true, metadata_def = {}) ⇒ Object
Save dataset to the MiGA::Project project identified with name. is_ref indicates if it should be a reference dataset, and contains metadata_def. If metadata_def includes metadata_only: true, no input data is downloaded.
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
# File 'lib/miga/remote_dataset.rb', line 65 def save_to(project, name = nil, is_ref = true, = {}) name ||= ids.join('_').miga_name project = MiGA::Project.new(project) if project.is_a? String MiGA::Dataset.exist?(project, name) and raise "Dataset #{name} exists in the project, aborting..." @metadata = () udb = @@UNIVERSE[universe][:dbs][db] @metadata["#{universe}_#{db}"] = ids.join(',') unless @metadata[:metadata_only] respond_to?("save_#{udb[:stage]}_to", true) or raise "Unexpected error: Unsupported stage #{udb[:stage]} for #{db}." send "save_#{udb[:stage]}_to", project, name, udb end dataset = MiGA::Dataset.new(project, name, is_ref, ) project.add_dataset(dataset.name) unless @metadata[:metadata_only] result = dataset.add_result(udb[:stage], true, is_clean: true) result.nil? and raise 'Empty dataset: seed result not added due to incomplete files.' result.clean! result.save end dataset end |
#update_metadata(dataset, metadata = {}) ⇒ Object
Updates the MiGA::Dataset dataset with the remotely available metadata, and optionally the Hash metadata.
93 94 95 96 97 |
# File 'lib/miga/remote_dataset.rb', line 93 def (dataset, = {}) = () .each { |k, v| dataset.[k] = v } dataset.save end |