Class: MiGA::RemoteDataset
- Defined in:
- lib/miga/remote_dataset.rb,
lib/miga/remote_dataset/base.rb,
lib/miga/remote_dataset/download.rb
Overview
MiGA representation of datasets with data in remote locations.
Defined Under Namespace
Constant Summary
Constants included from MiGA
CITATION, VERSION, VERSION_DATE, VERSION_NAME
Instance Attribute Summary collapse
-
#db ⇒ Object
readonly
Database storing the dataset.
-
#ids ⇒ Object
readonly
Array of IDs of the entries composing the dataset.
-
#metadata ⇒ Object
readonly
Internal metadata hash.
-
#universe ⇒ Object
readonly
Universe of the dataset.
Class Method Summary collapse
-
.download(universe, db, ids, format, file = nil, extra = []) ⇒ Object
Download data from the
universe
in the databasedb
with IDsids
and informat
. -
.download_rest(universe, db, ids, format, extra = []) ⇒ Object
(also: download_net)
Download data using a REST method from the
universe
in the databasedb
with IDsids
and informat
. -
.download_url(url) ⇒ Object
Download the given
url
and return the result regardless of response code. -
.ncbi_map(id, dbfrom, db) ⇒ Object
Looks for the entry
id
indbfrom
, and returns the linked identifier indb
(or nil). - .UNIVERSE ⇒ Object
Instance Method Summary collapse
-
#get_metadata(metadata_def = {}) ⇒ Object
Get metadata from the remote location.
-
#get_ncbi_taxid ⇒ Object
Get NCBI Taxonomy ID.
-
#get_ncbi_taxonomy ⇒ Object
Get NCBI taxonomy as MiGA::Taxonomy.
-
#get_type_status(metadata) ⇒ Object
Get the type material status and return an (updated)
metadata
hash. -
#initialize(ids, db, universe) ⇒ RemoteDataset
constructor
Initialize MiGA::RemoteDataset with
ids
in databasedb
fromuniverse
. -
#save_to(project, name = nil, is_ref = true, metadata_def = {}) ⇒ Object
Save dataset to the MiGA::Project
project
identified withname
. -
#update_metadata(dataset, metadata = {}) ⇒ Object
Updates the MiGA::Dataset
dataset
with the remotely available metadata, and optionally the Hashmetadata
.
Methods included from Download
Methods inherited from MiGA
CITATION, DEBUG, DEBUG_OFF, DEBUG_ON, DEBUG_TRACE_OFF, DEBUG_TRACE_ON, FULL_VERSION, LONG_VERSION, VERSION, VERSION_DATE, initialized?, #result_files_exist?
Methods included from Common::Path
Methods included from Common::Format
#clean_fasta_file, #seqs_length, #tabulate
Constructor Details
#initialize(ids, db, universe) ⇒ RemoteDataset
Initialize MiGA::RemoteDataset with ids
in database db
from universe
.
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
# File 'lib/miga/remote_dataset.rb', line 25 def initialize(ids, db, universe) ids = [ids] unless ids.is_a? Array @ids = (ids.is_a?(Array) ? ids : [ids]) @db = db.to_sym @universe = universe.to_sym @metadata = {} @metadata[:"#{universe}_#{db}"] = ids.join(",") @@UNIVERSE.keys.include?(@universe) or raise "Unknown Universe: #{@universe}. Try: #{@@UNIVERSE.keys}" @@UNIVERSE[@universe][:dbs].include?(@db) or raise "Unknown Database: #{@db}. Try: #{@@UNIVERSE[@universe][:dbs]}" # FIXME: Part of the +map_to+ support: # unless @@UNIVERSE[@universe][:dbs][@db][:map_to].nil? # MiGA::RemoteDataset.download # end end |
Instance Attribute Details
#db ⇒ Object (readonly)
Database storing the dataset.
17 18 19 |
# File 'lib/miga/remote_dataset.rb', line 17 def db @db end |
#ids ⇒ Object (readonly)
Array of IDs of the entries composing the dataset.
19 20 21 |
# File 'lib/miga/remote_dataset.rb', line 19 def ids @ids end |
#metadata ⇒ Object (readonly)
Internal metadata hash
21 22 23 |
# File 'lib/miga/remote_dataset.rb', line 21 def @metadata end |
#universe ⇒ Object (readonly)
Universe of the dataset.
15 16 17 |
# File 'lib/miga/remote_dataset.rb', line 15 def universe @universe end |
Class Method Details
.download(universe, db, ids, format, file = nil, extra = []) ⇒ Object
Download data from the universe
in the database db
with IDs ids
and in format
. If passed, it saves the result in file
. Additional parameters specific to the download method can be passed using extra
. Returns String.
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
# File 'lib/miga/remote_dataset/download.rb', line 14 def download(universe, db, ids, format, file = nil, extra = []) ids = [ids] unless ids.is_a? Array case @@UNIVERSE[universe][:method] when :rest doc = download_rest(universe, db, ids, format, extra) when :net doc = download_net(universe, db, ids, format, extra) end unless file.nil? ofh = File.open(file, 'w') ofh.print doc ofh.close end doc end |
.download_rest(universe, db, ids, format, extra = []) ⇒ Object Also known as: download_net
Download data using a REST method from the universe
in the database db
with IDs ids
and in format
. Additional URL parameters can be passed using extra
. Returns the doc as String.
34 35 36 37 38 39 |
# File 'lib/miga/remote_dataset/download.rb', line 34 def download_rest(universe, db, ids, format, extra = []) u = @@UNIVERSE[universe] url = sprintf(u[:url], db, ids.join(","), format, *extra) url = u[:api_key][url] unless u[:api_key].nil? download_url url end |
.download_url(url) ⇒ Object
Download the given url
and return the result regardless of response code. Attempts download up to three times before raising Net::ReadTimeout.
50 51 52 53 54 55 56 57 58 59 60 61 |
# File 'lib/miga/remote_dataset/download.rb', line 50 def download_url(url) doc = '' @timeout_try = 0 begin open(url, read_timeout: 600) { |f| doc = f.read } rescue => e @timeout_try += 1 raise e if @timeout_try >= 3 retry end doc end |
.ncbi_map(id, dbfrom, db) ⇒ Object
Looks for the entry id
in dbfrom
, and returns the linked identifier in db
(or nil).
66 67 68 69 70 71 72 73 74 75 |
# File 'lib/miga/remote_dataset/download.rb', line 66 def ncbi_map(id, dbfrom, db) doc = download(:ncbi_map, dbfrom, id, :json, nil, [db]) return if doc.empty? tree = JSON.parse(doc, symbolize_names: true) [:linksets, 0, :linksetdbs, 0, :links, 0].each do |i| tree = tree[i] break if tree.nil? end tree end |
.UNIVERSE ⇒ Object
9 |
# File 'lib/miga/remote_dataset/base.rb', line 9 def UNIVERSE ; @@UNIVERSE ; end |
Instance Method Details
#get_metadata(metadata_def = {}) ⇒ Object
Get metadata from the remote location.
77 78 79 80 81 82 83 84 85 |
# File 'lib/miga/remote_dataset.rb', line 77 def ( = {}) .each { |k,v| @metadata[k] = v } case universe when :ebi, :ncbi, :web # Get taxonomy @metadata[:tax] = get_ncbi_taxonomy end @metadata = get_type_status() end |
#get_ncbi_taxid ⇒ Object
Get NCBI Taxonomy ID.
89 90 91 |
# File 'lib/miga/remote_dataset.rb', line 89 def get_ncbi_taxid send("get_ncbi_taxid_from_#{universe}") end |
#get_ncbi_taxonomy ⇒ Object
Get NCBI taxonomy as MiGA::Taxonomy.
108 109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/miga/remote_dataset.rb', line 108 def get_ncbi_taxonomy tax_id = get_ncbi_taxid lineage = {} doc = MiGA::RemoteDataset.download(:ncbi, :taxonomy, tax_id, :xml) doc.scan(%r{<Taxon>(.*?)</Taxon>}m).map(&:first).each do |i| name = i.scan(%r{<ScientificName>(.*)</ScientificName>}).first.to_a.first rank = i.scan(%r{<Rank>(.*)</Rank>}).first.to_a.first rank = nil if rank == 'no rank' or rank.empty? rank = 'dataset' if lineage.empty? and rank.nil? lineage[rank] = name unless rank.nil? or rank.nil? end MiGA::Taxonomy.new(lineage) end |
#get_type_status(metadata) ⇒ Object
Get the type material status and return an (updated) metadata
hash.
96 97 98 99 100 101 102 103 104 |
# File 'lib/miga/remote_dataset.rb', line 96 def get_type_status() if [:ncbi_asm] get_type_status_ncbi_asm elsif [:ncbi_nuccore] get_type_status_ncbi_nuccore else end end |
#save_to(project, name = nil, is_ref = true, metadata_def = {}) ⇒ Object
Save dataset to the MiGA::Project project
identified with name
. is_ref
indicates if it should be a reference dataset, and contains metadata_def
.
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
# File 'lib/miga/remote_dataset.rb', line 45 def save_to(project, name = nil, is_ref = true, = {}) name ||= ids.join('_').miga_name project = MiGA::Project.new(project) if project.is_a? String MiGA::Dataset.exist?(project, name) and raise "Dataset #{name} exists in the project, aborting..." @metadata = () udb = @@UNIVERSE[universe][:dbs][db] @metadata["#{universe}_#{db}"] = ids.join(',') respond_to?("save_#{udb[:stage]}_to", true) or raise "Unexpected error: Unsupported stage #{udb[:stage]} for #{db}." send "save_#{udb[:stage]}_to", project, name, udb dataset = MiGA::Dataset.new(project, name, is_ref, ) project.add_dataset(dataset.name) result = dataset.add_result(udb[:stage], true, is_clean: true) result.nil? and raise 'Empty dataset: seed result not added due to incomplete files.' result.clean! result.save dataset end |
#update_metadata(dataset, metadata = {}) ⇒ Object
Updates the MiGA::Dataset dataset
with the remotely available metadata, and optionally the Hash metadata
.
69 70 71 72 73 |
# File 'lib/miga/remote_dataset.rb', line 69 def (dataset, = {}) = () .each { |k,v| dataset.[k] = v } dataset.save end |