Class: Bio::PubMed

Inherits:
Object show all
Defined in:
lib/bio/io/pubmed.rb

Overview

Description

The Bio::PubMed class provides several ways to retrieve bibliographic information from the PubMed database at www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed. Basically, two types of queries are possible:

  • searching for PubMed IDs given a query string:

    • Bio::PubMed#search

    • Bio::PubMed#esearch

  • retrieving the MEDLINE text (i.e. authors, journal, abstract, …) given a PubMed ID

    • Bio::PubMed#query

    • Bio::PubMed#pmfetch

    • Bio::PubMed#efetch

The different methods within the same group are interchangeable and should return the same result.

Additional information about the MEDLINE format and PubMed programmable APIs can be found on the following websites:

Usage

require 'bio'

# If you don't know the pubmed ID:
Bio::PubMed.search("(genome AND analysis) OR bioinformatics)").each do |x|
  p x
end
Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics)").each do |x|
  p x
end

# To retrieve the MEDLINE entry for a given PubMed ID:
puts Bio::PubMed.query("10592173")
puts Bio::PubMed.pmfetch("10592173")
puts Bio::PubMed.efetch("10592173", "14693808")
# This can be converted into a Bio::MEDLINE object:
manuscript = Bio::PubMed.query("10592173")
medline = Bio::MEDLINE(manuscript)

Class Method Summary collapse

Class Method Details

.efetch(*ids) ⇒ Object

Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez efetch. Multiple PubMed IDs can be provided:

Bio::PubMed.efetch(123)
Bio::PubMed.efetch(123,456,789)
Bio::PubMed.efetch([123,456,789])

Arguments:

  • ids: list of PubMed IDs (required)

Returns

MEDLINE formatted String



175
176
177
178
179
180
181
182
183
184
185
186
187
188
# File 'lib/bio/io/pubmed.rb', line 175

def self.efetch(*ids)
  return [] if ids.empty?

  host = "eutils.ncbi.nlm.nih.gov"
  path = "/entrez/eutils/efetch.fcgi?tool=bioruby&db=pubmed&retmode=text&rettype=medline&id="

  ids = ids.join(",")

  http = Bio::Command.new_http(host)
  response, = http.get(path + ids)
  result = response.body
  result = result.split(/\n\n+/)
  return result
end

.esearch(str, hash = {}) ⇒ Object

Search the PubMed database by given keywords using E-Utils and returns an array of PubMed IDs.

For information on the possible arguments, see eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html#PubMed


Arguments:

  • id: query string (required)

  • field

  • reldate

  • mindate

  • maxdate

  • datetype

  • retstart

  • retmax (default 100)

  • retmode

  • rettype

Returns

array of PubMed IDs



106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# File 'lib/bio/io/pubmed.rb', line 106

def self.esearch(str, hash = {})
  hash['retmax'] = 100 unless hash['retmax']

  opts = []
  hash.each do |k, v|
    opts << "#{k}=#{v}"
  end

  host = "eutils.ncbi.nlm.nih.gov"
  path = "/entrez/eutils/esearch.fcgi?tool=bioruby&db=pubmed&#{opts.join('&')}&term="

  http = Bio::Command.new_http(host)
  response, = http.get(path + CGI.escape(str))
  result = response.body
  result = result.scan(/<Id>(.*?)<\/Id>/m).flatten
  return result
end

.pmfetch(id) ⇒ Object

Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez pmfetch.


Arguments:

  • id: PubMed ID (required)

Returns

MEDLINE formatted String



151
152
153
154
155
156
157
158
159
160
161
162
163
164
# File 'lib/bio/io/pubmed.rb', line 151

def self.pmfetch(id)
  host = "www.ncbi.nlm.nih.gov"
  path = "/entrez/utils/pmfetch.fcgi?tool=bioruby&mode=text&report=medline&db=PubMed&id="

  http = Bio::Command.new_http(host)
  response, = http.get(path + id.to_s)
  result = response.body
  if result =~ /#{id}\s+Error/
    raise( result )
  else
    result = result.gsub("\r", "\n").squeeze("\n").gsub(/<\/?pre>/, '')
    return result
  end
end

.query(id) ⇒ Object

Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez query.


Arguments:

  • id: PubMed ID (required)

Returns

MEDLINE formatted String



130
131
132
133
134
135
136
137
138
139
140
141
142
143
# File 'lib/bio/io/pubmed.rb', line 130

def self.query(id)
  host = "www.ncbi.nlm.nih.gov"
  path = "/entrez/query.fcgi?tool=bioruby&cmd=Text&dopt=MEDLINE&db=PubMed&uid="

  http = Bio::Command.new_http(host)
  response, = http.get(path + id.to_s)
  result = response.body
  if result =~ /#{id}\s+Error/
    raise( result )
  else
    result = result.gsub("\r", "\n").squeeze("\n").gsub(/<\/?pre>/, '')
    return result
  end
end

.search(str) ⇒ Object

Search the PubMed database by given keywords using entrez query and returns an array of PubMed IDs.


Arguments:

  • id: query string (required)

Returns

array of PubMed IDs



76
77
78
79
80
81
82
83
84
85
86
# File 'lib/bio/io/pubmed.rb', line 76

def self.search(str)
  host = "www.ncbi.nlm.nih.gov"
  path = "/entrez/query.fcgi?tool=bioruby&cmd=Search&doptcmdl=MEDLINE&db=PubMed&term="

  http = Bio::Command.new_http(host)
  response, = http.get(path + CGI.escape(str))
  result = response.body
  result = result.gsub("\r", "\n").squeeze("\n")
  result = result.scan(/<pre>(.*?)<\/pre>/m).flatten
  return result
end