Class: OBAClient

Inherits:
Object
  • Object
show all
Defined in:
lib/oba_client.rb

Overview

A class for interacting with the Open Biomedical Annotator. There are two things we do: get text, and parse it. We can do both independently or serially.

Constant Summary collapse

VERSION =
"2.0.3"
DEFAULT_TIMEOUT =

A high HTTP read timeout, as the service sometimes takes awhile to respond.

30
DEFAULT_URI =

The endpoint URI for the production version of the Annotator service.

"http://rest.bioontology.org/obs/annotator"
HEADER =

The header for every request. There’s no need to specify this per-instance.

{"Content-Type" => "application/x-www-form-urlencoded"}
ANNOTATOR_PARAMETERS =

Parameters the annotator accepts. Any one not in this list (excluding textToAnnotate) is not valid.

[
  :email,
  :filterNumber,
  :format,
  :isStopWordsCaseSensitive,
  :isVirtualOntologyID,
  :levelMax,
  :longestOnly,
  :ontologiesToExpand,
  :ontologiesToKeepInResult,
  :mappingTypes,
  :minTermSize,
  :scored,
  :semanticTypes,
  :stopWords,
  :wholeWordOnly,
  :withDefaultStopWords,
  :withSynonyms,
]
STATISTICS_BEANS_XPATH =
"/success/data/annotatorResultBean/statistics/statisticsBean"
ANNOTATION_BEANS_XPATH =
"/success/data/annotatorResultBean/annotations/annotationBean"
ONTOLOGY_BEANS_XPATH =
"/success/data/annotatorResultBean/ontologies/ontologyUsedBean"
CONCEPT_ATTRIBUTES =

Attributes for mapping concepts (only one type).

{
  :id              => lambda {|c| c.xpath("id").text.to_i},
  :localConceptId  => lambda {|c| c.xpath("localConceptId").text},
  :localOntologyId => lambda {|c| c.xpath("localOntologyId").text.to_i},
  :isTopLevel      => lambda {|c| to_b(c.xpath("isTopLevel").text)},
  :fullId          => lambda {|c| c.xpath("fullId").text},
  :preferredName   => lambda {|c| c.xpath("preferredName").text},

  :synonyms        => lambda do |c| 
    c.xpath("synonyms/synonym").map do |s|
      s.xpath("string").text
    end
  end,

  :semanticTypes   => lambda do |c| 
    c.xpath("semanticTypes/semanticTypeBean").map do |s|
      {
        :id           => s.xpath("id").text.to_i,
        :semanticType => s.xpath("semanticType").text,
        :description  => s.xpath("description").text
      }
    end
  end
}
CONTEXT_ATTRIBUTES =

Attributes for mapping and mgrep contexts (both will add additional attributes).

{
  :contextName     => lambda {|c| c.xpath("contextName").text},
  :isDirect        => lambda {|c| to_b(c.xpath("isDirect").text)},
  :from            => lambda {|c| c.xpath("from").text.to_i},
  :to              => lambda {|c| c.xpath("to").text.to_i},
}
ANNOTATION_CONTEXT_ATTRIBUTES =

Attributes for annotation contexts.

{
  :score   => lambda {|c| c.xpath("score").text.to_i},
  :concept => lambda {|c| parse_concept(c.xpath("concept").first)},
  :context => lambda {|c| parse_context(c.xpath("context").first)}
}
MAPPED_CONTEXT_ATTRIBUTES =

Attributes for mapping contexts.

CONTEXT_ATTRIBUTES.merge(
  :mappingType   => lambda {|c| c.xpath("mappingType").text},
  :mappedConcept => lambda {|c| parse_concept(c.xpath("mappedConcept").first)}
)
MGREP_CONTEXT_ATTRIBUTES =

Attributes for mgrep contexts.

CONTEXT_ATTRIBUTES.merge(
  :name           => lambda {|c| c.xpath("term/name").text},
  :localConceptId => lambda {|c| c.xpath("term/localConceptId").text},
  :isPreferred    => lambda {|c| to_b(c.xpath("term/isPreferred").text)},
  :dictionaryId   => lambda {|c| c.xpath("term/dictionaryId").text}
)
CONTEXT_CLASSES =

Map the bean type to the set of attributes we parse from it.

{
  "annotationContextBean" => ANNOTATION_CONTEXT_ATTRIBUTES,
  "mgrepContextBean"      => MGREP_CONTEXT_ATTRIBUTES,
  "mappingContextBean"    => MAPPED_CONTEXT_ATTRIBUTES,
}

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) ⇒ OBAClient

Instantiate the class with a set of reused options. Options used by the method are:

* {String} **uri**: the URI of the annotator service (default: 
  {DEFAULT_URI}).
* {Fixnum} **timeout**: the length of the read timeout (default: 
  {DEFAULT_TIMEOUT}).
* {Boolean} **parse_xml**: whether to parse the received text (default: 
  false).
* {Array}<{String}> **ontologies**: a pseudo-parameter which sets both
   ontologiesToExpand and ontologiesToKeepInResult.

Parameters:

  • options (Hash<String, String>) (defaults to: {})

    Parameters for the annotation.



62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# File 'lib/oba_client.rb', line 62

def initialize(options = {})
  @uri         = URI.parse(options.delete(:uri) || DEFAULT_URI)
  @timeout     = options.delete(:timeout)       || DEFAULT_TIMEOUT
  @parse_xml   = options.delete(:parse_xml)

  if ontologies = options.delete(:ontologies)
    [:ontologiesToExpand, :ontologiesToKeepInResult].each do |k|
      if options.include?(k)
        puts "WARNING: specified both :ontologies and #{k}, ignoring #{k}."
      end
      options[k] = ontologies
    end
  end

  @options = {}
  options.each do |k, v|
    if !ANNOTATOR_PARAMETERS.include?(k)
      puts "WARNING: #{k} is not a valid annotator parameter."
    end
    if v.is_a? Array
      @options[k] = v.join(",")
    else
      @options[k] = v
    end
  end

  if !@options.include?(:email)
    puts "TIP: as a courtesy, consider including your email in the " + 
         "request (:email => '[email protected]')"
  end
end

Class Method Details

.parse(xml) ⇒ Hash<Symbol, Object>

Parse raw XML, returning a Hash with three elements: statistics, annotations, and ontologies. Respectively, these represent the annotation statistics (annotations by mapping type, etc., as a Hash), an Array of each annotation (as a Hash), and an Array of ontologies used (also as a Hash).

Parameters:

  • xml (String)

    The XML we’ll be parsing.

Returns:

  • (Hash<Symbol, Object>)

    A Hash representation of the XML, as described in the README.



238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
# File 'lib/oba_client.rb', line 238

def self.parse(xml)
  puts "WARNING: text is empty!" if (xml.gsub(/\n/, "") == "")
  doc = Nokogiri::XML.parse(xml)

  statistics = Hash[doc.xpath(STATISTICS_BEANS_XPATH).map do |sb|
    [sb.xpath("mapping").text, sb.xpath("nbAnnotation").text.to_i]
  end]

  annotations = doc.xpath(ANNOTATION_BEANS_XPATH).map do |annotation|
    parse_context(annotation)
  end

  ontologies = doc.xpath(ONTOLOGY_BEANS_XPATH).map do |ontology|
    {
      :localOntologyId   => ontology.xpath("localOntologyId").text.to_i,
      :virtualOntologyId => ontology.xpath("virtualOntologyId").text.to_i,
      :name              => ontology.xpath("name").text,
      :version           => ontology.xpath("version").text.to_f,
      :nbAnnotation      => ontology.xpath("nbAnnotation").text.to_i
    }
  end

  {
    :statistics  => statistics,
    :annotations => annotations,
    :ontologies  => ontologies
  }
end

.parse_concept(concept) ⇒ Hash<Symbol, Object>

Parse a concept: a toplevel annotation concept, or an annotation’s mapping concept.

Parameters:

  • concept (Nokogiri::XML::Node)

    The root node of the concept.

Returns:

  • (Hash<Symbol, Object>)

    The parsed concept.



221
222
223
224
225
# File 'lib/oba_client.rb', line 221

def self.parse_concept(concept)
  Hash[CONCEPT_ATTRIBUTES.map do |k, v| 
    [k, v.call(concept)]
  end]
end

.parse_context(context) ⇒ Hash<Symbol, Object>

Parse a context: an annotation, or a mapping/mgrep context bean.

Parameters:

  • context (Nokgiri::XML::Node)

    The root node of the context.

Returns:

  • (Hash<Symbol, Object>)

    The parsed context.



199
200
201
202
203
204
205
206
207
208
209
210
211
212
# File 'lib/oba_client.rb', line 199

def self.parse_context(context)
  # Annotations (annotationBeans) do not have a class, so we'll refer to them
  # as annotationContextBeans. context_class will be one of the types in
  # {CONTEXT_CLASSES}.
  context_class = if context.attribute("class").nil?
    "annotationContextBean"
  else
    context.attribute("class").value
  end

  Hash[CONTEXT_CLASSES[context_class].map do |k, v|
    [k, v.call(context)]
  end]
end

.to_b(value) ⇒ true, false

A little helper: convert a string true/false or 1/0 value to boolean. AFAIK, there’s no better way to do this.

Parameters:

  • value (String)

    The value to convert.

Returns:

  • (true, false)


274
275
276
277
278
279
280
281
# File 'lib/oba_client.rb', line 274

def self.to_b(value)
  case value
  when "0"     then false
  when "1"     then true
  when "false" then false
  when "true"  then true
  end
end

Instance Method Details

#execute(text) ⇒ Hash<Symbol, Array>, ...

Perform the annotation.

Parameters:

  • text (String)

    The text to annotate.

Returns:

  • (Hash<Symbol, Array>, String, nil)

    A Hash representing the parsed document, the raw XML if parsing is not requested, or nil if the request times out.



100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# File 'lib/oba_client.rb', line 100

def execute(text)
  request = Net::HTTP::Post.new(@uri.path, initheader=HEADER)
  request.body = {:textToAnnotate => text}.merge(@options).map do |k, v|
    "#{CGI.escape(k.to_s)}=#{CGI.escape(v.to_s)}"
  end.join("&")
  puts request.body if $DEBUG

  begin
    response = Net::HTTP.new(@uri.host, @uri.port).start do |http|
      http.read_timeout = @timeout
      http.request(request)
    end
    @parse_xml ? self.class.parse(response.body) : response.body
  rescue Timeout::Error
    puts "Request for #{text[0..10]}... timed-out at #{@timeout} seconds."
  end
end