Class: Abner

Inherits:
NER
  • Object
show all
Defined in:
lib/rbbt/ner/abner.rb

Overview

Offers a Ruby interface to the Abner Named Entity Recognition Package in Java Abner.

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from NER

#entities, #extract

Constructor Details

#initialize(modelfile = nil) ⇒ Abner

If modelfile is present a custom trained model can be used, otherwise, the default BioCreative model is used.



22
23
24
25
26
27
28
29
# File 'lib/rbbt/ner/abner.rb', line 22

def initialize(modelfile=nil)
  Abner.init
  if modelfile == nil         
    @tagger = @@Tagger.new(@@Tagger.BIOCREATIVE)
  else                
    @tagger = @@Tagger.new(@@JFile.new(modelfile))
  end
end

Class Method Details

.initObject



13
14
15
16
17
18
# File 'lib/rbbt/ner/abner.rb', line 13

def self.init
  Rbbt.software.opt.ABNER.produce
  @@JFile   ||= Rjb::import('java.io.File')
  @@Tagger  ||= Rjb::import('abner.Tagger')
  @@Trainer ||= Rjb::import('abner.Trainer')
end

Instance Method Details

#match(text, fix_encode = true) ⇒ Object

Given a chunk of text, it finds all the mentions appearing in it. It returns all the mentions found, regardless of type, to be coherent with the rest of NER packages in Rbbt.



34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# File 'lib/rbbt/ner/abner.rb', line 34

def match(text, fix_encode = true)
  return [] if text.nil? or text.empty?

  text = text.encode('utf-8', 'binary', :invalid => :replace, :undef => :replace, :replace => '') if fix_encode
  res = @tagger.getEntities(text)
  types = res[1]
  strings = res[0]

  docid = Misc.digest(text)
  global_offset = 0
  strings.zip(types).collect do |mention, type| 
    mention = mention.to_s; 
    offset = text.index(mention)
    if offset.nil?
      NamedEntity.setup(mention, :docid => docid, :entity_type => type)
    else
      NamedEntity.setup(mention, :offset => offset + global_offset, :docid => docid, :entity_type => type.to_s)
      text = text[offset + mention.length..-1]
      global_offset += offset + mention.length
    end

    mention
  end
end