Class: Treat::Workers::Lexicalizers::Taggers::Stanford

Inherits:

Object

Object
Treat::Workers::Lexicalizers::Taggers::Stanford

Defined in:: lib/treat/workers/lexicalizers/taggers/stanford.rb

Overview

POS tagging using a maximum entropy model, with (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) ﬁne-grained modeling of unknown word features.

Original paper: Toutanova, Manning, Klein and Singer.

Feature-Rich Part-of-Speech Tagging with a

Cyclic Dependency Network. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics.

Constant Summary collapse

DefaultOptions = Hold the default options.

{
  :tagger_model => nil
}

@@taggers = Hold one tagger per language.

{}

Class Method Summary collapse

.get_options(options, language) ⇒ Object

Handle the options for the tagger.
.get_token_list(entity) ⇒ Object

Retrieve a Java ArrayList object.
.init_tagger(language) ⇒ Object

Initialize the tagger for a language.
.tag(entity, options = {}) ⇒ Object

Tag the word using one of the Stanford taggers.

Class Method Details

.get_options(options, language) ⇒ `Object`

Handle the options for the tagger.

# File 'lib/treat/workers/lexicalizers/taggers/stanford.rb', line 76

def self.get_options(options, language)
  options = DefaultOptions.merge(options)
  if options[:tagger_model]
    StanfordCoreNLP.set_model('pos.model',
    options[:tagger_model])
  end
  options[:tag_set] =
  StanfordCoreNLP::Config::TagSets[language]
  options
end

.get_token_list(entity) ⇒ `Object`

Retrieve a Java ArrayList object.

# File 'lib/treat/workers/lexicalizers/taggers/stanford.rb', line 88

def self.get_token_list(entity)
  list = StanfordCoreNLP::ArrayList.new
  if entity.is_a?(Treat::Entities::Token)
    tokens = [entity]
  else
    tokens = entity.tokens
  end
  tokens.each do |token|
    list.add(StanfordCoreNLP::Word.new(token.to_s))
  end
  return tokens, list
end

.init_tagger(language) ⇒ `Object`

Initialize the tagger for a language.

# File 'lib/treat/workers/lexicalizers/taggers/stanford.rb', line 61

def self.init_tagger(language)
  unless @@taggers[language]
    Treat::Loaders::Stanford.load(language)
    unless StanfordCoreNLP.const_defined?('MaxentTagger')
      StanfordCoreNLP.load_class('MaxentTagger', 
      'edu.stanford.nlp.tagger.maxent')
    end
    model = Treat::Loaders::Stanford.find_model(:pos,language)
    tagger = StanfordCoreNLP::MaxentTagger.new(model)
    @@taggers[language] = tagger
  end
  @@taggers[language]
end

.tag(entity, options = {}) ⇒ `Object`

Tag the word using one of the Stanford taggers.

# File 'lib/treat/workers/lexicalizers/taggers/stanford.rb', line 25

def self.tag(entity, options = {})

  # Handle tags for sentences and phrases.
  if entity.is_a?(Treat::Entities::Group) &&
    !entity.parent_sentence

    tag_set = options[:tag_set]
    entity.set :tag_set, tag_set
  end

  return 'S' if entity.is_a?(Treat::Entities::Sentence)
  return 'P' if entity.is_a?(Treat::Entities::Phrase)
  return 'F' if entity.is_a?(Treat::Entities::Fragment)
  return 'G' if entity.is_a?(Treat::Entities::Group)

  # Handle options and initialize the tagger.
  lang = entity.language.intern
  init_tagger(lang) unless @@taggers[lang]
  options = get_options(options, lang)
  tokens, t_list = get_token_list(entity)

  # Do the tagging.
  i = 0
  isolated_token = entity.is_a?(Treat::Entities::Token)

  @@taggers[lang].apply(t_list).each do |tok|
    tokens[i].set(:tag, tok.tag.split('-').first)
    tokens[i].set(:tag_set,
    options[:tag_set]) if isolated_token
    return tok.tag if isolated_token
    i += 1
  end

end

Class: Treat::Workers::Lexicalizers::Taggers::Stanford

Overview

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.get_options(options, language) ⇒ Object

.get_token_list(entity) ⇒ Object

.init_tagger(language) ⇒ Object

.tag(entity, options = {}) ⇒ Object

.get_options(options, language) ⇒ `Object`

.get_token_list(entity) ⇒ `Object`

.init_tagger(language) ⇒ `Object`

.tag(entity, options = {}) ⇒ `Object`