Class: Treat::Workers::Lexicalizers::Taggers::Brill

Inherits:
Object
  • Object
show all
Defined in:
lib/treat/workers/lexicalizers/taggers/brill.rb

Overview

POS tagging using a set of rules developped by Eric Brill.

Original paper: Eric Brill. 1992. A simple rule-based part of speech tagger. In Proceedings of the third conference on Applied natural language processing.

Constant Summary collapse

@@tagger =

Hold one instance of the tagger.

nil

Class Method Summary collapse

Class Method Details

.tag(entity, options = {}) ⇒ Object

Tag words using a native Brill tagger. Performs own tokenization.

Options (see the rbtagger gem for more info):

:lexicon => String (Lexicon file to use) :lexical_rules => String (Lexical rule file to use) :contextual_rules => String (Contextual rules file to use)



23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# File 'lib/treat/workers/lexicalizers/taggers/brill.rb', line 23

def self.tag(entity, options = {})

  # Create the tagger if necessary
  @@tagger ||= ::Brill::Tagger.new(options[:lexicon],
  options[:lexical_rules], options[:contextual_rules])
  
  isolated_token = entity.is_a?(Treat::Entities::Token)
  tokens = isolated_token ? [entity] : entity.tokens
  tokens_s = tokens.map { |t| t.value }
  
  tags = @@tagger.tag_tokens( tokens_s )

  pairs = tokens.zip(tags)

  pairs.each do |pair|
    pair[0].set :tag, pair[1]
    pair[0].set :tag_set, :penn if isolated_token
    return pair[1] if isolated_token
  end
  
  if entity.is_a?(Treat::Entities::Group) && 
    !entity.parent_sentence
      entity.set :tag_set, :penn
  end
  
  return 'S' if entity.is_a?(Treat::Entities::Sentence)
  return 'P' if entity.is_a?(Treat::Entities::Phrase)
  return 'F' if entity.is_a?(Treat::Entities::Fragment)
  return 'G' if entity.is_a?(Treat::Entities::Group)
end