Class: Treat::Workers::Lexicalizers::Taggers::Brill
- Inherits:
-
Object
- Object
- Treat::Workers::Lexicalizers::Taggers::Brill
- Defined in:
- lib/treat/workers/lexicalizers/taggers/brill.rb
Overview
POS tagging using a set of rules developped by Eric Brill.
Original paper: Eric Brill. 1992. A simple rule-based part of speech tagger. In Proceedings of the third conference on Applied natural language processing.
Constant Summary collapse
- @@tagger =
Hold one instance of the tagger.
nil
Class Method Summary collapse
-
.tag(entity, options = {}) ⇒ Object
Tag words using a native Brill tagger.
Class Method Details
.tag(entity, options = {}) ⇒ Object
Tag words using a native Brill tagger. Performs own tokenization.
Options (see the rbtagger gem for more info):
:lexicon => String (Lexicon file to use) :lexical_rules => String (Lexical rule file to use) :contextual_rules => String (Contextual rules file to use)
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
# File 'lib/treat/workers/lexicalizers/taggers/brill.rb', line 23 def self.tag(entity, = {}) # Create the tagger if necessary @@tagger ||= ::Brill::Tagger.new([:lexicon], [:lexical_rules], [:contextual_rules]) isolated_token = entity.is_a?(Treat::Entities::Token) tokens = isolated_token ? [entity] : entity.tokens tokens_s = tokens.map { |t| t.value } = @@tagger.tag_tokens( tokens_s ) pairs = tokens.zip() pairs.each do |pair| pair[0].set :tag, pair[1] pair[0].set :tag_set, :penn if isolated_token return pair[1] if isolated_token end if entity.is_a?(Treat::Entities::Group) && !entity.parent_sentence entity.set :tag_set, :penn end return 'S' if entity.is_a?(Treat::Entities::Sentence) return 'P' if entity.is_a?(Treat::Entities::Phrase) return 'F' if entity.is_a?(Treat::Entities::Fragment) return 'G' if entity.is_a?(Treat::Entities::Group) end |