Class: Treat::Workers::Lexicalizers::Taggers::Lingua
- Inherits:
-
Object
- Object
- Treat::Workers::Lexicalizers::Taggers::Lingua
- Defined in:
- lib/treat/workers/lexicalizers/taggers/lingua.rb
Overview
POS tagging using part-of-speech statistics from the Penn Treebank to assign POS tags to English text. The tagger applies a bigram (two-word) Hidden Markov Model to guess the appropriate POS tag for a word.
Constant Summary collapse
- DefaultOptions =
Hold the default options.
{ :relax => false }
- Punctuation =
Replace punctuation tags used by this gem to the standard PTB tags.
{ 'pp' => '.', 'pps' => ';', 'ppc' => ',', 'ppd' => '$', 'ppl' => 'lrb', 'ppr' => 'rrb' }
- @@tagger =
Hold one instance of the tagger.
nil
Class Method Summary collapse
-
.tag(entity, options = {}) ⇒ Object
Tag the word using a probabilistic model taking into account known words found in a lexicon and the tag of the previous word.
Class Method Details
.tag(entity, options = {}) ⇒ Object
Tag the word using a probabilistic model taking into account known words found in a lexicon and the tag of the previous word.
Options:
-
(Boolean) :relax => Relax the HMM model - this may improve accuracy for uncommon words, particularly words used polysemously.
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/treat/workers/lexicalizers/taggers/lingua.rb', line 39 def self.tag(entity, = {}) = DefaultOptions.merge() @@tagger ||= ::EngTagger.new() left_tag = @@tagger.conf[:current_tag] = 'pp' isolated_token = entity.is_a?(Treat::Entities::Token) tokens = isolated_token ? [entity] : entity.tokens tokens.each do |token| next if token.to_s == '' w = @@tagger.clean_word(token.to_s) t = @@tagger.assign_tag(left_tag, w) t = 'fw' if t.nil? || t == '' @@tagger.conf[:current_tag] = left_tag = t t = 'prp$' if t == 'prps' t = 'dt' if t == 'det' t = Punctuation[t] if Punctuation[t] token.set :tag, t.upcase token.set :tag_set, :penn if isolated_token return t.upcase if isolated_token end if entity.is_a?(Treat::Entities::Group) && !entity.parent_sentence entity.set :tag_set, :penn end return 'S' if entity.is_a?(Treat::Entities::Sentence) return 'P' if entity.is_a?(Treat::Entities::Phrase) return 'F' if entity.is_a?(Treat::Entities::Fragment) return 'G' if entity.is_a?(Treat::Entities::Group) end |