Class: Treat::Workers::Lexicalizers::Taggers::Stanford
- Inherits:
-
Object
- Object
- Treat::Workers::Lexicalizers::Taggers::Stanford
- Defined in:
- lib/treat/workers/lexicalizers/taggers/stanford.rb
Overview
POS tagging using a maximum entropy model, with (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features.
Original paper: Toutanova, Manning, Klein and Singer.
-
Feature-Rich Part-of-Speech Tagging with a
Cyclic Dependency Network. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics.
Constant Summary collapse
- DefaultOptions =
Hold the default options.
{ :tagger_model => nil }
- @@taggers =
Hold one tagger per language.
{}
Class Method Summary collapse
-
.get_options(options, language) ⇒ Object
Handle the options for the tagger.
-
.get_token_list(entity) ⇒ Object
Retrieve a Java ArrayList object.
-
.init_tagger(language) ⇒ Object
Initialize the tagger for a language.
-
.tag(entity, options = {}) ⇒ Object
Tag the word using one of the Stanford taggers.
Class Method Details
.get_options(options, language) ⇒ Object
Handle the options for the tagger.
76 77 78 79 80 81 82 83 84 85 |
# File 'lib/treat/workers/lexicalizers/taggers/stanford.rb', line 76 def self.(, language) = DefaultOptions.merge() if [:tagger_model] StanfordCoreNLP.set_model('pos.model', [:tagger_model]) end [:tag_set] = StanfordCoreNLP::Config::TagSets[language] end |
.get_token_list(entity) ⇒ Object
Retrieve a Java ArrayList object.
88 89 90 91 92 93 94 95 96 97 98 99 |
# File 'lib/treat/workers/lexicalizers/taggers/stanford.rb', line 88 def self.get_token_list(entity) list = StanfordCoreNLP::ArrayList.new if entity.is_a?(Treat::Entities::Token) tokens = [entity] else tokens = entity.tokens end tokens.each do |token| list.add(StanfordCoreNLP::Word.new(token.to_s)) end return tokens, list end |
.init_tagger(language) ⇒ Object
Initialize the tagger for a language.
61 62 63 64 65 66 67 68 69 70 71 72 73 |
# File 'lib/treat/workers/lexicalizers/taggers/stanford.rb', line 61 def self.init_tagger(language) unless @@taggers[language] Treat::Loaders::Stanford.load(language) unless StanfordCoreNLP.const_defined?('MaxentTagger') StanfordCoreNLP.load_class('MaxentTagger', 'edu.stanford.nlp.tagger.maxent') end model = Treat::Loaders::Stanford.find_model(:pos,language) tagger = StanfordCoreNLP::MaxentTagger.new(model) @@taggers[language] = tagger end @@taggers[language] end |
.tag(entity, options = {}) ⇒ Object
Tag the word using one of the Stanford taggers.
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
# File 'lib/treat/workers/lexicalizers/taggers/stanford.rb', line 25 def self.tag(entity, = {}) # Handle tags for sentences and phrases. if entity.is_a?(Treat::Entities::Group) && !entity.parent_sentence tag_set = [:tag_set] entity.set :tag_set, tag_set end return 'S' if entity.is_a?(Treat::Entities::Sentence) return 'P' if entity.is_a?(Treat::Entities::Phrase) return 'F' if entity.is_a?(Treat::Entities::Fragment) return 'G' if entity.is_a?(Treat::Entities::Group) # Handle options and initialize the tagger. lang = entity.language.intern init_tagger(lang) unless @@taggers[lang] = (, lang) tokens, t_list = get_token_list(entity) # Do the tagging. i = 0 isolated_token = entity.is_a?(Treat::Entities::Token) @@taggers[lang].apply(t_list).each do |tok| tokens[i].set(:tag, tok.tag.split('-').first) tokens[i].set(:tag_set, [:tag_set]) if isolated_token return tok.tag if isolated_token i += 1 end end |