Class: Treat::Workers::Processors::Parsers::Stanford

Inherits:
Object
  • Object
show all
Defined in:
lib/treat/workers/processors/parsers/stanford.rb

Overview

Parsing using an interface to a Java implementation of probabilistic natural language parsers, both optimized PCFG and lexicalized dependency parsers, and a lexicalized PCFG parser.

Original paper: Dan Klein and Christopher D. Manning. 2003. Accurate Unlexicalized Parsing. Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423-430.

Constant Summary collapse

Pttc =
Treat.tags.aligned.phrase_tags_to_category
DefaultOptions =
{ model: nil }
@@parsers =

Hold one instance of the pipeline per language.

{}

Class Method Summary collapse

Class Method Details

.get_token_list(entity) ⇒ Object



80
81
82
83
84
85
86
# File 'lib/treat/workers/processors/parsers/stanford.rb', line 80

def self.get_token_list(entity)
  list = StanfordCoreNLP::ArrayList.new
  entity.tokens.each do |token|
    list.add(StanfordCoreNLP::Word.new(token.to_s))
  end
  list
end

.parse(entity, options = {}) ⇒ Object

Parse the entity using the Stanford parser.



20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# File 'lib/treat/workers/processors/parsers/stanford.rb', line 20

def self.parse(entity, options = {})

  val, lang = entity.to_s, entity.language.intern

  Treat::Loaders::Stanford.load(lang)
  
  tag_set = StanfordCoreNLP::Config::TagSets[lang]
  
  list = get_token_list(entity)
  entity.remove_all!
  
  model_file     = options[:model] || 
  StanfordCoreNLP::Config::Models[:parse][lang]
  
  unless @@parsers[lang] && @@parsers[lang][model_file]
    model_path   = Treat.libraries.stanford.model_path ||
                   StanfordCoreNLP.model_path
    model_folder = StanfordCoreNLP::Config::ModelFolders[:parse]
    model = File.join(model_path, model_folder, model_file)
    @@parsers[lang] ||= {}
    options = StanfordCoreNLP::Options.new
    parser = StanfordCoreNLP::LexicalizedParser
    .getParserFromFile(model, options)
    @@parsers[lang][model_file] = parser
  end
  
  parser = @@parsers[lang][model_file]
  
  text = parser.apply(list)
  
  recurse(text.children[0], entity, tag_set)
  entity.set :tag_set, tag_set

end

.recurse(java_node, ruby_node, tag_set) ⇒ Object



55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/treat/workers/processors/parsers/stanford.rb', line 55

def self.recurse(java_node, ruby_node, tag_set)
  
  java_node.children.each do |java_child|

    label = java_child.label
    tag = label.get(:category).to_s

    if Pttc[tag] && Pttc[tag][tag_set]
      ruby_child = Treat::Entities::Phrase.new
      ruby_child.set :tag, tag
      ruby_node << ruby_child
      unless java_child.children.empty?
        recurse(java_child, ruby_child, tag_set)
      end
    else
      val = java_child.children[0].to_s
      ruby_child = Treat::Entities::Token.from_string(val)
      ruby_child.set :tag, tag
      ruby_node << ruby_child
    end
    
  end

end