Method: Treat::Workers::Processors::Tokenizers::Stanford.tokenize

Defined in:
lib/treat/workers/processors/tokenizers/stanford.rb

.tokenize(entity, options = {}) ⇒ Object

Perform tokenization of the entity and add the resulting tokens as its children.

Options:

  • (Boolean) :directional_quotes => Whether

to attempt to get correct directional quotes, replacing “…” by “…”. Off by default.



26
27
28
29
30
31
32
33
34
# File 'lib/treat/workers/processors/tokenizers/stanford.rb', line 26

def self.tokenize(entity, options = {})
  Treat::Loaders::Stanford.load
  options = DefaultOptions.merge(options)
  @@tokenizer ||= StanfordCoreNLP.load(:tokenize)
  entity.check_hasnt_children
  text = ::StanfordCoreNLP::Annotation.new(entity.to_s)
  @@tokenizer.annotate(text)
  add_tokens(entity, text.get(:tokens), options)
end