Method: Treat::Workers::Processors::Tokenizers::Stanford.tokenize
- Defined in:
- lib/treat/workers/processors/tokenizers/stanford.rb
.tokenize(entity, options = {}) ⇒ Object
Perform tokenization of the entity and add the resulting tokens as its children.
Options:
-
(Boolean) :directional_quotes => Whether
to attempt to get correct directional quotes, replacing “…” by “…”. Off by default.
26 27 28 29 30 31 32 33 34 |
# File 'lib/treat/workers/processors/tokenizers/stanford.rb', line 26 def self.tokenize(entity, = {}) Treat::Loaders::Stanford.load = DefaultOptions.merge() @@tokenizer ||= StanfordCoreNLP.load(:tokenize) entity.check_hasnt_children text = ::StanfordCoreNLP::Annotation.new(entity.to_s) @@tokenizer.annotate(text) add_tokens(entity, text.get(:tokens), ) end |