Class: Treat::Workers::Processors::Segmenters::Stanford

Inherits:

Object

Object
Treat::Workers::Processors::Segmenters::Stanford

Defined in:: lib/treat/workers/processors/segmenters/stanford.rb

Overview

Detects sentence boundaries by first tokenizing the text and deciding whether periods are sentence ending or used for other purposes (abreviations, etc.). The

obtained tokens are then grouped into sentences.

Constant Summary collapse

DefaultOptions =

{
  :also_tokenize => false
}

@@segmenter = Keep one copy of the Stanford Core NLP pipeline.

nil

Class Method Summary collapse

.segment(entity, options = {}) ⇒ Object

Segment sentences using the sentence splitter supplied by the Stanford parser.

Class Method Details

.segment(entity, options = {}) ⇒ `Object`

Segment sentences using the sentence splitter supplied by the Stanford parser. For better performance, set the option :also_tokenize to true, and this segmenter will also add the tokens as children of the sentences.

Options:

(Boolean) :also_tokenize - Whether to also

add the tokens as children of the sentence.