Module: Splitta

Defined in:
lib/splitta.rb,
lib/splitta/doc.rb,
lib/splitta/frag.rb,
lib/splitta/model.rb,
lib/splitta/version.rb,
lib/splitta/word_tokenizer.rb

Overview

A list of (regexp, repl) pairs applied in sequence. The resulting string is split on whitespace. (Adapted from the Punkt Word Tokenizer)

Defined Under Namespace

Modules: WordTokenizer Classes: Doc, Frag, Model

Constant Summary collapse

VERSION =

Current gem version

'4.2.5'

Class Method Summary collapse

Class Method Details

.sentences(text) ⇒ Object



17
18
19
# File 'lib/splitta.rb', line 17

def self.sentences(text)
  Doc.new(text, model: Model.instance).segments.map(&:strip)
end