Module: Splitta
- Defined in:
- lib/splitta.rb,
lib/splitta/doc.rb,
lib/splitta/frag.rb,
lib/splitta/model.rb,
lib/splitta/version.rb,
lib/splitta/word_tokenizer.rb
Overview
A list of (regexp, repl) pairs applied in sequence. The resulting string is split on whitespace. (Adapted from the Punkt Word Tokenizer)
Defined Under Namespace
Modules: WordTokenizer Classes: Doc, Frag, Model
Constant Summary collapse
- VERSION =
Current gem version
'4.2.5'
Class Method Summary collapse
Class Method Details
.sentences(text) ⇒ Object
17 18 19 |
# File 'lib/splitta.rb', line 17 def self.sentences(text) Doc.new(text, model: Model.instance).segments.map(&:strip) end |