Module: Splitta
- Defined in:
- lib/splitta.rb,
lib/splitta/doc.rb,
lib/splitta/frag.rb,
lib/splitta/model.rb,
lib/splitta/version.rb,
lib/splitta/word_tokenizer.rb
Overview
A list of (regexp, repl) pairs applied in sequence. The resulting string is split on whitespace. (Adapted from the Punkt Word Tokenizer)
Defined Under Namespace
Modules: WordTokenizer Classes: Doc, Frag, Model
Constant Summary collapse
- VERSION =
Current gem version
'4.2.5'