Class: PragmaticTokenizer::FullStopSeparator
- Inherits:
-
Object
- Object
- PragmaticTokenizer::FullStopSeparator
- Defined in:
- lib/pragmatic_tokenizer/full_stop_separator.rb
Overview
This class separates true full stops while ignoring periods that are part of an abbreviation
Constant Summary collapse
- REGEXP_ENDS_WITH_DOT =
/\A(.*\w)\.\z/- REGEXP_ONLY_LETTERS =
/\A[a-z]\z/i- REGEXP_ABBREVIATION =
/[a-z](?:\.[a-z])+\z/i- DOT =
'.'.freeze
Instance Method Summary collapse
-
#initialize(tokens:, abbreviations:, downcase:) ⇒ FullStopSeparator
constructor
A new instance of FullStopSeparator.
- #separate ⇒ Object
Constructor Details
#initialize(tokens:, abbreviations:, downcase:) ⇒ FullStopSeparator
Returns a new instance of FullStopSeparator.
13 14 15 16 17 |
# File 'lib/pragmatic_tokenizer/full_stop_separator.rb', line 13 def initialize(tokens:, abbreviations:, downcase:) @tokens = tokens @abbreviations = abbreviations @downcase = downcase end |
Instance Method Details
#separate ⇒ Object
19 20 21 22 23 |
# File 'lib/pragmatic_tokenizer/full_stop_separator.rb', line 19 def separate @cleaned_tokens = create_cleaned_tokens replace_last_token unless @cleaned_tokens.empty? @cleaned_tokens end |