Class: PragmaticTokenizer::FullStopSeparator

Inherits:
Object
  • Object
show all
Defined in:
lib/pragmatic_tokenizer/full_stop_separator.rb

Overview

This class separates true full stops while ignoring periods that are part of an abbreviation

Constant Summary collapse

REGEXP_ENDS_WITH_DOT =
/\A(.*\w)\.\z/
REGEXP_ONLY_LETTERS =
/\A[a-z]\z/i
REGEXP_ABBREVIATION =
/[a-z](?:\.[a-z])+\z/i
DOT =
'.'.freeze

Instance Method Summary collapse

Constructor Details

#initialize(tokens:, abbreviations:, downcase:) ⇒ FullStopSeparator

Returns a new instance of FullStopSeparator.



13
14
15
16
17
# File 'lib/pragmatic_tokenizer/full_stop_separator.rb', line 13

def initialize(tokens:, abbreviations:, downcase:)
  @tokens        = tokens
  @abbreviations = abbreviations
  @downcase      = downcase
end

Instance Method Details

#separateObject



19
20
21
22
23
# File 'lib/pragmatic_tokenizer/full_stop_separator.rb', line 19

def separate
  @cleaned_tokens = create_cleaned_tokens
  replace_last_token unless @cleaned_tokens.empty?
  @cleaned_tokens
end