Class: PragmaticSegmenter::AbbreviationReplacer

Inherits:
Object
  • Object
show all
Defined in:
lib/pragmatic_segmenter/abbreviation_replacer.rb

Overview

This class searches for periods within an abbreviation and replaces the periods.

Defined Under Namespace

Modules: AmPmRules

Constant Summary collapse

PossessiveAbbreviationRule =
Rule.new(/\.(?='s\s)|\.(?='s$)|\.(?='s\z)/, '')
KommanditgesellschaftRule =
Rule.new(/(?<=Co)\.(?=\sKG)/, '')
MULTI_PERIOD_ABBREVIATION_REGEX =
/\b[a-z](?:\.[a-z])+[.]/i
SENTENCE_STARTERS =
%w(A Being Did For He How However I In It Millions More She That The There They We What When Where Who Why)

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(text:) ⇒ AbbreviationReplacer

Returns a new instance of AbbreviationReplacer.



37
38
39
# File 'lib/pragmatic_segmenter/abbreviation_replacer.rb', line 37

def initialize(text:)
  @text = Text.new(text)
end

Instance Attribute Details

#textObject (readonly)

Returns the value of attribute text.



36
37
38
# File 'lib/pragmatic_segmenter/abbreviation_replacer.rb', line 36

def text
  @text
end

Instance Method Details

#replaceObject



41
42
43
44
45
46
47
48
49
# File 'lib/pragmatic_segmenter/abbreviation_replacer.rb', line 41

def replace
  @reformatted_text = text.apply(PossessiveAbbreviationRule)
  @reformatted_text = text.apply(KommanditgesellschaftRule)
  @reformatted_text = PragmaticSegmenter::SingleLetterAbbreviation.new(text: @reformatted_text).replace
  @reformatted_text = search_for_abbreviations_in_string(@reformatted_text, abbreviations)
  @reformatted_text = replace_multi_period_abbreviations(@reformatted_text)
  @reformatted_text = @reformatted_text.apply(AmPmRules::All)
  replace_abbreviation_as_sentence_boundary(@reformatted_text)
end