Class: PragmaticSegmenter::SentenceBoundaryPunctuation

Inherits:
Object
  • Object
show all
Defined in:
lib/pragmatic_segmenter/sentence_boundary_punctuation.rb

Overview

This class splits text at sentence boundary punctuation marks

Constant Summary collapse

SENTENCE_BOUNDARY_REGEX =
/\u{ff08}(?:[^\u{ff09}])*\u{ff09}(?=\s?[A-Z])|\u{300c}(?:[^\u{300d}])*\u{300d}(?=\s[A-Z])|\((?:[^\)]){2,}\)(?=\s[A-Z])|'(?:[^'])*[^,]'(?=\s[A-Z])|"(?:[^"])*[^,]"(?=\s[A-Z])|“(?:[^”])*[^,]”(?=\s[A-Z])|\S.*?[。..!!??ȸȹ☉☈☇☄]/

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(text:) ⇒ SentenceBoundaryPunctuation

Returns a new instance of SentenceBoundaryPunctuation.



9
10
11
# File 'lib/pragmatic_segmenter/sentence_boundary_punctuation.rb', line 9

def initialize(text:)
  @text = text
end

Instance Attribute Details

#textObject (readonly)

Returns the value of attribute text.



8
9
10
# File 'lib/pragmatic_segmenter/sentence_boundary_punctuation.rb', line 8

def text
  @text
end

Instance Method Details

#splitObject



13
14
15
# File 'lib/pragmatic_segmenter/sentence_boundary_punctuation.rb', line 13

def split
  text.scan(SENTENCE_BOUNDARY_REGEX)
end