Class: PragmaticSegmenter::SentenceBoundaryPunctuation
- Inherits:
-
Object
- Object
- PragmaticSegmenter::SentenceBoundaryPunctuation
- Defined in:
- lib/pragmatic_segmenter/sentence_boundary_punctuation.rb
Overview
This class splits text at sentence boundary punctuation marks
Direct Known Subclasses
Languages::Amharic::SentenceBoundaryPunctuation, Languages::Arabic::SentenceBoundaryPunctuation, Languages::Armenian::SentenceBoundaryPunctuation, Languages::Burmese::SentenceBoundaryPunctuation, Languages::Greek::SentenceBoundaryPunctuation, Languages::Hindi::SentenceBoundaryPunctuation, Languages::Persian::SentenceBoundaryPunctuation, Languages::Urdu::SentenceBoundaryPunctuation
Constant Summary collapse
- SENTENCE_BOUNDARY_REGEX =
/\u{ff08}(?:[^\u{ff09}])*\u{ff09}(?=\s?[A-Z])|\u{300c}(?:[^\u{300d}])*\u{300d}(?=\s[A-Z])|\((?:[^\)]){2,}\)(?=\s[A-Z])|'(?:[^'])*[^,]'(?=\s[A-Z])|"(?:[^"])*[^,]"(?=\s[A-Z])|“(?:[^”])*[^,]”(?=\s[A-Z])|\S.*?[。..!!??ȸȹ☉☈☇☄]/
Instance Attribute Summary collapse
-
#text ⇒ Object
readonly
Returns the value of attribute text.
Instance Method Summary collapse
-
#initialize(text:) ⇒ SentenceBoundaryPunctuation
constructor
A new instance of SentenceBoundaryPunctuation.
- #split ⇒ Object
Constructor Details
#initialize(text:) ⇒ SentenceBoundaryPunctuation
Returns a new instance of SentenceBoundaryPunctuation.
9 10 11 |
# File 'lib/pragmatic_segmenter/sentence_boundary_punctuation.rb', line 9 def initialize(text:) @text = text end |
Instance Attribute Details
#text ⇒ Object (readonly)
Returns the value of attribute text.
8 9 10 |
# File 'lib/pragmatic_segmenter/sentence_boundary_punctuation.rb', line 8 def text @text end |
Instance Method Details
#split ⇒ Object
13 14 15 |
# File 'lib/pragmatic_segmenter/sentence_boundary_punctuation.rb', line 13 def split text.scan(SENTENCE_BOUNDARY_REGEX) end |