Module: PragmaticSegmenter::Languages::Arabic
- Includes:
- Common
- Defined in:
- lib/pragmatic_segmenter/languages/arabic.rb
Defined Under Namespace
Modules: Abbreviation Classes: AbbreviationReplacer, Process
Constant Summary collapse
- Punctuations =
['?', '!', ':', '.', '؟', '،']
- SENTENCE_BOUNDARY_REGEX =
/.*?[:\.!\?؟،]|.*?\z|.*?$/- ReplaceColonBetweenNumbersRule =
Rubular: rubular.com/r/RX5HpdDIyv
Rule.new(/(?<=\d):(?=\d)/, '♭')
- ReplaceNonSentenceBoundaryCommaRule =
Rubular: rubular.com/r/kPRgApNHUg
Rule.new(/،(?=\s\S+،)/, '♬')
Constants included from Common
Common::BETWEEN_DOUBLE_QUOTES_REGEX, Common::CONTINUOUS_PUNCTUATION_REGEX, Common::KommanditgesellschaftRule, Common::MULTI_PERIOD_ABBREVIATION_REGEX, Common::PARENS_BETWEEN_DOUBLE_QUOTES_REGEX, Common::PossessiveAbbreviationRule, Common::QUOTATION_AT_END_OF_SENTENCE_REGEX, Common::SPLIT_SPACE_QUOTATION_AT_END_OF_SENTENCE_REGEX
Constants included from Rules
Rules::AbbreviationsWithMultiplePeriodsAndEmailRule, Rules::ConsecutiveForwardSlashRule, Rules::ConsecutivePeriodsRule, Rules::DoubleNewLineRule, Rules::DoubleNewLineWithSpaceRule, Rules::EscapedCarriageReturnRule, Rules::EscapedNewLineRule, Rules::ExtraWhiteSpaceRule, Rules::GeoLocationRule, Rules::InlineFormattingRule, Rules::NEWLINE_IN_MIDDLE_OF_SENTENCE_REGEX, Rules::NO_SPACE_BETWEEN_SENTENCES_DIGIT_REGEX, Rules::NO_SPACE_BETWEEN_SENTENCES_REGEX, Rules::NewLineFollowedByBulletRule, Rules::NewLineFollowedByPeriodRule, Rules::NewLineInMiddleOfWordRule, Rules::NoSpaceBetweenSentencesDigitRule, Rules::NoSpaceBetweenSentencesRule, Rules::PDF_NewLineInMiddleOfSentenceNoSpacesRule, Rules::PDF_NewLineInMiddleOfSentenceRule, Rules::QuestionMarkInQuotationRule, Rules::QuotationsFirstRule, Rules::QuotationsSecondRule, Rules::ReplaceNewlineWithCarriageReturnRule, Rules::SingleNewLineRule, Rules::SubSingleQuoteRule, Rules::TableOfContentsRule, Rules::TypoEscapedCarriageReturnRule, Rules::TypoEscapedNewLineRule, Rules::URL_EMAIL_KEYWORDS