Module: PragmaticSegmenter::Languages::Deutsch

Includes:
Common
Defined in:
lib/pragmatic_segmenter/languages/deutsch.rb

Defined Under Namespace

Modules: Abbreviation Classes: AbbreviationReplacer, BetweenPunctuation, Cleaner, Number, Process

Constant Summary collapse

BETWEEN_UNCONVENTIONAL_DOUBLE_QUOTE_DE_REGEX =
/,,(?>[^“\\]+|\\{2}|\\.)*“/
SPLIT_DOUBLE_QUOTES_DE_REGEX =
/\A„(?>[^“\\]+|\\{2}|\\.)*“/
BETWEEN_DOUBLE_QUOTES_DE_REGEX =
/„(?>[^“\\]+|\\{2}|\\.)*“/
NumberPeriodSpaceRule =
Rule.new(/(?<=\s[0-9]|\s([1-9][0-9]))\.(?=\s)/, '∯')
NegativeNumberPeriodSpaceRule =
Rule.new(/(?<=-[0-9]|-([1-9][0-9]))\.(?=\s)/, '∯')
MONTHS =
['Januar', 'Februar', 'März', 'April', 'Mai', 'Juni', 'Juli', 'August', 'September', 'Oktober', 'November', 'Dezember']
SingleLowerCaseLetterRule =
Rule.new(/(?<=\s[a-z])\.(?=\s)/, '∯')
SingleLowerCaseLetterAtStartOfLineRule =
Rule.new(/(?<=^[a-z])\.(?=\s)/, '∯')

Constants included from Common

Common::BETWEEN_DOUBLE_QUOTES_REGEX, Common::CONTINUOUS_PUNCTUATION_REGEX, Common::KommanditgesellschaftRule, Common::MULTI_PERIOD_ABBREVIATION_REGEX, Common::PARENS_BETWEEN_DOUBLE_QUOTES_REGEX, Common::PossessiveAbbreviationRule, Common::Punctuations, Common::QUOTATION_AT_END_OF_SENTENCE_REGEX, Common::SENTENCE_BOUNDARY_REGEX, Common::SPLIT_SPACE_QUOTATION_AT_END_OF_SENTENCE_REGEX

Constants included from Rules

Rules::AbbreviationsWithMultiplePeriodsAndEmailRule, Rules::ConsecutiveForwardSlashRule, Rules::ConsecutivePeriodsRule, Rules::DoubleNewLineRule, Rules::DoubleNewLineWithSpaceRule, Rules::EscapedCarriageReturnRule, Rules::EscapedNewLineRule, Rules::ExtraWhiteSpaceRule, Rules::GeoLocationRule, Rules::InlineFormattingRule, Rules::NEWLINE_IN_MIDDLE_OF_SENTENCE_REGEX, Rules::NO_SPACE_BETWEEN_SENTENCES_DIGIT_REGEX, Rules::NO_SPACE_BETWEEN_SENTENCES_REGEX, Rules::NewLineFollowedByBulletRule, Rules::NewLineFollowedByPeriodRule, Rules::NewLineInMiddleOfWordRule, Rules::NoSpaceBetweenSentencesDigitRule, Rules::NoSpaceBetweenSentencesRule, Rules::PDF_NewLineInMiddleOfSentenceNoSpacesRule, Rules::PDF_NewLineInMiddleOfSentenceRule, Rules::QuestionMarkInQuotationRule, Rules::QuotationsFirstRule, Rules::QuotationsSecondRule, Rules::ReplaceNewlineWithCarriageReturnRule, Rules::SingleNewLineRule, Rules::SubSingleQuoteRule, Rules::TableOfContentsRule, Rules::TypoEscapedCarriageReturnRule, Rules::TypoEscapedNewLineRule, Rules::URL_EMAIL_KEYWORDS