Class: PragmaticSegmenter::Languages::Japanese::Cleaner
- Defined in:
- lib/pragmatic_segmenter/languages/japanese.rb
Constant Summary collapse
- NewLineInMiddleOfWordRule =
Rubular: rubular.com/r/N4kPuJgle7
Rule.new(/(?<=の)\n(?=\S)/, '')
Constants included from Rules
Rules::AbbreviationsWithMultiplePeriodsAndEmailRule, Rules::ConsecutiveForwardSlashRule, Rules::ConsecutivePeriodsRule, Rules::DoubleNewLineRule, Rules::DoubleNewLineWithSpaceRule, Rules::EscapedCarriageReturnRule, Rules::EscapedNewLineRule, Rules::ExtraWhiteSpaceRule, Rules::GeoLocationRule, Rules::InlineFormattingRule, Rules::NEWLINE_IN_MIDDLE_OF_SENTENCE_REGEX, Rules::NO_SPACE_BETWEEN_SENTENCES_DIGIT_REGEX, Rules::NO_SPACE_BETWEEN_SENTENCES_REGEX, Rules::NewLineFollowedByBulletRule, Rules::NewLineFollowedByPeriodRule, Rules::NoSpaceBetweenSentencesDigitRule, Rules::NoSpaceBetweenSentencesRule, Rules::PDF_NewLineInMiddleOfSentenceNoSpacesRule, Rules::PDF_NewLineInMiddleOfSentenceRule, Rules::QuestionMarkInQuotationRule, Rules::QuotationsFirstRule, Rules::QuotationsSecondRule, Rules::ReplaceNewlineWithCarriageReturnRule, Rules::SingleNewLineRule, Rules::SubSingleQuoteRule, Rules::TableOfContentsRule, Rules::TypoEscapedCarriageReturnRule, Rules::TypoEscapedNewLineRule, Rules::URL_EMAIL_KEYWORDS
Instance Attribute Summary
Attributes inherited from Cleaner
Instance Method Summary collapse
Methods inherited from Cleaner
Constructor Details
This class inherits a constructor from PragmaticSegmenter::Cleaner
Instance Method Details
#clean ⇒ Object
18 19 20 21 |
# File 'lib/pragmatic_segmenter/languages/japanese.rb', line 18 def clean super @clean_text = remove_newline_in_middle_of_word(@clean_text) end |