Module: Normatron::Filters::KeepFilter
- Extended by:
- Helpers
- Defined in:
- lib/normatron/filters/keep_filter.rb
Class Method Summary collapse
-
.evaluate(input, *properties) ⇒ String
Remove the characters that doesn’t match the given properties.
Methods included from Helpers
acronym_regex, acronyms, evaluate_regexp, inflections, mb_send
Class Method Details
.evaluate(input, *properties) ⇒ String
Raise exception for empty properties
Remove the characters that doesn’t match the given properties. The character properties follow the rule of \p{} construct described in Regexp class. The \p{} construct matches characters with the named property, much like POSIX bracket classes.
To pass named properties to this filter, use them as Symbols:
-
:Alnum- Alphabetic and numeric character -
:Alpha- Alphabetic character -
:Blank- Space or tab -
:Cntrl- Control character -
:Digit- Digit -
:Graph- Non-blank character (excludes spaces, control characters, and similar) -
:Lower- Lowercase alphabetical character -
:Print- Like :Graph, but includes the space character -
:Punct- Punctuation character -
:Space- Whitespace character ([:blank:], newline, carriage return, etc.) -
:Upper- Uppercase alphabetical -
:XDigit- Digit allowed in a hexadecimal number (i.e., 0-9a-fA-F) -
:Word- A member of one of the following Unicode general category Letter, Mark, Number, Connector_Punctuation -
:ASCII- A character in the ASCII character set -
:Any- Any Unicode character (including unassigned characters) -
:Assigned- An assigned character
A Unicode character’s General Category value can also be matched with :Ab where Ab is the category’s abbreviation as described below:
-
:L- ‘Letter’ -
:Ll- ‘Letter: Lowercase’ -
:Lm- ‘Letter: Mark’ -
:Lo- ‘Letter: Other’ -
:Lt- ‘Letter: Titlecase’ -
:Lu- ‘Letter: Uppercase -
:Lo- ‘Letter: Other’ -
:M- ‘Mark’ -
:Mn- ‘Mark: Nonspacing’ -
:Mc- ‘Mark: Spacing Combining’ -
:Me- ‘Mark: Enclosing’ -
:N- ‘Number’ -
:Nd- ‘Number: Decimal Digit’ -
:Nl- ‘Number: Letter’ -
:No- ‘Number: Other’ -
:P- ‘Punctuation’ -
:Pc- ‘Punctuation: Connector’ -
:Pd- ‘Punctuation: Dash’ -
:Ps- ‘Punctuation: Open’ -
:Pe- ‘Punctuation: Close’ -
:Pi- ‘Punctuation: Initial Quote’ -
:Pf- ‘Punctuation: Final Quote’ -
:Po- ‘Punctuation: Other’ -
:S- ‘Symbol’ -
:Sm- ‘Symbol: Math’ -
:Sc- ‘Symbol: Currency’ -
:Sc- ‘Symbol: Currency’ -
:Sk- ‘Symbol: Modifier’ -
:So- ‘Symbol: Other’ -
:Z- ‘Separator’ -
:Zs- ‘Separator: Space’ -
:Zl- ‘Separator: Line’ -
:Zp- ‘Separator: Paragraph’ -
:C- ‘Other’ -
:Cc- ‘Other: Control’ -
:Cf- ‘Other: Format’ -
:Cn- ‘Other: Not Assigned’ -
:Co- ‘Other: Private Use’ -
:Cs- ‘Other: Surrogate’
Lastly, this method matches a character’s Unicode script. The following scripts are supported:
Arabic, Armenian, Balinese, Bengali, Bopomofo, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Inherited, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lycian, Lydian, Malayalam, Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Saurashtra, Shavian, Sinhala, Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai, and Yi.
95 96 97 |
# File 'lib/normatron/filters/keep_filter.rb', line 95 def self.evaluate(input, *properties) input.kind_of?(String) ? evaluate_regexp(input, :keep, properties) : input end |