Module: Normatron::Filters::KeepFilter
- Extended by:
- Helpers
- Defined in:
- lib/normatron/filters/keep_filter.rb
Class Method Summary collapse
-
.evaluate(input, *properties) ⇒ String
Remove the characters that doesn’t match the given properties.
Methods included from Helpers
acronym_regex, acronyms, evaluate_regexp, inflections, mb_send
Class Method Details
.evaluate(input, *properties) ⇒ String
Raise exception for empty properties
Remove the characters that doesn’t match the given properties. The character properties follow the rule of \p{} construct described in Regexp class. The \p{} construct matches characters with the named property, much like POSIX bracket classes.
To pass named properties to this filter, use them as Symbols:
-
:Alnum
- Alphabetic and numeric character -
:Alpha
- Alphabetic character -
:Blank
- Space or tab -
:Cntrl
- Control character -
:Digit
- Digit -
:Graph
- Non-blank character (excludes spaces, control characters, and similar) -
:Lower
- Lowercase alphabetical character -
:Print
- Like :Graph, but includes the space character -
:Punct
- Punctuation character -
:Space
- Whitespace character ([:blank:], newline, carriage return, etc.) -
:Upper
- Uppercase alphabetical -
:XDigit
- Digit allowed in a hexadecimal number (i.e., 0-9a-fA-F) -
:Word
- A member of one of the following Unicode general category Letter, Mark, Number, Connector_Punctuation -
:ASCII
- A character in the ASCII character set -
:Any
- Any Unicode character (including unassigned characters) -
:Assigned
- An assigned character
A Unicode character’s General Category value can also be matched with :Ab where Ab is the category’s abbreviation as described below:
-
:L
- ‘Letter’ -
:Ll
- ‘Letter: Lowercase’ -
:Lm
- ‘Letter: Mark’ -
:Lo
- ‘Letter: Other’ -
:Lt
- ‘Letter: Titlecase’ -
:Lu
- ‘Letter: Uppercase -
:Lo
- ‘Letter: Other’ -
:M
- ‘Mark’ -
:Mn
- ‘Mark: Nonspacing’ -
:Mc
- ‘Mark: Spacing Combining’ -
:Me
- ‘Mark: Enclosing’ -
:N
- ‘Number’ -
:Nd
- ‘Number: Decimal Digit’ -
:Nl
- ‘Number: Letter’ -
:No
- ‘Number: Other’ -
:P
- ‘Punctuation’ -
:Pc
- ‘Punctuation: Connector’ -
:Pd
- ‘Punctuation: Dash’ -
:Ps
- ‘Punctuation: Open’ -
:Pe
- ‘Punctuation: Close’ -
:Pi
- ‘Punctuation: Initial Quote’ -
:Pf
- ‘Punctuation: Final Quote’ -
:Po
- ‘Punctuation: Other’ -
:S
- ‘Symbol’ -
:Sm
- ‘Symbol: Math’ -
:Sc
- ‘Symbol: Currency’ -
:Sc
- ‘Symbol: Currency’ -
:Sk
- ‘Symbol: Modifier’ -
:So
- ‘Symbol: Other’ -
:Z
- ‘Separator’ -
:Zs
- ‘Separator: Space’ -
:Zl
- ‘Separator: Line’ -
:Zp
- ‘Separator: Paragraph’ -
:C
- ‘Other’ -
:Cc
- ‘Other: Control’ -
:Cf
- ‘Other: Format’ -
:Cn
- ‘Other: Not Assigned’ -
:Co
- ‘Other: Private Use’ -
:Cs
- ‘Other: Surrogate’
Lastly, this method matches a character’s Unicode script. The following scripts are supported:
Arabic, Armenian, Balinese, Bengali, Bopomofo, Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Cham, Cherokee, Common, Coptic, Cuneiform, Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, Inherited, Kannada, Katakana, Kayah_Li, Kharoshthi, Khmer, Lao, Latin, Lepcha, Limbu, Linear_B, Lycian, Lydian, Malayalam, Mongolian, Myanmar, New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_Persian, Oriya, Osmanya, Phags_Pa, Phoenician, Rejang, Runic, Saurashtra, Shavian, Sinhala, Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan, Tifinagh, Ugaritic, Vai, and Yi.
95 96 97 |
# File 'lib/normatron/filters/keep_filter.rb', line 95 def self.evaluate(input, *properties) input.kind_of?(String) ? evaluate_regexp(input, :keep, properties) : input end |