Class: Regex::Character
- Inherits:
-
AtomicExpression
- Object
- Expression
- AtomicExpression
- Regex::Character
- Defined in:
- lib/regex/character.rb
Overview
A regular expression that matches a specific character in a given character set
Constant Summary collapse
- DigramSequences =
Constant with all special 2-characters escape sequences
{ "\\a" => 0x7, # alarm "\\n" => 0xA, # newline "\\r" => 0xD, # carriage return "\\t" => 0x9, # tab "\\e" => 0x1B, # escape "\\f" => 0xC, # form feed "\\v" => 0xB, # vertical feed # Single octal digit literals "\\0" => 0, "\\1" => 1, "\\2" => 2, "\\3" => 3, "\\4" => 4, "\\5" => 5, "\\6" => 6, "\\7" => 7 }.freeze
- MetaChars =
'\^$+?.'.freeze
Instance Attribute Summary collapse
-
#codepoint ⇒ Object
readonly
The integer value that uniquely identifies the character.
-
#lexeme ⇒ Object
readonly
The initial text representation of the character (if any).
Attributes inherited from Expression
Class Method Summary collapse
-
.char2codepoint(aChar) ⇒ Object
Convertion method that returns the codepoint for the given single character.
-
.codepoint2char(aCodepoint) ⇒ Object
Convertion method that returns a character given a codepoint (integer) value.
-
.esc2codepoint(anEscapeSequence) ⇒ Object
Convertion method that returns the codepoint for the given escape sequence (a String).
Instance Method Summary collapse
-
#==(other) ⇒ Object
Returns true iff this Character and parameter 'another' represent the same character.
-
#char ⇒ Object
Return the character as a String object.
-
#explain ⇒ Object
Return a plain English description of the character.
-
#initialize(aValue) ⇒ Character
constructor
Constructor.
Methods inherited from AtomicExpression
Methods inherited from Expression
#atomic?, #cardinality, #options, #to_str
Constructor Details
#initialize(aValue) ⇒ Character
Constructor. [aValue] Initialize the character with a either a String literal or a codepoint value. Examples: Initializing with codepoint value... RegAn::Character.new(0x3a3) # Represents: Σ (Unicode GREEK CAPITAL LETTER SIGMA) RegAn::Character.new(931) # Also represents: Σ (931 dec == 3a3 hex)
Initializing with a single character string RegAn::Character.new(?\u03a3) # Also represents: Σ RegAn::Character.new('Σ') # Obviously, represents a Σ
Initializing with an escape sequence string Recognized escaped characters are: \a (alarm, 0x07), \n (newline, 0xA), \r (carriage return, 0xD), \t (tab, 0x9), \e (escape, 0x1B), \f (form feed, 0xC) \uXXXX where XXXX is a 4 hex digits integer value, \uX..., \ooo (octal) \xXX (hex) Any other escaped character will be treated as a literal character RegAn::Character.new('\n') # Represents a newline RegAn::Character.new('\u03a3') # Represents a Σ
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
# File 'lib/regex/character.rb', line 58 def initialize(aValue) case aValue when String if aValue.size == 1 # Literal single character case... @codepoint = self.class.char2codepoint(aValue) else # Should be an escape sequence... @codepoint = self.class.esc2codepoint(aValue) end @lexeme = aValue when Integer @codepoint = aValue else raise StandardError, "Cannot initialize a Character with a '#{aValue}'." end end |
Instance Attribute Details
#codepoint ⇒ Object (readonly)
The integer value that uniquely identifies the character.
31 32 33 |
# File 'lib/regex/character.rb', line 31 def codepoint @codepoint end |
#lexeme ⇒ Object (readonly)
The initial text representation of the character (if any).
34 35 36 |
# File 'lib/regex/character.rb', line 34 def lexeme @lexeme end |
Class Method Details
.char2codepoint(aChar) ⇒ Object
Convertion method that returns the codepoint for the given single character. Example: RegAn::Character::char2codepoint('Σ') # Returns: 0x3a3
88 89 90 |
# File 'lib/regex/character.rb', line 88 def self.char2codepoint(aChar) return aChar.ord end |
.codepoint2char(aCodepoint) ⇒ Object
Convertion method that returns a character given a codepoint (integer) value. Example: RegAn::Character::codepoint2char(0x3a3) # Returns: Σ ( The Unicode GREEK CAPITAL LETTER SIGMA)
81 82 83 |
# File 'lib/regex/character.rb', line 81 def self.codepoint2char(aCodepoint) return [aCodepoint].pack('U') # Remark: chr() fails with codepoints > 256 end |
.esc2codepoint(anEscapeSequence) ⇒ Object
Convertion method that returns the codepoint for the given escape sequence (a String). Recognized escaped characters are: \a (alarm, 0x07), \n (newline, 0xA), \r (carriage return, 0xD), \t (tab, 0x9), \e (escape, 0x1B), \f (form feed, 0xC), \v (vertical feed, 0xB) \uXXXX where XXXX is a 4 hex digits integer value, \uX..., \ooo (octal) \xXX (hex) Any other escaped character will be treated as a literal character Example: RegAn::Character::esc2codepoint('\n') # Returns: 0xd
102 103 104 105 106 107 108 |
# File 'lib/regex/character.rb', line 102 def self.esc2codepoint(anEscapeSequence) msg = "Escape sequence #{anEscapeSequence} does not begin with a backslash (\)." raise StandardError, msg unless anEscapeSequence[0] == "\\" result = (anEscapeSequence.length == 2)? digram2codepoint(anEscapeSequence) : esc_number2codepoint(anEscapeSequence) return result end |
Instance Method Details
#==(other) ⇒ Object
Returns true iff this Character and parameter 'another' represent the same character. [another] any Object. The way the equality is tested depends on the another's class Example: newOne = Character.new(?\u03a3) newOne == newOne # true. Identity newOne == Character.new(?\u03a3) # true. Both have same codepoint newOne == ?\u03a3 # true. The single character String match exactly the char attribute. newOne == 0x03a3 # true. The Integer is compared to the codepoint value. Will test equality with any Object that knows the to_s method
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
# File 'lib/regex/character.rb', line 124 def ==(other) result = case other when Character self.to_str == other.to_str when Integer self.codepoint == other when String other.size > 1 ? false : to_str == other else # Unknown type: try with a convertion self == other.to_s # Recursive call end return result end |
#char ⇒ Object
Return the character as a String object
111 112 113 |
# File 'lib/regex/character.rb', line 111 def char() self.class.codepoint2char(@codepoint) end |
#explain ⇒ Object
Return a plain English description of the character
144 145 146 |
# File 'lib/regex/character.rb', line 144 def explain() return "the character '#{to_str}'" end |