Class: TwitterCldr::Shared::UnicodeRegex

Inherits:
Object
  • Object
show all
Extended by:
Forwardable
Defined in:
lib/twitter_cldr/shared/unicode_regex.rb

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(elements, modifiers = nil) ⇒ UnicodeRegex

Returns a new instance of UnicodeRegex.



58
59
60
61
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 58

def initialize(elements, modifiers = nil)
  @elements = elements
  @modifiers = nil
end

Instance Attribute Details

#elementsObject (readonly)

Returns the value of attribute elements.



56
57
58
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 56

def elements
  @elements
end

#modifiersObject (readonly)

Returns the value of attribute modifiers.



56
57
58
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 56

def modifiers
  @modifiers
end

Class Method Details

.all_unicodeObject

All unicode characters



21
22
23
24
25
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 21

def all_unicode
  @all_unicode ||= TwitterCldr::Utils::RangeSet.new(
    [0..0x10FFFF]
  )
end

.compile(str, modifiers = "", symbol_table = nil) ⇒ Object



12
13
14
15
16
17
18
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 12

def compile(str, modifiers = "", symbol_table = nil)
  new(
    parser.parse(tokenizer.tokenize(str), {
      :symbol_table => symbol_table
    }), modifiers
  )
end

.invalid_regexp_charsObject

A few <control> characters (i.e. 2..7) and public/private surrogates (i.e. 55296..57343). These don’t play nicely with Ruby’s regular expression engine, and I think we can safely disregard them.



30
31
32
33
34
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 30

def invalid_regexp_chars
  @invalid_regexp_chars ||= TwitterCldr::Utils::RangeSet.new(
    [2..7, 55296..57343]
  )
end

.valid_regexp_charsObject



36
37
38
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 36

def valid_regexp_chars
  @valid_regexp_chars ||= all_unicode.subtract(invalid_regexp_chars)
end

Instance Method Details

#to_regexpObject



63
64
65
66
67
68
69
70
71
72
73
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 63

def to_regexp
  if RUBY_VERSION <= "1.8.7"
    begin
      Oniguruma::ORegexp.new(to_regexp_str, modifiers)
    rescue NameError
      raise "Unicode regular expressions require the Oniguruma gem when using Ruby 1.8. Please install, require, and retry."
    end
  else
    @regexp ||= Regexp.new(to_regexp_str, modifiers)
  end
end

#to_regexp_strObject



75
76
77
# File 'lib/twitter_cldr/shared/unicode_regex.rb', line 75

def to_regexp_str
  @regexp_str ||= elements.map(&:to_regexp_str).join
end