Class: TwitterCldr::Segmentation::RuleSetBuilder

Inherits:
Object
  • Object
show all
Defined in:
lib/twitter_cldr/segmentation/rule_set_builder.rb

Class Method Summary collapse

Class Method Details

.exception_rule_for(locale, boundary_type) ⇒ Object

See the comment above exceptions_for. Basically, we only support exceptions for the “sentence” boundary type since the ULI JSON data doesn’t distinguish between boundary types.



19
20
21
22
23
24
25
26
27
28
# File 'lib/twitter_cldr/segmentation/rule_set_builder.rb', line 19

def exception_rule_for(locale, boundary_type)
  cache_key = TwitterCldr::Utils.compute_cache_key(locale, boundary_type)
  exceptions_cache[cache_key] ||= begin
    exceptions = exceptions_for(locale, boundary_type)
    regex_contents = exceptions.map { |exc| Regexp.escape(exc) }.join("|")
    parse("(?:#{regex_contents}) ×", nil).tap do |rule|
      rule.id = 0
    end
  end
end

.implicit_end_of_text_ruleObject

The implicit initial rules are always “start-of-text ÷” and “÷ end-of-text”. We don’t need the start-of-text one.



40
41
42
43
44
45
# File 'lib/twitter_cldr/segmentation/rule_set_builder.rb', line 40

def implicit_end_of_text_rule
  @implicit_end_of_text_rule ||=
    parse('.\z ÷', nil).tap do |rule|
      rule.id = 9998
    end
end

.implicit_final_ruleObject

The implicit final rule is always “Any ÷ Any”



31
32
33
34
35
36
# File 'lib/twitter_cldr/segmentation/rule_set_builder.rb', line 31

def implicit_final_rule
  @implicit_final_rule ||=
    parse('. ÷ .', nil).tap do |rule|
      rule.id = 9999
    end
end

.load(locale, boundary_type, options = {}) ⇒ Object



11
12
13
14
# File 'lib/twitter_cldr/segmentation/rule_set_builder.rb', line 11

def load(locale, boundary_type, options = {})
  rules = compile_rules_for(boundary_type)
  RuleSet.new(locale, rules, boundary_type, options)
end