Module: BCP47::Parser

Defined in:
lib/bcp47_spec/parser.rb

Constant Summary collapse

ALPHANUM =

Simplified check. Not implementing high level privateuse / grandfathered. Should replace with a proper check at some point.

/[a-zA-Z\d]/
SINGLETON =
/[\dA-WY-Za-wy-z]/
EXTLANG =
/[a-zA-Z]{3}(-[a-zA-Z]{3}){0,2}/
LANGUAGE =
/([a-zA-Z]{2,3}(-#{EXTLANG})?|[a-zA-Z]{4}|[a-zA-Z]{5,8})/
SCRIPT =
/[a-zA-Z]{4}/
REGION =
/([a-zA-Z]{2}|\d{3})/
VARIANT =
/(#{ALPHANUM}{5,8}|\d#{ALPHANUM}{3})/
EXTENSION =
/#{SINGLETON}(-[a-zA-Z]{2,8})+/
PRIVATEUSE =
/x(-#{ALPHANUM}{1,8})+/
LANGTAG =

Ruby .match only keeps the first captured group, so expressions like variants/extensions we need to keep everything in one captured group, then break them down in multipe groups separately

%r{
  (?<language>#{LANGUAGE})
  (-(?<script>#{SCRIPT}))?
  (-(?<region>#{REGION}))?
  (?<variants>(-#{VARIANT})*)
  (?<extensions>(-#{EXTENSION})*)
  (-(?<private>#{PRIVATEUSE}))?
}x
LANGUAGE_TAG =
/\A#{LANGTAG}\z/

Class Method Summary collapse

Class Method Details

.parse(language_tag) ⇒ Object



109
110
111
112
113
114
115
116
117
# File 'lib/bcp47_spec/parser.rb', line 109

def parse(language_tag)
  return unless match = language_tag.match(LANGUAGE_TAG)

  named_captures(match).tap do |captures|
    captures['variants']   = captures['variants'].to_s.empty? ? [] : captures['variants'][/-(.*)/, 1].split('-').sort
    captures['extensions'] = split_extensions(captures['extensions'])
    captures['private']    = captures['private'].to_s.empty? ? [] : captures['private'][/x-(.*)/, 1].split('-').sort
  end
end