Module: Lang::Tag::Canonicalization

Included in:
Langtag
Defined in:
lib/lang/tag/canonicalization.rb

Defined Under Namespace

Classes: Error

Constant Summary collapse

PRIVATE_LANGUAGE_REGEX =

– RFC 5646, Section 2.2.1 The subtags in the range ‘qaa’ through ‘qtz’ are reserved for private use in language tags. These subtags correspond to codes reserved by ISO 639-2 for private use. These codes MAY be used for non-registered primary language subtags (instead of using private use subtags following ‘x-’). ++

/^q[a-t][a-z]$/i.freeze
PRIVATE_SCRIPT_REGEX =

– RFC 5646, Section 2.2.3 The script subtags ‘Qaaa’ through ‘Qabx’ are reserved for private use in language tags. These subtags correspond to codes reserved by ISO 15924 for private use. These codes MAY be used for non- registered script values. Please refer to Section 4.6 for more information on private use subtags. ++

/^Qa[ab][a-x]$/i.freeze
PRIVATE_REGION_REGEX =

– RFC 5646, Section 2.2.4 The region subtags ‘AA’, ‘QM’-‘QZ’, ‘XA’-‘XZ’, and ‘ZZ’ are reserved for private use in language tags. These subtags correspond to codes reserved by ISO 3166 for private use. These codes MAY be used for private use region subtags (instead of using a private use subtag sequence). Please refer to Section 4.6 for more information on private use subtags. ++

/^(?:AA|Q[M-Z]|X[A-Z]|ZZ)$/i.freeze
PREFIX_REGEX =

– RFC 5646, Section 3.1.8 The ‘Prefix’ also indicates when variant subtags make sense when used together (many that otherwise share a ‘Prefix’ are mutually exclusive) and what the relative ordering of variants is supposed to be. For example, the variant ‘1994’ (Standardized Resian orthography) has several ‘Prefix’ fields in the registry (“sl-rozaj”, “sl-rozaj-biske”, “sl-rozaj-njiva”, “sl-rozaj-osojs”, and “sl-rozaj- solba”). This indicates not only that ‘1994’ is appropriate to use with each of these five Resian variant subtags (‘rozaj’, ‘biske’, ‘njiva’, ‘osojs’, and ‘solba’), but also that it SHOULD appear following any of these variants in a tag. Thus, the language tag ought to take the form “sl-rozaj-biske-1994”, rather than “sl-1994- rozaj-biske” or “sl-rozaj-1994-biske”. ++

/^(#{PATTERN::LANGUAGE})(?:-(#{PATTERN::SCRIPT}))?(?:-(#{PATTERN::REGION}))?(?:-(.+))?$/io.freeze

Instance Method Summary collapse

Instance Method Details

#canonicalizeObject Also known as: to_canonical_form



253
254
255
256
257
# File 'lib/lang/tag/canonicalization.rb', line 253

def canonicalize
  duplicated = self.dup
  duplicated.canonicalize!
  duplicated
end

#canonicalize!Object Also known as: to_canonical_form!



259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
# File 'lib/lang/tag/canonicalization.rb', line 259

def canonicalize!

  # 1. Extension sequences are ordered into case-insensitive ASCII order
  # by singleton subtag.

  canonicalize_extensions

  # A redundant tag is a grandfathered
  # registration whose individual subtags appear with the same semantic
  # meaning in the registry. For example, the tag "zh-Hant" (Traditional
  # Chinese) can now be composed from the subtags 'zh' (Chinese) and
  # 'Hant' (Han script traditional variant). These redundant tags are
  # maintained in the registry as records of type 'redundant', mostly as
  # a matter of historical curiosity.

  # 2. Redundant or grandfathered tags are replaced by their 'Preferred-
  # Value', if there is one.

  if re = Subtags::Redundant(composition)
    return recompose(re.preferred_value) if re.preferred_value
  end

  # 3. Subtags are replaced by their 'Preferred-Value', if there is one.
  # For extlangs, the original primary language subtag is also
  # replaced if there is a primary language subtag in the 'Preferred-
  # Value'.

  canonicalize_language
  canonicalize_script
  canonicalize_region
  canonicalize_variants

  nil
end

#same?(other) ⇒ Boolean

– RFC 5646, Section 3.1.7 For example, the tags “zh-yue-Hant-HK” and “yue-Hant-HK” are semantically equivalent and ought to be treated as if they were the same tag. ++

Returns:

  • (Boolean)


249
250
251
# File 'lib/lang/tag/canonicalization.rb', line 249

def same?(other)
  self.canonicalize == other.canonicalize
end

#suppress_scriptObject



361
362
363
364
365
# File 'lib/lang/tag/canonicalization.rb', line 361

def suppress_script
  duplicated = self.dup
  duplicated.suppress_script!
  duplicated
end

#suppress_script!Object

– RFC 5646, Section 4.1 The script subtag SHOULD NOT be used to form language tags unless the script adds some distinguishing information to the tag. … The field ‘Suppress-Script’ in the primary or extended language record in the registry indicates script subtags that do not add distinguishing information for most applications; this field defines when users SHOULD NOT include a script subtag with a particular primary language subtag.

For example, if an implementation selects content using Basic Filtering [RFC4647] (originally described in Section 14.4 of [RFC2616]) and the user requested the language range “en-US”, content labeled “en-Latn-US” will not match the request and thus not be selected. Therefore, it is important to know when script subtags will customarily be used and when they ought not be used. ++

Raises:



340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
# File 'lib/lang/tag/canonicalization.rb', line 340

def suppress_script!
  return unless @script && @language
  decompose_language unless @primary

  return if PRIVATE_LANGUAGE_REGEX === @primary

  subtag = Subtags::Language(@primary)
  raise Error, "Language #{@primary.inspect} is not registered." unless subtag
  if subtag.suppress_script && @script == subtag.suppress_script
    @script = nil
    dirty
  #elsif @extlang
  #  subtag = Subtags::Extlang(@extlang)
  #  raise Error, "Extlang #{@extlang.inspect} is not registered." unless subtag
  #  if subtag.suppress_script && @script == subtag.suppress_script
  #    dirty
  #  end
  end
  nil
end

#to_extlang_formObject



315
316
317
318
319
# File 'lib/lang/tag/canonicalization.rb', line 315

def to_extlang_form
  duplicated = self.dup
  duplicated.to_extlang_form!
  duplicated
end

#to_extlang_form!Object

– RFC 5646, Section 4.5 For example, “hak-CN” (Hakka, China) has the primary language subtag ‘hak’, which in turn has an ‘extlang’ record with a ‘Prefix’ ‘zh’ (Chinese). The extlang form is “zh-hak-CN” (Chinese, Hakka, China). ++



305
306
307
308
309
310
311
312
313
# File 'lib/lang/tag/canonicalization.rb', line 305

def to_extlang_form!
  canonicalize!
  subtag = Subtags::Extlang(@language)
  @primary = subtag.prefix
  @extlang = @language
  @language = "#{@primary}#{HYPHEN}#{@extlang}"
  dirty
  nil
end