Module: Lang::Tag::Canonicalization
- Included in:
- Langtag
- Defined in:
- lib/lang/tag/canonicalization.rb
Defined Under Namespace
Classes: Error
Constant Summary collapse
- PRIVATE_LANGUAGE_REGEX =
– RFC 5646, Section 2.2.1 The subtags in the range ‘qaa’ through ‘qtz’ are reserved for private use in language tags. These subtags correspond to codes reserved by ISO 639-2 for private use. These codes MAY be used for non-registered primary language subtags (instead of using private use subtags following ‘x-’). ++
/^q[a-t][a-z]$/i.freeze
- PRIVATE_SCRIPT_REGEX =
– RFC 5646, Section 2.2.3 The script subtags ‘Qaaa’ through ‘Qabx’ are reserved for private use in language tags. These subtags correspond to codes reserved by ISO 15924 for private use. These codes MAY be used for non- registered script values. Please refer to Section 4.6 for more information on private use subtags. ++
/^Qa[ab][a-x]$/i.freeze
- PRIVATE_REGION_REGEX =
– RFC 5646, Section 2.2.4 The region subtags ‘AA’, ‘QM’-‘QZ’, ‘XA’-‘XZ’, and ‘ZZ’ are reserved for private use in language tags. These subtags correspond to codes reserved by ISO 3166 for private use. These codes MAY be used for private use region subtags (instead of using a private use subtag sequence). Please refer to Section 4.6 for more information on private use subtags. ++
/^(?:AA|Q[M-Z]|X[A-Z]|ZZ)$/i.freeze
- PREFIX_REGEX =
– RFC 5646, Section 3.1.8 The ‘Prefix’ also indicates when variant subtags make sense when used together (many that otherwise share a ‘Prefix’ are mutually exclusive) and what the relative ordering of variants is supposed to be. For example, the variant ‘1994’ (Standardized Resian orthography) has several ‘Prefix’ fields in the registry (“sl-rozaj”, “sl-rozaj-biske”, “sl-rozaj-njiva”, “sl-rozaj-osojs”, and “sl-rozaj- solba”). This indicates not only that ‘1994’ is appropriate to use with each of these five Resian variant subtags (‘rozaj’, ‘biske’, ‘njiva’, ‘osojs’, and ‘solba’), but also that it SHOULD appear following any of these variants in a tag. Thus, the language tag ought to take the form “sl-rozaj-biske-1994”, rather than “sl-1994- rozaj-biske” or “sl-rozaj-1994-biske”. ++
/^(#{PATTERN::LANGUAGE})(?:-(#{PATTERN::SCRIPT}))?(?:-(#{PATTERN::REGION}))?(?:-(.+))?$/io.freeze
Instance Method Summary collapse
- #canonicalize ⇒ Object (also: #to_canonical_form)
- #canonicalize! ⇒ Object (also: #to_canonical_form!)
-
#same?(other) ⇒ Boolean
– RFC 5646, Section 3.1.7 For example, the tags “zh-yue-Hant-HK” and “yue-Hant-HK” are semantically equivalent and ought to be treated as if they were the same tag.
- #suppress_script ⇒ Object
-
#suppress_script! ⇒ Object
– RFC 5646, Section 4.1 The script subtag SHOULD NOT be used to form language tags unless the script adds some distinguishing information to the tag.
- #to_extlang_form ⇒ Object
-
#to_extlang_form! ⇒ Object
– RFC 5646, Section 4.5 For example, “hak-CN” (Hakka, China) has the primary language subtag ‘hak’, which in turn has an ‘extlang’ record with a ‘Prefix’ ‘zh’ (Chinese).
Instance Method Details
#canonicalize ⇒ Object Also known as: to_canonical_form
253 254 255 256 257 |
# File 'lib/lang/tag/canonicalization.rb', line 253 def canonicalize duplicated = self.dup duplicated.canonicalize! duplicated end |
#canonicalize! ⇒ Object Also known as: to_canonical_form!
259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 |
# File 'lib/lang/tag/canonicalization.rb', line 259 def canonicalize! # 1. Extension sequences are ordered into case-insensitive ASCII order # by singleton subtag. canonicalize_extensions # A redundant tag is a grandfathered # registration whose individual subtags appear with the same semantic # meaning in the registry. For example, the tag "zh-Hant" (Traditional # Chinese) can now be composed from the subtags 'zh' (Chinese) and # 'Hant' (Han script traditional variant). These redundant tags are # maintained in the registry as records of type 'redundant', mostly as # a matter of historical curiosity. # 2. Redundant or grandfathered tags are replaced by their 'Preferred- # Value', if there is one. if re = Subtags::Redundant(composition) return recompose(re.preferred_value) if re.preferred_value end # 3. Subtags are replaced by their 'Preferred-Value', if there is one. # For extlangs, the original primary language subtag is also # replaced if there is a primary language subtag in the 'Preferred- # Value'. canonicalize_language canonicalize_script canonicalize_region canonicalize_variants nil end |
#same?(other) ⇒ Boolean
– RFC 5646, Section 3.1.7 For example, the tags “zh-yue-Hant-HK” and “yue-Hant-HK” are semantically equivalent and ought to be treated as if they were the same tag. ++
249 250 251 |
# File 'lib/lang/tag/canonicalization.rb', line 249 def same?(other) self.canonicalize == other.canonicalize end |
#suppress_script ⇒ Object
361 362 363 364 365 |
# File 'lib/lang/tag/canonicalization.rb', line 361 def suppress_script duplicated = self.dup duplicated.suppress_script! duplicated end |
#suppress_script! ⇒ Object
– RFC 5646, Section 4.1 The script subtag SHOULD NOT be used to form language tags unless the script adds some distinguishing information to the tag. … The field ‘Suppress-Script’ in the primary or extended language record in the registry indicates script subtags that do not add distinguishing information for most applications; this field defines when users SHOULD NOT include a script subtag with a particular primary language subtag.
For example, if an implementation selects content using Basic Filtering [RFC4647] (originally described in Section 14.4 of [RFC2616]) and the user requested the language range “en-US”, content labeled “en-Latn-US” will not match the request and thus not be selected. Therefore, it is important to know when script subtags will customarily be used and when they ought not be used. ++
340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 |
# File 'lib/lang/tag/canonicalization.rb', line 340 def suppress_script! return unless @script && @language decompose_language unless @primary return if PRIVATE_LANGUAGE_REGEX === @primary subtag = Subtags::Language(@primary) raise Error, "Language #{@primary.inspect} is not registered." unless subtag if subtag.suppress_script && @script == subtag.suppress_script @script = nil dirty #elsif @extlang # subtag = Subtags::Extlang(@extlang) # raise Error, "Extlang #{@extlang.inspect} is not registered." unless subtag # if subtag.suppress_script && @script == subtag.suppress_script # dirty # end end nil end |
#to_extlang_form ⇒ Object
315 316 317 318 319 |
# File 'lib/lang/tag/canonicalization.rb', line 315 def to_extlang_form duplicated = self.dup duplicated.to_extlang_form! duplicated end |
#to_extlang_form! ⇒ Object
– RFC 5646, Section 4.5 For example, “hak-CN” (Hakka, China) has the primary language subtag ‘hak’, which in turn has an ‘extlang’ record with a ‘Prefix’ ‘zh’ (Chinese). The extlang form is “zh-hak-CN” (Chinese, Hakka, China). ++
305 306 307 308 309 310 311 312 313 |
# File 'lib/lang/tag/canonicalization.rb', line 305 def to_extlang_form! canonicalize! subtag = Subtags::Extlang(@language) @primary = subtag.prefix @extlang = @language @language = "#{@primary}#{HYPHEN}#{@extlang}" dirty nil end |