Class: Langtag
Overview
Langtag class, implementing BCP 47 (currently RFC 4646) IETF language tags. Provides decomposition of language tags into components, and wellformedness check.
Accessor methods
Getting: language, script, region, variants, extensions, private. Setting: language=, script=, region=, variants=, extensions=, private=. Variants and extensions accessors get/set Arrays, the other accessors get/set Strings. Because of the way Ruby assignement methods are implemented, manipulating variants and extensions with e.g.
myLangtag.variants += ['e-Extension']
(adding ‘e-Extension’ as an extension to whatever extensions myLangtag
already has) is possible. Similarly,
myLangtag.variants -= ['e-Extension']
will again remove the extension.
Constant Summary collapse
- Irregular =
Array of irregular language tags
['en-gb-oed', 'i-ami', 'i-bnn', 'i-default', 'i-enochian', 'i-hak', 'i-klingon', 'i-lux', 'i-mingo', 'i-navajo', 'i-pwn', 'i-tao', 'i-tay', 'i-tsu', 'sgn-be-fr', 'sgn-be-nl', 'sgn-ch-de']
- Grandfathered =
Array of grandfathered language tags
Irregular + ['art-lojban', 'cel-gaulish', 'no-bok', 'no-nyn', 'zh-cmn', 'zh-cmn-hans', 'zh-cmn-hant', 'zh-gan', 'zh-guoyu', 'zh-hakka', 'zh-min', 'zh-min-nan', 'zh-wuu', 'zh-xiang', 'zh-yue']
Instance Method Summary collapse
-
#compose ⇒ Object
compose the langtag from parts, joining with ‘-’ flatten first to deal with @variants/@extentsions with are arrays then compact to remove nil values (mainly internal use).
-
#decompose ⇒ Object
decompose a language tag into parts (mainly internal use).
-
#grandfathered? ⇒ Boolean
returns true if language tag is grandfathered, false otherwise.
-
#initialize(s) ⇒ Langtag
constructor
A new instance of Langtag.
-
#irregular? ⇒ Boolean
returns true if language tag is irregular, false otherwise.
-
#nicecase ⇒ Object
non-descructive variant of nicecase!: returns a nicecased copy.
-
#nicecase! ⇒ Object
changes case to look ‘nice’ (regions are UPPER-CASE, scripts are Title-Case, everything else is lower case.
-
#wellformed? ⇒ Boolean
returns true if language tag is well-formed, false otherwise.
Constructor Details
#initialize(s) ⇒ Langtag
Returns a new instance of Langtag.
44 45 46 47 |
# File 'lib/langtag.rb', line 44 def initialize (s) super(s) decompose end |
Instance Method Details
#compose ⇒ Object
compose the langtag from parts, joining with ‘-’ flatten first to deal with @variants/@extentsions with are arrays then compact to remove nil values (mainly internal use)
102 103 104 105 |
# File 'lib/langtag.rb', line 102 def compose replace([@language, @script, @region, @variants, @extensions, @private].flatten.compact.join('-')) end |
#decompose ⇒ Object
decompose a language tag into parts (mainly internal use)
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
# File 'lib/langtag.rb', line 108 def decompose # check if we really need to decompose again if @saved == self.to_str return end # initialize everything s = @saved = self.to_str # save for check next time around @wellformed = true # assume well-formed @language = @script = @region = @private = nil @variants = @extensions = [] # deal with irregular and completely private langtags if irregular? || s =~ /^x-/i @language = s return end # check well-formedness with a single regular expression, # except for irregulars (checked above) and multiple # occurrences of the same extension (checked below) # notice /i modifier for case insensitive matching if not(s =~ /^([a-z]{2,3} # shortest ISO 639 language (-[a-z]{3}){0,3} # with optional extensions |[a-z]{4,8}) # or reserved\registered (-[a-z]{4})? # optional script (-([a-z]{2}|\d{3}))? # optional region (-([a-z0-9]{5,8}|\d[a-z0-9]{3}))* # optional variants (-[a-wyz0-9](-[a-z0-9]{2,8})+)* # optional extensions (-x(-[a-z0-9]{1,8})+)? # optional private use part $/ix) @wellformed = false end # extract language if s =~ /^(([a-z]{2,3}(-[a-z]{3}){0,3}|[a-z]{4,8}))(-|$)/i @language, s = $1, $' end # extract private use tail if s =~ /(^|-)(x-.*)$/i s, @private = $`, $2 end # extract extensions and check for duplicates @extensions = Array.collect do if s =~ /(^|-)([a-wyz0-9](-[a-z0-9]{2,8})+)$/i s = $` $2 else nil end end @extensions.reverse! # put back in order if !((@extensions.collect {|ext| ext[0..1].downcase}).uniq?) @wellformed = false end if s =~ /(^|-)([a-z]{4})(-|$)/i # extract script @script = $2 end if s =~ /(^|-)([a-z]{2}|\d{3})(-|$)/i # extract region @region = $2 end # extract variants @variants = s.scan(/(^|-)([a-z0-9]{5,8}|\d[a-z0-9]{3})(?=(-|$))/i). collect { |match| match[1] } end |
#grandfathered? ⇒ Boolean
returns true if language tag is grandfathered, false otherwise
73 74 75 |
# File 'lib/langtag.rb', line 73 def grandfathered? () Grandfathered.include? self.to_str.downcase end |
#irregular? ⇒ Boolean
returns true if language tag is irregular, false otherwise
78 79 80 |
# File 'lib/langtag.rb', line 78 def irregular? () Irregular.include? self.to_str.downcase end |
#nicecase ⇒ Object
non-descructive variant of nicecase!: returns a nicecased copy
95 96 97 |
# File 'lib/langtag.rb', line 95 def nicecase () result = Langtag.new(self).nicecase! end |
#nicecase! ⇒ Object
changes case to look ‘nice’ (regions are UPPER-CASE, scripts are Title-Case, everything else is lower case
84 85 86 87 88 89 90 91 92 |
# File 'lib/langtag.rb', line 84 def nicecase! () @language.downcase! @script.capitalize! @region.upcase! @variants.each { |v| v.downcase! } @extensions.each { |e| e.downcase! } @private.downcase! compose end |
#wellformed? ⇒ Boolean
returns true if language tag is well-formed, false otherwise
67 68 69 70 |
# File 'lib/langtag.rb', line 67 def wellformed? () decompose @wellformed end |