Class: Langtag

Inherits:
String
  • Object
show all
Includes:
Composite
Defined in:
lib/langtag.rb

Overview

Langtag class, implementing BCP 47 (currently RFC 4646) IETF language tags. Provides decomposition of language tags into components, and wellformedness check.

Accessor methods

Getting: language, script, region, variants, extensions, private. Setting: language=, script=, region=, variants=, extensions=, private=. Variants and extensions accessors get/set Arrays, the other accessors get/set Strings. Because of the way Ruby assignement methods are implemented, manipulating variants and extensions with e.g.

myLangtag.variants += ['e-Extension']

(adding ‘e-Extension’ as an extension to whatever extensions myLangtag

already has) is possible. Similarly,
 myLangtag.variants -= ['e-Extension']

will again remove the extension.

Constant Summary collapse

Irregular =

Array of irregular language tags

['en-gb-oed',
'i-ami', 'i-bnn', 'i-default', 'i-enochian', 'i-hak',
'i-klingon', 'i-lux', 'i-mingo', 'i-navajo', 'i-pwn',
'i-tao', 'i-tay', 'i-tsu', 'sgn-be-fr',
'sgn-be-nl', 'sgn-ch-de']
Grandfathered =

Array of grandfathered language tags

Irregular + ['art-lojban', 'cel-gaulish',
'no-bok', 'no-nyn', 'zh-cmn', 'zh-cmn-hans', 'zh-cmn-hant',
'zh-gan', 'zh-guoyu', 'zh-hakka', 'zh-min', 'zh-min-nan',
'zh-wuu', 'zh-xiang', 'zh-yue']

Instance Method Summary collapse

Constructor Details

#initialize(s) ⇒ Langtag

Returns a new instance of Langtag.



44
45
46
47
# File 'lib/langtag.rb', line 44

def initialize (s)
  super(s)
  decompose
end

Instance Method Details

#composeObject

compose the langtag from parts, joining with ‘-’ flatten first to deal with @variants/@extentsions with are arrays then compact to remove nil values (mainly internal use)



102
103
104
105
# File 'lib/langtag.rb', line 102

def compose
  replace([@language, @script, @region, @variants,
           @extensions, @private].flatten.compact.join('-'))
end

#decomposeObject

decompose a language tag into parts (mainly internal use)



108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
# File 'lib/langtag.rb', line 108

def decompose
  # check if we really need to decompose again
  if @saved == self.to_str
    return
  end
  # initialize everything
  s = @saved = self.to_str # save for check next time around
  @wellformed = true   # assume well-formed
  @language = @script = @region = @private = nil
  @variants = @extensions = []

  # deal with irregular and completely private langtags
  if irregular? || s =~ /^x-/i
    @language = s
    return
  end
  # check well-formedness with a single regular expression,
  # except for irregulars (checked above) and multiple
  # occurrences of the same extension (checked below)
  # notice /i modifier for case insensitive matching
  if not(s =~ /^([a-z]{2,3}                       # shortest ISO 639 language
                  (-[a-z]{3}){0,3}                  # with optional extensions
                 |[a-z]{4,8})                       # or reserved\registered
                (-[a-z]{4})?                      # optional script
                (-([a-z]{2}|\d{3}))?              # optional region
                (-([a-z0-9]{5,8}|\d[a-z0-9]{3}))* # optional variants
                (-[a-wyz0-9](-[a-z0-9]{2,8})+)*   # optional extensions
                (-x(-[a-z0-9]{1,8})+)?            # optional private use part
                $/ix)
    @wellformed = false
  end
  # extract language
  if s =~ /^(([a-z]{2,3}(-[a-z]{3}){0,3}|[a-z]{4,8}))(-|$)/i
    @language, s = $1, $'
  end
  # extract private use tail
  if s =~ /(^|-)(x-.*)$/i
    s, @private = $`, $2
  end
  # extract extensions and check for duplicates
  @extensions = Array.collect do
    if s =~ /(^|-)([a-wyz0-9](-[a-z0-9]{2,8})+)$/i
      s = $`
      $2
    else
      nil
    end
  end
  @extensions.reverse! # put back in order
  if !((@extensions.collect {|ext| ext[0..1].downcase}).uniq?)
    @wellformed = false
  end
  if s =~ /(^|-)([a-z]{4})(-|$)/i    # extract script
    @script = $2
  end
  if s =~ /(^|-)([a-z]{2}|\d{3})(-|$)/i    # extract region
    @region = $2
  end
  # extract variants
  @variants = s.scan(/(^|-)([a-z0-9]{5,8}|\d[a-z0-9]{3})(?=(-|$))/i).
                collect { |match| match[1] }
end

#grandfathered?Boolean

returns true if language tag is grandfathered, false otherwise

Returns:

  • (Boolean)


73
74
75
# File 'lib/langtag.rb', line 73

def grandfathered? ()
  Grandfathered.include? self.to_str.downcase
end

#irregular?Boolean

returns true if language tag is irregular, false otherwise

Returns:

  • (Boolean)


78
79
80
# File 'lib/langtag.rb', line 78

def irregular? ()
  Irregular.include? self.to_str.downcase
end

#nicecaseObject

non-descructive variant of nicecase!: returns a nicecased copy



95
96
97
# File 'lib/langtag.rb', line 95

def nicecase ()
  result = Langtag.new(self).nicecase!
end

#nicecase!Object

changes case to look ‘nice’ (regions are UPPER-CASE, scripts are Title-Case, everything else is lower case



84
85
86
87
88
89
90
91
92
# File 'lib/langtag.rb', line 84

def nicecase! ()
  @language.downcase!
  @script.capitalize!
  @region.upcase!
  @variants.each { |v| v.downcase! }
  @extensions.each { |e| e.downcase! }
  @private.downcase!
  compose
end

#wellformed?Boolean

returns true if language tag is well-formed, false otherwise

Returns:

  • (Boolean)


67
68
69
70
# File 'lib/langtag.rb', line 67

def wellformed? ()
  decompose
  @wellformed
end