Class: Langtag

Inherits:

String

Object
String
Langtag

Includes:: Composite

Defined in:: lib/langtag.rb

Overview

Langtag class, implementing BCP 47 (currently RFC 4646) IETF language tags. Provides decomposition of language tags into components, and wellformedness check.

Accessor methods

Getting: language, script, region, variants, extensions, private. Setting: language=, script=, region=, variants=, extensions=, private=. Variants and extensions accessors get/set Arrays, the other accessors get/set Strings. Because of the way Ruby assignement methods are implemented, manipulating variants and extensions with e.g.

myLangtag.variants += ['e-Extension']

(adding ‘e-Extension’ as an extension to whatever extensions myLangtag

already has) is possible. Similarly,
 myLangtag.variants -= ['e-Extension']

will again remove the extension.

Constant Summary collapse

Irregular = Array of irregular language tags

['en-gb-oed',
'i-ami', 'i-bnn', 'i-default', 'i-enochian', 'i-hak',
'i-klingon', 'i-lux', 'i-mingo', 'i-navajo', 'i-pwn',
'i-tao', 'i-tay', 'i-tsu', 'sgn-be-fr',
'sgn-be-nl', 'sgn-ch-de']

Grandfathered = Array of grandfathered language tags

Irregular + ['art-lojban', 'cel-gaulish',
'no-bok', 'no-nyn', 'zh-cmn', 'zh-cmn-hans', 'zh-cmn-hant',
'zh-gan', 'zh-guoyu', 'zh-hakka', 'zh-min', 'zh-min-nan',
'zh-wuu', 'zh-xiang', 'zh-yue']

Instance Method Summary collapse

#compose ⇒ Object

compose the langtag from parts, joining with ‘-’ flatten first to deal with @variants/@extentsions with are arrays then compact to remove nil values (mainly internal use).
#decompose ⇒ Object

decompose a language tag into parts (mainly internal use).
#grandfathered? ⇒ Boolean

returns true if language tag is grandfathered, false otherwise.
#initialize(s) ⇒ Langtag constructor

A new instance of Langtag.
#irregular? ⇒ Boolean

returns true if language tag is irregular, false otherwise.
#nicecase ⇒ Object

non-descructive variant of nicecase!: returns a nicecased copy.
#nicecase! ⇒ Object

changes case to look ‘nice’ (regions are UPPER-CASE, scripts are Title-Case, everything else is lower case.
#wellformed? ⇒ Boolean

returns true if language tag is well-formed, false otherwise.

Constructor Details

#initialize(s) ⇒ `Langtag`

Returns a new instance of Langtag.

# File 'lib/langtag.rb', line 44

def initialize (s)
  super(s)
  decompose
end

Instance Method Details

#compose ⇒ `Object`

compose the langtag from parts, joining with ‘-’ flatten first to deal with @variants/@extentsions with are arrays then compact to remove nil values (mainly internal use)

# File 'lib/langtag.rb', line 102

def compose
  replace([@language, @script, @region, @variants,
           @extensions, @private].flatten.compact.join('-'))
end

#decompose ⇒ `Object`

decompose a language tag into parts (mainly internal use)

# File 'lib/langtag.rb', line 108

def decompose
  # check if we really need to decompose again
  if @saved == self.to_str
    return
  end
  # initialize everything
  s = @saved = self.to_str # save for check next time around
  @wellformed = true   # assume well-formed
  @language = @script = @region = @private = nil
  @variants = @extensions = []

  # deal with irregular and completely private langtags
  if irregular? || s =~ /^x-/i
    @language = s
    return
  end
  # check well-formedness with a single regular expression,
  # except for irregulars (checked above) and multiple
  # occurrences of the same extension (checked below)
  # notice /i modifier for case insensitive matching
  if not(s =~ /^([a-z]{2,3}                       # shortest ISO 639 language
                  (-[a-z]{3}){0,3}                  # with optional extensions
                 |[a-z]{4,8})                       # or reserved\registered
                (-[a-z]{4})?                      # optional script
                (-([a-z]{2}|\d{3}))?              # optional region
                (-([a-z0-9]{5,8}|\d[a-z0-9]{3}))* # optional variants
                (-[a-wyz0-9](-[a-z0-9]{2,8})+)*   # optional extensions
                (-x(-[a-z0-9]{1,8})+)?            # optional private use part
                $/ix)
    @wellformed = false
  end
  # extract language
  if s =~ /^(([a-z]{2,3}(-[a-z]{3}){0,3}|[a-z]{4,8}))(-|$)/i
    @language, s = $1, $'
  end
  # extract private use tail
  if s =~ /(^|-)(x-.*)$/i
    s, @private = $`, $2
  end
  # extract extensions and check for duplicates
  @extensions = Array.collect do
    if s =~ /(^|-)([a-wyz0-9](-[a-z0-9]{2,8})+)$/i
      s = $`
      $2
    else
      nil
    end
  end
  @extensions.reverse! # put back in order
  if !((@extensions.collect {|ext| ext[0..1].downcase}).uniq?)
    @wellformed = false
  end
  if s =~ /(^|-)([a-z]{4})(-|$)/i    # extract script
    @script = $2
  end
  if s =~ /(^|-)([a-z]{2}|\d{3})(-|$)/i    # extract region
    @region = $2
  end
  # extract variants
  @variants = s.scan(/(^|-)([a-z0-9]{5,8}|\d[a-z0-9]{3})(?=(-|$))/i).
                collect { |match| match[1] }
end

#grandfathered? ⇒ `Boolean`

returns true if language tag is grandfathered, false otherwise

Returns:

(Boolean)



73
74
75

# File 'lib/langtag.rb', line 73

def grandfathered? ()
  Grandfathered.include? self.to_str.downcase
end

#irregular? ⇒ `Boolean`

returns true if language tag is irregular, false otherwise

Returns:

(Boolean)



78
79
80

# File 'lib/langtag.rb', line 78

def irregular? ()
  Irregular.include? self.to_str.downcase
end

#nicecase ⇒ `Object`

non-descructive variant of nicecase!: returns a nicecased copy



95
96
97

# File 'lib/langtag.rb', line 95

def nicecase ()
  result = Langtag.new(self).nicecase!
end

#nicecase! ⇒ `Object`

changes case to look ‘nice’ (regions are UPPER-CASE, scripts are Title-Case, everything else is lower case

# File 'lib/langtag.rb', line 84

def nicecase! ()
  @language.downcase!
  @script.capitalize!
  @region.upcase!
  @variants.each { |v| v.downcase! }
  @extensions.each { |e| e.downcase! }
  @private.downcase!
  compose
end

#wellformed? ⇒ `Boolean`

returns true if language tag is well-formed, false otherwise

Returns:

(Boolean)

# File 'lib/langtag.rb', line 67

def wellformed? ()
  decompose
  @wellformed
end

Class: Langtag

Overview

Accessor methods

Constant Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(s) ⇒ Langtag

Instance Method Details

#compose ⇒ Object

#decompose ⇒ Object

#grandfathered? ⇒ Boolean

#irregular? ⇒ Boolean

#nicecase ⇒ Object

#nicecase! ⇒ Object

#wellformed? ⇒ Boolean

#initialize(s) ⇒ `Langtag`

#compose ⇒ `Object`

#decompose ⇒ `Object`

#grandfathered? ⇒ `Boolean`

#irregular? ⇒ `Boolean`

#nicecase ⇒ `Object`

#nicecase! ⇒ `Object`

#wellformed? ⇒ `Boolean`