Class: ISO_639

Inherits:
Array
  • Object
show all
Defined in:
lib/iso-639.rb

Constant Summary collapse

ISO_639_2 =

Load the ISO 639-2 dataset as an array of entries. Each entry is an array with the following format:

  • [0]: an alpha-3 (bibliographic) code

  • [1]: an alpha-3 (terminologic) code (when given)

  • [2]: an alpha-2 code (when given)

  • [3]: an English name

  • [4]: a French name of a language

Dataset Source: www.loc.gov/standards/iso639-2/ascii_8bits.html www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt

lambda do
  dataset = []
  File.open(
    File.join(File.dirname(__FILE__), 'data', 'ISO-639-2_utf-8.txt'),
    'r:bom|utf-8'
  ) do |file|
    CSV.new(file, **{ col_sep: '|' }).each do |row|
      dataset << self[*row.map { |v| v || '' }].freeze
    end
  end
  return dataset
end.call.freeze
INVERTED_INDEX =

An inverted index generated from the ISO_639_2 data. Used for searching all words and codes in all fields.

lambda do
  index = {}
  ISO_639_2.each_with_index do |record, i|
    record.each do |field|
      downcased = field.downcase
      words = (
        downcased.split(/[[:blank:]]|\(|\)|,|;/) +
        downcased.split(/;/)
      )
      words.each do |word|
        unless word.empty?
          index[word] ||= []
          index[word] << i
        end
      end
    end
  end
  return index
end.call.freeze
ISO_639_1 =

The ISO 639-1 dataset as an array of entries. Each entry is an array with the following format:

  • [0]: an ISO 369-2 alpha-3 (bibliographic) code

  • [1]: an ISO 369-2 alpha-3 (terminologic) code (when given)

  • [2]: an ISO 369-1 alpha-2 code (when given)

  • [3]: an English name

  • [4]: a French name

ISO_639_2.collect do |entry|
  entry unless entry[2].empty?
end.compact.freeze

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.find_by_code(code) ⇒ Object Also known as: find

Returns the entry array for an alpha-2 or alpha-3 code



65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/iso-639.rb', line 65

def find_by_code(code)
  return if code.nil?

  case code.length
  when 3
    ISO_639_2.detect do |entry|
      entry if [entry.alpha3, entry.alpha3_terminologic].include?(code)
    end
  when 2
    ISO_639_1.detect do |entry|
      entry if entry.alpha2 == code
    end
  end
end

.find_by_english_name(name) ⇒ Object

Returns the entry array for a language specified by its English name.



82
83
84
85
86
# File 'lib/iso-639.rb', line 82

def find_by_english_name(name)
  ISO_639_2.detect do |entry|
    entry if entry.english_name == name
  end
end

.find_by_french_name(name) ⇒ Object

Returns the entry array for a language specified by its French name.



89
90
91
92
93
# File 'lib/iso-639.rb', line 89

def find_by_french_name(name)
  ISO_639_2.detect do |entry|
    entry if entry.french_name == name
  end
end

.search(term) ⇒ Object

Returns an array of matches for the search term. The term can be a code of any kind, or it can be one of the words contained in the English or French name field.



98
99
100
101
102
103
# File 'lib/iso-639.rb', line 98

def search(term)
  term ||= ''
  normalized_term = term.downcase.strip
  indexes         = INVERTED_INDEX[normalized_term]
  indexes ? ISO_639_2.values_at(*indexes).uniq : []
end

Instance Method Details

#alpha2Object

The entry’s alpha-2 code (when given)



118
119
120
# File 'lib/iso-639.rb', line 118

def alpha2
  self[2]
end

#alpha3_bibliographicObject Also known as: alpha3

The entry’s alpha-3 bibliotigraphic code.



107
108
109
# File 'lib/iso-639.rb', line 107

def alpha3_bibliographic
  self[0]
end

#alpha3_terminologicObject

The entry’s alpha-3 terminologic (when given)



113
114
115
# File 'lib/iso-639.rb', line 113

def alpha3_terminologic
  self[1]
end

#english_nameObject

The entry’s english name.



123
124
125
# File 'lib/iso-639.rb', line 123

def english_name
  self[3]
end

#french_nameObject

The entry’s french name.



128
129
130
# File 'lib/iso-639.rb', line 128

def french_name
  self[4]
end