Class: ISO_639
- Inherits:
-
Array
- Object
- Array
- ISO_639
- Defined in:
- lib/iso-639.rb
Constant Summary collapse
- ISO_639_2 =
Load the ISO 639-2 dataset as an array of entries. Each entry is an array with the following format:
-
[0]: an alpha-3 (bibliographic) code
-
[1]: an alpha-3 (terminologic) code (when given)
-
[2]: an alpha-2 code (when given)
-
[3]: an English name
-
[4]: a French name of a language
Dataset Source: www.loc.gov/standards/iso639-2/ascii_8bits.html www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt
-
lambda do dataset = [] File.open( File.join(File.dirname(__FILE__), 'data', 'ISO-639-2_utf-8.txt'), 'r:bom|utf-8' ) do |file| CSV.new(file, **{ col_sep: '|' }).each do |row| dataset << self[*row.map { |v| v || '' }].freeze end end return dataset end.call.freeze
- INVERTED_INDEX =
An inverted index generated from the ISO_639_2 data. Used for searching all words and codes in all fields.
lambda do index = {} ISO_639_2.each_with_index do |record, i| record.each do |field| downcased = field.downcase words = ( downcased.split(/[[:blank:]]|\(|\)|,|;/) + downcased.split(/;/) ) words.each do |word| unless word.empty? index[word] ||= [] index[word] << i end end end end return index end.call.freeze
- ISO_639_1 =
The ISO 639-1 dataset as an array of entries. Each entry is an array with the following format:
-
[0]: an ISO 369-2 alpha-3 (bibliographic) code
-
[1]: an ISO 369-2 alpha-3 (terminologic) code (when given)
-
[2]: an ISO 369-1 alpha-2 code (when given)
-
[3]: an English name
-
[4]: a French name
-
ISO_639_2.collect do |entry| entry unless entry[2].empty? end.compact.freeze
Class Method Summary collapse
-
.find_by_code(code) ⇒ Object
(also: find)
Returns the entry array for an alpha-2 or alpha-3 code.
-
.find_by_english_name(name) ⇒ Object
Returns the entry array for a language specified by its English name.
-
.find_by_french_name(name) ⇒ Object
Returns the entry array for a language specified by its French name.
-
.search(term) ⇒ Object
Returns an array of matches for the search term.
Instance Method Summary collapse
-
#alpha2 ⇒ Object
The entry’s alpha-2 code (when given).
-
#alpha3_bibliographic ⇒ Object
(also: #alpha3)
The entry’s alpha-3 bibliotigraphic code.
-
#alpha3_terminologic ⇒ Object
The entry’s alpha-3 terminologic (when given).
-
#english_name ⇒ Object
The entry’s english name.
-
#french_name ⇒ Object
The entry’s french name.
Class Method Details
.find_by_code(code) ⇒ Object Also known as: find
Returns the entry array for an alpha-2 or alpha-3 code
65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/iso-639.rb', line 65 def find_by_code(code) return if code.nil? case code.length when 3 ISO_639_2.detect do |entry| entry if [entry.alpha3, entry.alpha3_terminologic].include?(code) end when 2 ISO_639_1.detect do |entry| entry if entry.alpha2 == code end end end |
.find_by_english_name(name) ⇒ Object
Returns the entry array for a language specified by its English name.
82 83 84 85 86 |
# File 'lib/iso-639.rb', line 82 def find_by_english_name(name) ISO_639_2.detect do |entry| entry if entry.english_name == name end end |
.find_by_french_name(name) ⇒ Object
Returns the entry array for a language specified by its French name.
89 90 91 92 93 |
# File 'lib/iso-639.rb', line 89 def find_by_french_name(name) ISO_639_2.detect do |entry| entry if entry.french_name == name end end |
.search(term) ⇒ Object
Returns an array of matches for the search term. The term can be a code of any kind, or it can be one of the words contained in the English or French name field.
98 99 100 101 102 103 |
# File 'lib/iso-639.rb', line 98 def search(term) term ||= '' normalized_term = term.downcase.strip indexes = INVERTED_INDEX[normalized_term] indexes ? ISO_639_2.values_at(*indexes).uniq : [] end |
Instance Method Details
#alpha2 ⇒ Object
The entry’s alpha-2 code (when given)
118 119 120 |
# File 'lib/iso-639.rb', line 118 def alpha2 self[2] end |
#alpha3_bibliographic ⇒ Object Also known as: alpha3
The entry’s alpha-3 bibliotigraphic code.
107 108 109 |
# File 'lib/iso-639.rb', line 107 def alpha3_bibliographic self[0] end |
#alpha3_terminologic ⇒ Object
The entry’s alpha-3 terminologic (when given)
113 114 115 |
# File 'lib/iso-639.rb', line 113 def alpha3_terminologic self[1] end |
#english_name ⇒ Object
The entry’s english name.
123 124 125 |
# File 'lib/iso-639.rb', line 123 def english_name self[3] end |
#french_name ⇒ Object
The entry’s french name.
128 129 130 |
# File 'lib/iso-639.rb', line 128 def french_name self[4] end |