Class: Languages
- Inherits:
-
Object
- Object
- Languages
- Defined in:
- lib/langa/languages.rb
Overview
The class Languages handles attributes of different languages, particularly the dna fingerprint used for language recognition.
The attributes for each language are stored in a yaml files of the form:
<three letter iso 639-3 language code>:
name: <name of the language>
iso1: <two letter iso 639-1 language code (optional)>
bibl: <three letter iso 639-2 bibliographic code (optional)>
source: <source file used for fingerprint creation>
size: <number of relevant characters for fingerprint creation>
utf8: <utf-8 representation of fingerprint>
fingerprint: <dna fingerprint of language>
i.e. this is shown for the german language
deu:
name: German
iso1: de
bibl: ger
source: corpora/ger.german.utf-8.txt
size: 92273185
utf8: enirtsadhlugcmobfkwzpvüäjöyxq
fingerprint: 101-16251+110-9918+105-7865+114-7637+116-6348...
For ISO 639-x codes see www.sil.org/ISO639-3/codes.asp
Class Method Summary collapse
-
.to_paste(key, config, indent = 4) ⇒ Object
Create the YAML representation of a language configuration for manually pasting this to the language configuration file.
Instance Method Summary collapse
-
#config(key) ⇒ Object
Get the complete configuration for a specific language.
-
#initialize(config_file) ⇒ Languages
constructor
Create a Language object to query language attributes la = Languages.new(‘language.dna’).
-
#keys ⇒ Object
Get the keys of all known languages.
-
#values_for(name, lcase = false) ⇒ Object
Get the values of all languages for given attribute
name
.
Constructor Details
Class Method Details
.to_paste(key, config, indent = 4) ⇒ Object
Create the YAML representation of a language configuration for manually pasting this to the language configuration file. Languages.to_paste(‘deu’, …) -> “deu:n name: Germann …”
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
# File 'lib/langa/languages.rb', line 66 def Languages.to_paste(key, config, indent=4) ind = ' ' * indent cnf = config.dup str = "#{key}:\n" ['name','iso1','bibl','source','size','utf8','fingerprint'].each do |key| if cnf.has_key?(key) str << "%s%s:%s%s\n" % [ind, key, ind[(key.size+1).modulo(indent)..indent-1], cnf[key]] cnf.delete(key) end end cnf.each do |key, value| str << "%s%s:%s%s\n" % [ind, key, ind[(key.size+1).modulo(indent)..indent-1], value] end str end |
Instance Method Details
#config(key) ⇒ Object
Get the complete configuration for a specific language. You can use any ISO 639 shortcut as a key (i.e. for german you can use ‘deu’, ‘de’ and ‘ger’)
la.config('deu') -> {"name"=>"German", "iso1"=>"de", "bibl"=>"ger", ...}
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
# File 'lib/langa/languages.rb', line 118 def config(key) # => tranlate key, unless present unless @languages.has_key?(key) map = values_for('iso1') unless map.has_key?(key) map = values_for('bibl') unless map.has_key?(key) map = values_for('name', true) key = nil unless map.has_key?(key) end end key = map[key] unless key.nil? end key.nil? ? nil : @languages[key] end |
#keys ⇒ Object
Get the keys of all known languages. The keys are named according to ISO 639-3
la.keys -> ['deu', 'eng', ...]
94 95 96 |
# File 'lib/langa/languages.rb', line 94 def keys @languages.keys.sort end |
#values_for(name, lcase = false) ⇒ Object
Get the values of all languages for given attribute name
. With lcase
you can force the results to be lowercased.
la.values_for('name') -> {'German'=>'deu', 'English'=>'eng', ...}
la.values_for('name', true) -> {'german'=>'deu', 'english'=>'eng', ...}
102 103 104 105 106 107 108 109 110 111 112 |
# File 'lib/langa/languages.rb', line 102 def values_for(name, lcase=false) return nil if name.nil? result = Hash.new @languages.each do |key, val| unless val[name].nil? result[lcase ? val[name].downcase : val[name]] = key end end result end |