Class: Languages

Inherits:
Object
  • Object
show all
Defined in:
lib/langa/languages.rb

Overview

The class Languages handles attributes of different languages, particularly the dna fingerprint used for language recognition.

The attributes for each language are stored in a yaml files of the form:

<three letter iso 639-3 language code>:
    name:   <name of the language>
    iso1:   <two letter iso 639-1 language code (optional)>
    bibl:   <three letter iso 639-2 bibliographic code (optional)>
    source: <source file used for fingerprint creation>
    size:   <number of relevant characters for fingerprint creation>
    utf8:   <utf-8 representation of fingerprint>
    fingerprint: <dna fingerprint of language>

i.e. this is shown for the german language

deu:
  name:   German
  iso1:   de
  bibl:   ger
  source: corpora/ger.german.utf-8.txt
  size:   92273185
  utf8:   enirtsadhlugcmobfkwzpvüäjöyxq
  fingerprint: 101-16251+110-9918+105-7865+114-7637+116-6348...

For ISO 639-x codes see www.sil.org/ISO639-3/codes.asp

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(config_file) ⇒ Languages

Create a Language object to query language attributes

la = Languages.new('language.dna')


86
87
88
89
# File 'lib/langa/languages.rb', line 86

def initialize(config_file)
  @languages = load_language_configuration(config_file)
  self
end

Class Method Details

.to_paste(key, config, indent = 4) ⇒ Object

Create the YAML representation of a language configuration for manually pasting this to the language configuration file. Languages.to_paste(‘deu’, …) -> “deu:n name: Germann …”



66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# File 'lib/langa/languages.rb', line 66

def Languages.to_paste(key, config, indent=4)
  ind = ' ' * indent
  cnf = config.dup
  str = "#{key}:\n"
  ['name','iso1','bibl','source','size','utf8','fingerprint'].each do |key|
    if cnf.has_key?(key)
      str << "%s%s:%s%s\n" % [ind, key, 
        ind[(key.size+1).modulo(indent)..indent-1], cnf[key]]
      cnf.delete(key)
    end
  end
  cnf.each do |key, value|
    str << "%s%s:%s%s\n" % [ind, key,
      ind[(key.size+1).modulo(indent)..indent-1], value]
  end
  str
end

Instance Method Details

#config(key) ⇒ Object

Get the complete configuration for a specific language. You can use any ISO 639 shortcut as a key (i.e. for german you can use ‘deu’, ‘de’ and ‘ger’)

la.config('deu') -> {"name"=>"German", "iso1"=>"de", "bibl"=>"ger", ...}


118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# File 'lib/langa/languages.rb', line 118

def config(key)
  # => tranlate key, unless present
  unless @languages.has_key?(key)
    map = values_for('iso1')
    unless map.has_key?(key)
      map = values_for('bibl')
      unless map.has_key?(key)
        map = values_for('name', true)
        key = nil unless map.has_key?(key)
      end
    end
    key = map[key] unless key.nil?
  end

  key.nil? ? nil : @languages[key]
end

#keysObject

Get the keys of all known languages. The keys are named according to ISO 639-3

la.keys -> ['deu', 'eng', ...]


94
95
96
# File 'lib/langa/languages.rb', line 94

def keys
  @languages.keys.sort
end

#values_for(name, lcase = false) ⇒ Object

Get the values of all languages for given attribute name. With lcase you can force the results to be lowercased.

la.values_for('name') -> {'German'=>'deu', 'English'=>'eng', ...}
la.values_for('name', true) -> {'german'=>'deu', 'english'=>'eng', ...}


102
103
104
105
106
107
108
109
110
111
112
# File 'lib/langa/languages.rb', line 102

def values_for(name, lcase=false)
  return nil if name.nil?
  
  result = Hash.new
  @languages.each do |key, val|
    unless val[name].nil?
      result[lcase ? val[name].downcase : val[name]] = key
    end
  end
  result
end