TranslitKit

Build Status Code Climate Coverage Status Inline docs Gem Version license

TranslitKit is a framework for Hebrew-English transliteration.

Installation

gem install translit_kit
# in your Gemfile
gem 'translit_kit'

Requires Ruby 2.2 or later

Usage

Basic transliteration

  require 'translit_kit'
  word = HebrewWord.new "אַברָהָם"
  word.transliterate(:single)
  # => ["avrohom"]

  # Shortcut
  word.t(:single)
  # => ["avrohom"]

Transliteration is powered by phoneme maps, files that map between Hebrew phonemes, or units of sound, and English characters. (see below)

Three phoneme_maps are provided: :long, :short, and :single. You can easily add your own (see below)

word.t(:single)
# => ["avrohom"]
word.t(:short)
# => ["avroom", "avroam", "avroem", "avrohom", "avroham",
# "avrohem", "avraom", "avraam", "avraem", "avrahom",
# "avraham", "avrahem", "avreom", "avream", "avreem",
# "avrehom", "avreham", "avrehem" ]
word.t(:long)
# => ["avroom", "avrooom", "avroohm", ... ] # 5,997 more!

The default is :short:

  word.t == word.t(:short)
  # => true

To get the total permutation count, call HebrewWord#inspect

word.inspect
# => "אַברָהָם: Permutations: 1 single | 18 short | 6000 long"

Adding Custom Phoneme maps

Format

Phoneme Maps are simply JSON files, placed in the lib/phoneme_maps directory.

The file should map between each String (the phonemes) and an Arrays of replacement characters.

{
  "ב": ["v"],
  "בּ": ["b", "bb"]
}

A phoneme can be a Hebrew character א, nekuda (ָ), or character with modifiers, such as a dagesh (בּ). Keep in mind that many characters will be normalized (see below).

Installation

To install your custom map, place the file in lib/resources

Your file will be available as the symbol:<filename> without the .json extension.

Example: klingon.json becomes :klingon

Now you can use it anywhere:

  word.transliterate(:klingon)
  # => (Results)

At present, your map will not display results in HebrewWord#inspect

Contributing

TranslitKit is currently maintained by @AnalyzePlatypus. Contributions welcome!

Appendix: Pre-Processing

When a word is transliterated, it is pre-processed to normalize certain characters. Specifically:

  • Whitespace is stripped
  • The final letters [םןךףץ] are normalized to their standard forms
  • CHATAF nekudos ['ֲ','ֳ','ֱ'] are normalized to their standard forms
  • Full CHIRIK, TZEIREI, and CHOLOM nekudos have their letters removed
  • DAGESH characters are removed from all but the characters [בוכפת]