Build status

Purpose

A seq2seq transformer suited for transliteration. Written in Ruby.

Secryst was originally built for the Interscript project (at GitHub).

The goal is to allow:

  • Developers to train models and provide the trained model to users. In order to to train models, raw computing and their bindings can be used, e.g. OpenCL.

  • Users of the library in Ruby who only want to "use" the trained models should not require special bindings to run.

Status

Currently Secryst works with the Khmer Romanization system as cited below.

Prerequisites

  • Ruby 2.7 (MUST - 2.6 does not work with the latest torch-rb)

  • libtorch (1.6.0)

  • fftw

  • gsl

  • lapack

  • openblas

On Ubuntu:

$ sudo apt-get -y install libfftw3-dev libgsl-dev libopenblas-dev \
    liblapack-dev liblapacke-dev unzip automake make gcc g++ \
    libtorch libtorch-dev
$ wget https://download.pytorch.org/libtorch/cu102/libtorch-cxx11-abi-shared-with-deps-1.6.0.zip
$ unzip libtorch-cxx11-abi-shared-with-deps-1.6.0.zip

$ gem install bundler -v "~> 2"
$ bundle config build.torch-rb \
    --with-torch-dir=$(pwd)/libtorch

$ bundle install

On macOS:

$ brew install libtorch gsl lapack openblas fftw automake gcc

$ gem install bundler -v "~> 2"
$ bundle config build.numo-linalg \
    --with-openblas-dir=/usr/local/opt/openblas \
    --with-lapack-lib=/usr/local/opt/lapack

$ bundle install
Note
(for macOS) If you mistakenly installed numo-linalg without the above configuration options, please uninstall it with these steps and configure the bundle as described above:
$ bundle exec gem uninstall numo-linalg

References

Secryst is built on the transformer model with architecture based on:

  • Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. 2017. In: Advances in Neural Information Processing Systems, pages 6000-6010.

The sample transliteration system implemented is the Khmer system:

Origin of name

Scrying is the practice of peering into a crystal sphere for fortune telling. The purpose of seq2seq is nearly like scrying: looking into a crystal sphere for some machine-learning magic to happen.

“Secryst” comes from the combination of “seq2seq” + “crystal” + “scrying”.