Transformers.rb
:slightly_smiling_face: State-of-the-art transformers for Ruby
Installation
First, install Torch.rb.
Then add this line to your application’s Gemfile:
gem "transformers-rb"
Getting Started
Models
sentence-transformers/all-MiniLM-L6-v2
sentences = ["This is an example sentence", "Each sentence is converted"]
model = Transformers::SentenceTransformer.new("sentence-transformers/all-MiniLM-L6-v2")
= model.encode(sentences)
sentence-transformers/multi-qa-MiniLM-L6-cos-v1
query = "How many people live in London?"
docs = ["Around 9 Million people live in London", "London is known for its financial district"]
model = Transformers::SentenceTransformer.new("sentence-transformers/multi-qa-MiniLM-L6-cos-v1")
query_emb = model.encode(query)
doc_emb = model.encode(docs)
scores = Torch.mm(Torch.tensor([query_emb]), Torch.tensor(doc_emb).transpose(0, 1))[0].cpu.to_a
doc_score_pairs = docs.zip(scores).sort_by { |d, s| -s }
mixedbread-ai/mxbai-embed-large-v1
def transform_query(query)
"Represent this sentence for searching relevant passages: #{query}"
end
docs = [
transform_query("puppy"),
"The dog is barking",
"The cat is purring"
]
model = Transformers::SentenceTransformer.new("mixedbread-ai/mxbai-embed-large-v1")
= model.encode(docs)
opensearch-project/opensearch-neural-sparse-encoding-v1
docs = ["The dog is barking", "The cat is purring", "The bear is growling"]
model_id = "opensearch-project/opensearch-neural-sparse-encoding-v1"
model = Transformers::AutoModelForMaskedLM.from_pretrained(model_id)
tokenizer = Transformers::AutoTokenizer.from_pretrained(model_id)
special_token_ids = tokenizer.special_tokens_map.map { |_, token| tokenizer.vocab[token] }
feature = tokenizer.(docs, padding: true, truncation: true, return_tensors: "pt", return_token_type_ids: false)
output = model.(**feature)[0]
values, _ = Torch.max(output * feature[:attention_mask].unsqueeze(-1), dim: 1)
values = Torch.log(1 + Torch.relu(values))
values[0.., special_token_ids] = 0
= values.to_a
Pipelines
Named-entity recognition
ner = Transformers.pipeline("ner")
ner.("Ruby is a programming language created by Matz")
Sentiment analysis
classifier = Transformers.pipeline("sentiment-analysis")
classifier.("We are very happy to show you the 🤗 Transformers library.")
Question answering
qa = Transformers.pipeline("question-answering")
qa.(question: "Who invented Ruby?", context: "Ruby is a programming language created by Matz")
Feature extraction
extractor = Transformers.pipeline("feature-extraction")
extractor.("We are very happy to show you the 🤗 Transformers library.")
Image classification
classifier = Transformers.pipeline("image-classification")
classifier.(URI("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"))
Image feature extraction
extractor = Transformers.pipeline("image-feature-extraction")
extractor.(URI("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"))
API
This library follows the Transformers Python API. Only a few model architectures are currently supported:
- BERT
- DistilBERT
- ViT
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/transformers-ruby.git
cd transformers-ruby
bundle install
bundle exec rake download:files
bundle exec rake test