TorchText

:fire: Data loaders and abstractions for text and NLP - for Ruby

Build Status

Installation

Add this line to your application’s Gemfile:

gem 'torchtext'

Getting Started

This library follows the Python API. Many methods and options are missing at the moment. PRs welcome!

Examples

Text classification

Datasets

Load a dataset

train_dataset, test_dataset = TorchText::Datasets::AG_NEWS.load(root: ".data", ngrams: 2)

Supported datasets are:

Data Utils

Supports:

  • tokenizer
  • ngrams_iterator

Data Metrics

Compute the BLEU score

candidate_corpus = [["My", "full", "pytorch", "test"], ["Another", "Sentence"]]
references_corpus = [[["My", "full", "pytorch", "test"], ["Completely", "Different"]], [["No", "Match"]]]
TorchText::Data::Metrics.bleu_score(candidate_corpus, references_corpus)

NN

Supports:

  • InProjContainer
  • MultiheadAttentionContainer
  • ScaledDotProduct

Vocab

Supports:

  • Vocab

Disclaimer

This library downloads and prepares public datasets. We don’t host any datasets. Be sure to adhere to the license for each dataset.

If you’re a dataset owner and wish to update any details or remove it from this project, let us know.

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/torchtext.git
cd torchtext
bundle install
bundle exec rake test