Bling Fire
Bling Fire - high speed text tokenization - for Ruby
Installation
Add this line to your application’s Gemfile:
gem 'blingfire'
Getting Started
Create a model
model = BlingFire::Model.new
Tokenize words
model.text_to_words(text)
Tokenize sentences
model.text_to_sentences(text)
Pre-trained Models
BlingFire comes with a default model that follows the tokenization logic of NLTK with a few changes. You can also download other models:
- BERT Base
- BERT Base Cased
- BERT Chinese
- BERT Multilingual Cased
- Laser 100k
- Laser 250k
- Laser 500k
- XLM Roberta
- XLNet
- XLNet No Norm
- WBD
Load a model
model = BlingFire.load_model("bert_base_tok.bin")
Convert text to ids
model.text_to_ids(text)
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/blingfire.git
cd blingfire
bundle install
bundle exec rake vendor:all
bundle exec rake test