Bling Fire

Bling Fire - high speed text tokenization - for Ruby

Build Status Build status

Installation

Add this line to your application’s Gemfile:

gem 'blingfire'

Getting Started

Create a model

model = BlingFire::Model.new

Tokenize words

model.text_to_words(text)

Tokenize sentences

model.text_to_sentences(text)

Pre-trained Models

BlingFire comes with a default model that follows the tokenization logic of NLTK with a few changes. You can also download other models:

Load a model

model = BlingFire.load_model("bert_base_tok.bin")

Convert text to ids

model.text_to_ids(text)

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/blingfire.git
cd blingfire
bundle install
bundle exec rake vendor:all
bundle exec rake test