Bling Fire

Bling Fire - high speed text tokenization - for Ruby

Build Status

Installation

Add this line to your application’s Gemfile:

gem 'blingfire'

Getting Started

Create a model

model = BlingFire::Model.new

Tokenize words

model.text_to_words(text)

Tokenize sentences

model.text_to_sentences(text)

Get offsets for words

words, start_offsets, end_offsets = model.text_to_words_with_offsets(text)

Get offsets for sentences

sentences, start_offsets, end_offsets = model.text_to_sentences_with_offsets(text)

Pre-trained Models

Bling Fire comes with a default model that follows the tokenization logic of NLTK with a few changes. You can also download other models:

Load a model

model = BlingFire.load_model("bert_base_tok.bin")

Convert text to ids

model.text_to_ids(text)

Get offsets for ids

ids, start_offsets, end_offsets = model.text_to_ids_with_offsets(text)

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/blingfire.git
cd blingfire
bundle install
bundle exec rake vendor:all download:models
bundle exec rake test