Build Status Code Climate

A JRuby wrapper for the Apache OpenNLP tools library, that allows you execute common natural language processing tasks, such as * sentence detection * tokenize * part-of-speech tagging * named entity extraction * chunks detection * parsing * document categorization


Add this line to your application's Gemfile:

gem 'open_nlp'

And then execute:

$ bundle

Or install it yourself as:

$ gem install open_nlp


To use open_nlp classes, you need to require it in your sources

require 'open_nlp'

Then you can create instances of open_nlp classes and use it for your nlp tasks

Sentence detection

sentence_detect_model ="nlp_models/en-sent.bin")
sentence_detector =

# get sentences as array of strings
sentence_detector.detect('The red fox sleeps soundly.')

# get array of OpenNLP::Util::Span objects:
sentence_detector.pos_detect('"The sky is blue. The Grass is green."')


token_model ="nlp_models/en-token.bin")
tokenizer =
tokenizer.tokenize('The red fox sleeps soundly.')

Part-of-speech tagging

pos_model ="nlp_models/en-pos-maxent.bin"))
pos_tagger =

# to tag string call OpenNlp::POSTagger#tag with String argument
pos_tagger.tag('The red fox sleeps soundly.')

# to tag array of tokens call OpenNlp::POSTagger#tag with Array argument
pos_tagger.tag(%w|The red fox sleeps soundly .|)

Chunks detection

# chunker also needs tokenizer and pos-tagger models
# because it uses tokenizing and pos-tagging inside chunk task
chunk_model ="nlp_models/en-chunker.bin"))
token_model ="nlp_models/en-token.bin")
pos_model ="nlp_models/en-pos-maxent.bin"))
chunker =, token_model, pos_model)
chunker.chunk('The red fox sleeps soundly.')


# parser also needs tokenizer model because it uses tokenizer inside parse task
parse_model ="nlp_models/en-parser-chunking.bin"))
token_model ="nlp_models/en-token.bin")
parser =, token_model)

# the result will be an instance of OpenNlp::Parser::Parse
parse_info = parser.parse('The red fox sleeps soundly.')

# you can get tree bank string by calling

# you can get code tree structure of parse result by calling


doccat_model ="nlp_models/en-doccat.bin"))
categorizer =
categorizer.categorize("Quick brown fox jumps very bad.")


  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request