Carrot2 Ruby

Ruby client for Carrot2 - the open-source document clustering server

Build Status

Installation

First, download and run the Carrot2 server. With Homebrew, use:

brew install carrot2
brew services start carrot2

Then add this line to your application’s Gemfile:

gem "carrot2"

The latest version works with Carrot2 4. For Carrot2 3, use version 0.2.1 and this readme.

How to Use

To cluster documents, use:

documents = [
  "Sign up for an exclusive coupon.",
  "Exclusive members get a free coupon.",
  "Coupons are going fast.",
  "This is completely unrelated to the other documents."
]

carrot2 = Carrot2::Client.new
carrot2.cluster(documents)

This returns:

{
  "clusters" => [
    {
      "labels" => ["Coupon"],
      "documents" => [0, 1, 2],
      "clusters" => [],
      "score" => 0.06418006702675011
    },
    {
      "labels" => ["Exclusive"],
      "documents" => [0, 1],
      "clusters" => [],
      "score" => 0.7040290701763807
    }
  ]
}

Documents are numbered in the order provided, starting with 0.

Specify a language with:

carrot2.cluster(documents, language: "French")

Specify an algorithm with:

carrot2.cluster(documents, algorithm: "Lingo")

Get a list of supported languages and algorithms with:

carrot2.list

Specify parameters with:

parameters = {
  preprocessing: {
    phraseDfThreshold: 1,
    wordDfThreshold: 1
  }
}
carrot2.cluster(documents, parameters: parameters)

See supported parameters for Lingo, STC, and Bisecting K-Means.

Specify a template with:

carrot2.cluster(documents, template: "lingo")

Configuration

To specify the Carrot2 server, set ENV["CARROT2_URL"] or use:

Carrot2::Client.new(url: "http://localhost:8080")

Set timeouts

Carrot2::Client.new(open_timeout: 3, read_timeout: 5)

Resources

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/carrot2-ruby.git
cd carrot2-ruby
bundle install
bundle exec rake test