S3Search

S3Search is an add-on for providing powerful document-based full-text indexing and search to your application.

Adding S3Search to your application will allow you search your documents based upon the actual text within the documents, as well as any metadata fields you assign to them. Yes, that's right! S3Search will index the text inside your documents.

Do you already have documents stored and want to index them and make them searchable? No worries. S3Search works by you sending it a URL to fetch the document content to index, along with a hash of metadata attributes to record against it. You can them perform powerful queries against that indexed data, based on the rich features of elasticsearch.

S3Search is a Heroku add-on.

Installation

Add this line to your application's Gemfile:

gem 's3search'

And then execute:

$ bundle

Provisioning the add-on

S3Search can be attached to a Heroku application via the CLI:

A list of all plans available can be found [here](http://addons.heroku.com/s3search).
    $ heroku addons:add s3search
    -----> Adding s3search to sharp-mountain-4005... done, v18 (free)

Once S3Search has been added a S3SEARCH_URL setting will be available in the app configuration and will contain your custom URL to access the newly provisioned S3Search service instance. This can be confirmed using the heroku config:get command.

    $ heroku config:get S3SEARCH_URL
    https://user:[email protected]

After installing S3Search the application should be configured to fully integrate with the add-on.

Using with Rails 3.x

Ruby on Rails applications will need to add the following entry into their Gemfile specifying the S3Search client library.

    gem 's3search'

Update application dependencies with bundler.

    $ bundle install

Write some application code to index some documents. Use the special field, _content_url, if you want to specify a location to download content and make it searchable. S3Search will download the content and make it searchable via the special field _document_content.

    S3Search.create title: 'MyDocument', _content_url: 'https://s3-us-east-1.amazonaws.com/my_bucket/my_document.pdf'
    S3Search.create name: 'Bob Lob Law', resume_id: 25, _content_url: 'https://s3-us-east-1.amazonaws.com/resumes.mycompany.com/bob.pdf'

The documents don't even really need to be in S3.

    S3Search.create name: 'Bitcoin Pirate', resume_id: 42, _content_url: 'https://user:[email protected]/docs/jenny.pdf'
    S3Search.create title: 'Bitcoin', author: '[email protected]', tags: ['bitcoin', 'manifesto'], _content_url: 'http://bitcoin.org/bitcoin.pdf'

The documents don't even really need to be documents! You can use S3Search to use its powerful search capability over just your custom metadata.

    S3Search.create customer_id: 32, first_name: 'His Holiness', last_name: 'The Dalia Lama', religion: 'Buddhist', twitter_handle: '@DalaiLama'
    S3Search.create customer_id: 99, first_name: 'George', middle_name: 'R. R.', last_name: 'Martin', job_title: 'Author'

Now retrieve some documents via the powerful query API.

Search by a single metadata field

    results = S3Search.search('title:MyDocument')

Search all metadata fields AND the content of the documents.

    results = S3Search.search('bitcoin')

Search only the content of the documents.

    results = S3Search.search('_document_content:bitcoin')

Boost the search ranking of a certain field.

    results = S3Search.search('bitcoin', boost: { title: 2.5 })

Find a single document based on its unique id.

    document = S3Search.get '833FCA4EEEF2943AC2D8E0'

Monitoring & Logging

Stats and the current state of S3Search can be displayed via the CLI.

    $ heroku s3search:status
    documents_indexed: 32842
    index_size: 640MB

S3Search activity can be observed within the Heroku log-stream.

    $ heroku logs -t | grep 's3search'

Dashboard

For more information on the features available within the S3Search dashboard please see the docs at [heroku.s3searchapp.com/docs](heroku.s3searchapp.com/docs).

The S3Search dashboard allows you to view the current status of your S3Search cluster.

The dashboard can be accessed via the CLI:

    $ heroku addons:open s3search
    Opening s3search for sharp-mountain-4005…

or by visiting the Heroku apps web interface and selecting the application in question. Select S3Search from the Add-ons menu.

Migrating between plans

Application owners should carefully manage the migration timing to ensure proper application function during the migration process.

Use the heroku addons:upgrade command to migrate to a new plan.

    $ heroku addons:upgrade s3search:newplan
    -----> Upgrading s3search:newplan to sharp-mountain-4005... done, v18 ($49/mo)
           Your plan has been updated to: s3search:newplan

Removing the add-on

S3Search can be removed via the CLI.

This will destroy all metadata and indexes stored in S3Search and cannot be undone! Of course, documents indexed in S3Search but stored elsewhere will remain untouched.
    $ heroku addons:remove s3search
    -----> Removing s3search from sharp-mountain-4005... done, v20 (free)

Before removing S3Search a data export can be performed by contacting [email protected] directly.

Support

All S3Search support and runtime issues should be submitted via on of the Heroku Support channels. Any non-support related issues or product feedback is welcome at [email protected].

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request