RSolr
A Ruby client for Apache Solr. Has transparent JRuby support by using “org.apache.solr.servlet.DirectSolrConnection” as a connection adapter.
Installation:
gem sources -a http://gems.github.com
sudo gem install mwmitchell-rsolr
Simple usage:
require 'rubygems'
require 'rsolr'
rsolr = RSolr.connect(:http)
response = rsolr.query(:q=>'*:*')
To run tests:
Copy an Apache Solr 1.3.0/or later (http://apache.seekmeup.com/lucene/solr/1.3.0/) distribution into this directory and rename to "apache-solr"
Start Solr HTTP: rake rsolr:start_test_server
MRI Ruby: rake
JRuby: jruby -S rake
To get a connection in MRI/standard Ruby:
solr = RSolr.connect(:http)
To get a direct connection (no http) in jRuby using DirectSolrConnection:
solr = RSolr.connect(:direct, :home_dir=>'/path/to/solr/home', :dist_dir=>'/path/to/solr/distribution')
You can set RSolr params that will be sent on every request:
solr = RSolr.connect(:http, :global_params=>{:wt=>:ruby, :echoParams=>'EXPLICIT'})
Requests
Once you have a connection, you can execute queries, updates etc..
Querying
response = solr.query(:q=>'washington', :facet=>true, :facet.limit=>-1, :facet.field=>'cat', :facet.field=>'inStock')
response = solr.find_by_id(1)
Thanks to a little Ruby magic, we can chain symbols to create Solr “dot” syntax: :facet.field=>‘cat’
Using the #search method makes building more complex Solr queries easier:
response = solr.search 'my search', :filters=>{:price=>(0.00..10.00)}
response.docs.each do |doc|
doc[:price]
end
Pagination
Pagination is simplified by using the :page and :per_page params:
response = solr.query(:page=>1, :per_page=>10, :q=>'*:*')
response.per_page
response.total_pages
response.current_page
response.previous_page
response.next_page
If you use WillPaginate, just pass-in the response to the #will_paginate view helper:
<%= will_paginate(@response) %>
The #query and #search methods automatically figure out the :start and :rows value, based on the values of :page and :per_page. The will_paginate view helper just needs the right methods (#current_page, #previous_page, #next_page and #total_pages) to create the pagination view widget.
Updating Solr
Updating is done using native Ruby structures. Hashes are used for single documents and arrays are used for a collection of documents (hashes). These structures get turned into simple XML “messages”.
Single document
response = solr.add(:id=>1, :price=>1.00)
Multiple documents
response = solr.add([{:id=>1, :price=>1.00}, {:id=>2, :price=>10.50}])
When adding, you can also supply “add” attributes and/or a block for digging into the Solr “add” params:
doc = {:id=>1, :price=>1.00}
solr.add(doc, {:allowDups=>false, :commitWithin=>10.0}) do |doc_attrs|
doc_attrs[:boost] = 10.0
end
Delete by id
response = solr.delete_by_id(1)
or an array of ids
response = solr.delete_by_id([1, 2, 3, 4])
Delete by query:
response = solr.delete_by_query('price:1.00')
Delete by array of queries
response = solr.delete_by_query(['price:1.00', 'price:10.00'])
Commit & Optimize
solr.commit
solr.optimize
Response Formats
The default response format is Ruby. When the :wt param is set to :ruby, the response is eval’d and wrapped up in a nice RSolr::Response class. You can get an unwrapped response by setting the :wt to “ruby” - notice, the string – not a symbol. All other response formats are available as expected, :wt=>‘xml’ etc.. Currently, the only response format that gets eval’d and wrapped is :ruby.
You can access the original request context (path, params, url etc.) from response.request. The response.request is a hash that contains the generated params, url, path, post data, headers etc., very useful for debugging and testing.
Data Mapping
The RSolr::Mapper::Base class provides some nice ways of mapping data. You provide a hash mapping and a “data source”. The keys of the hash mapping become the Solr field names. The values of the hash mapping get processed differently based on the value. The data source must be an Enumerable type object. The hash mapping is processed for each item in the data source.
Hash Map Processing
If the value is a string, the value of the String is used as the final Solr field value. If the value is a Symbol, the Symbol is used as a key on the data source. An Enumerable type does the same as the Symbol, but for each item in the set. The most interesting and flexible processing occurs when the value is a Proc. When a Proc is used as a hash mapping value, the mapper executes the Proc’s #call method, passing in the current data source item.
Examples
mapping = {
:id=>:id,
:title=>:title,
:source=>'Example',
:meta=>[:author, :sub_title],
:web_id=>proc {|item|
WebService.fetch_item_id_by_name(item[:name])
}
}
data_source = [
{
:id=>100,
:title=>'Doc One',
:author=>'Mr. X',
:sub_title=>'A first class document.',
:name=>'doc_1'
},
{
:id=>200,
:title=>'Doc Two',
:author=>'Mr. XYZ',
:sub_title=>'A second class document.',
:name=>'doc_2'
}
]
mapper = RSolr::Mapper::Base(mapping)
mapped_data = mapper.map(data_source)
# the following would be true...
mapped_data == [
{
:id=>100,
:title=>'Doc One',
:source=>'Example',
:meta=>['Mr. X', 'A first class document'],
:web_id=>'web_id_for_doc_1_for_example'
},
{
:id=>200,
:title=>'Doc Two',
:source=>'Example',
:meta=>['Mr. XYZ', 'A second class document'],
:web_id=>'web_id_for_doc_2_for_example'
}
]
RSS Mapper
There is currently one built in mapper, RSolr::Mapper::RSS. Here’s an example usage:
mapper = RSolr::Mapper::RSS.new
mapping = {
:channel=>:'channel.title',
:url=>:'channel.link',
:total=>:'items.size',
:title=>proc {|item,m| item.title },
:link=>proc {|item,m| item.link },
:published=>proc {|item,m| item.date },
:description=>proc {|item,m| item.description }
}
mapped_data = m.map('http://site.com/feed.rss')
Indexing
RSolr comes with a simple indexer that makes use of the Solr mapper. Here’s an example, using the “mapping” and “mapped_data” variables above (RSS mapper):
solr = RSolr.connect(:http)
i = RSolr::Indexer.new(solr, mapping)
i.index(mapped_data)
HTTP Client Adapter
You can specify the http client adapter to use by setting RSolr::Connection::Adapter::HTTP.client_adapter to one of:
:net_http uses the standard Net::HTTP library
:curb uses the Ruby "curl" bindings
Example:
RSolr::Connection::Adapter::HTTP.client_adapter = :curb
Example of using the HTTP client only:
hclient = RSolr::HTTPClient.connect(url, :curb)
hclient = RSolr::HTTPClient.connect(url, :net_http)
After reading this apocryph.org/2008/11/09/more_indepth_analysis_ruby_http_client_performance/ - I would recommend using the :curb adapter. NOTE: You can’t use the :curb adapter under jRuby. To install curb:
sudo gem install curb