Class: Riddle::Client

Inherits:
Object
  • Object
show all
Defined in:
lib/riddle/client.rb,
lib/riddle/client/filter.rb,
lib/riddle/client/message.rb,
lib/riddle/client/response.rb

Overview

This class was heavily based on the existing Client API by Dmytro Shteflyuk and Alexy Kovyrin. Their code worked fine, I just wanted something a bit more Ruby-ish (ie. lowercase and underscored method names). I also have used a few helper classes, just to neaten things up.

Feel free to use it wherever. Send bug reports, patches, comments and suggestions to pat at freelancing-gods dot com.

Most properties of the client are accessible through attribute accessors, and where relevant use symboles instead of the long constants common in other clients. Some examples:

client.sort_mode  = :extended
client.sort_by    = "birthday DESC"
client.match_mode = :extended

To add a filter, you will need to create a Filter object:

client.filters << Riddle::Client::Filter.new("birthday",
  Time.at(1975, 1, 1).to_i..Time.at(1985, 1, 1).to_i, false)

Defined Under Namespace

Classes: Filter, Message, Response

Constant Summary collapse

Commands =
{
  :search   => 0, # SEARCHD_COMMAND_SEARCH
  :excerpt  => 1, # SEARCHD_COMMAND_EXCERPT
  :update   => 2, # SEARCHD_COMMAND_UPDATE
  :keywords => 3  # SEARCHD_COMMAND_KEYWORDS
}
Versions =
{
  :search   => 0x113, # VER_COMMAND_SEARCH
  :excerpt  => 0x100, # VER_COMMAND_EXCERPT
  :update   => 0x101, # VER_COMMAND_UPDATE
  :keywords => 0x100  # VER_COMMAND_KEYWORDS
}
Statuses =
{
  :ok      => 0, # SEARCHD_OK
  :error   => 1, # SEARCHD_ERROR
  :retry   => 2, # SEARCHD_RETRY
  :warning => 3  # SEARCHD_WARNING
}
MatchModes =
{
  :all        => 0, # SPH_MATCH_ALL
  :any        => 1, # SPH_MATCH_ANY
  :phrase     => 2, # SPH_MATCH_PHRASE
  :boolean    => 3, # SPH_MATCH_BOOLEAN
  :extended   => 4, # SPH_MATCH_EXTENDED
  :fullscan   => 5, # SPH_MATCH_FULLSCAN
  :extended2  => 6  # SPH_MATCH_EXTENDED2
}
RankModes =
{
  :proximity_bm25 => 0, # SPH_RANK_PROXIMITY_BM25
  :bm25           => 1, # SPH_RANK_BM25
  :none           => 2, # SPH_RANK_NONE
  :wordcount      => 3  # SPH_RANK_WORDCOUNT
}
SortModes =
{
  :relevance     => 0, # SPH_SORT_RELEVANCE
  :attr_desc     => 1, # SPH_SORT_ATTR_DESC
  :attr_asc      => 2, # SPH_SORT_ATTR_ASC
  :time_segments => 3, # SPH_SORT_TIME_SEGMENTS
  :extended      => 4, # SPH_SORT_EXTENDED
  :expr          => 5  # SPH_SORT_EXPR
}
AttributeTypes =
{
  :integer    => 1, # SPH_ATTR_INTEGER
  :timestamp  => 2, # SPH_ATTR_TIMESTAMP
  :ordinal    => 3, # SPH_ATTR_ORDINAL
  :bool       => 4, # SPH_ATTR_BOOL
  :float      => 5, # SPH_ATTR_FLOAT
  :multi      => 0x40000000 # SPH_ATTR_MULTI
}
GroupFunctions =
{
  :day      => 0, # SPH_GROUPBY_DAY
  :week     => 1, # SPH_GROUPBY_WEEK
  :month    => 2, # SPH_GROUPBY_MONTH
  :year     => 3, # SPH_GROUPBY_YEAR
  :attr     => 4, # SPH_GROUPBY_ATTR
  :attrpair => 5  # SPH_GROUPBY_ATTRPAIR
}
FilterTypes =
{
  :values       => 0, # SPH_FILTER_VALUES
  :range        => 1, # SPH_FILTER_RANGE
  :float_range  => 2  # SPH_FILTER_FLOATRANGE
}

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(server = nil, port = nil) ⇒ Client

Can instantiate with a specific server and port - otherwise it assumes defaults of localhost and 3312 respectively. All other settings can be accessed and changed via the attribute accessors.



113
114
115
116
117
118
119
120
# File 'lib/riddle/client.rb', line 113

def initialize(server=nil, port=nil)
  @server = server || "localhost"
  @port   = port   || 3312
  
  reset
  
  @queue = []
end

Instance Attribute Details

#anchorObject

Returns the value of attribute anchor.



103
104
105
# File 'lib/riddle/client.rb', line 103

def anchor
  @anchor
end

#cut_offObject

Returns the value of attribute cut_off.



103
104
105
# File 'lib/riddle/client.rb', line 103

def cut_off
  @cut_off
end

#field_weightsObject

Returns the value of attribute field_weights.



103
104
105
# File 'lib/riddle/client.rb', line 103

def field_weights
  @field_weights
end

#filtersObject

Returns the value of attribute filters.



103
104
105
# File 'lib/riddle/client.rb', line 103

def filters
  @filters
end

#group_byObject

Returns the value of attribute group_by.



103
104
105
# File 'lib/riddle/client.rb', line 103

def group_by
  @group_by
end

#group_clauseObject

Returns the value of attribute group_clause.



103
104
105
# File 'lib/riddle/client.rb', line 103

def group_clause
  @group_clause
end

#group_distinctObject

Returns the value of attribute group_distinct.



103
104
105
# File 'lib/riddle/client.rb', line 103

def group_distinct
  @group_distinct
end

#group_functionObject

Returns the value of attribute group_function.



103
104
105
# File 'lib/riddle/client.rb', line 103

def group_function
  @group_function
end

#id_rangeObject

Returns the value of attribute id_range.



103
104
105
# File 'lib/riddle/client.rb', line 103

def id_range
  @id_range
end

#index_weightsObject

Returns the value of attribute index_weights.



103
104
105
# File 'lib/riddle/client.rb', line 103

def index_weights
  @index_weights
end

#limitObject

Returns the value of attribute limit.



103
104
105
# File 'lib/riddle/client.rb', line 103

def limit
  @limit
end

#match_modeObject

Returns the value of attribute match_mode.



103
104
105
# File 'lib/riddle/client.rb', line 103

def match_mode
  @match_mode
end

#max_matchesObject

Returns the value of attribute max_matches.



103
104
105
# File 'lib/riddle/client.rb', line 103

def max_matches
  @max_matches
end

#max_query_timeObject

Returns the value of attribute max_query_time.



103
104
105
# File 'lib/riddle/client.rb', line 103

def max_query_time
  @max_query_time
end

#offsetObject

Returns the value of attribute offset.



103
104
105
# File 'lib/riddle/client.rb', line 103

def offset
  @offset
end

#portObject

Returns the value of attribute port.



103
104
105
# File 'lib/riddle/client.rb', line 103

def port
  @port
end

#queueObject (readonly)

Returns the value of attribute queue.



108
109
110
# File 'lib/riddle/client.rb', line 108

def queue
  @queue
end

#rank_modeObject

Returns the value of attribute rank_mode.



103
104
105
# File 'lib/riddle/client.rb', line 103

def rank_mode
  @rank_mode
end

#retry_countObject

Returns the value of attribute retry_count.



103
104
105
# File 'lib/riddle/client.rb', line 103

def retry_count
  @retry_count
end

#retry_delayObject

Returns the value of attribute retry_delay.



103
104
105
# File 'lib/riddle/client.rb', line 103

def retry_delay
  @retry_delay
end

#serverObject

Returns the value of attribute server.



103
104
105
# File 'lib/riddle/client.rb', line 103

def server
  @server
end

#sort_byObject

Returns the value of attribute sort_by.



103
104
105
# File 'lib/riddle/client.rb', line 103

def sort_by
  @sort_by
end

#sort_modeObject

Returns the value of attribute sort_mode.



103
104
105
# File 'lib/riddle/client.rb', line 103

def sort_mode
  @sort_mode
end

#timeoutObject

Returns the value of attribute timeout.



103
104
105
# File 'lib/riddle/client.rb', line 103

def timeout
  @timeout
end

#weightsObject

Returns the value of attribute weights.



103
104
105
# File 'lib/riddle/client.rb', line 103

def weights
  @weights
end

Instance Method Details

#append_query(search, index = '*', comments = '') ⇒ Object

Append a query to the queue. This uses the same parameters as the query method.



173
174
175
# File 'lib/riddle/client.rb', line 173

def append_query(search, index = '*', comments = '')
  @queue << query_message(search, index, comments)
end

#excerpts(options = {}) ⇒ Object

Build excerpts from search terms (the words) and the text of documents. Excerpts are bodies of text that have the words highlighted. They may also be abbreviated to fit within a word limit.

As part of the options hash, you will need to define:

  • :docs

  • :words

  • :index

Optional settings include:

  • :before_match (defaults to <span class=“match”>)

  • :after_match (defaults to </span>)

  • :chunk_separator (defaults to ‘ &#8230; ’ - which is an HTML ellipsis)

  • :limit (defaults to 256)

  • :around (defaults to 5)

  • :exact_phrase (defaults to false)

  • :single_passage (defaults to false)

The defaults differ from the official PHP client, as I’ve opted for semantic HTML markup.

Example:

client.excerpts(:docs => ["Pat Allan, Pat Cash"], :words => 'Pat', :index => 'pats')
#=> ["<span class=\"match\">Pat</span> Allan, <span class=\"match\">Pat</span> Cash"]

lorem_lipsum = "Lorem ipsum dolor..."

client.excerpts(:docs => ["Pat Allan, #{lorem_lipsum} Pat Cash"], :words => 'Pat', :index => 'pats')
#=> ["<span class=\"match\">Pat</span> Allan, Lorem ipsum dolor sit amet, consectetur adipisicing
       elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua &#8230; . Excepteur 
       sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est 
       laborum. <span class=\"match\">Pat</span> Cash"]

Workflow:

Excerpt creation is completely isolated from searching the index. The nominated index is only used to discover encoding and charset information.

Therefore, the workflow goes:

  1. Do the sphinx query.

  2. Fetch the documents found by sphinx from their repositories.

  3. Pass the documents’ text to excerpts for marking up of matched terms.



335
336
337
338
339
340
341
342
343
344
345
346
347
348
# File 'lib/riddle/client.rb', line 335

def excerpts(options = {})
  options[:index]           ||= '*'
  options[:before_match]    ||= '<span class="match">'
  options[:after_match]     ||= '</span>'
  options[:chunk_separator] ||= ' &#8230; ' # ellipsis
  options[:limit]           ||= 256
  options[:around]          ||= 5
  options[:exact_phrase]    ||= false
  options[:single_passage]  ||= false
  
  response = Response.new request(:excerpt, excerpts_message(options))
  
  options[:docs].collect { response.next }
end

#keywords(query, index, return_hits = false) ⇒ Object

Generates a keyword list for a given query. Each keyword is represented by a hash, with keys :tokenised and :normalised. If return_hits is set to true it will also report on the number of hits and documents for each keyword (see :hits and :docs keys respectively).



372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
# File 'lib/riddle/client.rb', line 372

def keywords(query, index, return_hits = false)
  response = Response.new request(
    :keywords,
    keywords_message(query, index, return_hits)
  )
  
  (0...response.next_int).collect do
    hash = {}
    hash[:tokenised]  = response.next
    hash[:normalised] = response.next
    
    if return_hits
      hash[:docs] = response.next_int
      hash[:hits] = response.next_int
    end
    
    hash
  end
end

#query(search, index = '*', comments = '') ⇒ Object

Query the Sphinx daemon - defaulting to all indexes, but you can specify a specific one if you wish. The search parameter should be a string following Sphinx’s expectations.

The object returned from this method is a hash with the following keys:

  • :matches

  • :fields

  • :attributes

  • :attribute_names

  • :words

  • :total

  • :total_found

  • :time

  • :status

  • :warning (if appropriate)

  • :error (if appropriate)

The key :matches returns an array of hashes - the actual search results. Each hash has the document id (:doc), the result weighting (:weight), and a hash of the attributes for the document (:attributes).

The :fields and :attribute_names keys return list of fields and attributes for the documents. The key :attributes will return a hash of attribute name and type pairs, and :words returns a hash of hashes representing the words from the search, with the number of documents and hits for each, along the lines of:

results[:words]["Pat"] #=> {:docs => 12, :hits => 15}

:total, :total_found and :time return the number of matches available, the total number of matches (which may be greater than the maximum available, depending on the number of matches and your sphinx configuration), and the time in milliseconds that the query took to run.

:status is the error code for the query - and if there was a related warning, it will be under the :warning key. Fatal errors will be described under :error.



285
286
287
288
# File 'lib/riddle/client.rb', line 285

def query(search, index = '*', comments = '')
  @queue << query_message(search, index, comments)
  self.run.first
end

#resetObject

Reset attributes and settings to defaults.



123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# File 'lib/riddle/client.rb', line 123

def reset
  # defaults
  @offset         = 0
  @limit          = 20
  @max_matches    = 1000
  @match_mode     = :all
  @sort_mode      = :relevance
  @sort_by        = ''
  @weights        = []
  @id_range       = 0..0
  @filters        = []
  @group_by       = ''
  @group_function = :day
  @group_clause   = '@group desc'
  @group_distinct = ''
  @cut_off        = 0
  @retry_count    = 0
  @retry_delay    = 0
  @anchor         = {}
  # string keys are index names, integer values are weightings
  @index_weights  = {}
  @rank_mode      = :proximity_bm25
  @max_query_time = 0
  # string keys are field names, integer values are weightings
  @field_weights  = {}
  @timeout        = 0
end

#runObject

Run all the queries currently in the queue. This will return an array of results hashes.



179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
# File 'lib/riddle/client.rb', line 179

def run
  response = Response.new request(:search, @queue)
  
  results = @queue.collect do
    result = {
      :matches         => [],
      :fields          => [],
      :attributes      => {},
      :attribute_names => [],
      :words           => {}
    }

    result[:status] = response.next_int
    case result[:status]
    when Statuses[:warning]
      result[:warning] = response.next
    when Statuses[:error]
      result[:error] = response.next
      next result
    end
    
    result[:fields] = response.next_array

    attributes = response.next_int
    for i in 0...attributes
      attribute_name = response.next
      type           = response.next_int

      result[:attributes][attribute_name] = type
      result[:attribute_names] << attribute_name
    end

    matches   = response.next_int
    is_64_bit = response.next_int
    for i in 0...matches
      doc = is_64_bit > 0 ? response.next_64bit_int : response.next_int
      weight = response.next_int

      result[:matches] << {:doc => doc, :weight => weight, :index => i, :attributes => {}}
      result[:attribute_names].each do |attr|
        result[:matches].last[:attributes][attr] = attribute_from_type(
          result[:attributes][attr], response
        )
      end
    end

    result[:total] = response.next_int.to_i || 0
    result[:total_found] = response.next_int.to_i || 0
    result[:time] = ('%.3f' % (response.next_int / 1000.0)).to_f || 0.0

    words = response.next_int
    for i in 0...words
      word = response.next
      docs = response.next_int
      hits = response.next_int
      result[:words][word] = {:docs => docs, :hits => hits}
    end

    result
  end
  
  @queue.clear
  results
end

#set_anchor(lat_attr, lat, long_attr, long) ⇒ Object

Set the geo-anchor point - with the names of the attributes that contain the latitude and longitude (in radians), and the reference position. Note that for geocoding to work properly, you must also set match_mode to :extended. To sort results by distance, you will need to set sort_mode to ‘@geodist asc’ for example. Sphinx expects latitude and longitude to be returned from you SQL source in radians.

Example:

client.set_anchor('lat', -0.6591741, 'long', 2.530770)


162
163
164
165
166
167
168
169
# File 'lib/riddle/client.rb', line 162

def set_anchor(lat_attr, lat, long_attr, long)
  @anchor = {
    :latitude_attribute   => lat_attr,
    :latitude             => lat,
    :longitude_attribute  => long_attr,
    :longitude            => long
  }
end

#update(index, attributes, values_by_doc) ⇒ Object

Update attributes - first parameter is the relevant index, second is an array of attributes to be updated, and the third is a hash, where the keys are the document ids, and the values are arrays with the attribute values - in the same order as the second parameter.

Example:

client.update('people', ['birthday'], {1 => [Time.at(1982, 20, 8).to_i]})


359
360
361
362
363
364
365
366
# File 'lib/riddle/client.rb', line 359

def update(index, attributes, values_by_doc)
  response = Response.new request(
    :update,
    update_message(index, attributes, values_by_doc)
  )
  
  response.next_int
end