Class: XapianFu::XapianDb

Inherits:
Object show all
Defined in:
lib/xapian_fu/xapian_db.rb

Overview

The XapianFu::XapianDb encapsulates a Xapian database, handling setting up stemmers, stoppers, query parsers and such. This is the core of XapianFu.

Opening and creating the database

The :dir option specified where the xapian database is to be read from and written to. Without this, an in-memory Xapian database will be used. By default, the on-disk database will not be created if it doesn't already exist. See the :create option.

Setting the :create option to true will allow XapianDb to create a new Xapian database on-disk. If one already exists, it is just opened. The default is false.

Setting the :overwrite option to true will force XapianDb to wipe the current on-disk database and start afresh. The default is false.

db = XapianDb.new(:dir => '/tmp/mydb', :create => true)

Language, Stemmers and Stoppers

The :language option specifies the default document language, and controls the default type of stemmer and stopper that will be used when indexing. The stemmer and stopper can be overridden with the :stemmer and stopper options.

The :language, :stemmer and :stopper options can be set to one of of the following: :danish, :dutch, :english, :finnish, :french, :german, :hungarian, :italian, :norwegian, :portuguese, :romanian, :russian, :spanish, :swedish, :turkish. Set it to false to specify none.

The default for all is :english.

db = XapianDb.new(:language => :italian, :stopper => false)

Spelling suggestions

The :spelling option controls generation of a spelling dictionary during indexing and its use during searches. When enabled, Xapian will build a dictionary of words for the database whilst indexing documents and will enable spelling suggestion by default for searches. Building the dictionary will impact indexing performance and database size. It is enabled by default. See the search section for information on getting spelling correction information during searches.

Fields and values

The :store option specifies which document fields should be stored in the database. By default, fields are only indexed - the original values cannot be retrieved.

The :sortable option specifies which document fields will be available for sorting results on. This is really just does the same thing as :store and is just available to be explicit.

The :collapsible option specifies which document fields can be used to group (“collapse”) results. This also just does the same thing as :store and is just available to be explicit.

A more complete way of defining fields is available:

XapianDb.new(:fields => { :title => { :type => String },
                          :slug => { :type => String, :index => false },
                          :created_at => { :type => Time, :store => true },
                          :votes => { :type => Fixnum, :store => true },
                        })

XapianFu will use the :type option when instantiating a store value, so you'll get back a Time object rather than the result of Time's to_s method as is the default. Defining the type for numerical classes (such as Time, Fixnum and Bignum) allows XapianFu to to store them on-disk in a much more efficient way, and sort them efficiently (without having to resort to storing leading zeros or anything like that).

Term Weights

The :weights option accepts a Proc or Lambda that sets custom term weights.

Your function will receive the term key and value and the full list of fields, and should return an integer weight to be applied for that term when the document is indexed.

In this example,

XapianDb.new(:weights => Proc.new do |key, value, fields|
  return 10 if fields.keys.include?('culturally_important')
  return 3  if key == 'title'
  1
end)

terms in the title will be weighted three times greater than other terms, and all terms in 'culturally important' items will weighted 10 times more.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = { }) ⇒ XapianDb

Returns a new instance of XapianDb



144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
# File 'lib/xapian_fu/xapian_db.rb', line 144

def initialize( options = { } )
  @options = { :index_positions => true, :spelling => true }.merge(options)
  @dir = @options[:dir]
  @index_positions = @options[:index_positions]
  @db_flag = Xapian::DB_OPEN
  @db_flag = Xapian::DB_CREATE_OR_OPEN if @options[:create]
  @db_flag = Xapian::DB_CREATE_OR_OVERWRITE if @options[:overwrite]
  @tx_mutex = Mutex.new
  @language = @options.fetch(:language, :english)
  @stemmer = @options.fetch(:stemmer, @language)
  @stopper = @options.fetch(:stopper, @language)
  @field_options = {}
  setup_fields(@options[:fields])
  @store_values << @options[:store]
  @store_values << @options[:sortable]
  @store_values << @options[:collapsible]
  @store_values = @store_values.flatten.uniq.compact
  @spelling = @options[:spelling]
  @weights_function = @options[:weights]
end

Instance Attribute Details

#boolean_fieldsObject (readonly)

An array of fields that will be treated as boolean terms



136
137
138
# File 'lib/xapian_fu/xapian_db.rb', line 136

def boolean_fields
  @boolean_fields
end

#db_flagObject (readonly)

:nodoc:



124
125
126
# File 'lib/xapian_fu/xapian_db.rb', line 124

def db_flag
  @db_flag
end

#dirObject (readonly)

Path to the on-disk database. Nil if in-memory database



123
124
125
# File 'lib/xapian_fu/xapian_db.rb', line 123

def dir
  @dir
end

#field_optionsObject (readonly)

Returns the value of attribute field_options



140
141
142
# File 'lib/xapian_fu/xapian_db.rb', line 140

def field_options
  @field_options
end

#field_weightsObject (readonly)

Returns the value of attribute field_weights



142
143
144
# File 'lib/xapian_fu/xapian_db.rb', line 142

def field_weights
  @field_weights
end

#fieldsObject (readonly)

An hash of field names and their types



132
133
134
# File 'lib/xapian_fu/xapian_db.rb', line 132

def fields
  @fields
end

#index_positionsObject (readonly)

True if term positions will be stored



128
129
130
# File 'lib/xapian_fu/xapian_db.rb', line 128

def index_positions
  @index_positions
end

#languageObject (readonly)

The default document language. Used for setting up stoppers and stemmers.



130
131
132
# File 'lib/xapian_fu/xapian_db.rb', line 130

def language
  @language
end

#sortable_fieldsObject (readonly)

Returns the value of attribute sortable_fields



139
140
141
# File 'lib/xapian_fu/xapian_db.rb', line 139

def sortable_fields
  @sortable_fields
end

#spellingObject (readonly)

Whether this db will generate a spelling dictionary during indexing



138
139
140
# File 'lib/xapian_fu/xapian_db.rb', line 138

def spelling
  @spelling
end

#store_valuesObject (readonly)

An array of the fields that will be stored in the Xapian



126
127
128
# File 'lib/xapian_fu/xapian_db.rb', line 126

def store_values
  @store_values
end

#unindexed_fieldsObject (readonly)

An array of fields that will not be indexed



134
135
136
# File 'lib/xapian_fu/xapian_db.rb', line 134

def unindexed_fields
  @unindexed_fields
end

#weights_functionObject

Returns the value of attribute weights_function



141
142
143
# File 'lib/xapian_fu/xapian_db.rb', line 141

def weights_function
  @weights_function
end

Instance Method Details

#add_doc(doc) ⇒ Object Also known as: <<

Short-cut to documents.add



196
197
198
# File 'lib/xapian_fu/xapian_db.rb', line 196

def add_doc(doc)
  documents.add(doc)
end

#add_synonym(term, synonym) ⇒ Object

Add a synonym to the database.

If you want to search with synonym support, remember to add the option:

db.search("foo", :synonyms => true)

Note that in-memory databases don't support synonyms.



210
211
212
# File 'lib/xapian_fu/xapian_db.rb', line 210

def add_synonym(term, synonym)
  rw.add_synonym(term, synonym)
end

#closeObject

Closes the database.

Raises:



346
347
348
349
350
351
352
353
354
# File 'lib/xapian_fu/xapian_db.rb', line 346

def close
  raise ConcurrencyError if @tx_mutex.locked?

  @rw.close if @rw
  @rw = nil

  @ro.close if @ro
  @ro = nil
end

#documentsObject

The XapianFu::XapianDocumentsAccessor for this database



191
192
193
# File 'lib/xapian_fu/xapian_db.rb', line 191

def documents
  @documents_accessor ||= XapianDocumentsAccessor.new(self)
end

#flushObject

Flush any changes to disk and reopen the read-only database. Raises ConcurrencyError if a transaction is in process

Raises:



339
340
341
342
343
# File 'lib/xapian_fu/xapian_db.rb', line 339

def flush
  raise ConcurrencyError if @tx_mutex.locked?
  rw.flush
  ro.reopen
end

#roObject

The read-only Xapian::Database



181
182
183
# File 'lib/xapian_fu/xapian_db.rb', line 181

def ro
  @ro ||= setup_ro_db
end

#rwObject

The writable Xapian::WritableDatabase



176
177
178
# File 'lib/xapian_fu/xapian_db.rb', line 176

def rw
  @rw ||= setup_rw_db
end

#search(q, options = {}) ⇒ Object

Conduct a search on the Xapian database, returning an array of XapianFu::XapianDoc objects for the matches wrapped in a XapianFu::ResultSet.

The :limit option sets how many results to return. For compatability with the will_paginate plugin, the :per_page option does the same thing (though overrides :limit). Defaults to 10.

The :page option sets which page of results to return. Defaults to 1.

The :order option specifies the stored field to order the results by (instead of the default search result weight).

The :reverse option reverses the order of the results, so lowest search weight first (or lowest stored field value first).

The :collapse option specifies which stored field value to collapse (group) the results on. Works a bit like the SQL GROUP BY behaviour

The :spelling option controls whether spelling suggestions will be made for queries. It defaults to whatever the database spelling setting is (true by default). When enabled, spelling suggestions are available using the XapianFu::ResultSet corrected_query method.

The :check_at_least option controls how many documents will be sampled. This allows for accurate page and facet counts. Specifying the special value of :all will make Xapian sample every document in the database. Be aware that this can hurt your query performance.

The :query_builder option allows you to pass a proc that will return the final query to be run. The proc receives the parsed query as its only argument.

The first parameter can also be :all or :nothing, to match all documents or no documents respectively.

For additional options on how the query is parsed, see XapianFu::QueryParser



260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
# File 'lib/xapian_fu/xapian_db.rb', line 260

def search(q, options = {})
  defaults = { :page => 1, :reverse => false,
    :boolean => true, :boolean_anycase => true, :wildcards => true,
    :lovehate => true, :spelling => spelling, :pure_not => false }
  options = defaults.merge(options)
  page = options[:page].to_i rescue 1
  page = page > 1 ? page - 1 : 0
  per_page = options[:per_page] || options[:limit] || 10
  per_page = per_page.to_i rescue 10
  offset = page * per_page

  check_at_least = options.include?(:check_at_least) ? options[:check_at_least] : 0
  check_at_least = self.size if check_at_least == :all

  qp = XapianFu::QueryParser.new({ :database => self }.merge(options))
  query = qp.parse_query(q.is_a?(Symbol) ? q : q.to_s)

  if options.include?(:query_builder)
    query = options[:query_builder].call(query)
  end

  query = filter_query(query, options[:filter]) if options[:filter]

  enquiry = Xapian::Enquire.new(ro)
  setup_ordering(enquiry, options[:order], options[:reverse])
  if options[:collapse]
    enquiry.collapse_key = XapianDocValueAccessor.value_key(options[:collapse])
  end
  if options[:facets]
    spies = options[:facets].inject({}) do |accum, name|
      accum[name] = spy = Xapian::ValueCountMatchSpy.new(XapianDocValueAccessor.value_key(name))
      enquiry.add_matchspy(spy)
      accum
    end
  end

  if options.include?(:posting_source)
    query = Xapian::Query.new(Xapian::Query::OP_AND_MAYBE, query, Xapian::Query.new(options[:posting_source]))
  end

  enquiry.query = query

  ResultSet.new(:mset => enquiry.mset(offset, per_page, check_at_least),
                :current_page => page + 1,
                :per_page => per_page,
                :corrected_query => qp.corrected_query,
                :spies => spies,
                :xapian_db => self
               )
end

#serialize_value(field, value, type = nil) ⇒ Object



356
357
358
359
360
361
362
# File 'lib/xapian_fu/xapian_db.rb', line 356

def serialize_value(field, value, type = nil)
  if sortable_fields.include?(field)
    Xapian.sortable_serialise(value)
  else
    (type || fields[field] || Object).to_xapian_fu_storage_value(value)
  end
end

#sizeObject

The number of docs in the Xapian database



186
187
188
# File 'lib/xapian_fu/xapian_db.rb', line 186

def size
  ro.doccount
end

#stemmerObject

Return a new stemmer object for this database



166
167
168
# File 'lib/xapian_fu/xapian_db.rb', line 166

def stemmer
  StemFactory.stemmer_for(@stemmer)
end

#stopperObject

The stopper object for this database



171
172
173
# File 'lib/xapian_fu/xapian_db.rb', line 171

def stopper
  StopperFactory.stopper_for(@stopper)
end

#transaction(flush_on_commit = true) ⇒ Object

Run the given block in a XapianDB transaction. Any changes to the Xapian database made in the block will be atomically committed at the end.

If an exception is raised by the block, all changes are discarded and the exception re-raised.

Xapian does not support multiple concurrent transactions on the same Xapian database. Any attempts at this will be serialized by XapianFu, which is not perfect but probably better than just kicking up an exception.



322
323
324
325
326
327
328
329
330
331
332
333
334
335
# File 'lib/xapian_fu/xapian_db.rb', line 322

def transaction(flush_on_commit = true)
  @tx_mutex.synchronize do
    begin
      rw.begin_transaction(flush_on_commit)
      yield
    rescue Exception => e
      rw.cancel_transaction
      ro.reopen
      raise e
    end
    rw.commit_transaction
    ro.reopen
  end
end

#unserialize_value(field, value, type = nil) ⇒ Object



364
365
366
367
368
369
370
# File 'lib/xapian_fu/xapian_db.rb', line 364

def unserialize_value(field, value, type = nil)
  if sortable_fields.include?(field)
    Xapian.sortable_unserialise(value)
  else
    (type || fields[field] || Object).from_xapian_fu_storage_value(value)
  end
end