Module: Kithe::SolrUtil

Defined in:
app/indexing/kithe/solr_util.rb

Overview

This is all somewhat hacky code, but it gets the job done. Some convenienceutilities for dealing with your Solr index, including issuing a query to delete_all; and finding and deleting “orphaned” Kithe::Indexable Solr objects that no longer exist in the rdbms.

Unlike other parts of Kithe’s indexing support, this stuff IS very solr-specific, and generally implemented with [rsolr](github.com/rsolr/rsolr).

Class Method Summary collapse

Class Method Details

.delete_all(solr_url: Kithe.indexable_settings.solr_url, commit: :hard) ⇒ Object

Just a utility method to delete everything from Solr, and then issue a commit, using Rsolr. Pretty trivial.

Intended for dev/test instances, not really production.

Parameters:

  • commit (defaults to: :hard)

    :soft, :hard, or false. Default :hard



89
90
91
92
93
94
95
96
97
98
99
100
101
102
# File 'app/indexing/kithe/solr_util.rb', line 89

def self.delete_all(solr_url: Kithe.indexable_settings.solr_url, commit: :hard)
  rsolr = RSolr.connect :url => solr_url

  # RSolr is a bit under-doc'd, but this SEEMS to work to send a commit
  # or softCommit instruction with the delete request.
  params = {}
  if commit == :hard
    params[:commit] = true
  elsif commit == :soft
    params[:softCommit] = true
  end

  rsolr.delete_by_query("*:*", params: params)
end

.delete_solr_orphans(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url) ⇒ Object

Finds any Solr objects that have a ‘model_name_ssi` field (or `Kithe.indexable_settings.model_name_solr_field` if non-default), but don’t exist in the rdbms, and deletes them from Solr, then issues a commit.

Under normal use, you shouldn’t have to do this, but can if your Solr index has gotten out of sync and you don’t want to delete it and reindex from scratch.

Implemented in terms of .solr_orphan_ids.

A bit hacky implementation, it might be nice to have a progress bar, we don’t now.

Does return an array of any IDs deleted.



70
71
72
73
74
75
76
77
78
79
80
81
82
# File 'app/indexing/kithe/solr_util.rb', line 70

def self.delete_solr_orphans(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url)
  rsolr = RSolr.connect :url => solr_url
  deleted_ids = []

  solr_orphan_ids(batch_size: batch_size, solr_url: solr_url) do |orphan_id|
    deleted_ids << orphan_id
    rsolr.delete_by_id(orphan_id)
  end

  rsolr.commit

  return deleted_ids
end

.solr_orphan_ids(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url) ⇒ Object

based on sunspot, does not depend on Blacklight. github.com/sunspot/sunspot/blob/3328212da79178319e98699d408f14513855d3c0/sunspot_rails/lib/sunspot/rails/searchable.rb#L332

solr_index_orphans do |orphaned_id|
   delete(id)
end

It is searching for any Solr object with a ‘Kithe.indexable_settings.model_name_solr_field` field (default `model_name_ssi`). Then, it takes the ID and makes sure it exists in the database using Kithe::Model. At the moment we are assuming everything is in Kithe::Model, rather than trying to use the `model_name_ssi` to fetch from different tables. Could maybe be enhanced to not.

This is intended mostly for use by .delete_solr_orphans

A bit hacky implementation, it might be nice to support a progress bar, we don’t now.



29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# File 'app/indexing/kithe/solr_util.rb', line 29

def self.solr_orphan_ids(batch_size: 100, solr_url: Kithe.indexable_settings.solr_url)
  return enum_for(:solr_index_orphan_ids) unless block_given?

  model_name_solr_field = Kithe.indexable_settings.model_name_solr_field
  model_solr_id_attr   = Kithe.indexable_settings.solr_id_value_attribute

  solr_page = -1

  rsolr = RSolr.connect :url => solr_url

  while (solr_page = solr_page.next)
    response = rsolr.get 'select', params: {
      rows: batch_size,
      start: (batch_size * solr_page),
      fl: "id",
      q: "#{model_name_solr_field}:[* TO *]"
    }

    solr_ids = response["response"]["docs"].collect { |h| h["id"] }

    break if solr_ids.empty?

    (solr_ids - Kithe::Model.where(model_solr_id_attr => solr_ids).pluck(model_solr_id_attr)).each do |orphaned_id|
      yield orphaned_id
    end
  end
end