Class: GeoCombine::GeoBlacklightHarvester
- Inherits:
-
Object
- Object
- GeoCombine::GeoBlacklightHarvester
- Defined in:
- lib/geo_combine/geo_blacklight_harvester.rb
Overview
A class to harvest and index results from GeoBlacklight sites You can configure the sites to be harvested via a configure command. GeoCombine::GeoBlacklightHarvester.configure do
{
SITE: { host: 'https://example.com', params: { f: { dct_provenance_s: ['SITE'] } } }
}
end The class configuration also allows for various other things to be configured:
- A debug parameter to print out details of what is being harvested and indexed
- crawl delays for each page of results (globally or on a per site basis)
- Solr's commitWithin parameter (defaults to 5000)
- A document transformer proc to modify a document before indexing (defaults to removing _version_, score, and timestamp)
Example: GeoCombine::GeoBlacklightHarvester.new(‘SITE’).index
Defined Under Namespace
Classes: BlacklightResponseVersionFactory, LegacyBlacklightResponse, ModernBlacklightResponse
Class Attribute Summary collapse
Instance Attribute Summary collapse
-
#site ⇒ Object
readonly
Returns the value of attribute site.
-
#site_key ⇒ Object
readonly
Returns the value of attribute site_key.
Class Method Summary collapse
Instance Method Summary collapse
- #index ⇒ Object
-
#initialize(site_key) ⇒ GeoBlacklightHarvester
constructor
A new instance of GeoBlacklightHarvester.
Constructor Details
#initialize(site_key) ⇒ GeoBlacklightHarvester
Returns a new instance of GeoBlacklightHarvester.
44 45 46 47 48 49 |
# File 'lib/geo_combine/geo_blacklight_harvester.rb', line 44 def initialize(site_key) @site_key = site_key @site = self.class.config[site_key] raise ArgumentError, "Site key #{@site_key.inspect} is not configured for #{self.class.name}" unless @site end |
Class Attribute Details
.document_transformer ⇒ Object
32 33 34 35 36 37 38 39 |
# File 'lib/geo_combine/geo_blacklight_harvester.rb', line 32 def document_transformer @document_transformer || ->(document) do document.delete('_version_') document.delete('score') document.delete('timestamp') document end end |
Instance Attribute Details
#site ⇒ Object (readonly)
Returns the value of attribute site.
43 44 45 |
# File 'lib/geo_combine/geo_blacklight_harvester.rb', line 43 def site @site end |
#site_key ⇒ Object (readonly)
Returns the value of attribute site_key.
43 44 45 |
# File 'lib/geo_combine/geo_blacklight_harvester.rb', line 43 def site_key @site_key end |
Class Method Details
.config ⇒ Object
28 29 30 |
# File 'lib/geo_combine/geo_blacklight_harvester.rb', line 28 def config @config || {} end |
.configure(&block) ⇒ Object
24 25 26 |
# File 'lib/geo_combine/geo_blacklight_harvester.rb', line 24 def configure(&block) @config = yield block end |
Instance Method Details
#index ⇒ Object
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
# File 'lib/geo_combine/geo_blacklight_harvester.rb', line 51 def index puts "Fetching page 1 @ #{base_url}&page=1" if self.class.config[:debug] response = JSON.parse(Net::HTTP.get(URI("#{base_url}&page=1"))) response_class = BlacklightResponseVersionFactory.call(response) response_class.new(response: response, base_url: base_url).documents.each do |docs| docs.map! do |document| self.class.document_transformer.call(document) if self.class.document_transformer end.compact puts "Adding #{docs.count} documents to solr" if self.class.config[:debug] solr_connection.update params: { commitWithin: commit_within, overwrite: true }, data: docs.to_json, headers: { 'Content-Type' => 'application/json' } sleep(crawl_delay.to_i) if crawl_delay end end |