Class: Krikri::Harvesters::CouchdbHarvester
- Inherits:
-
Object
- Object
- Krikri::Harvesters::CouchdbHarvester
- Includes:
- Krikri::Harvester
- Defined in:
- lib/krikri/harvesters/couchdb_harvester.rb
Overview
A harvester implementation for CouchDB
Constant Summary
Constants included from Krikri::Harvester
Instance Attribute Summary collapse
-
#client ⇒ Object
Returns the value of attribute client.
Attributes included from Krikri::Harvester
Attributes included from SoftwareAgent
Class Method Summary collapse
Instance Method Summary collapse
-
#count(opts = {}) ⇒ Fixnum
Return the total number of documents reported by a CouchDB view.
-
#get_record(identifier) ⇒ Object
Retrieves a specific document from CouchDB.
-
#initialize(opts = {}) ⇒ CouchdbHarvester
constructor
A new instance of CouchdbHarvester.
-
#record_ids(opts = {}) ⇒ Object
Streams a response from a CouchDB view to yield identifiers.
-
#record_rows(view, limit) ⇒ Enumerator
Return an enumerator that provides individual records from batched view requests.
-
#records(opts = {}) ⇒ Enumerator
Makes requests to a CouchDB view to yield documents.
Methods included from Krikri::Harvester
Methods included from SoftwareAgent
Constructor Details
#initialize(opts = {}) ⇒ CouchdbHarvester
Returns a new instance of CouchdbHarvester.
19 20 21 22 23 24 25 |
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 19 def initialize(opts = {}) super @opts = opts.fetch(:couchdb, view: '_all_docs') @opts[:view] ||= '_all_docs' @opts[:limit] ||= 10 @client = Analysand::Database.new(uri) end |
Instance Attribute Details
#client ⇒ Object
Returns the value of attribute client.
8 9 10 |
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 8 def client @client end |
Class Method Details
.expected_opts ⇒ Object
138 139 140 141 142 143 144 145 |
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 138 def self.expected_opts { key: :couchdb, opts: { view: { type: :string, required: false } } } end |
Instance Method Details
#count(opts = {}) ⇒ Fixnum
Return the total number of documents reported by a CouchDB view.
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 54 def count(opts = {}) view = opts[:view] || @opts[:view] # The count that we want is the total documents in the database minus # CouchDB design documents. Asking for the design documents will give us # the total count in addition to letting us determine the number of # design documents. v = client.view(view, include_docs: false, stream: false, startkey: '_design', endkey: '_design0') total = v.total_rows design_doc_count = v.keys.size total - design_doc_count end |
#get_record(identifier) ⇒ Object
Retrieves a specific document from CouchDB.
Uses Analysand::Database#get!, which raises an exception if the document cannot be found.
131 132 133 134 |
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 131 def get_record(identifier) doc = client.get!(CGI.escape(identifier)).body.to_json @record_class.build(mint_id(identifier), doc, 'application/json') end |
#record_ids(opts = {}) ⇒ Object
Streams a response from a CouchDB view to yield identifiers.
The following will only send requests to the endpoint until it has 1000 record ids:
record_ids.take(1000)
37 38 39 40 41 42 43 44 45 |
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 37 def record_ids(opts = {}) view = opts[:view] || @opts[:view] # The set of record ids is all of the record IDs in the database minus # the IDs of CouchDB design documents. view_opts = {include_docs: false, stream: true} client.view(view, view_opts).keys.lazy.select do |k| !k.start_with?('_design') end end |
#record_rows(view, limit) ⇒ Enumerator
Return an enumerator that provides individual records from batched view requests.
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 104 def record_rows(view, limit) en = Enumerator.new do |e| view_opts = {include_docs: true, stream: false, limit: limit} rows_retrieved = 0 total_rows = nil loop do v = client.view(view, view_opts) total_rows ||= v.total_rows rows_retrieved += v.rows.size v.rows.each do |row| next if row['id'].start_with?('_design') e.yield row end break if rows_retrieved == total_rows view_opts[:startkey] = v.rows.last['id'] + '0' end end en.lazy end |
#records(opts = {}) ⇒ Enumerator
Makes requests to a CouchDB view to yield documents.
The following will only send requests to the endpoint until it has 1000 records:
records.take(1000)
Batches of records are requested, in order to avoid using ‘Analysand::StreamingViewResponse`, and the CouchDB `startkey` parameter is used for greater efficiency than `skip` in locating the next page of records.
86 87 88 89 90 91 92 93 94 95 96 |
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 86 def records(opts = {}) view = opts[:view] || @opts[:view] limit = opts[:limit] || @opts[:limit] record_rows(view, limit).map do |row| @record_class.build( mint_id(row['doc']['_id']), row['doc'].to_json, 'application/json' ) end end |