Class: Krikri::Harvesters::CouchdbHarvester

Inherits:
Object
  • Object
show all
Includes:
Krikri::Harvester
Defined in:
lib/krikri/harvesters/couchdb_harvester.rb

Overview

A harvester implementation for CouchDB

Constant Summary

Constants included from Krikri::Harvester

Krikri::Harvester::Registry

Instance Attribute Summary collapse

Attributes included from Krikri::Harvester

#name, #uri

Attributes included from SoftwareAgent

#entity_behavior

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Krikri::Harvester

#run

Methods included from SoftwareAgent

#agent_name, #run

Constructor Details

#initialize(opts = {}) ⇒ CouchdbHarvester

Returns a new instance of CouchdbHarvester.

Parameters:

  • opts (Hash) (defaults to: {})

    options to pass through to client requests. If => :view is not specified, it defaults to using the CouchDB ‘_all_docs` view.

See Also:



19
20
21
22
23
24
25
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 19

def initialize(opts = {})
  super
  @opts = opts.fetch(:couchdb, view: '_all_docs')
  @opts[:view] ||= '_all_docs'
  @opts[:limit] ||= 10
  @client = Analysand::Database.new(uri)
end

Instance Attribute Details

#clientObject

Returns the value of attribute client.



8
9
10
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 8

def client
  @client
end

Class Method Details

.expected_optsObject

See Also:

  • Krikri::Harvester::expected_opts


138
139
140
141
142
143
144
145
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 138

def self.expected_opts
  {
    key: :couchdb,
    opts: {
      view: { type: :string, required: false }
    }
  }
end

Instance Method Details

#count(opts = {}) ⇒ Fixnum

Return the total number of documents reported by a CouchDB view.

Parameters:

  • opts (Hash) (defaults to: {})

    Analysand::Database#view options

    • view: database view name

Returns:

  • (Fixnum)


54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 54

def count(opts = {})
  view = opts[:view] || @opts[:view]
  # The count that we want is the total documents in the database minus
  # CouchDB design documents.  Asking for the design documents will give us
  # the total count in addition to letting us determine the number of
  # design documents.
  v = client.view(view,
                  include_docs: false,
                  stream: false,
                  startkey: '_design',
                  endkey: '_design0')
  total = v.total_rows
  design_doc_count = v.keys.size
  total - design_doc_count
end

#get_record(identifier) ⇒ Object

Retrieves a specific document from CouchDB.

Uses Analysand::Database#get!, which raises an exception if the document cannot be found.

See Also:

  • Analysand::Database#get!


131
132
133
134
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 131

def get_record(identifier)
  doc = client.get!(CGI.escape(identifier)).body.to_json
  @record_class.build(mint_id(identifier), doc, 'application/json')
end

#record_ids(opts = {}) ⇒ Object

Streams a response from a CouchDB view to yield identifiers.

The following will only send requests to the endpoint until it has 1000 record ids:

record_ids.take(1000)

See Also:

  • Analysand::Viewing
  • Analysand::StreamingViewResponse


37
38
39
40
41
42
43
44
45
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 37

def record_ids(opts = {})
  view = opts[:view] || @opts[:view]
  # The set of record ids is all of the record IDs in the database minus
  # the IDs of CouchDB design documents.
  view_opts = {include_docs: false, stream: true}
  client.view(view, view_opts).keys.lazy.select do |k|
    !k.start_with?('_design')
  end
end

#record_rows(view, limit) ⇒ Enumerator

Return an enumerator that provides individual records from batched view requests.

Returns:

  • (Enumerator)

See Also:



104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 104

def record_rows(view, limit)
  en = Enumerator.new do |e|
    view_opts = {include_docs: true, stream: false, limit: limit}
    rows_retrieved = 0
    total_rows = nil
    loop do
      v = client.view(view, view_opts)
      total_rows ||= v.total_rows
      rows_retrieved += v.rows.size
      v.rows.each do |row|
        next if row['id'].start_with?('_design')
        e.yield row
      end
      break if rows_retrieved == total_rows
      view_opts[:startkey] = v.rows.last['id'] + '0'
    end
  end
  en.lazy
end

#records(opts = {}) ⇒ Enumerator

Makes requests to a CouchDB view to yield documents.

The following will only send requests to the endpoint until it has 1000 records:

records.take(1000)

Batches of records are requested, in order to avoid using ‘Analysand::StreamingViewResponse`, and the CouchDB `startkey` parameter is used for greater efficiency than `skip` in locating the next page of records.

Returns:

  • (Enumerator)

See Also:



86
87
88
89
90
91
92
93
94
95
96
# File 'lib/krikri/harvesters/couchdb_harvester.rb', line 86

def records(opts = {})
  view = opts[:view] || @opts[:view]
  limit = opts[:limit] || @opts[:limit]
  record_rows(view, limit).map do |row|
    @record_class.build(
      mint_id(row['doc']['_id']),
      row['doc'].to_json,
      'application/json'
    )
  end
end