DPN Synchronization

An application for synchronizing DPN registry data from remote nodes, using the Sidekiq background jobs framework.


  • the DPN nodes are defined in config/settings.yml
    • the settings are handled by DPN::Workers
    • a set of DPN nodes is loaded by DPN::Workers.nodes
  • a set of DPN nodes is modeled by the DPN::Workers::Nodes class
    • it requires a local_namespace to identify a local_node
    • it makes an important distinction between a local_node and remote_nodes
    • it has methods to sync data from remote_nodes into the local_node
    • the DPN::Workers::SyncWorker is a Sidekiq::Worker
    • subclasses of DPN::Workers::Sync implement #sync
      • they use DPN::Workers::JobData for tracking success
  • a node is modeled by the DPN::Workers::Node class


Getting Started

  git clone
  cd dpn-sync
  bundle install
  # Start the Sidekiq daemon to run background jobs; some
  # jobs are managed by sidekiq-cron, see config/schedule.yml
  bundle exec rake sidekiq:service:start
  # Start the Sidekiq dashboard at http://localhost:9292/
  bundle exec rackup
  # Explore the dashboard web pages and then
  # Cnt-C to stop and then
  bundle exec rake sidekiq:service:stop


The config gem provides several layers of specificity for settings, see

Configuring Nodes

The most important values in Settings are the nodes definitions and the local_namespace that should belong to one of the nodes. These values should be derived from the Node table of the dpn-server project. From the rails c console of the dpn-server project, the nodes data can be dumped using:

require 'yaml'
yml = do |n|
    namespace: n.namespace,
    api_root: n.api_root,
    auth_credential: n.auth_credential
puts yml

Note that the auth_credential values are private and should be kept secret.

The node information can be retrieved from the HTTP-REST-API. The response will include many details, including those required, but not the auth_credential values. For example, when the dpn-server cluster is running locally, it can be retrieved using:

curl -k -H "Authorization: Token token=aptrust_token" -L

An abridged response looks like:

  "count": 5,
  "next": null,
  "previous": null,
  "results": [{
    "name": "APTrust",
    "namespace": "aptrust",
    "api_root": ""
  }, {
    "name": "Chronopolis",
    "namespace": "chron",
    "api_root": ""
  }, {
    "name": "Hathi Trust",
    "namespace": "hathi",
    "api_root": ""
  }, {
    "name": "Stanford Digital Repository",
    "namespace": "sdr",
    "api_root": ""
  }, {
    "name": "Texas Digital Repository",
    "namespace": "tdr",
    "api_root": ""

Configuring Test Cluster

When running in development, the dpn-server project can run a test cluster and the nodes settings can be set to work with that cluster; the default values in config/settings.yml should work with this cluster. See

Environment Variables

  • Environment variables can be set in various places, with the following order of importance:
    • On deployed apps, running under Apache/Passenger:
    • see /etc/httpd/conf.d/z*
    • The content of the config files is managed by puppet
    • Command line values, e.g. RACK_ENV=production bundle exec rackup


Capistrano is configured to run all the deployments. See cap -T for all the options. There are private configuration files in the DLSS shared-configs. The following files should be in the shared_configs, in a branch like dpn-*-sync. The generic settings.yml should contain config parameters that are independent of the deployment {environment}.yml (like development.yml or production.yml), whereas the settings/{environment}.yml should contain nodes or other details that are specific to the deployment network.


Capistrano can start and stop the Sidekiq service. The tasks include:

cap sidekiq:quiet                  # Quiet sidekiq (stop processing new tasks)
cap sidekiq:respawn                # Respawn missing sidekiq processes
cap sidekiq:restart                # Restart sidekiq
cap sidekiq:rolling_restart        # Rolling-restart sidekiq
cap sidekiq:start                  # Start sidekiq
cap sidekiq:stop                   # Stop sidekiq


There are rake tasks for starting dpn-sync jobs and inspecting the Sidekiq API. All the tasks can be listed using bundle exec rake -T, e.g.

rake dpn:sync:bags                  # DPN - queue a job to fetch bag meta-data from remote nodes
rake dpn:sync:members               # DPN - queue a job to fetch member meta-data from remote nodes
rake dpn:sync:nodes                 # DPN - queue a job to fetch node meta-data from remote nodes
rake dpn:sync:replications          # DPN - queue a job to fetch replication request meta-data from remote nodes
rake sidekiq:default_queue:clear    # Sidekiq - clear the default queue
rake sidekiq:default_queue:entries  # Sidekiq - default queue entries
rake sidekiq:stats:all              # Sidekiq - statistics - all
rake sidekiq:stats:history[days]    # Sidekiq - statistics - history[days]
rake sidekiq:stats:reset            # Sidekiq - statistics - reset


  • To get a console: bundle exec rackup -d

    • if anything goes wrong, look at log/rack_debug.log
    • if the dpn-server cluster is running, the following works:
    #=> [true, true, true, true, true]
  • To see and test jobs:

    • bundle exec sidekiq -C ./config/sidekiq.yml -r ./config/initializers/sidekiq.rb
    • in another shell, run bundle exec rackup
    • use a browser to open http://localhost:9292
    • use the /test page to check messages are processed by a worker
    • use the /sidekiq dashboard