braavos - Backup/Restore Tool

Introduction

This document covers the implementation of a backup/restore utility named braavos and deployed on Cassandra and ElasticSearch servers. It implements:

  • the management of backup data on S3

    • Backup operations

    • Restore operations

    • Pruning executions

  • full backup strategies (IMPLEMENTED)

  • restore capabilities

    • rewind to previous state

    • restore fallen node (IMPLEMENTED)

    • restore from different cluster (IMPLEMENTED)

  • quick replacement strategy (using ebs volume)

  • secondary backup strategy (google drive)

Versions Targeted:

  • Cassandra: 1.2

  • ElasticSearch: 0.90

Terminology

Service : Cassandra(cass) or ElasticSearch(es)

Node : A running instance of a service.

Cluster : A named cluster of nodes for a service.

Design

Each service’s backups are maintained independently. They may or may not run on independent or shared hardware. Nodes are numbered and linked to their aws instance id. A backup operation will validate the cluster name, environment and node name before proceeding with any mutation.

Cluster Layout

cluster.json

{
  "environment": "production",
  "name": "place_directory",
  "nodes": {
    "1": {"instance_id": "i-abcdefed" },
    "2": {"instance_id": "i-abcdefec" },
    "3": {"instance_id": "i-abcdefeb" }
  }
}

Data Layout

General

/backup_bucket/name/environment/service/
/backup_bucket/name/environment/service/cluster.json
/backup_bucket/name/environment/service/full/datetime/node/
/backup_bucket/name/environment/service/full/datetime/node/_COMPLETED

Cassandra

/cassandra/data/node/keyspace/table/place_directory_production-Places-ic-218008.tgz ...
/cassandra/full/datetime/node/contents.json
/cassandra/full/datetime/node/system.tar.gz
/cassandra/full/datetime/node/schema.cql
/cassandra/full/datetime/node/ring.txt

ElasticSearch

/elasticsearch/full/datetime/node/cluster_state.local.json
/elasticsearch/full/datetime/node/cluster_state.global.json
/elasticsearch/full/datetime/node/index_name.restore.sh
/elasticsearch/full/datetime/node/index_name.tar.gz

Braavos Config

cassandra.yml

name: place_directory
environment: production
service: cassandra
bucket_name: backup_bucket
data_loc: /var/lib/cassandra/data
sync_loc: /mnt/cassandra_snapshots/latest

Operations

Initiate a full backup: (IMPLEMENTED)

braavos -c cassandra.yml backup full

List available backups: (IMPLEMENTED)

braavos -c cassandra.yml show

Verify backup:

braavos -c cassandra.yml verify backup_loc

Restore backup: (IMPLEMENTED)

braavos -c cassandra.yml restore backup_loc restore_loc

Restore across clusters: (IMPLEMENTED)

braavos -c cassandra.yml restore --cluster 'name:environment:node_id' backup_loc restore_loc
Example: ./bin/braavos -c config/braavos.yml restore --cluster 'place_directory:production:1' full/20140321 /tmp/restore

Prune backups:

braavos -c cassandra.yml prune

Sync latest (copy to ebs):

braavos sync -c cassandra.yml

Mentions

braavos uses and depends on the following awesome components:

  • GNU Parallel - The Command-Line Power Tool: O. Tange (2011)

  • GNU coreutils - The GNU Core Utils, including timeout.

  • s3cmd - Command Line S3 Client. Please use version 1.5.0-beta1 or greater.

  • Ruby - A Programmer’s Best Friend (including ERB templating)