braavos - Backup/Restore Tool
Introduction
This document covers the implementation of a backup/restore utility named braavos and deployed on Cassandra and ElasticSearch servers. It implements:
-
the management of backup data on S3
-
Backup operations
-
Restore operations
-
Pruning executions
-
-
full backup strategies (IMPLEMENTED)
-
restore capabilities
-
rewind to previous state
-
restore fallen node (IMPLEMENTED)
-
restore from different cluster (IMPLEMENTED)
-
-
quick replacement strategy (using ebs volume)
-
secondary backup strategy (google drive)
Versions Targeted:
-
Cassandra: 1.2
-
ElasticSearch: 0.90
Terminology
Service : Cassandra(cass) or ElasticSearch(es)
Node : A running instance of a service.
Cluster : A named cluster of nodes for a service.
Design
Each service’s backups are maintained independently. They may or may not run on independent or shared hardware. Nodes are numbered and linked to their aws instance id. A backup operation will validate the cluster name, environment and node name before proceeding with any mutation.
Cluster Layout
cluster.json
{
"environment": "production",
"name": "place_directory",
"nodes": {
"1": {"instance_id": "i-abcdefed" },
"2": {"instance_id": "i-abcdefec" },
"3": {"instance_id": "i-abcdefeb" }
}
}
Data Layout
General
/backup_bucket/name/environment/service/
/backup_bucket/name/environment/service/cluster.json
/backup_bucket/name/environment/service/full/datetime/node/
/backup_bucket/name/environment/service/full/datetime/node/_COMPLETED
Cassandra
/cassandra/data/node/keyspace/table/place_directory_production-Places-ic-218008.tgz ...
/cassandra/full/datetime/node/contents.json
/cassandra/full/datetime/node/system.tar.gz
/cassandra/full/datetime/node/schema.cql
/cassandra/full/datetime/node/ring.txt
ElasticSearch
/elasticsearch/full/datetime/node/cluster_state.local.json
/elasticsearch/full/datetime/node/cluster_state.global.json
/elasticsearch/full/datetime/node/index_name.restore.sh
/elasticsearch/full/datetime/node/index_name.tar.gz
Braavos Config
cassandra.yml
name: place_directory
environment: production
service: cassandra
bucket_name: backup_bucket
data_loc: /var/lib/cassandra/data
sync_loc: /mnt/cassandra_snapshots/latest
Operations
Initiate a full backup: (IMPLEMENTED)
braavos -c cassandra.yml backup full
List available backups: (IMPLEMENTED)
braavos -c cassandra.yml show
Verify backup:
braavos -c cassandra.yml verify backup_loc
Restore backup: (IMPLEMENTED)
braavos -c cassandra.yml restore backup_loc restore_loc
Restore across clusters: (IMPLEMENTED)
braavos -c cassandra.yml restore --cluster 'name:environment:node_id' backup_loc restore_loc
Example: ./bin/braavos -c config/braavos.yml restore --cluster 'place_directory:production:1' full/20140321 /tmp/restore
Prune backups:
braavos -c cassandra.yml prune
Sync latest (copy to ebs):
braavos sync -c cassandra.yml
Mentions
braavos uses and depends on the following awesome components:
-
GNU Parallel - The Command-Line Power Tool: O. Tange (2011)
-
GNU coreutils - The GNU Core Utils, including timeout.
-
s3cmd - Command Line S3 Client. Please use version 1.5.0-beta1 or greater.
-
Ruby - A Programmer’s Best Friend (including ERB templating)