Bolts

Bolts contains is a group of commands that help debug EC2 instance and ECS containers quickly.

After I exhaust debugging an ECS service with CloudWatch Logs I usually take it to the the next step. I ssh into an instance with a running task or docker container and poke around to figure out the issue.

In order to find the instance with the service's docker container I click around on the ECS console website until I find the container instance's DNS name and then paste it to the terminal. The process is not complicated but it is tedious. For example, the typical process is:

Click on the cluster
Click on the service
Click on the tasks tab
Click on the one of the tasks
Click on the container instance
Highlight and copy the DNS name
Paste the DNS name into the terminal to build up the ssh ec2-user@[dnsname] command
Ssh into the machine
Find the docker container with "docker ps"
Run docker exec -ti [container_id] bash
Finally, debug the actual problem

By the time I get into the container, I need to remind my brain on what the original issue was. This tool automates that process so you do not waste your precious finger energy clicking on links and use it to focus on better things like fixing the actual issue.

Install

Locally on your machine.

gem install bolts-ssh

Set up your AWS credentials at ~/.aws/credentials and ~/.aws/config. This is the AWS standard way of setting up credentials.

Note that the gem is named bolts-ssh but the command is bolts.

Requirements

jq - a lightweight and flexible command-line JSON processor

If you are also using the exec and run commands, then you will need to ensure that jq is installed on all of your ECS container instances. If you are only using the bolts ssh command then you do not need the jq dependency.

Usage

bolts ssh

To ssh into the host or container instance where an ECS service called my-service is running, simply run:

$ bolts ssh my-service --cluster my-cluster
# now you are on the container instance
$ docker ps
$ curl -s http://localhost:51678/v1/meta | jq .

The my-service can possible run on multiple container instances. The bolts command chooses the first container instance that it finds. If you need to ssh into a specific container instance, use bolts ssh instead.

You can also use the instance id, container instance arn or task arn to ssh into the machine. Examples:

$ bolts ssh my-ecs-service --cluster my-cluster # cluster is required or ~/.bolts/settings.yml
$ bolts ssh i-067c5e3f026c1e801
$ bolts ssh 7fbc8c75-4675-4d39-a5a4-0395ff8cd474
$ bolts ssh 1ed12abd-645c-4a05-9acf-739b9d790170

bolts ecs-exec

bolts ecs-exec will hop one more level and get you all the way into a live container for a service. To quickly get yourself into a docker exec bash shell for a service:

$ bolts ecs-exec SERVICE bash
$ bolts ecs-exec SERVICE # same as above, defaults to bash shell

This ultimately runs the following command after it ssh into the container instance:

$ docker run -ti SERVICE_CONTAINER bash

Here are examples to show what is possible:

$ bolts ecs-exec my-service bash
# You're in the docker container now
$ ls # check out some files to make sure you're the right place
$ ps auxxx | grep puma # is the web process up?
$ env # are the environment variables properly set?
$ bundle exec rails c # start up a rails console to debug

You can also pass in bundle exec rails console if you want to get to that as quickly as possible.

$ bolts ecs-exec my-service 'bundle exec rails console'
# You're a rails console in the docker container now
> User.count

You must put commands with spaces in quotes.

You can also use the container instance id or instance id in place of the service name:

$ bolts ecs-exec 9f1dadc7-4f67-41da-abec-ec08810bfbc9 bash
$ i-006a097bb10643e20

bolts ecs-run

The bolts ecs-run command is similar to the bolts ecs-exec command except it'll run a brand new container with the same environment variables as the task associated with the service. This allows you to debug in a container with the exact environment variables as the running tasks/containers without affecting the live service. So this is safer since you will not be able to mess up a live container that is in service.

This also allows you to run one off commands like a rake task. Here's an example:

bolts ecs-run my-service bundle exec 'rake do:something'

The default command opens up a bash shell.

bolts ecs-run my-service # start a bash shell
bolts ecs-run my-service bash # same thing

Settings

A ~/.bolts/settings.yml file is support that maps services to clusters. This is very useful if you get tired of typing the --cluster option every time. Here is an example ~/.bolts/settings.yml:

service_cluster:
  default: my-default-cluster
  hi-web-prod: prod
  hi-clock-prod: prod
  hi-worker-prod: prod
  hi-web-stag: stag
  hi-clock-stag: stag
  hi-worker-stag: stag

This results in shorter commands:

bolts ssh hi-web-prod
bolts ssh hi-clock-prod
bolts ssh hi-worker-stag

Instead of what you normally would have to type:

bolts ssh hi-web-prod --cluster prod
bolts ssh hi-clock-prod --cluster prod
bolts ssh hi-worker-prod --cluster stag

Help and CLI Options

bolts help
bolts help ssh
bolts help ecs-exec
bolts help ecs-run

Internals

The process that I outline about click around is close to the logic that actually takes place in the tool but is slightly different. Here's an overview of what actually happens internally for those who are interested.

I thought it would be possible to map the container instance info from aws ecs describe-services but it is not possible. But we can map to the container instance DNS name starting from aws ecs list-tasks.

Steps:

list-tasks: list all the tasks for their task_arns (scoped to service). this is all tasks on the service. We already know the service name!
describe-tasks: Using the task_arns from list-tasks. This will provide the container instance scoped the service since list-tasks was scoped to service. Keep the task arn for step 7. Also describe-task-definition and capture env vars and image for step 8b.
describe-container-instances: Using container_instance_arn from step 2. This will provide the instance_id to ssh into.
ec2 describe-instances to get the dns name or IP address.
Copy over files with required info over to the server with scp.
ssh into the machine with IP address.
Use the ecs metadata and pass it the task_arn from step 2. This will provide the map to the container id.
Run docker command
- a) docker exec -ti CONTAINER_ID options[:command]
- b) docker run options[:run_options] IMAGE options[:command]

In order to pass info over from your local machine to the container instance a file is generated and copied in step 5. File contains:

Options all the way from the original cli call like command to run. This is in json form. A bash script is also copied.
Bash script gets the container id using the task_arn. It also will run the docker exec or run command.
So bash script does steps 7 and 8a or 8b.

Contributing

Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request