MTR Monitor

Build Status

In December 2017, Hetzner, our hosting provider for the Build Platform, had a major network incident that lasted for almost a whole week. Our users were rightly frustrated.

To prevent and monitor these situation in the future, we have set up a transatlantic monitoring system based on MTR reports and Curl-ing important vendors for our platform such are GitHub and DockerHub. This system should report any issues in the network between Germany(Hetzner) and US(GitHub, DockerHub).

This project is part of the effort to have a readily available MTR reports before, during and after incidents, that we can send to Hetzner.

The MTR monitor is an application that generates MTR reports every 5 minutes and uploads them to an S3 Bucket. It is available as a standalone Docker container, and as a Ruby gem that can be injected into other Ruby applications.

Currently, we have the following routes covered:

  • Germany(Hetzner) -> AWS US East 1 (part of Job Runner)
  • Germany(Hetzner) -> AWS US West 1 (part of Job Runner)
  • Germany(Hetzner) -> AWS US West 2 (part of Job Runner)
  • Germany(Hetzner) -> GitHub (part of Job Runner)
  • Germany(Hetzner) -> DockerHub (part of Job Runner)
  • Germany(Hetzner) -> Stripe (part of Job Runner)
  • Germany(Hetzner) -> SemaphoreCI (part of Job Runner)
  • AWS US East 1 -> Builder sb1 in Hetzner (standalone AWS instance with Docker container)
  • AWS US West 1 -> Builder sb1 in Hetzner (standalone AWS instance with Docker container)
  • AWS US West 2 -> Builder sb1 in Hetzner (standalone AWS instance with Docker container)

Dashboards for the MTR monitor can be found on the Platform — Network dashboard on Grafana.

The US based MTR monitors have the following DNS addresses:

  • mtr-monitor.us-east-1.semaphoreci.com
  • mtr-monitor.us-west-1.semaphoreci.com
  • mtr-monitor.us-west-2.semaphoreci.com

To SSH into the, run ssh ubuntu@<address>

Location of the generated MTR reports

The MTR monitor generate and stores MTR reports both on the local machine, and uploads them to S3.

Local reports on the machine are located in the /var/log/mtr directory, and the following structure:

/var/log/mtr/<name>-<YYYY-DD-MM>-<host-ip-address>-<HH-MM>.log

For example, if you call your report hetzner-to-us-east-1 and run it at 2017-12-18 12:33:06, the log will be generated in:

/var/log/mtr/hetzner-to-us-east-1-2017-12-18-142-21-43-11-12-33.log

On S3, the path will follow the same convention, but will use a nested directory structure:

s3://<bucket-name>/<name>/<YYYY-DD-MM>/<host-ip-address>/<HH-MM>.log
s3://<bucket-name>/hetzner-to-us-east-1/2017-12-18/142-21-43-11/12-33.log

Report Name

The name of the report is used to group reports with the same purpose on S3 and on the local file system.

We use the following naming convention:

<from>-to-<destination>

Examples:

hetzner-to-github
us-east-1-to-hetzner-sb1
hetzner-to-us-west-2

Using MTR Monitor as a gem

The MTR monitor can be used as a gem and injected into existing Ruby applications. Currently, we inject the MTR monitor into Job Runner.

First, add the mtr_monitor gem to your Gemfile:

gem 'mtr_monitor'

Secondly, use the report class to generate a report:

name   = "google"
domain = "google.com"

s3_bucket             = "my-private-bucket-name" # change this
aws_access_key_id     = "<KEY>"
aws_secret_access_key = "<KEY>"

report = MtrMonitor::Report.new(name,
                                domain,
                                s3_bucket,
                                aws_access_key_id,
                                aws_secret_access_key)

report.generate

This above snippet will :

  • generate an MTR report on your local system under the /var/log/mtr directory
  • upload the report to the provided S3 bucket
  • submit metrics via Watchman and generate a metric "pulse"

If you want to generate reports continuously, create a CRON task that will call the above code. To monitor if the CRON task is running as expected, you should set up an alert on Grafana based on the "pulse" metric.

The pulse metric has the format network.mtr.pulse and is tagged with the hostname of the server where the MTR monitor is running and with the name of the metric.

MTR hops are also submitted to Grafana. Based on these metrics you can observe the packet loss, avg, best, and worst latency on the network. For more information read the code in lib/mtr_monitor/metrics.rb.

Using MTR Monitor as a standalone Docker container

The MTR monitor can be used as a standalone Docker container. This is our current approach for monitors that are hitting Germany from the United States.

To run a standalone MTR monitor, run the following command:

docker run --name mtr-monitor -d -v /var/log/mtr:/var/log/mtr -e NAME=<> -e DOMAIN=<> -e MTR_OPTIONS=<> -e S3_BUCKET=<> -e AWS_ACCESS_KEY_ID=<> -e AWS_SECRET_ACCESS_KEY=<> -e SLEEP_TIME=<> renderedtext/mtr_monitor

By default, the containers running on us-east-1, us-west-1, and us-west-2 are automatically deployed on every merge into master in for this repository.

The new container on the machine will trigger a MTR report generation every 5 minutes. Every time a Report is generated the following is executed:

  • a new MTR report is generate on your local system under the /var/log/mtr directory
  • the report is uploaded to the provided S3 bucket
  • metrics are submitted via Watchman and a pulse is generated
  • the MTR cleaner is uninitiated that cleans all reports from the local system that are older then 2 weeks

To monitor if the CRON task is running as expected, you should set up an alert on Grafana based on the "pulse" metric.

The pulse metric has the format network.mtr.pulse and is tagged with the hostname of the server where the MTR monitor is running and with the name of the metric.

MTR hops are also submitted to Grafana. Based on these metrics you can observe the packet loss, avg, best, and worst latency on the network. For more information read the code in lib/mtr_monitor/metrics.rb.

Setting up a new EC2 machine for a MTR monitor

  1. Buy a new EC2 machine on AWS. Choose, a t2-nano instance type with Ubuntu 14.04 operating system.

  2. SSH into the machine with the newly generated SSH keypair.

  3. Add RT developers to the authorized keys file. For a list of public keys, refer to s3://renderedtext-secrets/stg1-semaphore/authorized-keys.

  4. Install docker. Run curl https://get.docker.com | curl.

  5. Add the ubuntu user to docker group. sudo usermod -aG docker ubuntu

  6. Re-login into the SSH session.

  7. Pull and Run the MTR monitor:

docker run --name mtr-monitor -d -v /var/log/mtr:/var/log/mtr -e NAME=<> -e DOMAIN=<> -e MTR_OPTIONS=<> -e S3_BUCKET=<> -e AWS_ACCESS_KEY_ID=<> -e AWS_SECRET_ACCESS_KEY=<> -e SLEEP_TIME=<> renderedtext/mtr_monitor

If you want to keep this machine permanently, add it to the list of continuously deployed servers.

Continuously deploying MTR monitor to a EC2 machine

TODO @bmarkons

Set up Alerts and Monitoring for a MTR monitor

TODO @bmarkons