cfn-monitor

Build Status

About

CloudWatch monitoring tool can query a cloudformation stack and return monitorable resources that can be placed into a config file. This config can then be used to generate a cloudformation stack to create and manage cloudwatch alarms.

It is packaged as a docker container base2/cfn-monitor and can be run by volume mounting in a local directory to access the config or by using within AWS CodePipeline.

Install Gem

gem install cfn_monitor

Commands

Commands:
  cfn_monitor --version, -v   # print the version
  cfn_monitor deploy          # Deploys gerenated cfn templates to S3 bucket
  cfn_monitor generate        # Generate monitoring cloudformation templates
  cfn_monitor help [COMMAND]  # Describe available commands or one specific command
  cfn_monitor query           # Queries a cloudformation stack for monitorable resources

Run With Docker

The docker image will manage the runtime environment and dependencies. You can pass in your AWS credentials and set the region and profile with environment variables.

Example

docker run -it --rm \
  -v $(pwd):/src \
  -v $HOME/.aws:/root/.aws \
  -e AWS_REGION=us-east-1 \
  -e AWS_PROFILE=default \
  base2/cfn_monitor cfn_monitor <command> [parameters]

Configuration files

There are 2 config files that can be utilised to configure your cloudwatch alarms.

  • alarms.yaml - Configure the resources you want to monitor
  • template.yaml - Create or override alarm templates

The bellow config structure allows for multiple monitoring stacks to be kept with a single code repository separated by directories parameterised as <application>.

.
├── _application_1
|   ├── alarms.yaml
|   └── template.yaml
└── _application_2
    ├── alarms.yaml
    └── template.yaml

Generate

The generate command takes you alarm configuration and turns it into cloudformation templates to be deployed into AWS.

Usage:
  cfn_monitor generate

Options:
  a, [--application=APPLICATION]  # application name

Description:
  Generates cloudformation templates from the alarm configuration and output to the output/ directory.

alarms.yml

This file is used to configure the AWS resources you want to monitor with CloudWatch.

source_bucket: [Name of S3 bucket where CloudFormation templates will be deployed]
source_region: [Region of source_bucket]

resources:
  [nested stack name].[resource name]: [template name]

Example:

source_bucket: source.example.com

resources:
  RDSStack.RDS: RDSInstance

Resources

Resources are referenced by the CloudFormation logical resource ID used to create them. Nested stacks are also referenced by their CloudFormation logical resource ID. See example above.

Target group configuration:

Target group alarms in CloudWatch require dimensions for both the target group and its associated load balancer. To configure a target group alarm provide the logical ID of the target group (including any stacks it's nested under) followed by "/", followed by the logical ID of the load balancer (also including any stacks it's nested under).

Example:

resources:
  LoadBalancerStack.WebDefTargetGroup/LoadBalancerStack.WebLoadBalancer: ApplicationELBTargetGroup

Custom Metrics

Custom metrics are configured with a similar syntax to resources. Use metrics instead of resources.

Example:

metrics:
  MyCustomMetric: MyCustomMetricTemplate

Endpoints

HTTP endpoint monitoring and alerting is enabled by configuring resources under endpoints. Each endpoint will create a cloudwatch event, scheduled to trigger the aws-lambda-http-check lambda function deployed with this stack. Alarms will be configured (based on the specified template) to alert on the cloudwatch metrics generated by the lambda function.

Example:

endpoints:
  http://www.base2services.com:
    template: HttpCheck
    statusCode: 200
    bodyRegex: 'DevOps'
endpoints:
  http://www.base2services.com:
    template: HttpCheck
    statusCode: 200
    bodyRegex: 'DevOps'
    payload: id_=123
    method: POST

Supported parameters:

Key Value Default
statusCode The expected response code 200
bodyRegex A regex expected in the response body Disabled
timeOut A timeout value for the endpoint monitoring 120 seconds
scheduleExpression A cron expression used to schedule the endpoint monitoring Every minute
environments A string or array of environment names. Monitoring will only be deployed for these environments (if specified) All environments

SSL certificate expiry date checking

To alert on the expiry date of an SSL certificate for a particular domain, add the following config:

ssl:
  https://www.base2services.com: Ssl

The Ssl template is scheduled to push a metric once every day.

DNS domain expiry date checking

To alert on the expiry date of an DNS domain, add the following config:

dns:
  base2services.com: Dns

The Dns template is scheduled to push a metric once every day.

Multiple templates

You can specify multiple templates for the resource by providing a list/array. You may want to do this if you want to deploy some custom alarms in addition to the default alarms for a resource.

Example:

resources:
  RDSStack.RDS: [ 'RDSInstance', 'MyRDSInstance' ]

or

resources:
  RDSStack.RDS:
    - RDSInstance
    - MyRDSInstance

Auto generate alarms config for resources

You can query an existing stack for monitorable resources using the query command. This will provide a list of resources in the correct config syntax, including the nested stacks and the default templates for those resources.

Example:

Usage:
  cfn_monitor query

Options:
  a, [--application=APPLICATION]  # application name
  s, [--stack=STACK]              # cfn stack name

Description:
  This will provide a list of resources in the correct config syntax,
  including the nested stacks and the default templates for those resources.

Make sure you query a prod sized stack so that all conditional resources are included. The output will list all monitorable resources found in the stack, the coverage your current alarms.yml config provides, and a list of any resources missing from your current alarms.yml config.

Templates

The "template" value you specify for a resource refers to either a default templates, or a custom/override template in your own templates.yml. This template can contain multiple alarms. The example below shows the default RDSInstance template, which has 2 alarms (FreeStorageSpaceCrit and FreeStorageSpaceTask). Using the RDSInstance template in this example will create 2 CloudWatch alarms for the RDS resource in the RDSStack nested stack.

Example: alarms.yml

resources:
  RDSStack.RDS: RDSInstance

Example: templates.yml

templates:
  RDSInstance: # AWS::RDS::DBInstance
    FreeStorageSpaceCrit:
      AlarmActions: crit
      Namespace: AWS/RDS
      MetricName: FreeStorageSpace
      ComparisonOperator: LessThanThreshold
      DimensionsName: DBInstanceIdentifier
      Statistic: Minimum
      Threshold: 50000000000
      Threshold.development: 10000000000
      EvaluationPeriods: 1
    FreeStorageSpaceTask:
      AlarmActions: task
      Namespace: AWS/RDS
      MetricName: FreeStorageSpace
      ComparisonOperator: LessThanThreshold
      DimensionsName: DBInstanceIdentifier
      Statistic: Minimum
      Threshold: 100000000000
      Threshold.development: 20000000000
      EvaluationPeriods: 1

Globally overriding a template

You can override a default template in your own templates.yml file if all instances of a particular resource require a non standard configuration.

Example:

templates:
  RDSInstance:
    FreeStorageSpaceCrit:
      Threshold: 80000000000

This configuration will be merged over the default RDSInstance template resulting in the following:

templates:
  RDSInstance:
    FreeStorageSpaceCrit:
      AlarmActions: crit
      Namespace: AWS/RDS
      MetricName: FreeStorageSpace
      ComparisonOperator: LessThanThreshold
      DimensionsName: DBInstanceIdentifier
      Statistic: Minimum
      Threshold: 80000000000
      Threshold.development: 10000000000
      EvaluationPeriods: 1
    FreeStorageSpaceTask:
      AlarmActions: task
      Namespace: AWS/RDS
      MetricName: FreeStorageSpace
      ComparisonOperator: LessThanThreshold
      DimensionsName: DBInstanceIdentifier
      Statistic: Minimum
      Threshold: 100000000000
      Threshold.development: 20000000000
      EvaluationPeriods: 1

Create a custom template

If the default template for your resource is completely inappropriate, you can create your own custom template in the monitoring/templates.yml file.

Example:

templates:
  MyRDSInstance:
    DatabaseConnections:
      AlarmActions: crit
      Namespace: AWS/RDS
      MetricName: DatabaseConnections
      ComparisonOperator: MoreThanThreshold
      DimensionsName: DBInstanceIdentifier
      Statistic: Average
      Threshold: 20
      EvaluationPeriods: 5

Inherit a template

If you have multiple instances of a particular resource and you want to adjust the configuration for only some of them, you can create your own custom template which inherits the configuration of a default template.

Example:

templates:
  MyRDSInstance:
    template: RDSInstance
    FreeStorageSpaceCrit:
      Threshold: 80000000000

The above example creates a new template MyRDSInstance which can now be used by one or many resources. The MyRDSInstance template inherits all of the alarms and configuration from RDSInstance, but sets Threshold to 80000000000 for the FreeStorageSpaceCrit alarm.

Environment type mappings

You can create environment type mappings if alarm configurations need to differ between different environment types. This may be useful in situations where development type environments are running different resource quantities or sizes.

Example:

templates:
  RDSInstance:
    FreeStorageSpaceCrit:
      Threshold: 40000000000
      Threshold.development: 20000000000
      Threshold.staging: 30000000000
      EvaluationPeriods: 5

The above example shows different Threshold values for EnvironmentType values of production (default), development or staging. Any value can be specified using the .envType syntax and the necessary mappings and EnvironmentType will be generated when rendered. The EvaluationPeriods value for development and staging type environments will be 5 in the above example as no .envType values where provided for this parameter.

Supported Parameters: Parameter | Mapping support --- | --- ActionsEnabled | true AlarmActions | false AlarmDescription | false ComparisonOperator | true Dimensions | false EvaluateLowSampleCountPercentile | false EvaluationPeriods | true ExtendedStatistic | false InsufficientDataActions | false MetricName | true Namespace | true OKActions | false Period | true Statistic | true Threshold | true TreatMissingData | true Unit | false

Template variables

The following variables can be used in templates:

Variable Key Variable Value
$name Metric/Resource Name (from alarms.yml)
$metric Metric Name (from alarms.yml)
$resource Resource Name (from alarms.yml)
$templateName Template Name (from templates.yml)
$alarmName Alarm Name (from templates.yml)

Example:

alarms.yml

metrics:
  Metric1: MyCustomMetric

templates.yml

templates:
  MyCustomMetric:
    ItemCountHigh:
      MetricName: ${metric}
      AlarmDescription: '#{templateName} #{alarmName} - #{name}'

Result:

templates:
  MyCustomMetric:
    ItemCountHigh:
      MetricName: Metric1
      AlarmDescription: 'MyCustomMetric ItemCountHigh - Metric1'

Alarm Actions

There are 3 classes of alarm actions: crit, warn and task.

Action Process
crit Alert on-call technician
warn Create alarm in pager service but do not alert on-call technician
task Create support ticket for investigation

An SNS topic is required per alarm action, these topics and their subscriptions are managed outside this stack

Deployment

The rendered CloudFormation templates should be deployed to [source_bucket]/cloudformation/monitoring/.

Usage:
  cfn_monitor deploy

Options:
  a, [--application=APPLICATION]  # application name

Description:
  Deploys gerenated cloudformation templates to the specified S3 source_bucket

Launch the Monitoring stack in the desired account with the following CloudFormation parameters:

Parameter Key Parameter Value
EnvironmentType production / development / custom env type
MonitoredStack The name of the stack you want monitored. EG prod
MonitoringDisabled true for disables alerts, false for enabled alerts
SnsTopicCrit SNS topic used by crit type alarms
SnsTopicTask SNS topic used by task type alarms
SnsTopicWarn SNS topic used by warn type alarms

Disabling Monitoring

It is possible to globally disable / snooze / downtime all alarms by setting the MonitoringDisabled CloudFormation parameter to true. This will disable alarm actions without removing removing them.

Disabling and excluding alarms

To disable or prevent creation of a specific alarm, specify either of the following parameters:

templates:
  MyAutoScalingGroup:
    template: AutoScalingGroup
    CPUUtilizationHighBase:
      CreateAlarm: false    # Don't create the alarm
      DisableAlarm: true    # Create the alarm but disable it