We strive to keep all aspects of our infrastructure managed by code instead of people; all of our servers are managed by Chef, Jenkins builds and tests our code, and so on. Additionally, we work to verify that all these things are behaving as expected - though Chef configures a service to run on a server, our monitoring server will verify that that service is, in fact, running. We wanted the same verification to make sure our AWS Security Groups are configured correctly and no unexpected changes have occurred.
This verification process is managed by the same service that watches all our other services for problems: Nagios. We check in a copy of our Security Group configuration to git and ask Nagios to verify the state of the live Security Groups against the checked-in version. If there is any discrepancy, Nagios will page us and alert us to the rule(s) that has/have been added or removed. The extra content field in the alert contains an easy-to-read statement about what changed. For example, "ec2 has an extra rule in the web security group: tcp/80-80 is allowed from 0.0.0.0."
Beyond allowing us to use our existing code review processes for peer review, keeping the configuration in git provides us with an added benefit: we can add comments to the file with more detail about the reason behind a specific rule. No more will you have to remember that port 5666 is for NRPE; you can write a comment to that effect right there in the configuration file. While this isn't too big a deal for standard services, it's a huge benefit when adding custom or obscure services.
Since the AWS API will give you a dump of your Security Groups in JSON, it seemed like the easiest format to use for the configuration file in git. Unfortunately, JSON does not allow for comments (though it is lenient in differences in whitespace). So that we can have comments, the Nagios check will strip any line that matches
/^ *#/ (any line that begins with a
# character) before running it through the JSON parser.
To get started using this check,
- Grab a copy of
check_aws_secgroupsfrom our Ops repo on github. The check requires Python and the Boto library to talk to AWS. Run it once with
-hto see the various flags to configure file locations.
- Set up an automated git checkout on your Nagios server. You'll need to create an ssh key and a user on github that has read-only access.
- Configure it by putting your AWS credentials in
- Get an initial dump of your Security Groups from EC2:
check_aws_secgroups --dump-ec2 > /tmp/serialized_rules.json. You'll probably want to rearrange and reformat the file to maximize readability, as well as add initial comments before checking this into git.
- Configure the Nagios check appropriately..
Getting an automated git checkout was a little harder than expected; there are a few variables that make it easier.
Here is the crontab entry:
*/10 * * * * export GIT_SSH=/var/lib/git_checkout/.ssh/git_ssh_wrapper.sh; export GIT_DIR=/var/lib/git_checkout/chef-repo/.git; export GIT_WORK_TREE=/var/lib/git_checkout/chef-repo; git fetch -q && git fetch --tags -q && git reset -q --hard origin/master
The git_ssh_wrapper.sh makes sure ssh won't balk at github's host key and gives the path to your read-only user's ssh key.
/usr/bin/env ssh -o "StrictHostKeyChecking=no" -i "/var/lib/git_checkout/.ssh/id_rsa" "$@"
Enjoy your deeper sleep, as you rest assured that your Security Groups really are what you know they should be!