Did you know that Terraform has the ability to create resources using feature toggles with a single feature flag?
In my case, I didn’t realise I needed one until I destroyed my Jenkins / Amazon Elastic Compute Cloud (EC2) dev instance. Fortunately, I was able to rely on the automated snapshots created by the nightly Lambda function that is kicked off via CloudWatch.
The next hurdle was to restore the server back to its previous state using snapshots, as well as in an automated, repeatable way with Terraform. This would require a more robust Terraform script to assist with future restorations, one that simply required typing “Terraform Apply”.
Chris Pisano’s Building Feature Toggles Into Terraform article provided me with the initial inspiration to use a feature flag. This would help in determining if an EC2 instance will be restored or a new one will be created.
Here are the steps I used to achieve this:
Restoring from a snapshot
Before I can create a feature toggle, I needed to code the restore. I broke the process down into the following steps:
- Find the Amazon Elastic Block Store (EBS) snapshots for restore.
- Create an Amazon Machine Image (AMI). This may seem a bit odd but you will see why.
- Restore the instance using the new AMI.
- Associate the new instance with the existing elastic IP.
Find the EBS snapshot
Using the data aws_ebs_snapshot data source, you can find the snapshot to restore from using certain filters, such as name and volume size:
data "aws_ebs_snapshot" "root" {
most_recent = true
owners = ["self"]
filter {
name = "volume-size"
values = ["64"]
}
filter {
name = "tag:Name"
values = ["${var.snapshot_name}"]
}
}
In this example, I am filtering on the size of my volume (root) and the name of my snapshot.
Note that I set up a variable var.snapshot_name in order to not hardcode it in. This would be need to be done for all snapshots that are to be restored.
Create an AMI
An AMI needs to be created to launch the new restored server.
This may seem like an unnecessary step on the surface. However, if I just attached the snapshots under the aws_instance, it would only show up as additional volumes and not use the root snapshot to restore the previous config and state in its entirety.
I can overcome this shortcoming by creating an AMI from the snapshot:
resource "aws_ami" "restore" {
count = "${var.restore}"
name = "from-${data.aws_ebs_snapshot.root.snapshot_id}"
virtualization_type = "hvm"
root_device_name = "/dev/xvda"
ebs_block_device {
device_name = "/dev/xvda"
snapshot_id = "${data.aws_ebs_snapshot.root.id}"
volume_size = "${data.aws_ebs_snapshot.root.volume_size}"
}
}
Using the above code, the AMI is created with the snapshot defined as root. As for the purpose of the count parameter, that’s explained in detailed further down in the article.
The restored instance
This will help define the instance to be restored using the newly created restore AMI:
resource "aws_instance" "restore" {
count = "${var.restore}"
ami = "${aws_ami.restore.id}"
instance_type = "t2.medium"
Associate the elastic IP
We will use the aws_eip_association resource to not overwrite the existing elastic IP:
resource "aws_eip_association" "restore" {
count = "${var.restore}"
instance_id = "${aws_instance.restore.id}"
allocation_id = "${aws_eip.this.id}"
}
This also needs to be added to the original resource in the same way:
resource "aws_eip_association" "this" {
count = "${1- var.restore}"
instance_id = "${aws_instance.this.id}"
allocation_id = "${aws_eip.this.id}"
}
The Terraform Count Parameter
The above examples have a count parameter set on each of the restore resources with a variable of var.restore. The purpose of this count parameter is to inform Terraform of how many resources it should create, with zero being an acceptable value.
No resources are created when the value is set to zero. Terraform will convert 1 and 0 to a Boolean true or false, which we can then use as a feature of:
# This is just a pseudo code. It will not work in Terraform
if ${var.restore} {
resource "aws_ami" "restore",
resource "aws_instance" "restore",
resource "aws_eip_association" "restore"
}
else {
resource "aws_instance" "this",
}
Note that a restored resource now needs to be created, and not a brand-new Jenkins server. Terraform’s simple math in interpolations can be used to do this, simply by adding a -1 to the count to create a -1 + 1 = 0 (false) equation:
variable "restore" {
description = "Used to restore the jenkins server"
value = true
}
variable "snapshot_name" {
description = "snapshot name"
}
resource "aws_instance" "restore" {
count = "${var.restore}"
......
resource "aws_instance" "this" {
count = "${1- var.restore}"
....
Creating a Terraform toggle
The var.restore variable is now set up to control the creation of either the restored or brand new Jenkins server. Now a toggle is needed to turn this feature on or off when needed.
If there is a *.auto.tfvars file (I named it restore.auto.tfvars but you can call it whatever you want) in the current directory, it can be picked up by Terraform. It is then read at runtime, passing any variables and values required:
# restore.auto.tfvars
# restore key controls restore toggle
# Snapshot_name controls what snapshot to restore.
restore = false
snapshot_name = "<insert snapshot name here>"
My variables.tf now only passes on the variable to the restore.auto.tfvars, which now controls the restore toggle:
variable "restore"{
description = “Used to restore the jenkins server"
}
variable "snapshot_name" {
description = “snapshot name
}
Further reading
I hope that the above tips aid you in quickly and easily recovering an EC2 server in Terraform. Feel free to make further additions and changes to the above approach to find the right one for your Amazon-powered environment.
Big thanks to Yevgeniy Brikman from Gruntwork for his handy Terraform Tips & Tricks: Loops, Loops, If-Statements, and Gotchas article, as well as Chris Pisano for his Building Feature Toggles Into Terraform write-up.