Backing up windows servers in AWS

We have windows server in AWS and the task is to configure backup. You can use snapshots, but then there will be a problem with data integrity. I also want to keep weekly and monthly snapshots, but the lifecycle in snapshots does not offer this. The new AWS Backup service also does not know how to do complete snapshots yet, or I have not found how. Well, I want all this to work as much as possible without my participation.

To achieve the task we need

  1. Windows Server 2008 R2 or later running AWS
  2. SSM Agent version or later
  3. AWS Tools for Windows PowerShell or later
  4. AWS System Manager
  5. Iam
  6. SNS
  7. Lambda

First we need a role for the server. The role should enable AWS SSM and the creation of EBS snapshots.

Go to IAM → Policies → Create policy.
Go to the JSON tab and insert

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "ec2:CreateTags", "Resource": "arn:aws:ec2:*::snapshot/*" }, { "Effect": "Allow", "Action": [ "ec2:DescribeInstances", "ec2:CreateSnapshot" ], "Resource": "*" } ] } 

We press Review policy in the Name we write something like VssSnapshotPolicy. Save

Now create a role.
IAM → Roles → Create Role

Choose AWS Service → EC2 and go to Permissions.

Here we add AmazonSSMManagedInstanceCore for SMM and our policy which we created earlier VssSnapshotPolicy. If desired, assign a tag for our role and give it a name, say VssSnapshotRole.

Then we go and assign this role to the desired servers.

Everything ssm can now “manage” these servers.

Now we need to put AWSVssComponents on the server. To do this, select Run command and click Run command, look for AWS-ConfigureAWSPackage.

In the Command parameters, select Install, Name - AwsVssComponents, the latest version.
In Target, we select the systems that we will backup.

Click RUN.

After finishing, we can make backup from the SSM console.

Select Run command, look for AWSEC2-CreateVssSnapshot. We install our servers in Target. Select options as Exclude Boot Volume, Copy Only and No Writers.

Click RUN. We have to create snapshots.

For backup notifications, create an SNS Topic. And subscribe to it. I am using email notification.

We create a policy that allows sending messages in our turn

 { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": "sns:Publish", "Resource": "arn:aws:sns:ap-northeast-1:Account ID:Topic Name" } ] } 

And create a role with this policy.

To automate the process, we will use the SSM maintenance window.

Click Create maintenance window. Fill in the Name, fill in the Schedule to whatever you like.

We go into the created maintenance window and add the Register RUN command task. Fill in the parameters. In Tag I write the type of backup (TAG Key = SnapshotType, Value =). I have three possible parameters: Day, Week, Month, and accordingly three maintenance windows. Set Enable SNS notifications and indicate our role for sns and topic.

All snapshots will now be created on a schedule.

And after a while we will have too many snapshots - they need to be cleaned. To do this, we will use another AWS service - Lambda.

First, create a role that can read and delete snapshots.

To do this, we create a policy in IAM

 { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "logs:DeleteSubscriptionFilter", "ec2:DeleteSnapshot", "ec2:DescribeSnapshots", "logs:DeleteLogStream", "logs:CreateExportTask", "logs:DeleteResourcePolicy", "logs:CreateLogStream", "logs:DeleteMetricFilter", "logs:TagLogGroup", "logs:CancelExportTask", "ec2:DescribeVolumes", "logs:DeleteRetentionPolicy", "logs:DeleteLogDelivery", "logs:AssociateKmsKey", "logs:PutDestination", "logs:DisassociateKmsKey", "logs:UntagLogGroup", "logs:DeleteLogGroup", "logs:PutDestinationPolicy", "ec2:DescribeSnapshotAttribute", "logs:DeleteDestination", "logs:PutLogEvents", "logs:CreateLogGroup", "logs:PutMetricFilter", "logs:CreateLogDelivery", "logs:PutResourcePolicy", "logs:UpdateLogDelivery", "logs:PutSubscriptionFilter", "logs:PutRetentionPolicy" ], "Resource": "*" } ] } 

And we hang this policy on a new role.

Go to lambda and create a new python function.

 import datetime import sys import boto3 def get_volume_snapshots(client, volume_id, SnapshotType): args = { "Filters": [ { "Name": "volume-id", "Values": [volume_id] }, { "Name": "status", "Values": ["completed"] }, { "Name": "tag-key", "Values": ["SnapshotType"]}, { "Name": "tag-value", "Values": [SnapshotType]}, ], "OwnerIds": ["self"] } snapshots = [] while True: resp = client.describe_snapshots(**args) snapshots += resp.get("Snapshots", []) if "NextToken" in resp: args["NextToken"] = resp["NextToken"] else: break return snapshots def delete_snapshot(client, snapshot_id): wait_period = 5 retries = 5 while True: try: client.delete_snapshot(SnapshotId=snapshot_id) return True except Exception as ex: # As the list of snapshot is eventually consistent old snapshots might appear in listed snapshots if getattr(ex, "response", {}).get("Error", {}).get("Code", "") == "'InvalidSnapshot.NotFound": return False # Throttling might occur when deleting snapshots too fast if "throttling" in ex.message.lower(): retries -= 1 if retries == 0: raise ex time.sleep(wait_period) wait_period = min(wait_period + 10 , 30) continue raise ex def lambda_handler(event, context): retentions = {"Day": 5, "Week": 3, "Month": 2} client = boto3.client("ec2") vols = client.describe_volumes() snapshots_deleted = [] for vol in vols['Volumes']: vol_id = vol['VolumeId'] for SnapshotType, retention_count in retentions.items(): snapshots_for_volume = sorted(get_volume_snapshots(client, vol_id, SnapshotType), key=lambda s: s["StartTime"], reverse=True) snapshots_to_delete = [] if retention_count > 0: snapshots_to_delete = [b["SnapshotId"] for b in snapshots_for_volume[retention_count:]] for snapshot_id in snapshots_to_delete: if delete_snapshot(client, snapshot_id): snapshots_deleted.append(snapshot_id) return { "DeletedSnapshots": snapshots_deleted } 

We use the role created above. As a trigger we use CloudWatch Event.
This function passes through all volumes, searches for all volumes of snapshots that are completed with the SnapshotType tag, and removes all snapshots that have more snapshot retentions. I have the last 5 daily, 3 weekly and 2 monthly snapshots.


All Articles