Thursday, 23 February 2012

Secure backup of untrusted remote hosts

I didn't blog for a long time, so it will be a long post caused by some nightmares I had about not doing proper backups on some of my hosts.

Servers - all those small and big machines most of the geeks own, run or operate. As VPS pricing drops, we see more and more of those low-end, resource strapped servers. Organic growth usually means you start with empty server, some kind of definition what it will be doing and... from there it just goes downhill. How do you backup such VPS? Here is something I use myself.

My backup requirements

  • Automated - it has to run without supervision in roughly regular time intervals, if it's not automated it will never be done (read no backup)
  • Off-site - in case I loose the whole machine for some reason (because RAID is not backup and what fire doesn't destroy, water poured by firemen will)
  • No Cross-Backups - because they require trust relationship between machines and if you think about using cheap VPS'es for cross-backups, remember that you get what you pay for!
  • Automatically delete old backups - to save space, (my) time and money
  • Append only - machine can only write data to its own, designated backup volume but can not delete or modify other volumes (accidents and rogue users do happen)
  • Confidentiality - no unauthorized access backed up data
  • Availability - storage volume has to be highly available so I can not only write to it knowing it's there, but also access backups when I need them
  • Access controls - ability to define granular access rules and enforce append-only usage
  • Economy - it has to have reasonable cost

Proposed solution

Server creates tarball with files I want to copy using simple shell script triggered from cron. File created is encrypted with GnuPG using the key of my backup user and the private key is stored off-line. Encrypted file is uploaded to off-site storage volume.

As I used Amazon AWS before, this was my first choice. The company is big enough to do quality job, offers all the building blocks and pay-per-use is just what I need. By combining together services from Amazon I can satisfy most of the requirements out of the box and easily add what is missing.

Amazon S3 is a storage solution that allows you to put your files into 'buckets' (think file shares) with globally unique names. Each bucket has series of properties - for example geographical location, so you can select where your data will reside (thinking of legal stuff and price differences across locations), object expiry time which will be our auto-delete mechanism for old data and finally ACLs. Because those ACLs are not enough for what I want to do (or rather how I want to have it done) I will be using IAM service that nicely integrates with S3 and many other AWS services, so let's get it set up.

Setting up S3

I create separate S3 bucket for each host, so I can select location and different expiry times easily. I decided to name buckets after hosts's FQDN and add '-backups' suffix, so for this blog post I have bucket called aws-poc.home.lab-backups. In bucket properties we are interested in the object expiry time configuration. Simply add the rule as seen below. If you leave prefix empty, it will affect all objects in the bucket - which is exactly what I want - retain backups for 180 days.

IAM configuration

Uploading to S3 via web service requires providing user's Access Key ID and Secret Access Key. For each server I want to back up I need separate IAM user - this will allow me to tell them apart and revoke access to backup bucket if needed. IAM allows us to grant every IAM users and groups right to perform or deny certain actions, like 'allow to upload files only to bucket X, block all other bucket operations' - we do that below.

After creating the user in IAM service (yes, IAM, not S3), remember to write down the access keys - they can't be displayed later - you will have to generate new keys (see user properties).


Now we need to define what the user can do. In user properties under Permissions tab, we select Attach User Policy and choose Policy Generator. To have append-only access to our S3 bucket we need to grant user access to PutObject action (and only this one) and specify ARN of our S3 bucket. This is the minimum we need to do.


Backup and upload scripts

Backups scripts are really easy - just tar and gzip directories as needed so they contain what is to be backed up, pipe that via gpg and save somewhere for a short time... Then upload to S3 and you can delete original encrypted tarball. For example it can be done this way:

#!/bin/bash
#
# this is updated version that adds file hash to the name
# so once file was uploaded and source data changed,
# potential attacker can't overwrite files already uploaded
#
WORKDIR=/tmp
DATE=`date +%Y%m%d`
HOSTNAME=`hostname --fqdn`
cd $WORKDIR
tar cf - /etc /var/backups 2>/dev/null | bzip2 -9 | gpg -e -r backups > tmpbackup
SHA256=`sha256sum tmpbackup | awk '{ print $1; }'`
BACKUPFILE=$DATE-$HOSTNAME-$SHA256.tar.bz2.gpg
mv tmpbackup $BACKUPFILE
s3upload.pl $HOSTNAME-backups $BACKUPFILE && rm $BACKUPFILE

That's all - the upload is done by s3upload.pl script:

#!/usr/bin/perl
use strict;
use warnings;
use Net::Amazon::S3;

# requires:
# apt-get install libnet-amazon-s3-perl libwww-perl libxml-simple-perl

if ($#ARGV < 1) {
    print "Usage:\n\t$0 <bucket name> <file name>\n";
    exit 1;
}

my $s3 = Net::Amazon::S3->new({ 
    aws_access_key_id => "INSERT KEY ID HERE",
    aws_secret_access_key => "INSERT SECRET KEY HERE",
  });

# upload or die
my $bucket = $s3->bucket($ARGV[0]);
$bucket->add_key_filename($ARGV[1], $ARGV[1]) or die $s3->err . ": " . $s3->errstr;
exit 0;

Caveats

To run gpg in the way I do above, importing the target key is not enough - you have to edit the imported key and set trust level to ULTIMATE or every time the script runs, you will have to interactively confirm that you are sure you want to encode data.


To change trust level for the above key I did:

gpg --edit-key backups
trust
5                  <== for ultimate trust
quit

That's all, now the key has ultimate trust and the process can be fully automated - no more questions asked.

Closing notes

The old saying says there are two kinds of people - those who do backups and those who will do backups. In fact there is a third kind - those who test their backups... so please, test your backups, see if you can restore data, or otherwise you have just wasted your time and money to buy false sense of security.

UPDATE:
As the PutObject permission allows to overwrite already existing files, it's desirable to have unique file names that can't be easily determined/guessed. I have updated the backup script above to have to so calculate SHA256 hash of encrypted backup file and add resulting hash to the file name. This is just a result of my paranoia - better be safe than sorry :-)
Another update is for s3upload.pl - it is more generic right now, taking two parameters - bucket name and file name from the command line passed as parameters, so you can use it as well for uploading other things than backups and it will work ok.