Fri, 02 Jan 2009

Backup Implementation

I previously considered different backup schemes. Writing that entry crystallized my basic ideas about what I was going to do about the backups. I then proceeded to implement them. This entry is a detailed description of what I did.

Backup Overview

I ended up buying two 80 GB hard drives and a ThermalTake BlacX external enclosure. The overall plan is to do backups to one of the drives on a daily, automated basis, and the other on a periodic, maybe monthly basis. Most of the time, the periodic backup drive will live at my office and will serve as my offsite backup.

I want to have a backup history in the same way that a tape rotation scheme would give me. That way, if I don't notice there's something wrong with a file for a time, there's still a good chance I can retrieve it. I also want things stored space-efficiently, so backing up unchanged files doesn't take up additional space. This is accomplished pretty easily with rsync; I do one full backup, and then subsequent backups use rsync's --link-dest option pointing to the most recent complete backup; any files that haven't changed are simply hardlinked together, so the two directory entries point to the same physical location on the disk.

For the backup history, I decided to go with a variant of the Towers of Hanoi backup rotation. Instead of using a preset number of pegs, as I would have to do if I were using physical tapes, I can just calculate the numbers on the fly, effectively behaving as if I had an infinite number of tapes. This rotation gives me roughly exponential backoff for my history; I can look at backups from yesterday, two days ago, four days ago, eight days ago, and so on.

Finally, I decided to encrypt the drives. This lets me cart them around with confidence that if they get lost or stolen, anyone who ends up with them won't have my personal files. I used LUKS to encrypt the disks, and used both a file on my computer and a password as keys. The file makes it easier to mount the drives without manual intervention (important if my desktop reboots), while the password lets me get at the data if the key file isn't available (important if the main disk dies and I need my backups).

Backup Procedure

Set up fstab.

The first thing I did was to set up /etc/fstab for the disks. Since I only have one enclosure, I have to swap disks, so only one will ever be connected at the same time. Thus, I mount whichever's in at the moment on /backups. Likewise, I will associate each one with the dm-crypt name of "backups", so the device will be /dev/mapper/backups. Thus, I added the following line to /etc/fstab:

/dev/mapper/backups  /backups  auto  defaults  0  0

Set up rsync filters.

I'm using rsync to do the backups, but in some cases I don't want everything in the source directories to be backed up. Thus, I tell rsync to look at a filter file for each directory so it knows what to skip and what to keep. rsync will go through the filters for each file or directory it considers, and will take the first action that matches. If nothing matched, the file is copied. If a directory is ignored, none of the files in that directory are considered at all, so I had to include a few explicit directory chains.

In /var, I only want to back up a few things, so the final pattern ignores anything that isn't explicitly included.

+ /backups
+ /backups/**
+ /lib
+ /lib/bzr
+ /lib/bzr/**
+ /lib/svn
+ /lib/svn/**
- *

For my home directory, I include everything, with a few exceptions. For instance, most of my music directory can be reripped from CD if needed, so I don't need to take up space backing up those files. On the other hand, I have some files that I either purchased online or downloaded and wouldn't be able to easily replace if they were lost, so I do back them up. Here's an excerpt from my home filter file:

+ /movies/Star_Wars_Holiday_Special
+ /movies/Star_Wars_Holiday_Special/**
- /movies

+ /music
+ /music/Bonerama
+ /music/Bonerama/Bringing_It_Home
+ /music/Bonerama/Bringing_It_Home/**
+ /music/Jonathan_Coulton
+ /music/Jonathan_Coulton/Thing_a_Week_Three
+ /music/Jonathan_Coulton/Thing_a_Week_Three/03-Code_Monkey*
+ /music/Nine_Inch_Nails
+ /music/Nine_Inch_Nails/Ghosts_I-IV
+ /music/Nine_Inch_Nails/Ghosts_I-IV/**
+ /music/Nine_Inch_Nails/The_Slip
+ /music/Nine_Inch_Nails/The_Slip/**
+ /music/Obelix
+ /music/Obelix/**
+ /music/Solo_String_Project
+ /music/Solo_String_Project/**
- /music/**

- /tmp

Initialize disks.

I wrote a script to initialize the disks for me: init-backup-disk. It takes two parameters: the name of the device for the backup disk, and the file to use as a key for the partition. If the key file doesn't exist, it will be created.

After a few sanity checks, the script starts doing things. It starts by checking that the disk is good with badblocks. If it encounters any errors, it stops there and the drive needs to be sent in for warranty replacement. Following that, it goes into the most time-consuming part of the initialization: writing random data to the disk. (badblocks already wrote random data, but its PRNG is somewhat simplistic; /dev/urandom is a much better source of pseudo-random numbers.) Without this step, it would be obvious which bits of the disk had encrypted data on them. I use pv to give a progress meter and time estimate. On my computer, badblocks took a little over two hours and /dev/urandom took about eight hours for each 80GB disk.

# Check and randomize disk.
badblocks -b 512 -s -w -t random -v $disk || exit 2
</dev/urandom pv -s $(fdisk -l $disk |
    perl -nle 'm{^Disk '${disk}': [0-9.]+ [KMGT]B, (\d+) bytes$} and print $1') |
  dd bs=512 conv=sync,noerror of=$disk

The next step is to format the encrypted partition. I use sfdisk to create a single partition that spans the entire drive, followed by cryptsetup to do the format. I explicitly specify the cipher in order to use ESSIV, which makes certain attacks more difficult. The --batch-mode option keeps it from asking for confirmation before writing. The second call to sfdisk just tells the kernel to reread the disk's partitions so it will pick up the UUID that cryptsetup created.

# Add and format the LUKS partition.
echo , | sfdisk --Linux $disk
cryptsetup luksFormat --cipher aes-cbc-essiv:sha256 --batch-mode ${disk}1 $keyfile
sfdisk -R $disk; sleep 5

Next, I open, format, and mount the partition. JFS is the filesystem that's been nicest to me, of all the journaling filesystems I've tried. (In the future, it might be nice to use ZFS for the backup disks--I'd get better detection of disk errors, at least--but I don't think it would play entirely well with running over an encrypted volume, and they haven't integrated encryption into ZFS yet, as far as I can tell.)

# Open LUKS partition, format and mount the encrypted volume.
cryptsetup --key-file $keyfile luksOpen ${disk}1 backups
mkfs -t jfs -q /dev/mapper/backups
mount /backups

Now I run the initial backup. Each backup version is just a separate numbered directory in the partition, so the first one is '1'. I'm backing up /etc, some of /var, my and my wife's home directories, and any databases I have. My wife's stuff isn't directly backed up here because she's on a different computer; I have to initiate her backup from there. The script, in its first requirement for user interaction, will wait until I tell it that's done.

# Do the initial backup.
mkdir -vp /backups/1/{etc,var,phil,postgresql,mysql,rivana}
chmod a-r /backups/1
chown rivana /backups/1/rivana
chown postgres /backups/1/postgresql
rsync -avP --filter 'merge /etc/backups/etc-filter' /etc/ /backups/1/etc/
rsync -avP --filter 'merge /etc/backups/var-filter' /var/ /backups/1/var/
rsync -avP --filter 'merge /etc/backups/phil-filter' /home/phil/ /backups/1/phil/
su -c 'pg_dumpall -v >/backups/1/postgresql/dump' postgres
mysqldump -v --all-databases >/backups/1/mysql/dump
echo -n "Hit enter when rivana is backed up.  "
read foo

Now that the backup is done, the script unmounts and deactivates the partition.

# Deactivate the encrypted volume.
umount /backups
cryptsetup luksClose backups

And I get prompted for the password that will unlock the partition if the key file isn't available.

# Add user password.
cryptsetup --key-file $keyfile --verify-passphrase luksAddKey ${disk}1

Finally, the script displays the UUID for the partition, which is needed for later use.

# Display the partition's UUID.
echo -n 'UUID: '
cryptsetup luksUUID ${disk}1

Set up crypttab.

Debian, at least, has an /etc/crypttab file that lists encrypted partitions to be enabled at boot time. I put the onsite backup disk in there so it'll be automatically mounted if the computer reboots. This plus a backup cronjob make the backup process completely automated.

backups  /dev/disk/by-uuid/<onsite UUID>  <key file>  luks

Do local backups.

I have a simple script to do the daily backups: perform-backup. It's basically the same as the initial backup, but with the --link-dest option as I mentioned previously.

last_num=$(ls -t /backups | head -1)

mkdir -p /backups/$num/{etc,var,phil,postgresql,mysql,rivana}
chown rivana /backups/$num/rivana
chmod a-r /backups/$num
rsync -a --filter 'merge /etc/backups/etc-filter' --link-dest=/backups/$last_num/etc /etc/ /backups/$num/etc/
rsync -a --filter 'merge /etc/backups/var-filter' --link-dest=/backups/$last_num/var /var/ /backups/$num/var/
rsync -a --filter 'merge /etc/backups/phil-filter' --link-dest=/backups/$last_num/phil /home/phil/ /backups/$num/phil/
chown postgres /backups/$num/postgresql
su -c "pg_dumpall >/backups/$num/postgresql/dump" postgres
mysqldump --all-databases >/backups/$num/mysql/dump

Do Becca's backup

My wife has her own computer but, fortunately, simpler backup requirements. I have ssh public key authentication set up so she can ssh to my computer without a password, which makes the backups work properly in an automated fashion.

The initial backup is a simple rsync one-liner.

rsync -avP ~/ mithrandir:/backups/1/rivana/

Subsequent backups are a simple script:


num=$(ssh mithrandir ls -t /backups | head -1)
last_num=$(ssh mithrandir ls -t /backups | head -2 | tail -1)

rsync -a --link-dest=/backups/${last_num}/rivana ${HOME}/ mithrandir:/backups/${num}/rivana/

Backup rotation.

The Towers of Hanoi rotation is effected by a script that goes through and deletes any directories that don't need to be there: purge-backups. I won't quote it here because I don't think it's all that interesting. It just finds the largest power of two less than or equal to the current number and then works its way down from that to enumerate all of the directories, deleting everything else.

Offsite backups.

Every so often I'll bring the offsite disk home and back up to it. The script for that is insert-offsite-backup. It unmounts the onsite disk, waits for me to insert the offsite disk, runs a backup, unmounts the offsite disk, waits for me to reinsert the onsite disk, then remounts that disk. It needs to be told what my offsite UUID is, but it picks up all the other settings from /etc/crypttab.

The backup takes about half an hour, so I have ample time to manually run the backup script on Becca's computer.

Phil! Gold