Backup Implementation

I previously considered different backup schemes.  Writing that entry crystallized my basic ideas about what I was going to do about the backups.  I then proceeded to implement them.  This entry is a detailed description of what I did.

§ Backup Overview

I ended up buying two 80 GB hard drives and a ThermalTake BlacX external enclosure.  The overall plan is to do backups to one of the drives on a daily, automated basis, and the other on a periodic, maybe monthly basis.  Most of the time, the periodic backup drive will live at my office and will serve as my offsite backup.

I want to have a backup history in the same way that a tape rotation scheme would give me.  That way, if I don’t notice there’s something wrong with a file for a time, there’s still a good chance I can retrieve it.  I also want things stored space-efficiently, so backing up unchanged files doesn’t take up additional space.  This is accomplished pretty easily with rsync; I do one full backup, and then subsequent backups use rsync’s --link-dest option pointing to the most recent complete backup; any files that haven’t changed are simply hardlinked together, so the two directory entries point to the same physical location on the disk.

For the backup history, I decided to go with a variant of the Towers of Hanoi backup rotation.  Instead of using a preset number of pegs, as I would have to do if I were using physical tapes, I can just calculate the numbers on the fly, effectively behaving as if I had an infinite number of tapes.  This rotation gives me roughly exponential backoff for my history; I can look at backups from yesterday, two days ago, four days ago, eight days ago, and so on.

Finally, I decided to encrypt the drives.  This lets me cart them around with confidence that if they get lost or stolen, anyone who ends up with them won’t have my personal files.  I used LUKS to encrypt the disks, and used both a file on my computer and a password as keys.  The file makes it easier to mount the drives without manual intervention (important if my desktop reboots), while the password lets me get at the data if the key file isn’t available (important if the main disk dies and I need my backups).

§ Backup Procedure

# Set up fstab.

The first thing I did was to set up /etc/fstab for the disks.  Since I only have one enclosure, I have to swap disks, so only one will ever be connected at the same time.  Thus, I mount whichever’s in at the moment on /backups.  Likewise, I will associate each one with the dm-crypt name of “backups”, so the device will be /dev/mapper/backups.  Thus, I added the following line to /etc/fstab:

/dev/mapper/backups  /backups  auto  defaults  0  0

# Set up rsync filters.

I’m using rsync to do the backups, but in some cases I don’t want everything in the source directories to be backed up.  Thus, I tell rsync to look at a filter file for each directory so it knows what to skip and what to keep.  rsync will go through the filters for each file or directory it considers, and will take the first action that matches.  If nothing matched, the file is copied.  If a directory is ignored, none of the files in that directory are considered at all, so I had to include a few explicit directory chains.

In /var, I only want to back up a few things, so the final pattern ignores anything that isn’t explicitly included.

+ /backups
+ /backups/**
+ /lib
+ /lib/bzr
+ /lib/bzr/**
+ /lib/svn
+ /lib/svn/**
- *

For my home directory, I include everything, with a few exceptions.  For instance, most of my music directory can be reripped from CD if needed, so I don’t need to take up space backing up those files.  On the other hand, I have some files that I either purchased online or downloaded and wouldn’t be able to easily replace if they were lost, so I do back them up.  Here’s an excerpt from my home filter file:

+ /movies/Star_Wars_Holiday_Special
+ /movies/Star_Wars_Holiday_Special/**
- /movies

+ /music
+ /music/Bonerama
+ /music/Bonerama/Bringing_It_Home
+ /music/Bonerama/Bringing_It_Home/**
+ /music/Jonathan_Coulton
+ /music/Jonathan_Coulton/Thing_a_Week_Three
+ /music/Jonathan_Coulton/Thing_a_Week_Three/03-Code_Monkey*
+ /music/Nine_Inch_Nails
+ /music/Nine_Inch_Nails/Ghosts_I-IV
+ /music/Nine_Inch_Nails/Ghosts_I-IV/**
+ /music/Nine_Inch_Nails/The_Slip
+ /music/Nine_Inch_Nails/The_Slip/**
+ /music/Obelix
+ /music/Obelix/**
+ /music/Solo_String_Project
+ /music/Solo_String_Project/**
- /music/**

- /tmp

# Initialize disks.

I wrote a script to initialize the disks for me: init-backup-disk.  It takes two parameters: the name of the device for the backup disk, and the file to use as a key for the partition.  If the key file doesn’t exist, it will be created.

After a few sanity checks, the script starts doing things.  It starts by checking that the disk is good with badblocks.  If it encounters any errors, it stops there and the drive needs to be sent in for warranty replacement.  Following that, it goes into the most time-consuming part of the initialization: writing random data to the disk.  (badblocks already wrote random data, but its PRNG is somewhat simplistic; /dev/urandom is a much better source of pseudo-random numbers.)  Without this step, it would be obvious which bits of the disk had encrypted data on them.  I use pv to give a progress meter and time estimate.  On my computer, badblocks took a little over two hours and /dev/urandom took about eight hours for each 80GB disk.

# Check and randomize disk. 
badblocks -b 512 -s -w -t random -v $disk || exit 2
</dev/urandom pv -s $(fdisk -l $disk |
    perl -nle 'm{^Disk '${disk}': [0-9.]+ [KMGT]B, (\d+) bytes$} and print $1') |
  dd bs=512 conv=sync,noerror of=$disk

The next step is to format the encrypted partition.  I use sfdisk to create a single partition that spans the entire drive, followed by cryptsetup to do the format.  I explicitly specify the cipher in order to use ESSIV, which makes certain attacks more difficult.  The --batch-mode option keeps it from asking for confirmation before writing.  The second call to sfdisk just tells the kernel to reread the disk’s partitions so it will pick up the UUID that cryptsetup created.

# Add and format the LUKS partition. 
echo , | sfdisk --Linux $disk
cryptsetup luksFormat --cipher aes-cbc-essiv:sha256 --batch-mode ${disk}1 $keyfile
sfdisk -R $disk; sleep 5

Next, I open, format, and mount the partition.  JFS is the filesystem that’s been nicest to me, of all the journaling filesystems I’ve tried.  (In the future, it might be nice to use ZFS for the backup disks—I’d get better detection of disk errors, at least—but I don’t think it would play entirely well with running over an encrypted volume, and they haven’t integrated encryption into ZFS yet, as far as I can tell.)

# Open LUKS partition, format and mount the encrypted volume. 
cryptsetup --key-file $keyfile luksOpen ${disk}1 backups
mkfs -t jfs -q /dev/mapper/backups
mount /backups

Now I run the initial backup.  Each backup version is just a separate numbered directory in the partition, so the first one is ‘1’.  I’m backing up /etc, some of /var, my and my wife’s home directories, and any databases I have.  My wife’s stuff isn’t directly backed up here because she’s on a different computer; I have to initiate her backup from there.  The script, in its first requirement for user interaction, will wait until I tell it that’s done.

# Do the initial backup. 
mkdir -vp /backups/1/{etc,var,phil,postgresql,mysql,rivana}
chmod a-r /backups/1
chown rivana /backups/1/rivana
chown postgres /backups/1/postgresql
rsync -avP --filter 'merge /etc/backups/etc-filter' /etc/ /backups/1/etc/
rsync -avP --filter 'merge /etc/backups/var-filter' /var/ /backups/1/var/
rsync -avP --filter 'merge /etc/backups/phil-filter' /home/phil/ /backups/1/phil/
su -c 'pg_dumpall -v >/backups/1/postgresql/dump' postgres
mysqldump -v --all-databases >/backups/1/mysql/dump
echo -n "Hit enter when rivana is backed up.  "
read foo

Now that the backup is done, the script unmounts and deactivates the partition.

# Deactivate the encrypted volume. 
umount /backups
cryptsetup luksClose backups

And I get prompted for the password that will unlock the partition if the key file isn’t available.

# Add user password. 
cryptsetup --key-file $keyfile --verify-passphrase luksAddKey ${disk}1

Finally, the script displays the UUID for the partition, which is needed for later use.

# Display the partition's UUID. 
echo -n 'UUID: '
cryptsetup luksUUID ${disk}1

# Set up crypttab.

Debian, at least, has an /etc/crypttab file that lists encrypted partitions to be enabled at boot time.  I put the onsite backup disk in there so it’ll be automatically mounted if the computer reboots.  This plus a backup cronjob make the backup process completely automated.

backups  /dev/disk/by-uuid/<onsite UUID>  <key file>  luks

# Do local backups.

I have a simple script to do the daily backups: perform-backup.  It’s basically the same as the initial backup, but with the --link-dest option as I mentioned previously.

last_num=$(ls -t /backups | head -1)
((num=$last_num+1))

mkdir -p /backups/$num/{etc,var,phil,postgresql,mysql,rivana}
chown rivana /backups/$num/rivana
chmod a-r /backups/$num
rsync -a --filter 'merge /etc/backups/etc-filter' --link-dest=/backups/$last_num/etc /etc/ /backups/$num/etc/
rsync -a --filter 'merge /etc/backups/var-filter' --link-dest=/backups/$last_num/var /var/ /backups/$num/var/
rsync -a --filter 'merge /etc/backups/phil-filter' --link-dest=/backups/$last_num/phil /home/phil/ /backups/$num/phil/
chown postgres /backups/$num/postgresql
su -c "pg_dumpall >/backups/$num/postgresql/dump" postgres
mysqldump --all-databases >/backups/$num/mysql/dump

# Do Becca’s backup

My wife has her own computer but, fortunately, simpler backup requirements.  I have ssh public key authentication set up so she can ssh to my computer without a password, which makes the backups work properly in an automated fashion.

The initial backup is a simple rsync one-liner.

rsync -avP ~/ mithrandir:/backups/1/rivana/

Subsequent backups are a simple script:

#!/bin/sh

num=$(ssh mithrandir ls -t /backups | head -1)
last_num=$(ssh mithrandir ls -t /backups | head -2 | tail -1)

rsync -a --link-dest=/backups/${last_num}/rivana ${HOME}/ mithrandir:/backups/${num}/rivana/

# Backup rotation.

The Towers of Hanoi rotation is effected by a script that goes through and deletes any directories that don’t need to be there: purge-backups.  I won’t quote it here because I don’t think it’s all that interesting.  It just finds the largest power of two less than or equal to the current number and then works its way down from that to enumerate all of the directories, deleting everything else.

# Offsite backups.

Every so often I’ll bring the offsite disk home and back up to it.  The script for that is insert-offsite-backup.  It unmounts the onsite disk, waits for me to insert the offsite disk, runs a backup, unmounts the offsite disk, waits for me to reinsert the onsite disk, then remounts that disk.  It needs to be told what my offsite UUID is, but it picks up all the other settings from /etc/crypttab.

The backup takes about half an hour, so I have ample time to manually run the backup script on Becca’s computer.


Backups

I had a dream last night that the apartment beneath ours caught on fire, we had to rush out of the building, and my computer and all of its data was destroyed.

I’ve been pondering a formal backup system for a while now.  (My current system involves making sure important files are in a version control system and exist on at least my laptop and desktop.  This is pretty ad-hoc, inconsistently updated, and not entirely comprehensive.)  I’m taking my dream as impetus to actually set something up.  This post is to help me organize my thoughts and see if anyone has any comments or suggestions.

§ My Requirements

I want to have a full rotating versioned backup system, where I have complete daily backups for a recent time span (say a week or so) and more sporadic backups back to as much as a year in the past.  Ideally, the backups should be stored in a space-efficient manner so unchanged files don’t take up more space than a single copy would require.  The backups should have off-site redundancy.  They should be relatively easy to use; they should be fully automated on a day-to-day basis, with notification when things go wrong.  Ease of setup would be nice but not necessary.

§ My Data

I currently have about 720 GB of data in my home directory, plus a few hundred MB elsewhere on the computer that I’d want to back up.  I also have about 11GB in a bzr repository, but all of that should remain duplicated in my home directory.  Most of the data in my home directory is in media files that I can either replace (rerip CDs, etc.) or live without; only 25 GB of it is stuff that I must back up.  (A further 130 GB is stuff that would be nice to back up, but I can just burn it to DVD and consider those my backups; the data is essentially static.)

§ JWZ Backups

The easiest approach is the JWZ backup solution.  For all of my data, that would be two 1 TB external hard drives, for about $220.  If I restrict myself to the “must backup” data, I could make do with two 60 GB external hard drives for about $80.  In either case, I’d keep one drive at the office and swap them periodically.

The advantage of this approach is that I control everything.  I can put encrypted volumes on the drives, so if they get lost or stolen, my data isn’t usable to other people.  I can use rsync with hardlinks between datestamped directories to get versioned backups with efficient disk usage.  The drawbacks are a modest initial monetary outlay and the need to coordinate shuttling drives back and forth.

§ Amazon S3

Another approach is to use Amazon S3 to store my data.  It’s offsite by definition (and stored among multiple data centers; if I write data to it, I can reasonably trust that I’ll get that data back).  It’s not too expensive: at $0.17/GB-month, my minimal backup will cost about $3.85/month.  Throw in transfer costs and churn, and I doubt I’d exceed $6/month.  (The initial upload would be $2.56.  A full restore would cost me $4.36.)  With S3, I would only back up the minimal data; the 130 GB of optional backups would cost an additional $20/month, which would exceed the cost of the full do-it-myself hard drive backups in one year.

The complication to S3 is that it’s just a web-based data storage service; you need additional software to make a reasonable backup solution.

# Jungle Disk

From everything I’ve read, Jungle Disk is currently the best software for storing filesystem data on S3.  It runs on Windows, Mac OSX, and Linux, and exports your S3 buckets as a WebDAV disk, which you can then mount and treat like an ordinary (unlimited capacity) disk drive.  All data is encrypted before it’s sent to S3.

I like this approach.  Since it looks like a disk, I can use the same rsync setup I would with my own disks, and since the data is encrypted, I don’t need to worry too much about it being transported over the Internet and stored on someone else’s servers.  The main drawback is that it’s proprietary software.  In addition to my principled preference of open source software to proprietary, there’s also the issue that, especially because the data’s encrypted, this software would be my only access to my backups.  If something went wrong and I couldn’t get support from the company (e.g. they went out of business), I’d be out of luck.

The software costs $20.  Assuming $5/month on S3, it would take one year for this approach to cost more than the minimal get-my-own-disks approach.

# Other S3 software

I haven’t seen anything else that will let me back up to S3 and keep versioned backups in a space-efficient manner.  Most of the S3 backup software I’ve seen doesn’t do versions, and the few that do don’t appear to do it space-efficiently.  As always, I have the option of writing my own, but that would take a fair amount of time and effort, and I’d be likely to give up partway through, continuing to leave myself without good backups.

§ Conclusion

Barring any better suggestions from others, I’m leaning towards the two smallish hard drives.  They’d pay for themselves after a year of use, and I get complete control of my data (for better or worse).  I like the idea of using S3, but it’s more expensive in the long run, and I’m not completely happy with any of the software I’ve found to use with it.


Truffle-Topped Amaretto Brownies

I made this for Thanksgiving this year, and have already been asked for the recipe, even though I haven’t had any yet.  So here goes.

  • Brownie Layer
    • 1 cup sugar
    • 1 cup butter
    • 4 Tablespoons water
    • 2 cups semi-sweet chocolate morsels (~300g)
    • 1/2 cup amaretto
    • 2 teaspoons vanilla extract
    • 4 eggs
    • 1 1/2 cups all-purpose flour (~190g)
    • 1/2 teaspoon baking soda
    • 1/2 teaspoon salt
    • 1 cup chopped or sliced almonds (optionally toasted)
    • 1/2 cup chopped maraschino cherries
  • Truffle Layer
    • 8 oz cream cheese, softened
    • 1/4 cup powdered sugar (30g)
    • 1 cup semi-sweet chocolate morsels (~150g)
    • 2-3 Tablespoons amaretto
  • Topping
    • 1 cup semi-sweet chocolate morsels (~150g)
    • 1/2 cup whipping cream
    • 1 cup sliced almonds, lightly toasted
    • maraschino cherries for garnish

§ Brownie Layer

Preheat oven to 325°F.  Prepare a 9×13 baking dish.  (I line it with a sling of parchment paper and then spray it with Baker’s Joy.)

In a saucepan, bring the sugar, butter, and water to a boil.  Remove from heat.  Add chocolate, amaretto, and vanilla extract, stirring until chocolate is melted.  Add eggs, one at a time, stirring until blended.

Whisk together flour, baking soda, and salt.  Add to chocolate mixture, stirring well.  Stir in almonds and cherries.

Pour mixture into baking dish and bake for 42-48 minutes.

Leave them in the dish to cool.

§ Truffle Layer

While the brownies are cooling, beat the cream cheese and powdered sugar in a stand mixer on medium speed until the mixture is smooth.  Melt chocolate and add with the amaretto to the cream cheese, mixing until well-blended.

Spread over brownies and refrigerate until firm, at least 1 hour.

§ Topping

In a saucepan, melt the chocolate in the whipping cream.  Spread evenly over the brownies.  Sprinkle with almonds and cherries.  Refrigerate until set, at least 1 hour.

Cut into bars and serve.


More Delphi Type Hate

I have simple needs.  I have a base class with some generic behavior and subclasses with specific information for that generic behavior.  More concretely, the subclasses need to provide the generic behavior with an ordered list of things that designate key fields on database tables.  The best representation of those “things” in Delphi seems to be members of an enumeration:

type
  TKeyField = (kfFoo, kfBar, kfBaz, kfQuux);

Since I need the list of fields to be ordered, I need them in an array:

type
  TKeyFieldArray = array of TKeyField;

The declaration of the base class is pretty simple:

type
  TBaseClass = class
   protected
    function GetKeyFieldList : TKeyFieldArray; virtual; abstract;
   public
    procedure DoSomethingWithKeyFields;
  end;

As is the declaration of the subclass:

type
  TSubClass = class(TBaseClass)
   protected
    function GetKeyFieldList : TKeyFieldArray; override;
  end;

So where’s the problem?  Where’s the hate?  The hate is in the implementation.  If Delphi had array literals, this would be easy.  Something like:

function TSubClass.GetKeyFieldList : TKeyFieldArray;
begin
  Result := [kfBar, kfFoo, kfQuux];
end;

But it doesn’t.  It has some special magic for array literals if they’re the parameter to a function, but not anywhere else.  It does, however, have a syntax for array constants.  Perhaps this will work:

function TSubClass.GetKeyFieldList : TKeyFieldArray;
  const
    keyFieldList : TKeyFieldArray = (kfBar, kfFoo, kfQuux);
begin
  Result := keyFieldList;
end;

But no.  That TKeyFieldArray is a dynamic array; Delphi doesn’t allocate any space for it, so it can’t be a constant value.  You have to tell Delphi how big each constant array is, even though you’re already telling it how many elements are in the array.  So perhaps this is the solution:

function TSubClass.GetKeyFieldList : TKeyFieldArray;
  const
    keyFieldList : array[0..2] of TKeyField = (kfBar, kfFoo, kfQuux);
begin
  Result := keyFieldList;
end;

But no.  Because of Delphi’s approach to static typing, those are actually different types, and are therefore not assignment-compatible.  (See previous hates on this subject.)  No, here is the code that Delphi makes me type for what should be a one-line function implementation:

function TSubClass.GetKeyFieldList : TKeyFieldArray;
begin
  SetLength(Result, 3);
  Result[0] := kfBar;
  Result[1] := kfFoo;
  Result[2] := kfQuux;
end;

And just earlier this morning I was pleased because I read that Delphi 2007 (to which I’ll soon be upgrading from Delphi 5) has for...in loops, so I can finally have foreach.  (Can’t get the generics and anonymous functions in Delphi 2009, because we need .NET and that’s not yet available for Delphi 2009.)  Oh, Delphi.  The one hand giveth, and the entire rest of the stupid, anemic, pox-ridden language taketh away.


Take My Stuff!

I’m divesting myself of a lot of computer-related things that are taking up too much space in my apartment.  Take a look at my stuff and let me know if there’s anything you’d like to take off of my hands.  Anything left after a couple of weeks will go to the electronics recycling center.