Mon, 22 Dec 2008


I had a dream last night that the apartment beneath ours caught on fire, we had to rush out of the building, and my computer and all of its data was destroyed.

I've been pondering a formal backup system for a while now. (My current system involves making sure important files are in a version control system and exist on at least my laptop and desktop. This is pretty ad-hoc, inconsistently updated, and not entirely comprehensive.) I'm taking my dream as impetus to actually set something up. This post is to help me organize my thoughts and see if anyone has any comments or suggestions.

My Requirements

I want to have a full rotating versioned backup system, where I have complete daily backups for a recent time span (say a week or so) and more sporadic backups back to as much as a year in the past. Ideally, the backups should be stored in a space-efficient manner so unchanged files don't take up more space than a single copy would require. The backups should have off-site redundancy. They should be relatively easy to use; they should be fully automated on a day-to-day basis, with notification when things go wrong. Ease of setup would be nice but not necessary.

My Data

I currently have about 720 GB of data in my home directory, plus a few hundred MB elsewhere on the computer that I'd want to back up. I also have about 11GB in a bzr repository, but all of that should remain duplicated in my home directory. Most of the data in my home directory is in media files that I can either replace (rerip CDs, etc.) or live without; only 25 GB of it is stuff that I must back up. (A further 130 GB is stuff that would be nice to back up, but I can just burn it to DVD and consider those my backups; the data is essentially static.)

JWZ Backups

The easiest approach is the JWZ backup solution. For all of my data, that would be two 1 TB external hard drives, for about $220. If I restrict myself to the "must backup" data, I could make do with two 60 GB external hard drives for about $80. In either case, I'd keep one drive at the office and swap them periodically.

The advantage of this approach is that I control everything. I can put encrypted volumes on the drives, so if they get lost or stolen, my data isn't usable to other people. I can use rsync with hardlinks between datestamped directories to get versioned backups with efficient disk usage. The drawbacks are a modest initial monetary outlay and the need to coordinate shuttling drives back and forth.

Amazon S3

Another approach is to use Amazon S3 to store my data. It's offsite by definition (and stored among multiple data centers; if I write data to it, I can reasonably trust that I'll get that data back). It's not too expensive: at $0.17/GB-month, my minimal backup will cost about $3.85/month. Throw in transfer costs and churn, and I doubt I'd exceed $6/month. (The initial upload would be $2.56. A full restore would cost me $4.36.) With S3, I would only back up the minimal data; the 130 GB of optional backups would cost an additional $20/month, which would exceed the cost of the full do-it-myself hard drive backups in one year.

The complication to S3 is that it's just a web-based data storage service; you need additional software to make a reasonable backup solution.

Jungle Disk

From everything I've read, Jungle Disk is currently the best software for storing filesystem data on S3. It runs on Windows, Mac OSX, and Linux, and exports your S3 buckets as a WebDAV disk, which you can then mount and treat like an ordinary (unlimited capacity) disk drive. All data is encrypted before it's sent to S3.

I like this approach. Since it looks like a disk, I can use the same rsync setup I would with my own disks, and since the data is encrypted, I don't need to worry too much about it being transported over the Internet and stored on someone else's servers. The main drawback is that it's proprietary software. In addition to my principled preference of open source software to proprietary, there's also the issue that, especially because the data's encrypted, this software would be my only access to my backups. If something went wrong and I couldn't get support from the company (e.g. they went out of business), I'd be out of luck.

The software costs $20. Assuming $5/month on S3, it would take one year for this approach to cost more than the minimal get-my-own-disks approach.

Other S3 software

I haven't seen anything else that will let me back up to S3 and keep versioned backups in a space-efficient manner. Most of the S3 backup software I've seen doesn't do versions, and the few that do don't appear to do it space-efficiently. As always, I have the option of writing my own, but that would take a fair amount of time and effort, and I'd be likely to give up partway through, continuing to leave myself without good backups.


Barring any better suggestions from others, I'm leaning towards the two smallish hard drives. They'd pay for themselves after a year of use, and I get complete control of my data (for better or worse). I like the idea of using S3, but it's more expensive in the long run, and I'm not completely happy with any of the software I've found to use with it.

Wed, 26 Nov 2008

Truffle-Topped Amaretto Brownies

I made this for Thanksgiving this year, and have already been asked for the recipe, even though I haven't had any yet. So here goes.

Brownie Layer

Preheat oven to 325°F. Prepare a 9×13 baking dish. (I line it with a sling of parchment paper and then spray it with Baker's Joy.)

In a saucepan, bring the sugar, butter, and water to a boil. Remove from heat. Add chocolate, amaretto, and vanilla extract, stirring until chocolate is melted. Add eggs, one at a time, stirring until blended.

Whisk together flour, baking soda, and salt. Add to chocolate mixture, stirring well. Stir in almonds and cherries.

Pour mixture into baking dish and bake for 42-48 minutes.

Leave them in the dish to cool.

Truffle Layer

While the brownies are cooling, beat the cream cheese and powdered sugar in a stand mixer on medium speed until the mixture is smooth. Melt chocolate and add with the amaretto to the cream cheese, mixing until well-blended.

Spread over brownies and refrigerate until firm, at least 1 hour.


In a saucepan, melt the chocolate in the whipping cream. Spread evenly over the brownies. Sprinkle with almonds and cherries. Refrigerate until set, at least 1 hour.

Cut into bars and serve.

Wed, 19 Nov 2008

More Delphi Type Hate

I have simple needs. I have a base class with some generic behavior and subclasses with specific information for that generic behavior. More concretely, the subclasses need to provide the generic behavior with an ordered list of things that designate key fields on database tables. The best representation of those "things" in Delphi seems to be members of an enumeration:

  TKeyField = (kfFoo, kfBar, kfBaz, kfQuux);

Since I need the list of fields to be ordered, I need them in an array:

  TKeyFieldArray = array of TKeyField;

The declaration of the base class is pretty simple:

  TBaseClass = class
    function GetKeyFieldList : TKeyFieldArray; virtual; abstract;
    procedure DoSomethingWithKeyFields;

As is the declaration of the subclass:

  TSubClass = class(TBaseClass)
    function GetKeyFieldList : TKeyFieldArray; override;

So where's the problem? Where's the hate? The hate is in the implementation. If Delphi had array literals, this would be easy. Something like:

function TSubClass.GetKeyFieldList : TKeyFieldArray;
  Result := [kfBar, kfFoo, kfQuux];

But it doesn't. It has some special magic for array literals if they're the parameter to a function, but not anywhere else. It does, however, have a syntax for array constants. Perhaps this will work:

function TSubClass.GetKeyFieldList : TKeyFieldArray;
    keyFieldList : TKeyFieldArray = (kfBar, kfFoo, kfQuux);
  Result := keyFieldList;

But no. That TKeyFieldArray is a dynamic array; Delphi doesn't allocate any space for it, so it can't be a constant value. You have to tell Delphi how big each constant array is, even though you're already telling it how many elements are in the array. So perhaps this is the solution:

function TSubClass.GetKeyFieldList : TKeyFieldArray;
    keyFieldList : array[0..2] of TKeyField = (kfBar, kfFoo, kfQuux);
  Result := keyFieldList;

But no. Because of Delphi's approach to static typing, those are actually different types, and are therefore not assignment-compatible. (See previous hates on this subject.) No, here is the code that Delphi makes me type for what should be a one-line function implementation:

function TSubClass.GetKeyFieldList : TKeyFieldArray;
  SetLength(Result, 3);
  Result[0] := kfBar;
  Result[1] := kfFoo;
  Result[2] := kfQuux;

And just earlier this morning I was pleased because I read that Delphi 2007 (to which I'll soon be upgrading from Delphi 5) has loops, so I can finally have foreach. (Can't get the generics and anonymous functions in Delphi 2009, because we need .NET and that's not yet available for Delphi 2009.) Oh, Delphi. The one hand giveth, and the entire rest of the stupid, anemic, pox-ridden language taketh away.

Mon, 17 Nov 2008

Take My Stuff!

I'm divesting myself of a lot of computer-related things that are taking up too much space in my apartment. Take a look at my stuff and let me know if there's anything you'd like to take off of my hands. Anything left after a couple of weeks will go to the electronics recycling center.

Tue, 14 Oct 2008

Change of Name

On September 27th, 2008, I got married. As a consequence of my marriage, I gave up my last name and took my wife's, going from Phillip Gregory to Phillip Gold. I've been asked about my decision a lot; this is my explanation.

I feel that the prevaling societal standard--the assumption that the woman must go through all the work to change her name and give up the identity she's had since birth--is unfair and an example of gender inequality. Rather than simply make that assumption, Rebecca and I discussed our names a lot before the wedding, starting with what we each wanted out of our married names, and working from there to a mutual decision.

I wanted us to both have the same name, as symbolic of our marriage. I also didn't want a hyphenated last name, because I feel that those are cumbersome and unwieldy. Rebecca also wanted to have a Jewish last name, to honor her cultural heritage. Finally, I was inclined to have a name that started with "G" so my (and her) initials would stay the same.

Our first thought was that we would find a new name that met all of our criteria and both change to that name. Unfortunately, there are only really two common Jewish surnames that start with "G": Gold and Green (plus all the variations thereof), and we couldn't find a variation on Green that we both liked. We started looking at other Jewish surnames, and I realized that I really did want to keep my initials, if only because I have the username "phil_g" on a lot of sites, not least of which is the email address I've had for over a decade now.

So I offered to just take Rebecca's name because that approach accomplished everything we wanted. She was a little hesitant, feeling that doing so would require more of me than her, but we eventually agreed that it seemed the best approach given our requirements.

Postscript: Some people suggested changing my middle name to my old last name, as some married women do. I opted against that approach, because my middle name is the same as my dad's. I would never want to give my child the same first name as myself, but I like the subtle continuity of shared middle names.

Fri, 11 Jul 2008

DVD Video to Matroska Video, Losslessly

I recently had the desire to rip some DVDs so I could watch them on my computer without swapping discs. I figured I could just pull everything from the DVD into Matroska files, since Matroska supports everything that DVDs do. When I went looking on the Internet, I found few resources for moving from DVD to MKV, and everything that did talk about it actually reencoded the DVD video to get it into its final destination. Since Matroska can contain all of the codecs native to DVDs, I wanted to transfer everything losslessly. This is how I did it. (Note that I'm using the Linux command line; I prefer Linux to Windows, and the command line to X.)

The programs I used are as follows:

Some Background

I don't know all the details of how data is stored on DVDs, but here's a rough overview. The video on DVDs is encoded in either MPEG-2 or MPEG-1 with a variable bit rate. The audio can be in raw PCM, DTS, MP2, or AC3. Most DVDs use AC3. Not all DVD players support DTS. Subtitles are stored as bitmaps with associated timecodes governing when to show them on screen.

In a DVD, the basic unit of video is a title. Each title consists of one or more video streams, zero or more audio streams, zero or more subtitle streams, and a list of timepoints to mark chapter boundaries. The titles on a DVD are grouped together into titlesets. The grouping may be arbitrarily-chosen by the DVD manufacturer, but all titles in a given titleset must have the same video encoding parameters (codec, dimensions, framerate, etc.). All of the data for each titleset is concatenated together into VOB format and then split into 1GB chunks. The net result is that there's no one-to-one correspondence between files on the DVD and individual titles. Worse, titles are actually implemented as start and end indices into the VOB stream, so it's entirely possible for titles to overlap each other. This often shows up in TV show DVDs with a "play all" option: all of the episodes are in a single titleset and the "play all" menu option goes to a title that spans the entire titleset, while individual episodes are titles that only span the relevant part of the titleset.

If a title has more than one video stream, one will be the primary stream while the others represent alternate angles. Few DVDs have multiple angles, so I'm not sure how the data for those works; all of the DVDs I've seen just have a single video stream for each title.

Also note that not all of the titles on a DVD are the feature content. Almost every bit of video, including DVD extras like bloopers and "making of" videos, is stored as a title. The one exception is the DVD menus. Those are also stored on the DVD as VOBs, but they're indexed differently, so they don't show up as titles. Be aware that DVD easter eggs, including some apparently-longer videos, are often implemented as menus, so they won't show up as titles.

Any of a DVD's titles may be encrypted with CSS. Either the DVD player or the DVD drive must have a licensed CSS decryption key in order to read the encrypted data. Fortunately, CSS is somewhat weak, and most Linux programs for accessing DVDs use libdvdcss to bypass the encryption.

Ripping the DVD

Ripping the DVD isn't strictly necessary, but it helps to have all of the data on your hard drive for processing. Even if you don't copy the videos to your hard drive, you'll have to mount the DVD and use its IFO files; I'll get to that later.

The easiest way to rip the DVD is with dvdbackup. It creates a directory for the DVD and then puts a VIDEO_TS subdirectory in the DVD directory. The VIDEO_TS directory contains all of the files in the DVD's VIDEO_TS directory. (Or, at least, it will if you use the -M option; other options give more restricted results.)

dest_dir=<destination directory>
dvd_name=<DVD name>
dvd_device=<DVD device, e.g. /dev/dvd>
dvdbackup -M -i $dvd_device -o $dest_dir -n $dvd_name

In theory, you could also mount the DVD and just copy all of the files over, but that has not worked well for me in the past, partly because of CSS problems, but also partly because my drive is a little wonky.

You can also just take an image of the DVD with dd. You'll need to disable the CSS beforehand. I've found that just running xine on the DVD is sufficient.

dest_dir=<destination directory>
dvd_name=<DVD name>
dvd_device=<DVD device, e.g. /dev/dvd>
xine dvd://
dd if=${dvd_device} bs=2048 conv=sync,noerror of=${dest_dir}/${dvd_name}.iso

If you have pv installed, you can get a fancy progress bar.

dest_dir=<destination directory>
dvd_name=<DVD name>
dvd_device=<DVD device, e.g. /dev/dvd>
xine dvd://
dd if=${dvd_device} bs=2048 conv=sync,noerror |
  pv -s $(fdisk -l $dvd_device |
          perl -nle 'm{^Disk '${dvd_device}': \d+ MB, (\d+) bytes$} and print $1') \

Get Disc Info

Whether you've ripped the DVD to disk or not, you need to see what's on it. Change into your working directory and run lsdvd. (NB: From here on out, unless otherwise noted, all commands that reference a DVD will work equally well with a device (e.g. /dev/dvd), a disc image (like the one created with dd), or a directory containing a VIDEO_TS directory structure.)

dvd=<DVD device, image, or directory>
lsdvd -a -n -c -s -v $dvd > contents

Rip Each Title

The first order of business is to get the title data off of the DVD. tccat will pull just the given title's stream out of the DVD. (Note that the resulting file has the possibility of exceeding 7GB in size; make sure your filesystem can handle files that large.)

title=<title number, e.g. 01>
dvd=<DVD device, image, or directory>
tccat -i $dvd -t dvd -T ${title},-1 >${title}.vob

The information about the title's chapters isn't in the VOB, so you'll have to extract that separately with dvdxchap. In my experience, dvdxchap never gets useful information for the chapter names (perhaps the DVD only contains the timepoints with no names associated), so you may want to edit the resulting file to put in more meaningful names. (Note that mplayer will output chapter information if you use its -identify option, but dvdxchap is more precise in its timing and also generates the data in the format that mkvmerge wants.)

dvdxchap -t $title $dvd > ${title}.chapters

I've seen DVDs where the TOC info as reported by lsdvd doesn't match the actual streams in the titles, so it's good to check the track directly. Ideally, tcprobe would give all the information about the streams, but while it gives good information about audio and video streams, it doesn't give all the details we'll need about subtitle streams. Thus, we need to use mplayer. mplayer gives audio stream ids in decimal, not hex, so the first audio stream will show as 128, not 0x80. It numbers the subtitle streams from zero, though, so you have to add 0x20 to the numbers it gives to get the actual subtitle stream ids.

mplayer -dvd-device $dvd -vo null -ao null -frames 0 -v dvd://${title} 2>&1 | egrep '[as]id' > ${title}.streams

In an ideal world, mkvmerge would be able to operate directly on the VOB, but when I tried that, it had problems demuxing the data and it died halfway through. So I'll use tcextract to pull out the individual components. Video first.

tcextract -i ${title}.vob -t vob -x mpeg2 >${title}.video.m2v

Next up are the audio tracks. The VOB may contain more than one audio track. They should be labeled as to to their language, but check mplayer's info, not lsdvd's. mplayer's info will also tell what format the audio is in. tcextract wants the audio tracks numbered from zero, but mplayer reports their actual track ids, which usually start at 128 and go up from there. The lowest-numbered track is track 0 to tcextract, and so on.

lang=<language code>
track=<source audio track: 0, 1, 2, etc.>
format=<extension for audio format; e.g. ac3, mp2>
tcextract -i ${title}.vob -t vob -x $format -a $track >${title}.audio-${lang}.${format}

The VOB also contains subtitles, although most programs that query it won't see them. Unlike when extracting audio, tcextract requires that you use the absolute track number, but mplayer reports a relative number. You will need to add 0x20, or 32 to the value that mplayer reports for the subtitle tracks. Some of the information for subtitles is stored in .IFO files on the DVD. Each titleset has its own .IFO file; check the contents file to see what titleset contains the track and use that titleset's .IFO file. It will be in the VIDEO_TS directory, named VTS_<titleset number>_0.IFO.

Matroska supports several subtitle formats, but VobSub is probably the easiest to use, because it's a series of bitmaps, just like the DVD subtitles. If you're not happy with VobSub, you'll need to OCR each image to get its text; there are instructions for doing so elsewhere on the Internet.

lang=<language code>
stream_id=<id of the subtitle stream: 0x20, 0x21, 32, 33, etc.>
ifo=<IFO file; e.g. /path/to/VIDEO_TS/VTS_nn_0.IFO>
tcextract -i ${title}.vob -t vob -x ps1 -a $stream_id >${title}.subs-${lang}.raw
subtitle2vobsub -p ${title}.subs-${lang}.raw -i $ifo -o ${title}.subs-${lang}

Finally, it's time to bring everything together with mkvmerge. When I use <title>, I mean the actual textual title for the video, like "Bob's House of Horror 2" or whatever. ${title} still refers to the title number on the DVD.

mkvmerge -o <final filename> \
         --title <title> \
         --chapters ${title}.chapters \
         ${title}.video.m2v \
         <audio clauses> \
         <subtitle clauses>

For each audio file, you'll need a clause giving the file and its language. The first file you list on the command line will be the default audio, unless you use mkvmerge's --default-track option to change it.

--language 0:${lang} ${title}.audio-${lang}.ac3

Likewise, you'll need a clause for each subtitle file. Since I generally don't want any subtitles displayed by default, I set things so that there isn't a default subtitle track.

--language 0:${lang} --default-track 0:0 ${title}.subs-${lang}.idx

And that should do it. After a fair bit of disk-churning, you should have a Matroska file containing all of the elements from the original DVD title. You can now delete all of your intermediate files and just keep the MKV on your computer and the DVD in its box.

Thu, 29 May 2008

Auto-locking My Computer When I Walk Away

The other day, while I was wating for several GB to transfer over the network at work, I finally got around to setting something that's been dancing at the back of my mind for a while: computer-based proximity detection using Bluetooth.

I have a Treo 650. It has Bluetooth. I also have a USB Bluetooth Adapter. I originally planned to carry the bluetooth adapter around and hook it up to different computers whenever I wanted to talk to the Treo, but I've only been using it at work, so I've been leaving the adapter connected to my Linux computer at work. The thought occurred to me that I could use the Bluetooth adapter to see whether my phone was nearby and do things based on that information. At least to start, I decided to have the computer lock itself when I wasn't around.

I have the BlueZ Bluetooth stack installed. (On Debian, that's the bluez-utils package.) They include a l2ping program, but that establishes a full Bluetooth connection with the device, which makes my Treo turn on the screen, play a little sound, and show a pop-up dialog. That's a little intrusive for something that I want checked several times a minute. Some people use hcitool rssi to find out the strength of the phone's (or other device's) Bluetooth signal. That also requires a full Bluetooth connection. I ended up using hcitool name, which returns the name of the device if it's found and nothing if it's not. More importantly, it doesn't cause the Treo to do anything but silently send its response, and it works even if the Treo screen is off.

So I now have a stupid little shell script that looks like this:



while true; do
  if [ "$(hcitool name $PHONE_ADDR)" \!= "$PHONE_NAME" ]; then
    xscreensaver-command -lock
  sleep $WAIT_TIME

There are programs for Windows that do similar things. Possibly one of the simplest is Blue Lock, which is also open-source (and written in Delphi). I'm probably just going to write a simple Windows program to listen on the network for a message from my Linux computer to tell it to lock the screen.

Fri, 02 May 2008

New Site Hosting

In the interests of better site availability and less Comcast AUP-breaking, I've finally gotten around to outsourcing my website hosting. I'm currently at, a webhost committed to the twin goals of free speech and affordable web hosting.

How free is their speech? Read their Abuse page:

"A NearlyFreeSpeech.NET member site is defaming me or otherwise injuring me civilly."

Please forward a copy of your legal finding from a court of competent jurisdiction to our contact address. If you have not yet obtained such a finding, a preliminary injunction or court order is also sufficient.

If you are not able to obtain the above, you will need to work directly with the site operator to resolve your differences. We will have to fall back on our members' contractual assertion that the content they upload is legitimate and therefore we will not be able to get involved

How affordable is their hosting? You pay only for the bandwidth and storage that you actually use: $1 per gigabyte of bandwidth and $0.01 per megabyte-month of storage. (Plus the bandwidth cost goes down the more you use.)

They support a variety of CGI scripting languages, including C, PHP, Perl, Python, and Ruby. Oh, but also Fortran, Tcl, Lisp, Scheme, OCaml, and Haskell.

We'll see how it goes, but I think I'll like it here.

Tue, 19 Feb 2008

Java Reflection

I'm only a few weeks into my Java class and I'm already annoyed at the language. I'm completely willing to ascribe this to newbieness, where I'm just not working with what the language gives me, but the metaobject stuff in Java seems a bit painful.

I'm working on a project for the class where I have to accept input in several different units of heat (BTUs, calories, and joules) and output the measurement in joules. I've made an abstract base class for the various units and created concrete classes for each unit the program has to read. I'd like to just have a list of available classes and have my program enumerate them automatically (rather than hardcoding the behavior for each), but the way I would normally think about doing this is painful in Java.

In Delphi, I'd do something like this:

  THeatUnits = class
    constructor Create(Value : Real); virtual;
    class function GetUnitsName : String; virtual; abstract;
    function ConvertToJoules : Real; virtual; abstract;

  THeatUnitsClass = class of THeatUnits;

  TBTUs = class(THeatUnits);
  TCalories = class(THeatUnits);
  TJoules = class(THeatUnits);

  availableUnits : array [1..3] of THeatUnitsClass
                 = (TBTUs, TCalories, TJoules);


procedure DoStuff(Index : Integer; Value : Real);
    Units : THeatUnits;
  Units := availableUnits[Index].Create(Value);
  // Now do things polymorphically with Units.

Delphi's class types often seem like a quick hack to me, but they beat what Java does in the same situation. For one thing, there doesn't seem to really be a class type in the same way that Delphi does it. There are instead objects of type Class. As far as I can tell, the best way to get one of those is to call Class.forName("ClassName"). But the painful part is that there's no specialization of class types at compile time, so the Java code equivalent to my Units := availableUnits[Index].Create(Value); above would be something like this:

static final AVAILABLE_UNITS = new String[] ("BTUs", "Calories", "Joules");

public void doStuff(int index, double value) {
  Class unitClass = Class.forName(AVAILABLE_UNITS[index]);
  Constructor unitConstructor = unitClass.getConstructor(new Class[] (double.class));
  HeatUnits units = (HeatUnits)unitConstructor.newInstance(new Object[] (new Double(value)));
  // now do things polymorphically with units.

(Common Lisp, of course, would be more succinct than Delphi, because everything is first-class; I would probably do something like this:

(defclass heat-units () ())

(defgeneric get-units-name (unit-class))
(defgeneric convert-to-joules (unit-class))

(defclass btus (heat-units) ())
(defclass calories (heat-units) ())
(defclass joules (heat-units) ())

(defparameter +available-units+ #(btus calories joules))

(defun do-stuff (index value)
  (let ((units (make-instance (svref +available-units+ index)
                              :value value)))
    ;; Now do things polymorphically with units.


As I said, though, I think this is just an artifact of not thinking in Java to the appropriate degree. Most of Java's reflection stuff seems set up to be useful at run-time while Delphi's run-time reflection is much uglier than Java's. And I think I'm going to approach my Java problem from a different direction, with an enum and a factory method. I was still struck by how annoyingly wordy (and not entirely typesafe) my first approach turned out to be in Java.

Phil! Gold