Fri, 11 Jul 2008

DVD Video to Matroska Video, Losslessly

I recently had the desire to rip some DVDs so I could watch them on my computer without swapping discs. I figured I could just pull everything from the DVD into Matroska files, since Matroska supports everything that DVDs do. When I went looking on the Internet, I found few resources for moving from DVD to MKV, and everything that did talk about it actually reencoded the DVD video to get it into its final destination. Since Matroska can contain all of the codecs native to DVDs, I wanted to transfer everything losslessly. This is how I did it. (Note that I'm using the Linux command line; I prefer Linux to Windows, and the command line to X.)

The programs I used are as follows:

Some Background

I don't know all the details of how data is stored on DVDs, but here's a rough overview. The video on DVDs is encoded in either MPEG-2 or MPEG-1 with a variable bit rate. The audio can be in raw PCM, DTS, MP2, or AC3. Most DVDs use AC3. Not all DVD players support DTS. Subtitles are stored as bitmaps with associated timecodes governing when to show them on screen.

In a DVD, the basic unit of video is a title. Each title consists of one or more video streams, zero or more audio streams, zero or more subtitle streams, and a list of timepoints to mark chapter boundaries. The titles on a DVD are grouped together into titlesets. All of the data for each titleset is concatenated together into VOB format and then split into 1GB chunks. The net result is that there's no one-to-one correspondence between files on the DVD and individual titles.

If a title has more than one video stream, one will be the primary stream while the others represent alternate angles. Few DVDs have multiple angles, so I'm not sure how the data for those works; all of the DVDs I've seen just have a single video stream for each title.

Also note that not all of the titles on a DVD are the feature content. Practically everything displayed by a DVD, including the splash image, copyright warning, ads and trailers, and even the menus is a DVD title.

Any of a DVD's titles may be encrypted with CSS. Either the DVD player or the DVD drive must have a licensed CSS decryption key in order to read the encrypted data. Fortunately, CSS is somewhat weak, and most Linux programs for accessing DVDs use libdvdcss to bypass the encryption.

Ripping the DVD

Ripping the DVD isn't strictly necessary, but it helps to have all of the data on your hard drive for processing. Even if you don't copy the videos to your hard drive, you'll have to mount the DVD and use its IFO files; I'll get to that later.

The easiest way to rip the DVD is with dvdbackup. It creates a directory for the DVD and then puts a VIDEO_TS subdirectory in the DVD directory. The VIDEO_TS directory contains all of the files in the DVD's VIDEO_TS directory. (Or, at least, it will if you use the -M option; other options give more restricted results.)

dest_dir=<destination directory>
dvd_name=<DVD name>
dvd_device=<DVD device, e.g. /dev/dvd>
dvdbackup -M -i $dvd_device -o $dest_dir -n $dvd_name

In theory, you could also mount the DVD and just copy all of the files over, but that has not worked well for me in the past, partly because of CSS problems, but also partly because my drive is a little wonky.

You can also just take an image of the DVD with dd. You'll need to disable the CSS beforehand. I've found that just running xine on the DVD is sufficient.

dest_dir=<destination directory>
dvd_name=<DVD name>
dvd_device=<DVD device, e.g. /dev/dvd>
xine dvd://
dd if=${dvd_device} bs=2048 conv=sync,noerror of=${dest_dir}/${dvd_name}.iso

If you have pv installed, you can get a fancy progress bar.

dest_dir=<destination directory>
dvd_name=<DVD name>
dvd_device=<DVD device, e.g. /dev/dvd>
xine dvd://
dd if=${dvd_device} bs=2048 conv=sync,noerror |
  pv -s $(fdisk -l $dvd_device |
          perl -nle 'm{^Disk '${dvd_device}': \d+ MB, (\d+) bytes$} and print $1') \
  >${dest_dir}/${dvd_name}.iso

Get Disc Info

Whether you've ripped the DVD to disk or not, you need to see what's on it. Change into your working directory and run lsdvd. (NB: From here on out, unless otherwise noted, all commands that reference a DVD will work equally well with a device (e.g. /dev/dvd), a disc image (like the one created with dd), or a directory containing a VIDEO_TS directory structure.)

dvd=<DVD device, image, or directory>
lsdvd -a -n -c -s -v $dvd > contents

Rip Each Title

The first order of business is to get the title data off of the DVD. mplayer's -dumpstream option will pull just the given title's stream out of the DVD. (Note that the resulting file has the possibility of exceeding 7GB in size; make sure your filesystem can handle files that large.)

title=<title number, e.g. 01>
dvd=<DVD device, image, or directory>
mplayer dvd://${title} -dvd-device $dvd -dumpstream -dumpfile ${title}.vob

The information about the title's chapters isn't in the VOB, so you'll have to extract that separately with dvdxchap. In my experience, dvdxchap never gets useful information for the chapter names (perhaps the DVD only contains the timepoints with no names associated), so you may want to edit the resulting file to put in more meaningful names. (Note that mplayer will output chapter information if you use its -identify option, but dvdxchap is more precise in its timing and also generates the data in the format that mkvmerge wants.)

dvdxchap -t $title $dvd > ${title}.chapters

I've seen DVDs where the TOC info as reported by lsdvd doesn't match the actual streams in the titles, so it's good to check the track directly with mplayer. mplayer gives audio stream ids in decimal, not hex, so the first audio stream will show as 128, not 0x80. It numbers the subtitle streams from zero, though, so you have to add 0x20 to the numbers it gives to get the actual subtitle stream ids.

mplayer -dvd-device $dvd -vo null -ao null -frames 0 -v dvd://${title} 2>&1 | egrep '[as]id' > ${title}.streams

In an ideal world, mkvmerge would be able to operate directly on the VOB, but when I tried that, it had problems demuxing the data and it died halfway through. So I'll use mplayer to pull out the individual components. Video first.

mplayer ${title}.vob -dumpvideo -dumpfile ${title}.video.m2v

Next up are the audio tracks. The VOB may contain more than one audio track. They should be labeled as to to their language, but check mplayer's info, not lsdvd's. mplayer's info will also tell what format the audio is in. mplayer will accept the audio stream ids in either decimal or hex.

lang=<language code>
track=<source audio track: 128, 129, 0x80, 0x81, etc.>
format=<extension for audio format; e.g. ac3, mp2>
mplayer ${title}.vob -aid $audio_track -dumpaudio -dumpfile ${title}.audio-${lang}.${format}

The VOB also contains subtitles, although most programs that query it won't see them. I tried using mplayer's -dumpsub command to get the subtitles, but it didn't work properly for me. So we'll just use the info from mplayer to get the language and stream id for each stream, but then use transcode's tcextract to actually pull the data out. Remember to add 0x20 to mplayer's stream ids. Some of the information for subtitles is stored in .IFO files on the DVD. Each titleset has its own .IFO file; check the contents file to see what titleset contains the track and use that titleset's .IFO file. It will be in the VIDEO_TS directory, named VTS_<titleset number>_0.IFO.

Matroska supports several subtitle formats, but VobSub is probably the easiest to use, because it's a series of bitmaps, just like the DVD subtitles. If you're not happy with VobSub, you'll need to OCR each image to get its text; there are instructions for doing so elsewhere on the Internet.

lang=<language code>
stream_id=<id of the subtitle stream>
ifo=<IFO file; e.g. /path/to/VIDEO_TS/VTS_nn_0.IFO>
<${title}.vob tcextract -t vob -x ps1 -a $stream_id >${title}.subs-${lang}.raw
subtitle2vobsub -p ${title}.subs-${lang}.raw -i $ifo -o ${title}.subs-${lang}

Finally, it's time to bring everything together with mkvmerge. When I use <title>, I mean the actual textual title for the video, like "Bob's House of Horror 2" or whatever. ${title} still refers to the title number on the DVD.

mkvmerge -o <final filename> \
         --title <title> \
         --chapters ${title}.chapters \
         ${title}.video.mpg \
         <audio clauses> \
         <subtitle clauses>

For each audio file, you'll need a clause giving the file and its language. The first file you list on the command line will be the default audio, unless you use mkvmerge's --default-track option to change it.

--language 0:${lang} ${title}.audio-${lang}.ac3

Likewise, you'll need a clause for each subtitle file. Since I generally don't want any subtitles displayed by default, I set things so that there isn't a default subtitle track.

--language 0:${lang} --default-track 0:0 ${title}.subs-${lang}.idx

And that should do it. After a fair bit of disk-churning, you should have a Matroska file containing all of the elements from the original DVD title. You can now delete all of your intermediate files and just keep the MKV on your computer and the DVD in its box.

Thu, 29 May 2008

Auto-locking My Computer When I Walk Away

The other day, while I was wating for several GB to transfer over the network at work, I finally got around to setting something that's been dancing at the back of my mind for a while: computer-based proximity detection using Bluetooth.

I have a Treo 650. It has Bluetooth. I also have a USB Bluetooth Adapter. I originally planned to carry the bluetooth adapter around and hook it up to different computers whenever I wanted to talk to the Treo, but I've only been using it at work, so I've been leaving the adapter connected to my Linux computer at work. The thought occurred to me that I could use the Bluetooth adapter to see whether my phone was nearby and do things based on that information. At least to start, I decided to have the computer lock itself when I wasn't around.

I have the BlueZ Bluetooth stack installed. (On Debian, that's the bluez-utils package.) They include a l2ping program, but that establishes a full Bluetooth connection with the device, which makes my Treo turn on the screen, play a little sound, and show a pop-up dialog. That's a little intrusive for something that I want checked several times a minute. Some people use hcitool rssi to find out the strength of the phone's (or other device's) Bluetooth signal. That also requires a full Bluetooth connection. I ended up using hcitool name, which returns the name of the device if it's found and nothing if it's not. More importantly, it doesn't cause the Treo to do anything but silently send its response, and it works even if the Treo screen is off.

So I now have a stupid little shell script that looks like this:

#!/bin/sh

PHONE_ADDR=01:23:45:67:89:AB
PHONE_NAME="glamdring"
WAIT_TIME=15

while true; do
  if [ "$(hcitool name $PHONE_ADDR)" \!= "$PHONE_NAME" ]; then
    xscreensaver-command -lock
  fi
  sleep $WAIT_TIME
done

There are programs for Windows that do similar things. Possibly one of the simplest is Blue Lock, which is also open-source (and written in Delphi). I'm probably just going to write a simple Windows program to listen on the network for a message from my Linux computer to tell it to lock the screen.

Tue, 19 Feb 2008

Java Reflection

I'm only a few weeks into my Java class and I'm already annoyed at the language. I'm completely willing to ascribe this to newbieness, where I'm just not working with what the language gives me, but the metaobject stuff in Java seems a bit painful.

I'm working on a project for the class where I have to accept input in several different units of heat (BTUs, calories, and joules) and output the measurement in joules. I've made an abstract base class for the various units and created concrete classes for each unit the program has to read. I'd like to just have a list of available classes and have my program enumerate them automatically (rather than hardcoding the behavior for each), but the way I would normally think about doing this is painful in Java.

In Delphi, I'd do something like this:

type
  THeatUnits = class
   public
    constructor Create(Value : Real); virtual;
    class function GetUnitsName : String; virtual; abstract;
    function ConvertToJoules : Real; virtual; abstract;
  end;

  THeatUnitsClass = class of THeatUnits;

  TBTUs = class(THeatUnits);
  TCalories = class(THeatUnits);
  TJoules = class(THeatUnits);

const
  availableUnits : array [1..3] of THeatUnitsClass
                 = (TBTUs, TCalories, TJoules);

...

procedure DoStuff(Index : Integer; Value : Real);
  var
    Units : THeatUnits;
begin
  Units := availableUnits[Index].Create(Value);
  // Now do things polymorphically with Units.
end;

Delphi's class types often seem like a quick hack to me, but they beat what Java does in the same situation. For one thing, there doesn't seem to really be a class type in the same way that Delphi does it. There are instead objects of type Class. As far as I can tell, the best way to get one of those is to call Class.forName("ClassName"). But the painful part is that there's no specialization of class types at compile time, so the Java code equivalent to my Units := availableUnits[Index].Create(Value); above would be something like this:

static final AVAILABLE_UNITS = new String[] ("BTUs", "Calories", "Joules");

public void doStuff(int index, double value) {
  Class unitClass = Class.forName(AVAILABLE_UNITS[index]);
  Constructor unitConstructor = unitClass.getConstructor(new Class[] (double.class));
  HeatUnits units = (HeatUnits)unitConstructor.newInstance(new Object[] (new Double(value)));
  // now do things polymorphically with units.
}

(Common Lisp, of course, would be more succinct than Delphi, because everything is first-class; I would probably do something like this:

(defclass heat-units () ())

(defgeneric get-units-name (unit-class))
(defgeneric convert-to-joules (unit-class))

(defclass btus (heat-units) ())
(defclass calories (heat-units) ())
(defclass joules (heat-units) ())

(defparameter +available-units+ #(btus calories joules))

(defun do-stuff (index value)
  (let ((units (make-instance (svref +available-units+ index)
                              :value value)))
    ;; Now do things polymorphically with units.
    ))

)

As I said, though, I think this is just an artifact of not thinking in Java to the appropriate degree. Most of Java's reflection stuff seems set up to be useful at run-time while Delphi's run-time reflection is much uglier than Java's. And I think I'm going to approach my Java problem from a different direction, with an enum and a factory method. I was still struck by how annoyingly wordy (and not entirely typesafe) my first approach turned out to be in Java.

Fri, 31 Aug 2007

Finding Duplicate Elements in an Array

I came across an interesting programming puzzle today, and I'd like to share a couple of variants on it.

To start with, let's say we have an array A that contains some numbers. For simplicity, let's say that the array is 1-based. How can we efficiently find out if any of those numbers are duplicates?

The easiest approach is to make a hash table to keep track of the numbers we've seen so far. Something like this:

Given an array A of length n,

let h ← (new hash table)
for 1 <= i <= n:
  if A[i] is present in h: return A[i]
  set h[A[i]] ← True
return <no duplicates>

Now, suppose that the array is of length n and only contains positive integers less than n. We can be sure (by the pigeonhole principle) that there is at least one duplicate. The bounds on the values mean that we can be a little more efficient and store the already-seen numbers in an array rather than a hash table:

let s ← array [1..n]
initialize s to all zeroes
for 1 <= i <= n:
  if s[A[i]] > 0: return A[i]
  set s[A[i] ← 1

Now, suppose that we have to do this in constant space; no more creating hash tables and arrays on the fly. Is if still possible? If you add the constraint that there be exactly one duplicate value, this is easy.

Since there are n values from 1 to n-1 with one duplicate, we can use the fact that the sum of the integers from 1 to m is (m*(m+1))/2, take the sum from 1 to n-1, subtract that from the actual sum of numbers in the array, and the result will be the duplicated number:

let s ← 0
for 1 <= i <= n:
  s ← s + A[i]
return s - n*(n-1)/2

But suppose there are no guarantees about the number of duplicates, and the only assurance you have is that the values are between 0 and n, exclusive. Is it still possible to find a duplicated value in O(n) time and O(1) space? Feel free to stop here and try to work things out yourself, if you want.

As you might have guessed from the length of this entry, there is an approach that will work. It's something of a radical departure from the previous approaches, and it relies rather heavily on the particular bounds we have on the values in the array.

First, there's the solution-by-cheating. If you're working in a language with bounded integers, you can exploit that knowledge to meet the letter of the law in the "constant space" requirement.

Let's say we're using a language and platform where integers are 32 bits, signed. The maximum value for an array element would therefore be 231-1, while the minimum value would still be 1. We only need one bit of information per value (seen/not seen), so, at 8 bits per byte, an array 223 bytes (8MB) long would be enough to track any set of integers the environment is capable of passing. This algorithm is therefore O(n) time and O(1) space on a system with 32-bit signed integers:

let s ← array [1..2^31-1] of bit
for 1 <= i <= n:
  if s[A[i]] = 1: return A[i]
  set s[A[i]] ← 1

Of course, it relies on implementation details that might change (a similar array on a 64-bit system would require 32 petabytes of memory) and it might be nice to have an algorithm that works regardless of any artifical bounds on your integers. Fortunately, there are a couple more approaches we can take.

Recall that the array is 1-based, so its elements are numbered from 1 to n. The values are positive integers less than n, so they can range anywhere from 1 to n-1. Because the values are a subset of the array bounds, we can treat those values as indices into the array itself.

If we're allowed to modify the array, we can just go through it reordering the elements, trying to put each value into the corresponding element (so the value 1 goes into the first element, 2 goes into the second element, and so on). When we find a collision, that's the duplicate.

for 1 <= i <= n:
  while A[i] ≠ i:
    if A[A[i]] = A[i]: return A[i]
    swap(A[A[i]], A[i])

If the array is immutable, though, the solution is a little more involved. Given the view of each value as an index into the array, we can treat the array as a directed graph, where each element is a node and the value of that element is an edge pointing at the next node. Since every node points to another node, there is at least one cycle in the graph. We can also conclude that every node leads to a cycle, even if it is not directly part of one.

(Side note on terminology, for those not immediately familiar: A graph is simply a collection of nodes and edges. Each edge connects exactly two nodes. In a directed graph, each edge also has a direction; it goes from one node to another. If you start at a particular node, follow (one of) its edges to the next node, continue doing that for a certain amount of time, and eventually come back to the starting node, that's a cycle. If you start at a particular node, follow its edges out in the same way, and end up in a cycle that does not contain the starting node, all of the nodes and edges you crossed before getting to the cycle are called the tail. The first node that you reach that is in the cycle is defined to be the beginning of the cycle. The symbols λ (lambda) and μ (mu) represent the lengths of the tail and cycle, respectively. Technically, the directed graph we're dealing with here is also a sequence, since every node has at most one edge leading away from it (though it might have more than one leading into it).)

Since the values in the array are all less than n, no element in the array can point to the nth element. Therefore, the last element cannot be part of a cycle. Since every element must lead to a cycle, it must be part of a tail. If we follow the sequence from this point, we will eventually run into a cycle, and the last element in the tail will be a duplicate value.

So, how do we find the beginning of the cycle? The easiest approach is to use Floyd's cycle-finding algorithm. It works roughly like this: Start at the beginning of the sequence. Keep track of two values (call them ai and aj). At each step of the algorithm, move ai one step along the sequence, but move aj two steps. Stop when ai = aj.

At this point, the difference between i and j must be a multiple of the cycle length. If the algorithm took m steps to get to this point, then i = m and j = 2m, so m is a multiple of the cycle length. i has also traveled λ steps along the tail and m-λ steps along the cycle.

If we now start another value (call it ak) at the beginning of the sequence and move both it and ai simultaneously and at the same rate, after λ steps, ak will have traveled the entire tail and will have arrived at the beginning of the cycle. At the same time, ai will have moved a total of m steps along the cycle and, since m is a multiple of μ, will also have arrived at the beginning of the cycle. Thus, if you stop as soon as ai = ak, you will have found the beginning of the cycle and the end of the tail.

So. Applying all of that to our problem means that we start at the last element of the array, have two variables following the pointers in the array, one going twice as fast as the other, until they are equal to each other. At that point, start one of them back at the beginning and run them at the same speed until they again equal each other. Their value will then be the duplicate value.

let i ← n, j ← n
do: i ← A[i], j ← A[A[j]]; until i = j
set j ← n
do: i ← A[i], j ← A[j]; until i = j
return i

This algorithm also lends itself nicely to functional programming, since it doesn't actually require any side effects, so here's an implementation in Haskell:

import Data.Array

findDup a = findDup' n $ findCycle (a!n) (a!(a!n))
    where n = snd $ bounds a
          findCycle i j | i == j    = i
                        | otherwise = findCycle (a!i) (a!(a!j))
          findDup' i j | i == j    = i
                       | otherwise = findDup' (a!i) (a!j)

Sun, 01 Jul 2007

Reply-To: for Mailing Lists

Many mailing lists I'm on (particularly ones inhabited by sizable portions of geeks) have periodic "discussions" about the behavior of the Reply-To header on list email. The discussions usually follow fairly predictable paths. "Reply-To" Munging Considered Harmful is quoted. Proponents respond with Reply-To Munging Considered Useful.

My opinion is threefold:

First, not touching the header is the purest solution. For all the reasons in "Reply-To Munging Considered Harmful", mailing lists shouldn't touch it, in an ideal world, at least.

Second, "Reply-To Munging Considered Useful" gets one things right: it adds functionality. Even today, most mail clients can either "reply" or "reply-to-all". If "reply" goes to the message originator, then they're left with "reply-to-all" as the easiest way to send a message to the list. (And sending mail to the list should, in most cases, be easy.) Some MUAs, such at mutt and KMail [please email me if you know of others] add a "list-reply" function. If most common MUAs had such a feature, I would find it much easier to advocate against munging. But they don't, and it's rather elitist to insist that people switch MUAs just for your mailing list. So, in today's world and for an average list, I tend to vote for munging the header. (Though I really like the (void) mailing list, which provides twin lists, one with munging, the other without. All posts do, of course, appear on both lists.)

(Note that "reply-to-all" is not the same as "list-reply". "list-reply" sends one message--it goes to the list (unless the original email has a Mail-Followup-To: header, but that's another post). "reply-to-all" sends one message to the list, plus one message for each sender and recipient of the original email. That adds up quickly for involved discussion threads, and all of those extra messages are wasts of both time (human and computer) and bandwidth.)

Third, I don't have to care either way, because I use mutt and procmail. (And, actually, the procmail part is optional.) In my .procmailrc is the following recipe:

:0 fhw
| sed -e "s/^\\(Subject:.*\\)\\[$MATCH\\]\\( \\)*/\\1/I" \
      -e "/^Reply-To:.*$MATCH@/ d" \
      -e "s/^Old-Reply-To:/Reply-To:/"

$MATCH is already set to the name of the mailing list, which is usually also the local portion of its email address. (Note that this loses information if the list doesn't munge the header and someone has their own Reply-To header that happens to have the same local part as the mailing list. In my experience, this is highly unlikely.)

My mutt config contains the following directives:

set ignore_list_reply_to=yes
set reply_to=yes
subscribe <list addresses>

so if a Reply-To header is set to an email address that mutt knows to be a mailing list, it ignores that header. (And since those are ignored, I set "reply_to" to yes, so it always uses acceptable headers; by default it asks whether I want to use them.) Thus, the "r" key always replies to the person who sent the message, the "g" key always replies to the sender and all recipients of the message, and the "L" key sends the reply to the mailing list the email was from (and any other addresses in the Mail-Followup-To header).

Thu, 08 Mar 2007

Common Lisp Testing Frameworks

I've been evaluating the various Common Lisp testing frameworks. I have a large body of code in my Project Euler stuff, and I use unit testing extensively; extensively enough that the simplistic unit testing package I had been using was getting unwieldy. So I figured I'd take a look at what was available and see how everything stacked up.

Since this page is long, I'll present my summary up front: If your testing needs are simple, use lisp-unit. It you need many, well-organized tests (as I now do), use either Stefil or FiveAM, depending on whether you want highly-interactive or noninteractive tests, respectively.

First, a bit about my requirements. I use packages extensively in my Project Euler setup. I have a main package, :euler, which contains "common" code, both things that are used extensively, like my primality testing function, and things that might only be used in one place, but that seemed like they belonged in the common section, like the implementation of Dijkstra's shortest-path algorithm or the function for finding a minimum spanning tree. I have unit tests for most of these functions. (Not all yet, but I'm working on it.) There is a separate package for each problem, into which goes at least a function named get-answer as well as any other problem-specific code and tests. The problem-specific tests include at least tests to check my code against the examples given in the problems and (when I've gotten it) a test for the answer to the problem. I generally want to do one of three things: test everything, test just one problem's code, or test the library functions. I occasionally want to test just the answers to all of the problems, too.

In setting up the tests, I wrote a few macros and put all of my tests in those so I could redefine the test framework easily. When I could use hierarchies, I set them up as follows: suite euler-tests is the root of the tree. It contains euler-functions and euler-problems. euler-functions contains a suite/test (depending on the particular framework) for each common function. euler-problems contains one suite for each problem, of the form euler-<problem-number>-tests. That suite contains a test/suite for the examples given in the problem (euler-<problem-number>-givens), one for the answer (euler-<problem-number>-answer), and possibly one for each internal function I want to test (euler-<problem-number>-<function-name>). In my example runs, I've deliberately introduced several errors: a function signalling an error when it shouldn't and vice versa, a function returning t when it should return nil and vice versa, and a couple of functions that return an incorrect (numerical) value.

lisp-unit

lisp-unit was the package I'd been using, and I like it because it's dead simple to work with. It uses packages to group the tests, so your existing package structure serves the part of test suites. Obviously, they're not even hierarchical, let alone composable. There are no fixtures. While the design lacks some flexibility, since it's tied directly to the package system, it makes up for it in simplicity. Because there's no need to declare and organize your test suites, adding a test is as simple as

(lisp-unit:define-test <test-name>
  <test-clauses>)

The test clauses use the fairly obvious assert-true, assert-false, assert-equal, and assert-error tests (plus a couple of others). Running the tests is either a matter of (run-tests <test-names>), which runs tests in the current package (if no names are given, all tests are run); or (run-all-tests <package> <test-names>) to use a different package (again, listing no tests will run them all).

The main drawbacks for large bodies of tests are that there's no composition of test suites, so you have to give explicit commands to run all of your tests, and it prints one line for each test. Here's a sample run:

* (eval
   `(progn
      (lisp-unit:run-all-tests euler)
      ,@(loop for i from 1 to 142
              collect `(lisp-unit:run-all-tests
                        ,(intern (format nil "E~3,'0D" i)))))))
EULER-BOUNCYP: 17 assertions passed, 0 failed.
EULER-LEAST-DIVISIBLE-REPUNIT: 2 assertions passed, 0 failed.
EULER-MODPOWER: 6 assertions passed, 0 failed.
EULER-NEXT-PRIME: 9 assertions passed, 0 failed.
EULER-NUM-PARTITIONS: 12 assertions passed, 0 failed.
EULER-PALINDROMEP: (PALINDROMEP 10221) failed: 
Expected T but saw NIL
EULER-PALINDROMEP: 35 assertions passed, 1 failed.
EULER-PREV-PRIME: 7 assertions passed, 0 failed.
EULER-PRIMEP: (PRIMEP 2) failed: 
Should have signalled ERROR but saw T
EULER-PRIMEP: 11 assertions passed, 1 failed.
EULER-UNIQUE-DIGITS-P: (UNIQUE-DIGITS-P 1) failed: 
Expected NIL but saw T
EULER-UNIQUE-DIGITS-P: 9 assertions passed, 1 failed.
TOTAL: 108 assertions passed, 3 failed, 0 execution errors.
EULER-001-ANSWER: 1 assertions passed, 0 failed.
EULER-001-GIVENS: (GET-ANSWER 10) failed: 
Expected 24 but saw 23
EULER-001-GIVENS: 0 assertions passed, 1 failed.
TOTAL: 1 assertions passed, 1 failed, 0 execution errors.
EULER-002-ANSWER: 1 assertions passed, 0 failed.
EULER-002-GIVENS: 1 assertions passed, 0 failed.
TOTAL: 2 assertions passed, 0 failed, 0 execution errors.

[ 72 lines snipped ]

EULER-035-GIVENS: 1 assertions passed, 0 failed.
EULER-035-ROTATE-NUM: (ROTATE-NUM 1 1) failed: 
Expected 2 but saw 1
EULER-035-ROTATE-NUM: 9 assertions passed, 1 failed.
TOTAL: 11 assertions passed, 1 failed, 0 execution errors.

[ 123 lines snipped ]

    EULER-095-ANSWER: The value -1 is not of type (MOD 536870911).

[ 87 lines snipped ]

In most cases, the error test is very informative. It gives the code that it ran, the expected return value, and the actual value. The only exception is when an unexpected error was signalled; in that case, only the test name is given. The problem, as you can see, is that these informative failure messages get lost among all the noise of the successful tests. The fact that there's no composition means that you don't even get a count of total failed tests at the end. (That is a failing of many of the Common Lisp test frameworks available).

lisp-unit's main strength is that it's easy to use. For simple testing situations, it's easily the best one available, because it's so easy to use there's no barrier to writing all the tests you need. Its failure messages are useful and informative. Its main drawback is that it doesn't scale well. If your test needs are simple then lisp-unit is probably the best choice for you.

LIFT

Initially, LIFT looked good. It's obviously influenced by JUnit's ways of doing things. Test suites are defined very much like CLOS classes; they have a hierarchical inheritance structure (with multiple inheritance, even) and can have slots just like CLOS classes. Fixtures for setting up and tearing down resources for the tests are supported (with the resources references by custom slots). The test running function simply returns a test result object, but the package uses the pretty-printing function for that object to detail the test results, which is rather neat and would lend itself well to automated processing of the test results. Another interesting feature is randomized testing based on a specification, similar to QuickCheck for Haskell.

Setting up tests requires defining a test suite and adding tests to it, which can either be done in one statement:

(lift:deftestsuite <suite-name> (<parent-suites>)
  (<slots>)
  (:setup
    <forms to set up the slots>)
  (:teardown
    <forms to tear down slots>)
  (:tests
    (<test-clause>)
    (<test-clause>)
    (<test-clause>)))

or, since everything after the slot definitions is optional, in multiple statements:

(lift:deftestsuite <suite-name> (<parent-suites>)
  (<slots>))
(lift:addtest (<suite-name>)
  <test-clause>)

(Note that having the suite name in a list for addtest is a special case. Normally, that parameter would be a symbol giving the name of the test, which would be placed in the most recently-defined test suite. Using a list with a suite name in it overrides that behavior and assigns the test to a specific suite and gives the test a name of the form test-<number>. Another special case is that you can omit both the name and the suite; when the first form is a list of more than one element, it is treated as a test clause and the test is given an autogenerated name (as above) and placed in the most recently-defined test suite.)

Test clauses are very much like lisp-unit's, except that the test macros have slightly different names: ensure (analogous to lisp-unit:assert-true), ensure-same, and ensure-error. The analog to (lisp-unit:assert-false <form>) is (ensure (not <form>)).

The fact that test suites support multiple inheritance is a nice touch. In my setup, it means that I could have an additional suite named euler-answers, and every answer suite would inherit from both its project's test suite and the euler-answers suite. That way I would have a suite that would run just the tests on the problems' answers without interfering with the rest of my test hierarchy structure. Of the test frameworks I looked at, LIFT was the only one with multiple inheritance for test suites.

One feature that's not in LIFT just yet but which the author has mentioned adding is support for benchmarking tests. This is something that would really interest me, especially in conjunction with being able to run just the tests on the problem answers. (I've actually got some code that did benchmarking of my lisp-unit code, and I've used it to target my optimization of problems. Having support for that sort of thing baked into the testing framework would be very nice.)

Test and suite names appear to be effectively global; as far as I can tell, they're compared by string= on symbol-name. This means that it's easy to call tests that were defined in a different package than the current one, but it also means that the normal package exporting rules don't apply, and all names must be globally unique.

That's all good stuff. There were a few problems I ran into. First, whenever I defined a test, SBCL raised a style-warning about redefining things in defmethod, even if it was the first time I'd defined the test. SLIME didn't quite understand the warnings, so everytime I compiled a test suite or file, I got an extra SLIME window listing all the spurious warnings.

The larger problem, though, was its speed and memory footprint. Defining tests is very slow; when using LIFT, the time necessary to compile and load all of my Project Euler code jumped from the other frameworks' average of about 1.5 minutes to over nine minutes. Redefining tests felt even slower than defining them initially, but I don't have solid numbers on that. After loading everything, memory usage was more than twice that of other frameworks. Running all of the tests took more than a minute longer than other frameworks, though that seems mostly to be a result of swapping induced by LIFT's greater memory requirements.

Here's a sample run:

* (lift:run-tests :suite 'euler-tests)
#<Results for EULER-TESTS 380 Tests, 5 Failures, 1 Error.

Failure: euler-primep : test-12
  Condition: Expected ERROR but got NIL

Failure: euler-palindromep : test-36
  Condition: Ensure failed: (EULER:PALINDROMEP 10221) ()

Failure: euler-unique-digits-p : test-10
  Condition: Ensure failed: (NOT (EULER:UNIQUE-DIGITS-P 1)) ()

Failure: euler-001-givens : test-1
  Condition: Ensure-same: 24 is not EQUAL to 23

Failure: euler-035-rotate-num : test-1
  Condition: Ensure-same: 2 is not EQUAL to 1

ERROR  : euler-095-answer : test-1
  Condition: The value -1 is not of type (MOD 536870911).
  >

As I mentioned before, the use of the print-object generic function to give a summary of the tests is very nice. The fact that it only details failures is also quite nice; failures aren't lost in the noise of successful tests. The reporting of the failures isn't quite as nice as in lisp-unit. While failures of ensure show the code that failed, failures of ensure-same merely show the mismatched values, not the code that generated the values.

There's no feedback while the test routines are running, which is a minor annoyance, but completely in keeping with the nature of functional programming.

Overall, LIFT is an intriguing framework with a lot of good things going for it, but the speed and memory issues killed it for me. It might work better with fewer test cases and suites, but in that case I'd probably recommend just using lisp-unit and its elegant simplicity.

XLUnit

XLUnit bills itself as a low-programmer-overhead adaptation of JUnit. It's based on both XPTEST and clos-unit (so I didn't bother trying either of those individually). It follows the approach of JUnit in using native OO classes as test suites; each suite is simply a CLOS class that inherits from the test-case class. It does have its own macro for declaring test cases, but its format reads very much like a generic method being specialized on a particular class. Likewise, fixtures are supported by implementing the set-up and tear-down generic functions for your test suite class. The main drawback is that, although you can create hierarchies of test suites (since they're just CLOS classes), the suites aren't composable.

Setting up tests is pretty simple:

(defclass <suite-name> (xlunit:test-case)
  (<slots>))
(xlunit:def-test-method <test-name> ((test <suite-name>))
  <test-clauses>)

XLUnit uses fairly standard names for its test clauses, too: assert-true, assert-false, assert-equal, assert-condition, assert-eql, and assert-not-eql.

The UI is pretty similar to the standard *Unit text interface, too:

* (xlunit:textui-test-run (xlunit:get-suite euler-tests))
..F...F..F..
Time: 0.072


There were 3 failure:
1) EULER-UNIQUE-DIGITS-P: Assert false: (UNIQUE-DIGITS-P 1)
2) EULER-PALINDROMEP: Assert true: (PALINDROMEP 10221)
3) EULER-PRIMEP: Assert condition ERROR, but no condition signaled


FAILURES!!!
Run: 9   Failures: 3   Errors: 0
#<XLUNIT:TEST-RESULTS {AC399D9}>

* (xlunit:textui-test-run (xlunit:get-suite e001::euler-001-tests))
..F
Time: 0.002


There was 1 failure:
1) EULER-001-GIVENS: Assert equal: 24 23


FAILURES!!!
Run: 2   Failures: 1   Errors: 0
#<XLUNIT:TEST-RESULTS {B79A7F9}>

* (xlunit:textui-test-run (xlunit:get-suite e095::euler-095-tests))
.E
Time: 0.138


There was 1 error:
1) EULER-095-ANSWER: The value -1 is not of type (MOD 536870911).


FAILURES!!!
Run: 1   Failures: 0   Errors: 1
#<XLUNIT:TEST-RESULTS {B82F53A}>

It prints dots, 'F's, and 'E's as it runs each test and then summarizes the failures. As with LIFT, it gives details of the code that failed in some cases (assert-true, assert-false), but not in others (assert-equal, assert-error, unexpected condition).

The def-test-method macro feels a little clunky to me. I'd rather see the class instance and test suite named separately from each other, but I think part of that is just experience with other test frameworks that worked differently. It does make plenty of sense when viewed as just an extension of CLOS. The non-composability of suites is what really kills XLUnit for me. If I didn't want composition, I'd still be using lisp-unit, which has even less syntactic overhead and comparable resource usage. I'd say it has some usefulness for situations where you needed fixtures but still wanted a simple programming interface to the framework.

FiveAM

Initially, FiveAM reminded me a bit of LIFT with fewer gee-whiz features. It, too, does hierarchical, composable test suites. (In fact, it was the only other framework that supported composing suites.) It also does fixtures, but somewhat differently than anyone else. In FiveAM, fixtures are written as macros with a special keyword indicating where the test body will go. On one hand, it's a very Lispy approach to the issue. (In particular, it lets you keep all of your state in local variables, which is a bit more functional style than the OO approach of using class slots.) On the other hand, you have to wrap every set of tests that use the fixture in a with-fixture macro, while most other approaches tie the fixtures to the test suite so they're set up and torn down automatically for each test in the suite.

Like LIFT, FIveAM has support for specification-based random testing. I don't have anywhere I use things like that, so I again didn't test it.

FiveAM also has fewer test clauses; most cases will work with the is macro, which codewalks its body to determine what exactly is being tested.

Test suites are pretty easy to create:

(5am:def-suite <suite-name>)

It supports inheritance (and documentation) with key parameters:

(5am:def-suite <suite-name> :in <parent-suite>
               :documentation <doc-string>)

Test may either be bound to suites explicitly:

(5am:test (<test-name> <dependencies> <suite>)
  <test-clauses>)

or implicitly, as with the in-package form:

(5am:in-suite <suite-name>)
(5am:test <test-name>
  <test-clauses>)

The <dependencies> parameter is for declaring that a given test depends on another test completing successfully (or failing) first.

Sample run:

* (time (5am:run! '5am-euler-tests))
.................f...................................................f..........
...........f....................f...............................................
...........f....................................................................
..................................................X.............................
.............................................................
 Did 381 checks.
    Pass: 375 (98%)
    Skip: 0 ( 0%)
    Fail: 6 ( 1%)

 Failure Details:
 --------------------------------
 EULER-PRIMEP []: 
      Failed to signal a ERROR.
 --------------------------------
 --------------------------------
 EULER-PALINDROMEP []: 
      10221 did not satisfy PALINDROMEP.
 --------------------------------
 --------------------------------
 EULER-UNIQUE-DIGITS-P []: 
      1 satisfied UNIQUE-DIGITS-P.
 --------------------------------
 --------------------------------
 EULER-001-GIVENS []: 
      23 was not = to 24.
 --------------------------------
 --------------------------------
 EULER-035-ROTATE-NUM []: 
      1 was not = to 2.
 --------------------------------
 --------------------------------
 EULER-095-ANSWER []: 
      Unexpected Error: #<TYPE-ERROR {B2B7639}>
The value -1 is not of type (MOD 536870911)...
 --------------------------------
NIL

As with other *Unit frameworks, it prints one character for each test run, giving a bit of visual feedback. The failure messages are decent, but (as with most frameworks) no code is shown for unequal values.

Overall, I like FiveAM slightly more than the other frameworks I tried. (Though I like lisp-unit's failure messages, XLUnit's approach to fixtures, and several of LIFT's current and proposed features more than FiveAM.) FiveAM does support the one killer feature I was looking for--hierarchical, composable test suites--as well as a number of smaller things that seem very nice and in the spirit of Lisp.

ptester

ptester is barely a test framework. It has no test suites and no test functions. All it provides is a set of macros for checking function results (test (analogous to lisp-unit:assert-equal), test-error, test-no-error, test-warning, and test-no-warning) and a wrapper macro designed to enclose the test clauses which merely provides a count of success and failures at the end. ptester expects that all testing is done in predefined functions and lacks the dynamic approach present in other frameworks.

I didn't bother trying ptester with my code. The lack of any sort of organization meant that I would have had to rewrite most of my tests to get them all called properly. (I would have to put each set of test clauses in a function and then have the function called from somewhere. Compare to most frameworks, where the test is told which suite or clause it belongs to and running the suite automatically gets all of the tests it contains.)

CLUnit

Although there's a Debian package for CLUnit, it didn't load properly into my environment (SBCL 1.0.0.0); it complained that the function deftest was undefined. A survey of the documentation indicates that it's lacking my single most important requirement: a composable test suite hierarchy. I thus didn't spend much time trying to get it to work.

It does support grouping of tests, but not hierarchically. There are no fixtures. Tests are named with strings, which bypasses some of the scope problems that can arise when using symbols. The use of function passing looks a little clunky to me, but it could be abstracted away with a macro if necessary. It does support multiple value returns, which sets it apart from most other frameworks I looked at (MSL.TEST does them, as does ptester, though ptester only considers the first return value by default).

A simple test definition looks roughly like this:

(clunit:deftest "<test-name>"
  :category "<suite-name>"
  :test-fn (lambda () <test-clauses>))

The test is considered to have failed if test-fn returns nil; otherwise, it succeeded.

The functionality of other frameworks' test clause macros (lisp-unit:assert-same, etc.) is accomplished by use of other keyword arguments passed in to the deftest clause. For example, this lisp-unit test

(lisp-unit:define-test addition
  (lisp-unit:assert-equal 2 (+ 1 1)))

could be rendered as

(clunit:deftest "addition"
    :input-form '(1 1)
    :test-fn (lambda (args)
               (destructuring-bind (a b) args
                 (+ a b)))
    :output-form 2)

which, yes, is wordier, but the various keywords allow for some interesting combinations of tests.

When the tests are run, the names of failed tests are given, but nothing more. The setup appears to highly encourage one clause per test, so that imprecision shouldn't be too much of an impediment.

I didn't really experiment with this one much, since it failed to compile and was missing a key feature, but it does seem to have an interesting approach to testing. I fear, though, that its approach adds too much complexity to the process.

FReT

FReT also did not compile for me. It ran into problems with some of its internal macros. It says it is based on LIFT, which I'd already dismissed, and it has no documentation available aside from the source. I opted not the spend the time investigating it further.

MSL.TEST

MSL.TEST is another basic test framework. It checks multiple value returns by default, which is nice (though it made a bunch of my previously-passing tests fail; all the other frameworks I tried only compared the primary return value by default). The framework only has test groups, not a hierarchy, but it uses the return value of its test clauses to determine success or failure, and the run-tests function returns t only if all of its tests succeeded, so you can fake a hierarchy be having one of your test clauses run the tests in a different group. The test results are not aggregated, though, and its output format is very noisy, printing one line per passing test and more for failing tests.

Excerpt of a sample run:

* (time (msl.test:run-tests 'euler-tests))
** PASS: EULER-MODPOWER            [00:00:00.00]
** FAIL: EULER-PRIMEP              [00:00:00.00]
** PASS: EULER-PREV-PRIME          [00:00:00.00]
** PASS: EULER-NEXT-PRIME          [00:00:00.00]
Test EULER-PALINDROMEP failed
Form: (PALINDROMEP 10221)
Expected value: T
Actual value: NIL.
** FAIL: EULER-PALINDROMEP         [00:00:00.00]
** PASS: EULER-NUM-PARTITIONS      [00:00:00.05]
Test EULER-UNIQUE-DIGITS-P failed
Form: (UNIQUE-DIGITS-P 1)
Expected value: NIL
Actual value: T.
** FAIL: EULER-UNIQUE-DIGITS-P     [00:00:0.004]
** PASS: EULER-BOUNCYP             [00:00:00.00]
** PASS: EULER-LEAST-DIVISIBLE-REPUNIT [00:00:00.00]

Ran 9 of 9 tests in group EULER-FUNCTIONS
The following tests failed:
  (EULER-UNIQUE-DIGITS-P EULER-PALINDROMEP EULER-PRIMEP)

Totals -- Passed: 6       67%
          Failed: 3       33%
** PASS: EULER-001-ANSWER          [00:00:00.00]
Test EULER-001-GIVENS failed
Form: (EULER-001:GET-ANSWER 10)
Expected value: 24
Actual value: 23.
** FAIL: EULER-001-GIVENS          [00:00:00.00]

Ran 2 of 2 tests in group EULER-001::EULER-001-TESTS
The following tests failed:
  (EULER-001::EULER-001-GIVENS)

Totals -- Passed: 1       50%
          Failed: 1       50%
** FAIL: EULER-001-SUBGROUP        [00:00:00.00]
** PASS: EULER-002-ANSWER          [00:00:00.00]
** PASS: EULER-002-GIVENS          [00:00:00.00]

Ran 2 of 2 tests in group EULER-002::EULER-002-TESTS

Totals -- Passed: 2      100%
          Failed: 0        0%
** PASS: EULER-002-SUBGROUP        [00:00:00.00]

[ 78 lines snipped ]

Test EULER-013-ANSWER failed
Form: (EULER-013:GET-ANSWER)
Expected value: 5537376230
Actual values: 5537376230
               6107447457844511669265405808963434873323/15625000000000000000000000000000000000000.
** FAIL: EULER-013-ANSWER          [00:00:0.004]

Ran 1 of 1 test in group EULER-013::EULER-013-TESTS
The following tests failed:
  (EULER-013::EULER-013-ANSWER)

Totals -- Passed: 0        0%
          Failed: 1      100%
** FAIL: EULER-013-SUBGROUP        [00:00:00.01]

[ 651 lines snipped ]

Test EULER-095-ANSWER failed
Form: (EULER-095:GET-ANSWER)
The value -1 is not of type (MOD 536870911).
** FAIL: EULER-095-ANSWER          [00:00:00.22]

Ran 1 of 1 test in group EULER-095::EULER-095-TESTS
The following tests failed:
  (EULER-095::EULER-095-ANSWER)

Totals -- Passed: 0        0%
          Failed: 1      100%
** FAIL: EULER-095-SUBGROUP        [00:00:00.22]

[ 333 lines snipped ]

Ran 140 of 140 tests in group EULER-PROBLEMS
The following tests failed:
  (EULER-142::EULER-142-SUBGROUP EULER-141::EULER-141-SUBGROUP
   EULER-140::EULER-140-SUBGROUP EULER-138::EULER-138-SUBGROUP
   EULER-137::EULER-137-SUBGROUP EULER-128::EULER-128-SUBGROUP
   EULER-126::EULER-126-SUBGROUP EULER-121::EULER-121-SUBGROUP
   EULER-103::EULER-103-SUBGROUP EULER-095::EULER-095-SUBGROUP
   EULER-090::EULER-090-SUBGROUP EULER-084::EULER-084-SUBGROUP
   EULER-069::EULER-069-SUBGROUP EULER-062::EULER-062-SUBGROUP
   EULER-013::EULER-013-SUBGROUP EULER-001::EULER-001-SUBGROUP)

Totals -- Passed: 124     89%
          Failed: 16      11%
** FAIL: EULER-TEST-CHILDREN       [00:03:26.29]

Ran 1 of 1 test in group EULER-TESTS
The following tests failed:
  (EULER-TEST-CHILDREN)

Totals -- Passed: 0        0%
          Failed: 1      100%

MSL.TEST is exceedingly wordy, too much so for feasably running any large number of tests. As usual, I feel that in situations where the requirements are simple enough that MSL.TEST would work well, lisp-unit would work equally well, if not better.

Stefil

Stefil takes a much more interactive approach to testing. It is intended primarily for use from the REPL, preferably running within SLIME. Test failures are reported by raising conditions with information about the failing test:

Test assertion failed:

Binary predicate (= X Y) failed.
x: 24 => 24
y: (EULER-001:GET-ANSWER 10) => 23
   [Condition of type STEFIL::ASSERTION-FAILED]

Restarts:
  0: [CONTINUE] Record the failure and continue
  1: [CONTINUE-WITHOUT-DEBUGGING] Record the failure, turn off debugging for this test session and continue
  2: [CONTINUE] Skip the rest of the test (STEFIL::NAME-OF STEFIL::TEST) and continue
  3: [RETEST] Rerun the test (STEFIL::NAME-OF STEFIL::TEST)
  4: [CONTINUE] Skip the rest of the test (STEFIL::NAME-OF STEFIL::TEST) and continue
  5: [RETEST] Rerun the test (STEFIL::NAME-OF STEFIL::TEST)
  6: [CONTINUE-WITHOUT-DEBUGGING] Turn off debugging for this test session and invoke the first CONTINUE restart
  7: [ABORT-TESTING] Abort the entire test session started with EULER-001::TEST-ALL
  8: [ABORT-REQUEST] Abort handling SLIME request.
  9: [TERMINATE-THREAD] Terminate this thread (#<THREAD "repl-thread" {AF0CB91}>)

Backtrace:
  0: (STEFIL::RECORD-FAILURE* STEFIL::FAILED-ASSERTION :SIGNAL-ASSERTION-FAILED T :DESCRIPTION-INITARGS (:FORM (STEFIL:IS (= 24 #1=#)) :FORMAT-CONTROL "Binary predicate ~A failed.~%~
                               x: ~S => ~S~%~
                               y: ~S => ~S" :FORMAT-ARGUMENTS ((= STEFIL::X STEFIL::Y) 24 24 #1# 23)))
 -- more --

Defining a test (via deftest) results in a function with the same name as the test; running that function runs the test. Tests are completely composable; you can call one test from within another and their results are aggregated. Test suites are defined by defsuite, but internally suites are just tests that call other tests. Adding a new test to a suite just adds that test's function to the functions called by the suite's function. Likewise, creating a suite adds it to the current suite, giving you a hierarchy. My only problem with the setup is that I don't see a way to explicitly assign tests to suites, aside from dynamically binding stefil::*suite*. Normally, the current suite is set by in-suite, which requires careful attention if you're jumping between different suites often. (A somewhat mitigating factor is that tests remember which suite they were created in, so the current suite only matters for newly-defined tests.)

As an example of a hierarchy, here's roughly how I set up things for my tests:

(in-package :euler)

(stefil:in-root-suite)
(stefil:defsuite* test-all)
(stefil:defsuite test-functions)
(stefil:defsuite test-problems)

(stefil:in-suite test-functions)

(stefil:deftest test-modpower ()
  (stefil:is (= 399 (modpower 123456 123456 789))))

(in-package :e001)

(stefil:in-suite euler:test-problems)
(stefil:defsuite* test-all)

(stefil:deftest test-answer ()
  (stefil:is (= <answer> (get-answer))))
(stefil:deftest test-givens ()
  (stefil:is (= 24 (get-answer 10))))

Note that (defsuite* <suite-name>) is equivalent to (defsuite <suite-name>) followed by (in-suite <suite-name>).

Fixtures are supported in pretty much the same way that FivaAM does them; you define them as macros and then run your tests within a with-fixture macro.

Running the tests looks like this:

* (test-all)
.................X...................................................X.....................X........
............X..........................................................X............................
..........................................................................................E.........
.................................................................................
#<test-run 362 tests, 381 assertions, 6 failures in 169.139 sec>

That is, of course, only half the picture, since failures are reported by raised conditions, and that returned object is designed to be browsed with the SLIME inspector. I showed the full debugger response to a condition earlier; here are the other messages from test failures I saw in my run:

Test assertion failed:

(PROGN (PRIMEP 2)) failed to signal condition ERROR
   [Condition of type STEFIL::ASSERTION-FAILED]

...

Expression (PALINDROMEP 10221) evaluated to false
0: 10221 => 10221
   [Condition of type STEFIL::ASSERTION-FAILED]

...

Expression (NOT (UNIQUE-DIGITS-P 1)) evaluated to true
0: 1 => 1
   [Condition of type STEFIL::ASSERTION-FAILED]

...

Binary predicate (= X Y) failed.
x: 24 => 24
y: (EULER-001:GET-ANSWER 10) => 23
   [Condition of type STEFIL::ASSERTION-FAILED]

...

The value -1 is not of type (MOD 536870911).
   [Condition of type TYPE-ERROR]

I like that you get lots of information about the code that was run, the result it produced, and the result that was expected. The only thing that could be clearer is the actual test that failed. You either have pore through the backtrace or wait for everything to finish and pore through the test result objects.

Speaking of test result objects, here's the one returned by the test. It's got a lot of information in it.

An object.
 [type: STEFIL::GLOBAL-CONTEXT]
--------------------
Class: #<STANDARD-CLASS STEFIL::GLOBAL-CONTEXT>
Slots:
FAILURE-DESCRIPTIONS = #(#<STEFIL::MISSING-CONDITION {C3F7741}> #<failure (STEFIL:IS (PALINDROMEP 10221)) backtrace: TEST-PALINDROMEP,TEST-FUNCTIONS,TEST-ALL> #<failure (STEFIL:IS (NOT (UNIQUE-DIGITS-P 1))) backtrace: TEST-UNIQUE-DIGITS-P,TEST-FUNCTIONS,TEST-ALL> #<failure (STEFIL:IS (= 24 (EULER-001:GET-ANSWER 10))) backtrace: TEST-GIVENS,TEST-ALL,TEST-PROBLEMS,TEST-ALL> #<failure (STEFIL:IS (= 2 (EULER-035::ROTATE-NUM 1 1))) backtrace: TEST-ROTATE-NUM,TEST-ALL,TEST-PROBLEMS,TEST-ALL> #<error TEST-ALL,TEST-PROBLEMS,TEST-ALL,TEST-ANSWER: #<TYPE-ERROR {A8FE971}>>)
ASSERTION-COUNT = 381
PROGRESS-CHAR-COUNT = 381
PRINT-TEST-RUN-PROGRESS-P = T
DEBUG-ON-UNEXPECTED-ERROR-P = T
DEBUG-ON-ASSERTION-FAILURE-P = T
TOPLEVEL-CONTEXT = #<test-run (TEST-ALL)>
CURRENT-TEST = #<test EULER-142::TEST-ALL>
RUN-TESTS = #<HASH-TABLE :TEST EQL :COUNT 362 {B129149}>
RUN-FIXTURES = #<HASH-TABLE :TEST EQL :COUNT 0 {B129341}>
TEST-LAMBDAS = #<HASH-TABLE :TEST EQL :COUNT 0 {B129539}>

failure-descriptions is an array containing an object for each failed test. run-tests is a hash table with one entry for each test run; the keys are the test objects and the values are "test context" objects.

Here's what one of the "assertion failed" objects looks like

An object.
 [type: STEFIL::FAILED-ASSERTION]
--------------------
Class: #<STANDARD-CLASS STEFIL::FAILED-ASSERTION>
Slots:
TEST-CONTEXT-BACKTRACE = (#<test-run (EULER-001::TEST-GIVENS)> #<test-run (EULER-001:TEST-ALL)> #<test-run (TEST-PROBLEMS)> #<test-run (TEST-ALL)>)
PROGRESS-CHAR = #\X
FORM = (STEFIL:IS (= 24 (EULER-001:GET-ANSWER 10)))
FORMAT-CONTROL = "Binary predicate ~A failed.~%~
                               x: ~S => ~S~%~
                               y: ~S => ~S"
FORMAT-ARGUMENTS = ((= STEFIL::X STEFIL::Y) 24 24 (EULER-001:GET-ANSWER 10) 23)

And here's what a test context object looks like.

An object.
 [type: STEFIL::CONTEXT]
--------------------
Class: #<STANDARD-CLASS STEFIL::CONTEXT>
Slots:
PARENT-CONTEXT = #7=#<test-run (EULER-001:TEST-ALL)>
TEST = #<test EULER-001::TEST-GIVENS>
INTERNAL-REALTIME-SPENT-WITH-TEST = 16
TEST-ARGUMENTS = NIL
NUMBER-OF-ADDED-FAILURE-DESCRIPTIONS = 1

I like the touch of having the test's runtime available. As I mentioned in the LIFT section, I'd written some functions to gather that data myself, but having the test collect it for me is even nicer.

I'm fairly impressed with Stefil. It's got a very interesting interface, and the use of the debugger and inspector lends itself well to the interactive, short feedback cycle of Lisp programming. I'm not sure if the approach will work for everyone, but I'm going to give Stefil an extended try and see how well its testing approach works for me. If the live testing proves too bothersome, I'll just use FiveAM.

Sun, 10 Dec 2006

FourFours in Common Lisp

I came across the FourFours problem recently. Stated succinctly, it asks: what are the ways to calculate each of the integers from 1 to 100 with formulas which use the digit four exactly four times? (No digits other than four can be used at all.)

People have solved it in C# (138 lines, 380ms), Python (119 lines, 20s), Haskell (73 lines, 900ms), and Perl (one (really long) line, 45s). I thought it would be interesting to try it in Common Lisp, so I did.

For one thing, I found that my code was slower than the C# and Haskell solutions; I consistently get about 2.5 seconds for my solution. That seems mostly to be a result of both the dynamic nature of Lisp and the particularly-dynamic approach I took (more on that in a bit). I also took more lines (148 significant lines; excludes comments, doc strings, empty lines, and a couple of ancillary definitions).

In my defense, I did things somewhat more generally than the others; I don't like hardcoded data, so my code generates a lot of stuff on the fly. One notable example is the ordering of operations on the four values: the other solutions I've seen all hardcode the orderings, but my code generates the code structure (just a binary tree) at runtime based on the number of elements. I then use the code/data duality of Lisp to generate both functions and the printed representation of the operations from those trees. Likewise, my code will figure out the possible values to use in the calculations from a supplied digit.

All that flexibility let me play around a lot more with the structure of the problem. Although everyone else used five operations (addition, subtraction, multiplication, division, and exponentiation), I found that only four were necessary (addition, subtraction, multiplication, and division). (I checked smaller numbers of operations, too. I considered all of the functions in Common Lisp that would return ratios or integers when given a pair of ratios or integers as parameters. I was unable to get all the way from 1 to 100 with fewer than four distinct functions.)

With the given parameters (four fours, with 4, 4!, sqrt(4), .4, .4bar, and sqrt(.4bar)) and operators (+, -, *, /), you can actually go all the way up to 102. (Adding exponentiation does not allow you to go any further.)

I can calculate the consecutive chains for arbitrary base digits:

1: 1

1: 2 / 2

1: 3! / (3 + 3)
2: (3 + 3) / 3
3: (3 + 3) - 3
4: 3 + (3 / 3)
5: (3! / 3) + 3
6: (3 * 3) - 3
7: 3! + (3 / 3)
8: 3! + (3! / 3)
9: 3 + (3 + 3)
10: (3! - 3) / .3
11: (.3 + 3) / .3
12: 3 + (3 * 3)
13: (3 / .3) + 3
14: (3! / .3) - 3!
15: 3! + (3 * 3)
16: (3 / .3) + 3!
17: (3! / .3) - 3
18: 3 * (3 + 3)
19: (3! - .3) / .3
20: (3 + 3) / .3
21: (3! * 3) + 3

1: (4 + 4) / (4 + 4)
2: (4 * 4) / (4 + 4)
3: (4 + (4 + 4)) / 4
4: 4 + (4 * (4 - 4))
5: (4 + (4 * 4)) / 4
6: 4 + ((4 + 4) / 4)
7: (4 + 4) - (4 / 4)
8: (4 + (4 + 4)) - 4
9: (4 / 4) + (4 + 4)
10: (4! + (4 * 4)) / 4
11: ((4! + 4) / 4) + 4
12: 4 * (4 - (4 / 4))
13: (4! + (4! + 4)) / 4
14: (4! / 4) + (4 + 4)
15: (4 * 4) - (4 / 4)
16: 4 + (4 + (4 + 4))
17: (4 / 4) + (4 * 4)
18: ((4! * 4) - 4!) / 4
19: 4! - (4 + (4 / 4))
20: 4 * (4 + (4 / 4))
21: (4 / 4) + (4! - 4)
22: 4! - ((4 + 4) / 4)
23: ((4! * 4) - 4) / 4
24: (4 * 4) + (4 + 4)
25: ((4! * 4) + 4) / 4
26: 4! + ((4 + 4) / 4)
27: (4! + 4) - (4 / 4)
28: (4 * (4 + 4)) - 4
29: (4 / 4) + (4! + 4)
30: (4! + (4! * 4)) / 4
31: 4! + ((4! + 4) / 4)
32: (4 * 4) + (4 * 4)
33: ((4 - .4) / .4) + 4!
34: (4! / 4) + (4! + 4)
35: ((4 / .4) + 4) / .4
36: 4 + (4 * (4 + 4))
37: ((.4 + 4!) / .4) - 4!
38: (4 / .4) + (4! + 4)
39: 4! + (4! / (.4 * 4))
40: (4 * (4 * 4)) - 4!
41: (.4 + (4 * 4)) / .4
42: (4! + 4!) - (4! / 4)
43: (4! + 4!) - (sqrt(4) / .4)
44: (4 * 4) + (4! + 4)
45: (4! - (4! / 4)) / .4
46: ((4! - 4) / .4) - 4
47: (4! + 4!) - (4 / 4)
48: 4 * (4 + (4 + 4))
49: (4 / 4) + (4! + 4!)
50: (4 + (4 * 4)) / .4
51: ((.4 + 4!) - 4) / .4
52: (4! / .4) - (4 + 4)
53: (sqrt(4) / .4) + (4! + 4!)
54: (4! / 4) + (4! + 4!)
55: ((4! - .4) / .4) - 4
56: 4! + (4 * (4 + 4))
57: ((.4 + 4!) / .4) - 4
58: (4 / .4) + (4! + 4!)
59: (4! / .4) - (4 / 4)
60: (4 * (4 * 4)) - 4
61: (4! / .4) + (4 / 4)
62: (4 * (4 * 4)) - sqrt(4)
63: ((4! - .4) / .4) + 4
64: (4 + 4) * (4 + 4)
65: ((.4 + 4!) / .4) + 4
66: ((4! + 4) / .4) - 4
67: ((4! + 4) / .4bar) + 4
68: 4 + (4 * (4 * 4))
69: ((4! + 4) - .4) / .4
70: (4! / .4) + (4 / .4)
71: (.4 + (4! + 4)) / .4
72: 4! * (4 - (4 / 4))
73: (sqrt(.4bar) + (4! + 4!)) / sqrt(.4bar)
74: ((4! + 4) / .4) + 4
75: (4! + (4! / 4)) / .4
76: ((4! - 4) * 4) - 4
77: ((4! - .4bar) / .4bar) + 4!
78: ((4! - 4) * 4) - sqrt(4)
79: ((4! - sqrt(4)) / .4) + 4!
80: 4 * (4 + (4 * 4))
81: ((4! / .4) - 4!) / .4bar
82: sqrt(4) + ((4! - 4) * 4)
83: ((4! - .4) / .4) + 4!
84: ((4! - 4) * 4) + 4
85: ((4 / .4) + 4!) / .4
86: (4! * 4) - (4 / .4)
87: (4! * 4) - (4 / .4bar)
88: (4! * 4) - (4 + 4)
89: ((sqrt(4) + 4!) / .4) + 4!
90: (4! * 4) - (4! / 4)
91: (4! * 4) - (sqrt(4) / .4)
92: (4! - (4 / 4)) * 4
93: (4! * 4) - (sqrt(4) / sqrt(.4bar))
94: ((4! + 4) / .4) + 4!
95: (4! * 4) - (4 / 4)
96: 4! * ((4 + 4) - 4)
97: (4 / 4) + (4! * 4)
98: .4 + ((.4 + 4!) * 4)
99: ((4! + 4!) - 4) / .4bar
100: (4! + (4 / 4)) * 4
101: (sqrt(4) / .4) + (4! * 4)
102: (4! / 4) + (4! * 4)

1: 5 / (5 + (5 * (5 - 5)))
2: ((5 + (5 + 5)) - 5) / 5
3: (5 + (5 * 5)) / (5 + 5)
4: (5 + (5 + (5 + 5))) / 5
5: (5 + (5 + 5)) - (5 + 5)
⋮
154: (5! / 5) + (5! + (5 + 5))
155: 5 + (5 * (5 + (5 * 5)))
156: (5! + (5! * (.5 + 5))) / 5
157: (5 / (.5bar * (.5bar - .5))) - 5
158: (((5! / 5) - 5) / .5) + 5!

1: (6 + (6 + 6)) / (6 + (6 + 6))
2: (6 + (6 + (6 + 6))) / (6 + 6)
3: ((6 + (6 + (6 + 6))) - 6) / 6
4: (6 * (6 + 6)) / (6 + (6 + 6))
5: (6 + (6 + (6 + (6 + 6)))) / 6
6: 6 + (6 * ((6 + 6) - (6 + 6)))
⋮
881: 6! + (((.6bar * (6! + 6!)) + 6) / 6)
882: ((6! / 6) + 6) * (6 + (6 / 6))
883: ((.6bar + (6! / (.6bar + 6))) / .6bar) + 6!
884: (6! / (.6 * 6)) + (6! - (6 * 6))
885: (.6 * ((6 * 6) + (6! + 6!))) - .6
886: ((6! / (6 - .6)) * (.6 + 6)) + 6

I'll probably experiment with solutions that involve concatenation of digits next. That'll involve some fairly significant code changes, though. And I'll probably just end up duplicating the work of the definitive Four Fours answer key.


Phil! Gregory