UTF-8 and XEmacs

In order to display Unicode characters properly, I also had to set

(set-terminal-coding-system 'utf-8)

in my startup files.  For input, I found set-input-method, which is bound to C-x RET C-\ (which is extra fun, because C-\ is my screen escape character; well, at least it’s not a command I’ll be using over-frequently).  I found I liked the latin-1-alt-postfix input method.  Latin-1 characters are entered via magic two-key sequences such as “a’” for “á”.  It indicates candidates for replacement by underlining them and showing the possible second characters in the minibuffer.  (That’s the “alt” part.  Normal postfix doesn’t prompt.)

No real support for general Unicode, aside from switching input methods among the various charsets.  Reputedly, XEmacs 22 will have much better Unicode support.  With luck, it’ll include an RFC 1345 compliant input method.

Addendum: XEmacs still has a habit of trashing some characters when saving files.  (The characters get replaced by ASCII question marks.)  This happens to a lot of characters.  Plain old GNU Emacs gets almost everything right.  I did have to do the same thing with set-terminal-coding-system as with XEmacs, but that appears to be the only necessary config change.  About the only thing I see wrong is that it’s using strlen() (or its equivalent) to determine line lengths, so lines with UTF-8 characters get wrapped prematurely.  When selecting input methods, it seems to want to default to one named “rfc1345” (yay!), but there’s no method by that name (boo).  Perhaps it’s in another Debian package.

Next geek project: See how well my carefully-customized XEmacs environment transfers to GNU Emacs.


UTF-8

Last weekend I was feeling both bored and geeky, so I did something I’d been meaning to do for a while: I switched to UTF-8.  I’m running Debian unstable, and the transition was relatively painless, though I did run into some problems.

Markus Kuhn’s Unicode page proved invaluable for both theory and practice, as did his UTF-8 example files.  Also of use was Radovan Garabík’s Debian howto for switching to UTF-8.

I got rid of gnome-terminal, sadly, and went back to vanilla xterm.  There were some aspects of UTF-8 that gnome-terminal didn’t support (combining characters, notably), and there wasn’t a good Unicode font that it could use.  (The only monospace font with any sort of reasonable coverage was FreeMono, which looks horrible.  Terminus was actually decent in the Latin-1 sections, but I’d need more than 1152x864 to use it the way I’d want.  The fact that gnome-terminal refused to use traditional X fonts is a separate rant.)  I’m using xterm as xterm -fn '-misc-fixed-medium-r-semicondensed-*-*-120-*-*-*-*-iso10646-1'.

screen supports UTF-8 nicely.  I merely set defutf8 on in my .screenrc.  Debian has a separate package for mutt with UTF-8 support; it’s mutt-utf8.  Once installed, it diverts existing mutt binaries to mutt.ncurses, so just typing mutt works.  irssi happily handled UTF-8 without any intervention from me.  In order to get w3m working, I had to compile and install w3m-m17n.

XEmacs seems uneasy with the whole thing.  I’m using xemacs21-mule and I have

(require 'un-define)
(set-coding-priority-list '(utf-8))
(set-coding-category-system 'utf-8 'utf-8)

in my startup files.  That enables UTF-8 support and autodetects files that already have UTF-8 characters in them.  I still need to figure out how to open other files as UTF-8 (default translation still seems to be ISO-8859-1).  I also need to look at the displaying of Unicode characters.  XEmacs is running in screen in a UTF-8-aware xterm, so things should display properly, but most Unicode characters are displayed as tildes.  Finally, it appears that the easiest way to enter Unicode characters is to call the function insert-ucs-character and type in the decimal (not hex!)  number of the Unicode codepoint.  Addendum: XEmacs does seem to actually mess up some UTF-8 characters.  Sigh.

I played a little with other editors to see what I could do with them.  yudit seems the best of the lot, but it’s GUI-only.  qemacs doesn’t look too bad, but it had some problems detecting UTF-8 documents, which led to munged characters when I saved.  And apparently vim has excellent UTF-8 support.  Figures.

zsh does not support UTF-8.  (Though it’s one of two items in the TODO list.)  It passes things through literally enough that you can paste UTF-8 into a command line and have the app handle it, but you can’t edit Unicode on the command line.  It also doesn’t deal properly with the size of UTF-8 characters, so no UTF-8 in my shell prompt.

Just for the fun of it, I switched the blog pages here over to UTF-8.  I might switch the web server as a whole, but that could break some of my text files.  It doesn’t really matter, because I generally use the HTML entities for non-ASCII characters, anyway.  Why type “àéîõü” (UTF-8) when I can use “àéîõü” (HTML entities) and be much more portable?  Of course, UTF-8 does let me put in things like “ᚠᛁᚢᛖ᛫ᚠᛟᛏ᛫ᚻᛁᚷᚻ᛫ᚦᛖ᛫ᛞᚩᚱ᛫ᚨᚾᛞ᛫ᚦᚱᛟ᛫ᛗᚨᚤ᛫ᚹᚨᛚᚴ᛫ᚨᛒᚱᛖᚨᛋᛏ”.  }:>


More messed up MTA with no communication.

I haven’t had problems with mass transit in a week or two.  So I was about due, right?  Right.

Went to catch the Light Rail on my way home from work.  Based on the timing, I figured I’d catch the 6:24 scheduled train from Timonium Fairgrounds.  From a distance, I watched a train go through at 6:19.  Based on the track signals and the fact that trains are almost never that early, I figured it was just really late and that another would be through shortly.

This turned out to be true; another one ran by a 6:34, presumably the 6:24 scheduled one.  I was assisting a couple of out-of-town women with directions and they hadn’t finished buying their tickets, so I said, “Oh, there should be another one in ten to twenty minutes.” (Reasoning that the next scheduled train (6:44) probably wouldn’t be more then ten minutes late itself.  More the fool I.)

Over the next twenty minutes, three trains went by heading north.  At 6:57, another one same by going south.  It drove right through the station without stopping while sporting a sign that said “not in service”.  Another train went by heading north at 7:12.  Finally, at 7:18, almost an hour since I’d gotten there and running 45 minutes behind the last train, another train finally came by to pick us up.

At no point was there any use made of the loudspeakers at the stop, nor does there appear to have been any announcement made via the MTA’s website or mailing lists.  The transit police officer at the station didn’t know what was going on.  (Though the delays didn’t stop her from doing a fare check once the train was underway.)


Reply-To: for Mailing Lists

Many mailing lists I’m on (particularly ones inhabited by sizable portions of geeks) have periodic “discussions” about the behavior of the Reply-To header on list email.  The discussions usually follow fairly predictable paths.  “Reply-To” Munging Considered Harmful is quoted.  Proponents respond with Reply-To Munging Considered Useful.

My opinion is threefold:

First, not touching the header is the purest solution.  For all the reasons in “Reply-To Munging Considered Harmful”, mailing lists shouldn’t touch it, in an ideal world, at least.

Second, “Reply-To Munging Considered Useful” gets one things right: it adds functionality.  Even today, most mail clients can either “reply” or “reply-to-all”.  If “reply” goes to the message originator, then they’re left with “reply-to-all” as the easiest way to send a message to the list.  (And sending mail to the list should, in most cases, be easy.)  Some MUAs, such at mutt and KMail [please if you know of others] add a “list-reply” function.  If most common MUAs had such a feature, I would find it much easier to advocate against munging.  But they don’t, and it’s rather elitist to insist that people switch MUAs just for your mailing list.  So, in today’s world and for an average list, I tend to vote for munging the header.  (Though I really like the (void) mailing list, which provides twin lists, one with munging, the other without.  All posts do, of course, appear on both lists.)

(Note that “reply-to-all” is not the same as “list-reply”.  “list-reply” sends one message—it goes to the list (unless the original email has a Mail-Followup-To: header, but that’s another post).  “reply-to-all” sends one message to the list, plus one message for each sender and recipient of the original email.  That adds up quickly for involved discussion threads, and all of those extra messages are wasts of both time (human and computer) and bandwidth.)

Third, I don’t have to care either way, because I use mutt and procmail.  (And, actually, the procmail part is optional.)  In my .procmailrc is the following recipe:

:0 fhw
| sed -e "s/^\\(Subject:.*\\)\\[$MATCH\\]\\( \\)*/\\1/I" \
      -e "/^Reply-To:.*$MATCH@/ d" \
      -e "s/^Old-Reply-To:/Reply-To:/"

$MATCH is already set to the name of the mailing list, which is usually also the local portion of its email address.  (Note that this loses information if the list doesn’t munge the header and someone has their own Reply-To header that happens to have the same local part as the mailing list.  In my experience, this is highly unlikely.)

My mutt config contains the following directives:

set ignore_list_reply_to=yes
set reply_to=yes
subscribe <list addresses>

so if a Reply-To header is set to an email address that mutt knows to be a mailing list, it ignores that header.  (And since those are ignored, I set “reply_to” to yes, so it always uses acceptable headers; by default it asks whether I want to use them.)  Thus, the “r” key always replies to the person who sent the message, the “g” key always replies to the sender and all recipients of the message, and the “L” key sends the reply to the mailing list the email was from (and any other addresses in the Mail-Followup-To header).


New Writings in SF 7

As far as I can tell, New Writings in SF 7 hasn’t been published since before ISBNs were adopted.  (Hence, no Open Library link.)  It’s a collection of short stories from authors that were, in 1966, “major new writers”.  Like many such collections, some stories are good while others are not.  The collection is, on balance, decent.

The first two stories, “The Pen and the Dark” and “Gifts of the Gods” are typical science-driven stories of the era.  The main characters are all male and serve merely to advance some particular scientific speculation.  The third, “The Long Memory”, I found to be too scrawny a story with a too-abrupt ending.  From there on, things get better, however.  “The Man Who Missed the Ferry” is probably my favorite of the set, with a just-slightly-surreal approach to things.  “The Night of the Seventh Finger” is rather moving, and I enjoyed its characterization.  “Six Cubed Plus One” was also good, if a little forced at times.  “Defense Mechanism” was interesting in its depiction of a future Earth.

It’s a thin volume, and I’d say that “The Man Who Missed the Ferry” alone was worth the time taken to read the whole book.