Mon, 07 Mar 2005

UTF-8 and XEmacs

In order to display Unicode characters properly, I also had to set

(set-terminal-coding-system 'utf-8)

in my startup files. For input, I found set-input-method, which is bound to C-x RET C-\ (which is extra fun, because C-\ is my screen escape character; well, at least it's not a command I'll be using over-frequently). I found I liked the latin-1-alt-postfix input method. Latin-1 characters are entered via magic two-key sequences such as "a'" for "á". It indicates candidates for replacement by underlining them and showing the possible second characters in the minibuffer. (That's the "alt" part. Normal postfix doesn't prompt.)

No real support for general Unicode, aside from switching input methods among the various charsets. Reputedly, XEmacs 22 will have much better Unicode support. With luck, it'll include an RFC 1345 compliant input method.

Addendum: XEmacs still has a habit of trashing some characters when saving files. (The characters get replaced by ASCII question marks.) This happens to a lot of characters. Plain old GNU Emacs gets almost everything right. I did have to do the same thing with set-terminal-coding-system as with XEmacs, but that appears to be the only necessary config change. About the only thing I see wrong is that it's using strlen() (or its equivalent) to determine line lengths, so lines with UTF-8 characters get wrapped prematurely. When selecting input methods, it seems to want to default to one named "rfc1345" (yay!), but there's no method by that name (boo). Perhaps it's in another Debian package.

Next geek project: See how well my carefully-customized XEmacs environment transfers to GNU Emacs.


Phil! Gold