Fun With Unicode

A common annoyance when typing science-y things is that you often want to use symbols that aren’t available on standard keyboards: Greek letters, math symbols like ±, or em-dashes (if you’re the kind of person that thinks em- dashes are cool). It turns out that nowadays, you can type all of these symbols and more with relative ease and good confidence that other people will be able to see them, thanks to the Unicode standards. By now, not only is basically everything that works with text on a computer Unicode-enabled, but people have gotten their act together regarding entering Unicode characters if they’re not on your keyboard. I’ve found two mechanisms to be helpful for this.

The first is a “compose key“, which you can set up in your keyboard settings. This is a physical key on the keyboard that you tell the OS to interpret specially — I like to use the “context menu” key that lives near the right control key. The compose key is a “do what I want” kind of feature best illustrated by example:

Every graphical application that I’ve tried supports input this way. This table lists some of the many special characters you can construct with a compose key.

Some useful special characters aren’t supported by the compose key, though, including most of the Greek letters. It turns out that most programs that allow text input have a way for entering an arbitrary Unicode character if you know its Unicode representation. These representations are four-character hexadecimal codes and are usually written with a prefix of “U+”. For instance, the code for the Greek letter alpha is U+03B1. I’ve only needed to figure out two ways to use this information:

Obviously, we’ve all got better things to do than memorize Unicode codepoint values, but it’s the kind of information that goes well on a Post-It note on one’s monitor. The Greek letters in particular go in sequence, so β is U+03B2, γ is U+03B3, etc., so one can guess-and-check a little bit. Another one that I like to use is U+2022, the middle bullet: •.

With all of these skills in hand, one can go from typing “alpha = 0.5 +- 0.1 (reduced chi-squared = 1.2)” to “α = 0.5 ± 0.1 (reduced χ² = 1.2)”. Hooray!

Technical note. You can embed Unicode characters into string constants in Python programs if you give the interpreter a hint that your program is UTF8-encoded. This can be done by adding a bit to an Emacs modeline to make it read

# -*- mode: python; coding: utf-8 -*-

More info in PEP263.

Technical note #2. It turns out that the compose key definitions live in the file /usr/share/X11/locale/en_US.UTF-8/Compose, for my current setup at least.

Questions or comments? For better or worse this website isn’t interactive, so send me an email or, uh, Toot me.

To get notified of new posts, try subscribing to my lightweight newsletter or my RSS/Atom feed. No thirsty influencering — you get alerts about what I’m writing; I get warm fuzzies from knowing that someone’s reading!

Later: Whiteboard Videos

Earlier: CiteULike

See a list of all posts.

View the revision history of this page.