Bayesian Blocks Analysis in Python

I wrote a Python implementation of the Bayesian Blocks algorithm. There’s already a good implementation out in public from Jake Vanderplas, but it had some issues that I’ll briefly mention below. I’m being a bad member of the community by writing a new version instead of improving the existing one, and I feel bad about that, but I needed to get something going quickly, and I might as well put it out there.

Bayesian Blocks algorithms are nice for binning X-ray lightcurves into sections of different count rates; or, you can think of them as providing dynamic binning for histogram analysis. J. Scargle has been the main person developing the theory, and has put together a very nice detailed paper explaining the approach in 2013 ApJ 764 167. The paper’s written in a Reproducible Research style, coming with code to reproduce all the figures — which is really awesome, but something to talk about another day.

The Scargle implementation is in Matlab, with an IDL implementation mentioned but accidentally not provided, as far as I can tell. Jake Vanderplas’ code in the AstroML package seems to be the main Python implementation. It’s very nice, but has a few issues with “time-tagged” events, which is Scargle’s term for the X-ray-style case of binning up rates from individual Poisson events with precise timing. In particular, the AstroML code doesn’t let you specify the start and stop times of the observation, which can have huge semantic consequences — e.g., if you observed two events at times 1.0 and 2.0, your interpretation is very different depending on whether you observed from times 0.0 to 3.0 or times -1000 to +1000. The Vanderplas code also doesn’t implement iteration on the “p0″ parameter or bootstrap-based assessment of bin height uncertainties, as suggested by Scargle.

My implementation has these features. It also has a fix for a mistake in the Scargle paper: equation 21 has a right-hand side of “4 – [term]” when it should be “4 – log [term]“. The new module is here on GitHub, and a script for doing a blocks analysis on a Chandra events file is here. Docstrings, tests, etc, are lacking, because as mentioned before I’m a bad community member. But if you want to fool around with the code, there it is.

Typing Greek Letters Easily on Linux

I’ve already written about easily entering special characters on Linux using the “Compose Key”. The only inconvenient thing about this setup was entering Greek letters — they’re not included in the default list of compositions. I’ve learned a few of the raw Unicode values (Ctrl-Shift-u 0 3 c 3 for σ) but that’s not exactly ideal.

You can customize the composition list to include things like Greek letters, with some limitations. Here’s the recipe for Gnome 3 on Fedora:

  • Copy the default mapping file, /usr/share/X11/locale/en_US.UTF-8/Compose, to ~/.XCompose.
  • Edit the file to include your new mappings — the format should be obvious. Here are my new entries that add Greek letters (γαγ), math blackboard bold capitals, and the thin nonbreaking space. I prefixed the Greek alphabet with “g”, so that Compose-g-b gives β.
  • You must be using the xim input method for the file to take effect. On my current Gnome/Fedora setup, I just had to run the “Input Method Selector” program and choose “Use X Compose table” from the list of options.

That’s more or less it. However, the settings don’t get applied consistently — there seems to be a conflict between the way that the Gnome Shell wants to do input and this customization. If you start a program from the terminal, custom settings seem to take effect, but if you launch it from the Shell directly, they might not. I haven’t yet found a way to get everything on the same page.

Hack: after login, run imsettings-reload ; gnome-shell --replace &  in a terminal.

Note: I initially put some example blackboard bold capitals in this post. They showed up OK in the editor, but the saved post was truncated right where I put the capitals. So there must be some bugs in WordPress’ Unicode handling.

Note 2: I initially had Compose-~-~ for a nonbreaking space, but it turns out Compose-<space>-<space> already does that.

Propagating Uncertainties: The Lazy and Absurd Way

I needed to write some code that does calculations and propagates uncertainties under a fairly generic set of conditions. A well-explored problem, surely? And indeed, in Python there’s the uncertainties package which is quite sophisticated and seems to be the gold standard for this kind of thing.

Being eminently reasonable, uncertainties represents uncertain variables with a mean and standard deviation, propagating errors analytically. It does this quite robustly, computing the needed derivatives magically, but analytic propagation still fundamentally operates by ignoring nonlinear terms, which means, in the words of the uncertainties documentation, that “it is therefore important that uncertainties be small.” As far as I can tell, uncertainties does analytic propagation as well as anything out there, but honestly, if your method can’t handle large uncertainties, it’s pretty useless for astronomy.

Well, if analytic error propagation doesn’t work, I guess we have to do it empirically.

So I wrote a little Python module. To represent 5±3 I don’t create a variable that stores mean=5 and stddev=3 — I create an array that stores 1024 samples drawn from a normal distribution. Yep. To do math on it, I just use numpy‘s vectorized operations. When I report a result, I look at the 16th, 50th, and 84th percentile points of the resulting distribution.

Ridiculous? Yes. Inefficient? Oh yes. Effective? Also yes, in many cases.

For instance: the uncertainties package doesn’t support asymmetric error bars or upper limits. My understanding is that these could be implemented, but they badly break the assumptions of analytic error propagation — an asymmetric error bar by definition cannot be represented by a simple mean and standard deviation, and an upper limit measurement by definition has a large uncertainty compared to its best value. But I can do math with these values simply by drawing my 1024 samples from the right distribution —skew normal or uniform between zero and the limit. I can mix perfectly-known values, “standard” (i.e. normally-distributed) uncertain values, upper limits, and anything else, and everything Just Works. (It might be hard to define the “uncertainty” on a complex function of a mixture of all of these, but that’s because it’s genuinely poorly-defined — analytic propagation is just misleading you!)

Another example: uncertainties spends a lot of effort tracking correlations, so that if x = 5 ± 3, then x - x = 0 precisely, not 0±4.2. My approach gets this for free.

I’ve found that approaching uncertainties this way helps clarify your thinking too. You worry: is 1024 samples big enough? Well, did you actually measure 5±3 by taking 1024 samples? Probably not. As Boss Hogg points out, the uncertainties on your uncertainties are large. I’m pretty sure that only in extreme circumstances would the number of samples actually limit your ability to understand your uncertainties.

Likewise: what if you’re trying to compute log(x) for x = 9 ± 3? With 1024 samples, you’ll quite likely end up trying to take the logarithm of a negative number. Well, that’s telling you something. In many such cases, x is something like a luminosity, and while you might not be confident that it’s much larger than zero, I can guarantee you it’s not actually less than zero. The assumption that x is drawn from a normal distribution is failing. Now, living in the real world, you want to try to handle these corner cases, but if they happen persistently, you’re being told that the breakdown of the assumption is a significant enough effect that you need to figure out what to do about it.

Now obviously this approach has some severe drawbacks. But it was super easy to implement and has Just Worked remarkably well. Those are big deals.

My New Snoozing Technique is Unstoppable

One of the great things about an academic job is flexibility in the hours. But this blessing can be a bit of a curse: it can be hard to get the day started when you don’t have to show up at work at any particular time. (Cue tiny violins, #richpeopleproblems, etc.) Like many people in this position, I’ve had a longstanding love/hate relationship with the snooze button on my alarm clock. I’d usually hit snooze four or five times and spend about 45 minutes half-awake feeling guilty for not really wanting to get up and at it.

Around a month ago I decided to try and get out of the rut. Unsurprisingly, the internet is full of advice on how to get out of bed in the morning, and equally unsurprisingly, most of it is dubious at best. Even less surprisingly, a lot of the advice is from self-appointed self-improvement gurus (whom I often find to be kind of bizarrely fascinating — they’re so weird!). One of these people is named Steve Pavlina, and I have to admit that I gleaned something from a blog post of his on the the topic. I didn’t take his advice (simulate waking up quickly during the day), but I did like his perspective: relying on sheer willpower is not the answer. Maybe it works for some people, but not for me, and it’s really not fun to start every day with a demoralizing lost battle against inertia.

I decided that I needed to find something fun and easy to do first thing every morning. Not necessarily something that’ll get me up and out of bed instantly — but something that I’ll look forward to doing and that’ll keep me awake as my brain warms up. I’m a bit of a nerd, so what fits the bill perfectly for me? Catching up on my favorite blogs.

Specifically, instead of checking them compulsively all day, I now read them all in one big burst in bed using my phone and an app called NewsRob. My alarm goes off, I hit snooze once, and the next time it rings I sit up and grab my phone. Catching up on everything takes about the same amount of time as my previous snoozing, but it’s actually fun and doesn’t make me feel guilty. As a side benefit, I’m no longer compulsively checking blogs all day!

People who aren’t me will presumably have different priorities. A friend of mine watches the Daily Show instead. (One advantage of that over the blogs is that it’s not always easy to read a lot of dense text while your head’s still fuzzy.) I’m sure you could come up with other options.

So far, the new system’s been working great. My mornings feel way better and what I’m doing is so easy that I don’t think backsliding is going to be a problem. Baby steps to a better life!

(apologies to David Rees for the title [←bad language])

Fun with SSH configuration: connection sharing

Because nothing says “fun” like SSH tricks!

More and more organizations that use SSH are restricting it through “login” (aka “bastion”) hosts: locked-down machines that are the only bridge between the wilds of the Internet and internal networks. These often ban SSH public-key authentication, making you type in your password for every login. This can get to be a hassle if you’re frequently logging into your internal network. (As a side note, the link above is to the best explanation I could find, but I imagine it’d still be confusing for a newbie. Someone needs to write clear pedagogical material on SSH with public keys!)

There’s a somewhat-new feature in OpenSSH that can make life easier, though. You can now configure it to automatically “share” preexisting connections: if you’ve got one connection going and try to start another one, it’ll reuse the authentication and set up the connection without needing a password. So if you keep your first connection going, you can open more of them for free. Clearly this won’t always be useful, but for me, it often is.

To set this up, put something like the following lines in ~/.ssh/config:

ControlMaster = auto
ControlPath = ~/.ssh/connshare.%h_%p_%r.sock
ControlPersist = no

Read the ssh_config manpage to learn about the meaning of these settings and what some possible alternatives are. Some versions of SSH don’t support the ControlPersist option, in which case you can just leave it out.

With this setup, you can log in once at the beginning of the day and not worry about typing your password until quittin’ time, if you don’t close the original session. Or, if you have a program that needs to operate over SSH but for some reason can’t deal with the password prompts, you can pre-authenticate the connection and skip them.

While I’m at it, a few other SSH tips:

  • If you log in to remote computers a lot, the single biggest favor you can do yourself is to learn how to use screen or tmux.
  • Despite the lack of good introductory materials, it’s also really worthwhile to learn how public-key authentication works, and to use SSH keys when appropriate.
  • … and a super useful thing about SSH keys is the ssh-agent program, which remembers your decrypted keys. This lets you skip passphrase entry without compromising security (at least by real-world standards).
  • You may know that ssh user@host command will run command on your destination. To chain SSH invocations, use ssh -t user@outerhost ssh -t innerhost. The -t option is needed for unimportant reasons related to password entry.
  • Finally, you can also use your ~/.ssh/config file to preset usernames, ports, X11 forwarding, etc. for specific hosts. See the manual page.

Reference: Running Chandra cscview on Fedora Linux

You’re supposed to access the Chandra Source Catalog with a Java applet, but it appears there’s no version of Java compatible with both the applet and Firefox. (In the sense that many Java versions have big security bugs and Firefox blocks them.) Thus although there are recommendations for using the applet on Fedora Linux I believe they are now inoperative. Here’s a solution.

  1. Download and install the latest Java 1.6 JDK. The JRE won’t cut it since we need the appletviewer program. The instructions above call for 32-bit, but I think that’s only for Firefox compat; 64-bit should be OK.
  2. Download a local copy of the HTML for the viewer applet. The official version specifies width="100%" which appletviewer can’t handle; I can’t think of a clever way to get around this.
  3. Edit the local copy to say something like width="640" height="480".
  4. But now we’ve lost the base URL, so create a subdirectory called client and download the jar files mentioned in the HTML, e.g., jsamp 1.0.0.0.
  5. Edit jdk*/jre/lib/security/java.policy and copy the java.security.AllPermission to apply to all applets. Yeah, we’re classy.
  6. Finally, run jdk*/bin/appletviewer file:///path/to/your/cscview.html .
  7. (Optional) Discover the cscview doesn’t do what you want. ☹

How to Increase App Space with a Partitioned SD Card on an HTC Nexus One Running Android 2.3.6 on Linux

This recipe written December 2012. Keep in mind that techniques vary a lot over time. And, yeah, sorry for the SEOriffic title.

I like a lot about the HTC Nexus One phone, but one of its annoying problems is that it has a tiny amount of storage space for apps. Maps, Twitter, Firefox, Go SMS Pro, and K-9 Mail are pretty much must-haves for me, and they alone almost fill up the phone’s “internal storage” for apps. It’s basically one-in-one-out for installing apps, and the constant “space is running low” messages are not only annoying but also indicate that the phone won’t perform certain important functions.

If you Google around, you’ll see various “app2SD” or “apps2SD” apps or scripts that promise to let you move apps to the SD card, which isn’t huge (4 GB for me) but is bigger than the 180 MB of the internal storage. There’s built-in support for this with some apps on Android 2.3.6, and some of the app2sd tools seem to be remnants from an earlier time. (Others are probably just trying to cash in on ignorant people.) You also see people talking about reformatting your SD card to add an ext-format partition to allow app storage there, which makes sense; the VFAT filesystem used by the SD cards by default doesn’t support some of the features needed for system files.

It took me a surprisingly long amount of time to find what seems to be the current best system: Super APP2SD by XDA Developers user TweakerL with assistance from several others. The basic approach is straightforward: in the early stages of system startup, a partition on the SD card is bind-mounted to a few magic app directories, redirecting their storage from the internal NAND memory to the SD. And that’s all you need to do!

As usual, the code written by the XDA people is a little dubious and scary, and the instructions are not great and also targeted at Windows users. So I’ve adapted their technique and documented what I did here. That being said, big thanks to TweakerL and friends for figuring this out and publishing the technique. The caveats:

  • You need a rooted phone.
  • Of course you run the risk of bricking your phone, killing all your apps, etc.
  • Apps become substantially slower to load and less responsive. The difference is noticeable and annoying. I still prefer having the freedom to actually install new apps, which I basically didn’t before, but this downside is significant.
  • The transfer is all-or-nothing, so you can’t keep a few apps on internal storage for faster loading.
  • The storage space reports on the phone become inaccurate.
  • None of your apps are available if your SD card is unavailable.
  • I’ve only been using this system for a little while, so there may be other issues that I haven’t yet appreciated.

Here’s how it’s done:

  1. Do some preparation.
    1. Root your phone, perhaps using my instructions. This has its own large set of caveats and hassles. For me, at least, the lack of app storage is an annoying enough issue that I’m happy that I went to the trouble of the rooting.
    2. You’ll need the Android SDK and the adb program. You just need the SDK, not the ADT bundle. My post on rooting has a bit more info on the installation.
    3. You also need to install Busybox on the phone to get a more powerful mount program for later. You install the Busybox app, then run it to actually install the needed files. It wanted to place all sorts of crazy programs (httpd??) that I disabled, but they’re probably fine. I think the only truly necessary tools may be mount and mknod.
  2. Back up everything, especially your apps (duh), their data, and your SD card.
  3. Add an ext3 partition to your phone’s SD card.
    1. You backed up your SD card, right? Because you’re about to wipe it.
    2. Somewhat surprisingly, you can do this while your phone is running. Connect your phone to your computer via USB and turn on USB storage. Don’t mount the VFAT partition.
    3. I added partitions using the very nice GNOME Disks program. I essentially shrank the first partition (by deleting and recreating it), changing it from 4 to 3 GB, and then I created a new ext3 partition of 1 GB. The volume labels aren’t important. The partitions should be primary/bootable. Android 2.3.6 can’t handle ext4, so use ext3. Make sure that the type of the VFAT partition is W95 FAT32; I had to change this manually after making the partition. You should be able to mount the VFAT and ext3 partitions on your computer after setting them up and verify their setup.
    4. Restore your backup of the VFAT files.
    5. If you disconnect the USB, your phone should reload the SD card and not notice any problems. If it does, reformat the card (which the phone will offer to do), then reconnect to your computer and figure out what you did wrong. For me, it was setting the partition type.
  4. Get the ext3 partition mounted on your phone.
    1. This part of the process involved some trial-and-error so I’m not sure what’s strictly necessary. I’m pretty sure that you need to reboot the phone to get the OS to notice the new SD partition.
    2. Turn on USB Debugging on the phone. Connect the USB and open up a shell using the Android SDK’s platform-tools/adb shell command. Use su to become root.
    3. Many of the built-in shell utilities are surprisingly braindead, even given that we’re running Unix on a tiny piece of plastic and metal. When in doubt, prefix commands with busybox to use the Busybox versions, which are generally smarter.
    4. Run busybox mount -o rw,remount / and busybox mount -o rw,remount /system to be able to monkey with the filesystem.
    5. You may also need to manually create device nodes to be able to mount the ext3 partition. The VFAT partition of the SD card is mounted from something like /dev/block/vold/179:1, but the Busybox mount program seems to cut off the filename at the colon. There seems to be an equivalent device node called /dev/block/mmcblk0p1, so I created /dev/block/mmcblk0p2 using busybox mknod and emulating the parameters of the VFAT partition.
    6. After the right magic has happened, I created a mount point at /mnt/sd-ext and was able to mount the partition with busybox mount -o nosuid,nodev,noatime /dev/block/mmcblk0p2 /mnt/sd-ext .
    7. Also create a mount point at /mnt/nand-data for later.
  5. Mirror the apps and data from the internal storage to the SD.
    1. The relevant internal directories are /data/app and /data/data. The XDA Developers approach also mirrors /data/media/Android, but I can’t find anything resembling this directory on my phone, so I suspect that it’s only present in newer Android versions.
    2. Create /mnt/sd-ext/data/app and /mnt/sd-ext/data/data and duplicate the contents of /data/app and /data/data using your favorite technique. You may need to use busybox cp and not just cp to get more useful options. Make sure to preserve permissions, ownership, etc.
    3. I touched /data/data/this_is_old and /mnt/sd-ext/data/data/this_is_new as sanity checks for later.
  6. Install the code for futzing with the mounts at boot time.
    1. The trick here is to intercept an invocation of /system/bin/debuggerd, which is apparently called early in the boot process.
    2. Move /system/bin/debuggerd to /system/bin/debuggerd.bin.
    3. Create /system/bin/debuggerd.shim with the following contents:
      #!/system/bin/sh
      /system/xbin/pkgw.earlymounts
      exec /system/bin/debuggerd.bin

      I did this using adb push onto the VFAT SD card then moving the files. You can’t push directly into /system/bin because of permissions. Give the file permissions and ownership mirroring debuggerd.bin.

    4. cd /system/bin ; ln -s debuggerd.shim debuggerd
    5. Create /system/xbin/pkgw.earlymounts with the following contents:
      #!/system/bin/sh
      
      export PATH=/sbin:/system/sbin:/system/xbin:/system/bin
      
      nanddata_dev=/dev/block/mtdblock5
      nanddata_mount=/mnt/nand-data
      sdext_dev=/dev/block/mmcblk0p2
      sdext_mount=/mnt/sd-ext
      
      busybox mount -o remount,rw /
      busybox mkdir -p $sdext_mount $nanddata_mount
      busybox mount $nanddata_dev $nanddata_mount
      busybox mount -o nosuid,nodev,noatime $sdext_dev $sdext_mount
      if test -f $sdext_mount/data/remount_is_safe ; then
        busybox mount -o bind $sdext_mount/data/data /data/data
        busybox mount -o bind $sdext_mount/data/app /data/app
      fi
      busybox mount -o remount,ro /

      It should be pretty apparent how it all works. It should also be apparent that the device paths and mount paths should be checked against the output of mount to be sure that everything is pointing in the right place. Set ownership and permissions on pkgw.earlymounts as with debuggerd.shim. (The mkdir -p above might not be needed? I don’t know what persists across boots. I also discovered that the Android shell doesn’t have [ as an alias for test by default.)

  7. Reboot the phone to verify that the shim works. If you log in with adb shell, the /data directory should be unchanged, but the /mnt/sd-ext and /mnt/nand-data should be mounted appropriately right from bootup.
  8. Activate the bind mounts.
    1. Enable them by touching /mnt/sd-ext/data/remount_is_safe.
    2. Reboot the phone.
    3. Along with the new mounts in /mnt, you should see that /data/data and /data/app are mounted from /dev/block/mmcblk0p2 now. Their contents should be identical and your phone should work. But /data/data should contain a file called this_is_new, not this_is_old. Your apps are now all living on your SD card!
    4. Check out how much storage your phone thinks it has free in the internal app storage space. Try installing a new, large app. The amount of free space should remain unchanged. (If nothing else has happened. Firefox’s hungry cache ate up some of my space and made me think something had gone wrong.)
  9. Clean up.
    1. Delete files in /mnt/nand-data/data and /mnt/nand-data/app to free up space on the internal storage. Make sure you’re working in the right directory! Try deleting something noncritical, then confirming that it still runs on the phone. Reboot again if that’ll make you feel better. Don’t delete critical-sounding apps and data in case your SD card goes missing. After doing this, my internal storage has 74 MB free, which should be enough for all sorts of temporary stuff.
    2. In the app manager, move apps back “from” the SD “to” internal storage — even though internal storage is now also on the SD. We’re just relocating things from the VFAT partition to the ext3 partition. This just makes it so you can still use those apps when you’ve got the VFAT mounted over USB.
    3. Go to the app store, go to My Apps, slide over to the All listing, and install all those old apps that you removed because of space restrictions.
    4. As a side benefit, you can now mostly back up your apps and their data directly over USB, rather than by having to use a dedicated backup program.

Big thanks to TweakerL and the XDA Developers folks for blazing the trail!

Update: Having been using my phone this way for the past few days, I’m still happy I did this, but I might consider only moving the app data, and not the apps themselves, to the SD card. There would still be a lot of space left in the system storage for a reasonable number of apps, and there’s a possibility it would help with the responsiveness issues I’m seeing.

Silence is Golden

Scientists love to write programs that are too chatty.

Chatty programs are eager to tell you what they’re doing and how they’re doing it. On the command line, this means you get a lot of output:

$ ./run-simulation.py input.dat output.dat
Science simulation by Peter Williams
Opening the input "input.dat" ...
   ... 300 items
Running simulation!
Step 1
DEBUG: n=12
Step 2
Step 3
Opening "output.dat" for writing ...
All done!
$

Contrast this with classic Unix tools, where silence implies success:

$ cp source.txt destination.txt
$

There are a lot of reasons why people write chatty programs: some good, some at least understandable. But I want to suggest that chattiness is something to be actively fought.

One concrete issue is that it’s hard to notice when chatty programs have problems. Showstopping errors are generally easy to pick out because they’ve, well, stopped the show, but I can’t tell you how many times I’ve missed some important warning message because it got scrolled off the screen by a bunch of inane diagnostics. And beginning programmers — which is what most scientists are — generally have trouble writing error-handling code and often write programs that charge ahead past conditions that should be showstoppers.

Less concretely but just as importantly, chatter is distracting. When I’m using a computer, I’m trying to get stuff done, and I get more done when I’m not wasting time thinking about low-level details: I’m in a line of work where my attention is one of my most valuable resources. A chatty program is like a needy employee who can’t make progress without coming back to you over and over for handholding and reassurance. Just do what I asked and stop bothering me unless you run into a problem you can’t solve. When I need to move files around in the terminal, I can bang out commands and I just know that everything is OK because I’m not seeing any messages. I have to stop and read the output of my simulation to see if it went wrong.

My advice is look at every line your program prints and ask yourself, do I really care? Vital outputs should be respected by not being surrounded by chatter, or by being recorded to disk instead of left to scroll off the terminal screen. Diagnostics should be separate: either optional and off by default, or output in some secondary way so that the “main” output is clean. Always keep in mind that every unhelpful message is drawing your precious attention away from the important ones.

For the beginners out there, I’d argue that reducing chatter also helps build confidence. One tends to have this irrational fear that the program usually works but maybe this time it’s derailed in some crazy way and is producing garbage. Seeing the same output over and over again is reassuring. But take those training wheels off! If you’re seeing the same output over and over, by definition you’re not learning anything new. To be confident in quiet programs, you do have to evaluate which parts of your program are the most likely to go wrong, and you do need to get good at handling error conditions. These are vital skills for competent programming, and the faster you develop them, the better.

Reference: Modern pyGTK+ stack on RHEL 5

Certain software projects like to target “enterprise”-class Linux distributions that have super old versions of certain key pieces of software. If, for instance, you want to actually use a somewhat modern GTK+ from Python, here’s the ordered list of dependencies that you’ll need to install:

  1. Python >= 2.6. (Versus the 2004-vintage Python 2.4 that comes by default on RHEL 5.) CASA bundles Python 2.6.5 and you need to use its Python to get its special libraries, so use that.
  2. libffi 3.0.11
  3. glib 2.34.2
  4. pixman 0.28.0
  5. cairo 1.12.6
  6. gobject-introspection 1.34.2
  7. gdk-pixbuf 2.26.5
  8. freetype 2.4.10
  9. harfbuzz 0.9.6
  10. fontconfig 2.10.1
  11. pango 1.32.2
  12. atk 2.6.0
  13. gtk+ 2.24.13
  14. pygobject 2.28.6 with this patch
  15. py2cairo 1.10.0 — uses “waf” build tool that may require the futzing mentioned at the end of this thread
  16. pygtk 2.24.0

Obviously, somewhat different versions are probably OK, though too-new stuff probably targets Python 3 and GTK+ 3 and opens a huge can of worms if your code is targeting version 2 of each of those.

Building everything is tedious but not generally difficult, though there isn’t much documentation about how to pass extra info to waf if it can’t guess the right flags for things.

Barycentric Julian Dates in Python

(I wrote some code to calculate BJDs in Python. Skip the next few paragraphs if you don’t want any context.)

I’ve been working on a problem that involves (somewhat) precise timing of astronomical events over the span of a year or two. Doing this right can be tricky since we generally learn about events by the light they give off, and this light can take varying amounts of time to reach the Earth from its origin; most notably, our motion around the sun changes the distance between us and sources by enough to affect the apparent time of an event by several minutes.

Of course, astronomers have known about this issue for a while, which is why we have Barycentric Julian Dates (BJDs). Times in astronomy are usually reported as some kind of “Julian date,” which is just a large number counting days since some reference value. (This is a much subtler task than you might think.) When we report a BJD, the apparent time of an event has been adjusted to be what it would be as if it we were observing at the barycenter of the Solar System, which is the coordinate origin of all modern, precise astronomical positional calculations. If the source of the events is stationary with respect to the barycenter, this gives us a steady clock with which to measure when each event happened.

To time precisely, then, we need to be able to compute the BJD of an event from its apparent time as measured at some telescope. Pulsar astronomers need to do these calculations to nanosecond precision, so there must be some ridiculously detailed code out there somewhere, but Google doesn’t bring up many options. Jason Eastman and collaborators at Ohio State University have a great resource with documentation, online calculators, and code. Their core seems to be based on a routine called BARYCEN by Eckart Göhler. Both reference documentation and some helper code from Craig Markwardt.

Unfortunately, all of this code is written in IDL. Let us never forget that 1) IDL is not free and 2) it is a lame, bad, unpleasant language. I generally do things in Python instead. I haven’t been able to find any Python libraries for BJD computations, so I added support to my library for precision astronomy in Python.

Now, both the library and the BJD support are halfassed. The library builds off of two well-vetted libraries, IAU SOFA and USNO NOVAS, but itself is not well-tested and only exposes a few routines I’ve needed — none of which has, in fact, required precision beyond the most basic levels. The BJD support, meanwhile, only corrects for the Earth’s orbit, and so is accurate to only 0.1 second or so: that is, a hundred million times worse than what pulsar astronomers need.

But it’s good enough for me! And if you want to compute BJDs in Python, hopefully you can build off of precastro rather than starting from scratch. It should be capable of extremely precise calculations even if there may be some wrinkles to work out, and as always I’ve tried to make the API as nice as possible.

Further References

These are basically copied from the resources linked above; I haven’t read them myself. Standing on the shoulders of giants and all that.

Extracting PDF Figures as PDFs in Linux

The perennial problem: you have a PDF of a paper, and you want to extract one of its figures to show in a talk or something.

If you’re a normal person, you open it up big in your PDF viewer, take a screenshot, and call it a day. Or your PDF reader just lets you draw a box and save its contents.

But maybe you’re running Linux and don’t have commercial software available; and maybe you want to make a vector screenshot, not a bitmapped one, so that full detail will be maintained and you can scale the figure around. Evince doesn’t support draw-a-box-and-save-it, so you’re sunk.

But you’re not! You’ll need xpdf, poppler-utils, and pdfcrop — all widely-available. You can draw a box on the page in xpdf, and you can also set up xpdf to run a command when you press a key, with the box parameters able to be substituted into the command line. Then the pdftocairo utility provided by poppler-utils can rerender the page to PDF while cropping to your selected box. pdfcrop makes the margins nice.

You need to put this short helper script in your $PATH. It takes the coordinates from xpdf, converts them to the system needed by pdftocairo, then chooses an output filename (fig${n}.pdf) and makes the magic happen. There’s nothing complex about it and it can easily be monkeyed with to fit your preferences.

Then put this line in ~/.xpdfrc:

bind ctrl-e any "run(xpdf-extract-helper.sh '%f' %p %x %y %X %Y)"

Straightforward enough, I hope. I chose Control-E as the keybinding for “extract”.

Now just open up your PDFs in xpdf, draw boxes, and hit Control-E! Here’s a sample result … converted to a PNG so that it can render inline, so that in a way it misses the whole point of the exercise. But there’s little PDF figure behind it, I swear.

McLean et al (2012 ApJ 746 23), fig 2. I assert without proof that I have a PDF version of this image.

Finishing Up at Berkeley

Well, I’ve just filed my dissertation with UC Berkeley, so I’m essentially done with my PhD! I’ve uploaded an electronic copy of my thesis here. I’ll be giving an exit seminar summarizing my work (and probably digressing a fair bit, because why not?) on Monday. It’s been a great time at Berkeley, though I’m also pretty excited to return to Harvard to work with Edo Berger. Onward!

A Short Bibliography of Structure Functions

“Structure functions” (SFs) are often used for the analysis of variability in astronomy. Today I spent some time tracking down some of the original references and I thought I’d record what I found for posterity.

It’s surprisingly difficult to find pedagogical references for SFs as used in the astronomical context. Lindsey and Chie below refer to them as “Kolmogorov structure functions,” and there’s a Wikipedia article on that topic, but it’s far more abstract than what astronomers would want: in fact, I’m pretty sure that it’s totally unrelated, but it’s so abstract that I can’t actually tell. The basic references define SFs in terms of “increments” on random processes but I can’t find a lot of explanatory material in that direction either. For pedagogical background, Lindsey and Chie below reference generic textbooks on random processes (e.g., “Theory of Stationary Random Functions,” A. M. Yaglom, Prentice-Hall, 1962).

One of the original references, if not the original reference, seems to be Kolmogorov (1941), in “Doklady Akademiia Nauk SSSR” 30 301, which is translated and available online in 1991 RSPSA 434 9. (This is the Kolmogorov turbulence paper.) Unsurprisingly it isn’t very helpful pedagogically.

The next step along one particular intellectual path is the apparent introduction of SFs into the engineering literature by Lindsey and Chie (1976) [DOI], with an overview in their Appendix I. A few years later Rutman (1978) [DOI] also presented SFs in a very similar context (in that paper’s Section VIII). I haven’t digested the papers but Rutman is much more highly-cited than Lindsey and Chie. Prokhorov et al (1975) [DOI] appear to have introduced SFs in the IEEE Proceedings first but don’t explain things as clearly.

It looks like SFs were then brought to astronomy by Rickett, Coles, & Bourgois (1984), though a close second are Simonetti, Cordes, & Heeschen (1985). The latter reference Lindsey & Chie and Rutman and have an appendix that gives a nice rundown of some of the basic useful properties of SFs in the same style as those. Rickett et al have a less helpful explanation.

After writing the first version of this bibliography, I found Emmanoulopoulos et al (2010), which gives a nice and thorough astronomical SF bibliography and also discusses caveats of SFs in astronomical usage. Another couple recent papers going into SF properties in various specialized astronomical circumstances are Hughes, Aller, & Aller (1992) and de Vries et al (2005). There are many more relevant recent papers out there than just these, though.

Poisson Distribution Confidence Intervals in Scipy

As happens often when doing transient searches, lately I’ve found myself wanting to compute confidence intervals in rate measurements from a small number of events. Clearly I want to be using the properties of the Poisson distribution, but how so, exactly?

The “Parameter Estimation” section of the above Wikipedia page gives some hints but involves a few notation switches and also special functions that don’t seem to be implemented in the handy scipy.special or scipy.stats modules. Once you track down the notation, though, it turns out to be really easy. We want the inverse incomplete regularized Γ function.

Let’s say you observe n events in a period and want to compute the k confidence interval on the true rate — that is, 0 < k ≤ 1, and k = 0.95 would be the equivalent of 2σ. Let a = 1 − k, i.e. 0.05. The lower bound of the confidence interval, expressed as a potential number of events, is

scipy.special.gammaincinv (n, 0.5 * a)

and the upper bound is

scipy.special.gammaincinv (n + 1, 1 - 0.5 * a)

Conversion to rates is then just a matter of dividing by the relevant duration, area, whatever. These apparently give the “exact” interval which may be minutely more strict than is necessary. The halving of a is just because the 95% confidence interval is made up of two tails of 2.5% each, so the gammaincinv function is really, once you chop through the obscurity, exactly what you want.

So in this example, n is 14, a is 0.05, and we divide by 400 to get the interval bounds as reported (0.019135 to 0.058724, in case that webpage disappears).

It took me a while to understand the numbers in the first table on this page, but they’re very helpful once you get over that: they give the 95% confidence limits for a given number of observed events, when that number is small. I.e., if I observe one event over some period, my 95% confidence interval is 0.0253 to 5.5616 events. These numbers don’t quite make sense on their own, since you can only get integral numbers of events in practice, but you can use them to compute rate limits that are perfectly meaningful.

(Once you think about it, the relationship between the expressions for the two ends of the interval is kind of cute. Because the Poisson distribution can only take on integral values, the 2.5% upper limit to the value n must be the 2.5% lower limit to the value n + 1, because there are no intervening values that can be chosen.)

Quick Note on Compiling CASA

Today I started compiling my own version of CASA to track down and destroy a bug that’s stalling my data analysis. (It’s in what should be a simple and robust subsystem, which is depressing, but that’s a whole other story.)

Any normal person will find building CASA to be an enormous challenge, but fortunately I have a ton of experience in this area and although things are incredibly tedious, they’ve been fairly tractable.

Excluding one huge pitfall that I fell into.

CASA requires a software package called ccmtools. If you Google it, you find a stale SourceForge page for the project. Do not go there. If you read down the Google results, you find Building CCMTools as required by CASA. Do not go there. These resources reference an older version of ccmtools which uses the worst build system I have ever seen and doesn’t even work anyway. I found a bit of gruesome fascination in the intersection of idiotic design, worthless documentation, and grossly incompatible versions, but that was all below a layer of bottomless, unfathomable rage. Don’t go there.

Instead, go to Compiling casapy from source in Ubuntu and follow the tip: there’s now a CASA fork of ccmtools which builds in a much saner (though still somewhat weird) way. For the love of all that is good, use it. I’m going to give it some more links so that hopefully Google reorders its results. Look, ccmtools for CASA! If you want to compile ccmtools, use the NRAO version of ccmtools!

As I mentioned, the rest of the CASA build process is still a huge pain, and would be impossible for the average person, which is too bad, but (so far?)  this has been by far the worst part.

Finally, to the guy who wrote Confix: please do not ever write code again.

Reference: Gaussian Modeling with MIRIAD maths

A note to self. Sometimes I want to use maths to generate an image with a large Gaussian component. Let’s say that the total flux I want to capture is F (in Janskys). Using what we know about normalized Gaussians the expression I want is something of the form \frac{F}{2\pi\sigma_x\sigma_y} \exp\left(-\frac{1}{2}\left[\frac{x^2}{\sigma_x^2}+\frac{y^2}{\sigma_y^2}\right]\right). Assuming circularity and simplifying, we get A\exp\left(B\left[x^2+y^2\right]\right), where A = \frac{F}{2\pi\sigma^2} and B=-\frac{1}{2\sigma^2}.

For now I’ve thought through the implementation using arcsecond units. Say we have a 2047×2047 image with a pixel scale of 10 arcsec that’s uniform across the image. If I want σ to measure the ATA primary beam, with a FWHM of 3.5°/GHz, I get σ = 5350.7 arcsec/GHz. To generate a model-type image, we need to work in Jy/pixel, so A must be scaled by the arcsec²/pixel conversion ratio, 100 in this case.

Implementing this in maths is straightforward. You use the xrange and yrange keywords to implement the Gaussian expression, ranging them from -10230 to +10230 in the example here.

Multimedia Abstraction

I want to call out something neat I saw on astro-ph the other day. Linda Watson and collaborators posted a paper, Properties of Bulgeless Disk Galaxies II. Star Formation as a Function of Circular Velocity, with the following comment:

23 pages, 13 figures, 4 tables. Accepted for publication in ApJ. For a brief video explaining the main results of this paper, see this http URL

Indeed, if you follow the link, you can see a nice 5-minute video of Linda explaining the gist of the work, a sort of video abstract.

I like to think that my reading comprehension skills are pretty OK but I often scan abstracts on interesting-sounding but unfamiliar topics only to later realize that I’ve absorbed absolutely nothing from them. I think the video alternative is great for this situation, for the same reason that we’ve got lectures as well as textbooks: there’s something about watching a person talk out an idea that’s evidently a lot easier for the brain to follow, even if reading is more efficient.

It’s also nice to put a human face on the work. Even for those topics that I know more about, where the written abstract doesn’t go in one ear (eye?) and out the other, I think it’d be fun to see the author’s body language and maybe glean a bit about their approach to the topic. Hopefully Linda’s paper will be the start of a trend!

Imaging Algorithms vs. Perfect Data

Fun fact: when I’m not rooting my phone or Linuxing my computer, I do some astronomy. I’ve been doing some data simulations and used the opportunity to look into the effect that different approaches to interferometric imaging have. The image below summarizes what I’ve found:

Residuals from point-source imaging

Using real ATA observations as a template, I generated some perfect fake data of a single point source — no noise, calibration errors, etc. I’m interested in wide-field imaging so I put the source at 0.63 deg from phase center. I then imaged the data using the casarest lwimager fork of the CASA imager with various options.

The top-left panel shows the imaging residuals using traditional grid-and-FFT and 200 iterations of Högbom (1974) clean. The S/N is just 1001 — for ideal data with no noise!

The top-right panel turns on the w-projection algorithm with 128 planes. The S/N jumps to 2987. I’m too lazy to measure it, but the speed penalty is nontrivial. Annoyingly, a bunch of time is spent in startup calculating convolution kernels that are constant from one imaging run to another in most cases. In the imaging that I do, a lot of time could be saved by caching those kernels on disk.

The bottom-left panel turns on the “wide-field” Högbom clean which subtracts the modeled CLEAN components from the visibilities. This also adds a significant time penalty. The S/N increases to 8713.

Finally, and most interestingly to me, in the bottom-right panel I’ve moved the source slightly to land it precisely in the center of the nearest image pixel. The S/N jumps to 34542. This effect is investigated in Cotton & Uson (2008).

Not shown: using Cotton-Schwab clean with w-projection gives results nearly identical to those of the wide-field Högbom clean. Wide-field Högbom should theoretically give more accurate results since it uses the full dirty beam in the image domain. In a quick test, CS clean is about 30% faster than wide-field Högbom.

Reference: How To Root an HTC Nexus One Running Android 2.3.6 on Linux

This recipe written in March 2012. Keep in mind that the techniques vary a lot over time.

I recently had to get my Nexus One phone repaired, which involved discovering 1) that to do so requires getting the phone’s data wiped and 2) you need a rooted phone to actually back up all of your data. It was too late to be able to back up my data, but as long as my phone came back wiped, I figured I’d make it so I was able to back things up thoroughly in the future. This is what I learned.

Rooting your phone is a pain in the ass. It took me a long time to figure out this recipe. Instructions online are usually some combination of confusing, outdated, inaccurate, or inappropriate. Most instructions reference XDA Developers Forum threads which are all of the above except more so, difficult to browse, and often filled with useless chatter. (I can see why they do the things they do, but there are a lot of drawbacks too.) Most of the rooting instructions are for Windows or Macs. The phone rooters using Linux are, in general, clearly not very familiar with the OS.

Rooting your phone voids your warranty and wipes your user data. Of course, if you still have a Nexus One, your warranty is probably long gone anyway. In some versions of Android, there are ways to gain root without losing your data, but my impression was that these are more difficult and rely on exploits that have been closed in Android 2.3.6. (By the way, Android 2.3.6 is in the “Gingerbread” series; as always, Wikipedia has a good table matching Android codenames and versions.)

That being said, here’s the recipe:

  1. Preparatory work
    1. Turn on data syncing on your phone in Settings → Privacy → “Back up my data”. I have no idea how long it takes for these data to get synced, so if the setting wasn’t on before, do this well in advance of actually rooting your phone.
    2. Install backup software on your phone and use it. These back up various pieces of phone data to your SD card, which can then be transferred to your computer over USB. MyBackup Pro (non-free) seems to be the consensus choice. The whole point of this exercise is that without root you can’t back up everything, but you can back up a lot of important stuff like SMSes and call logs. MyBackup Pro wants you to create some online account but you can skip that part. (The hands-down favorite for general backup is Titanium Backup, but it requires root access, which we don’t have yet.)
    3. Back up your phone’s SD card onto your computer using the USB cable. Free up at least 500 MB on the card to make room for backups. (The backup tool won’t run if you have less than this amount free, so it’s not like you can get away with less if you don’t have much installed.) These backups will include the phone data backups you just made inside rerware/ on the SD card (if you used MyBackup Pro).
    4. Install the Android SDK. Once the tarball has been unpacked, you need to run tools/android to get a GUI that will let you install many of the components actually needed to do stuff with the phone. Be sure to install the components for the correct version of Android API. Version 2.3.3/API 10 worked for me. You can skip all of the vendor-specific packages (some of which take a long time to download). In the end, the programs platform-tools/adb and platform-tools/fastboot should be available.
    5. Download a “recovery image,” which is a tool that lets you boot into an alternate OS that lets you monkey around with your phone. These all seem to be posted on XDA Developers. They have a Nexus One Recovery Images wiki page that allegedly collects such images. “Amon_Ra’s Recovery” version 2.2.1 worked for me: the file’s called recovery-RA-passion-v2.2.1.img and I got it off this forum thread. Just to be clear, many of the following steps assume that you’re running this recovery image or a very similar version thereof.
    6. Download a “superuser utility,” which is a combination of Android software (“APK”) and some low-level OS hooks that almost all applications build on to do rooty things. There seem to be a ton of variants. Almost everybody links to the version mentioned on this XDA thread, but the version described there is incredibly out of date. If you trace the downloads, they lead to this listing page, which has more recent versions. The file called Superuser-3.0.7-efghi-signed.zip worked for me. (It was at this point that I realized that not only are Android OS versions named after desserts, but those names proceed in alphabetical order.)
    7. Set up a udev file allowing the fastboot program to identify your phone. This seems a little silly but does appear to be necessary. I created a file named /etc/udev/rules.d/99-android.rulescontaining these lines:
      SUBSYSTEM=="usb", SYSFS{idVendor}=="0bb4", MODE="0666", OWNER="peter"
      SUBSYSTEM=="usb", SYSFS{idVendor}=="18d1", MODE="0666", OWNER="peter"

      The OWNER parameter should be substituted with your username. The syntax for these rules is apparently deprecated but whatever.

  2. Unlocking the bootloader
    1. Ensure that your SD card is backed up and you have 500 MB of free space on it. Prepare to lose all of your data.
    2. Power off your phone. Disconnect it from the USB cable if it’s plugged in. (I haven’t verified that you need to do this, but other people’s instructions seem to suggest this.)
    3. Boot it into the “fastboot” screen by holding down the trackball while pressing the power button. You should quickly get a little semi-textual menu screen.
    4. Plug your phone into your computer with the USB. In the Android SDK, run platform-tools/fastboot oem unlock. If this complains about not being able to find your phone, your udev rule may not be functional.
    5. A screen will pop up on your phone asking you if you Really Want To Do This. You do. Unless you don’t.
    6. Magic will happen and your phone will boot up. You’ll see a little open lock icon at the bottom of the splash screen indicating that the bootloader is unlocked.
    7. Reenter your Google account information, Wi-Fi connection password, and other basic stuff.
  3. Booting into the recovery image
    1. Get back to the fastboot menu: power off, disconnect USB, boot with trackball held, reconnect USB.
    2. Run platform-tools/fastboot flash recovery /path/to/recovery-RA-passion-v2.2.1.img. This should succeed without errors.
    3. In the fastboot menu, choose Bootloader.
    4. In the bootloader menu, choose Recovery.
    5. You should boot up into the special recovery mode. NB: I originally thought that after I flashed in the special recovery image (step 3B) I could boot into this special mode whenever I wanted. In my experience, it only works if you boot into recovery mode immediately after flashing the image. Otherwise I get a hard lock that requires popping out the battery to fix.
  4. Monkeying around in recovery mode
    1. Choose the “Nandroid backup/restore” menu option. Back up everything that seems reasonable. Since the bootloader unlock has just wiped your data, this won’t back up anything personal, but it’ll back up your phone OS in case you break things later.
    2. Go back to the main menu (volume down key) and turn on “USB-MS”. (The USB cable should still be plugged in from the fastboot flash steps.) This will activate your phone as a USB filesystem.
    3. Copy your nandroid backup off the SD card (nandroid/ and potentially .android_secure/).
    4. Copy your superuser utility onto the SD card.
    5. Unmount the phone on your computer and turn off USB-MS on your phone.
    6. In the main recovery menu, choose “Flash zip from sdcard”, and choose your superuser utility. It should print out some messages about what it’s doing. You’re now rooted.
    7. In the main menu, reboot into the regular OS.
  5. Demonstration of rootiness
    1. Besides back up your data, one thing you can do while rooted is remove built-in applications. To get started, turn on “USB Debugging” in the Settings → Applications → Development menu on your phone, and plug in the USB cable.
    2. In the Android SDK, run platform-tools/adb shell.
    3. This is a tiny little busybox shell running on your phone. You can ls and see the contents of your phone’s builtin storage.
    4. And you should be able to run su to get root access. Do so.
    5. Remount the /system partition in read-write mode: mount -o remount,rw /dev/block/mtdblock3 /system.
    6. cd /system/app
    7. Running pm list packages -f will give you a list of installed packages and their corresponding APK files, which are found in this directory. To remove a package, first manually delete its APK file, then run pm uninstall [ident], where [ident] is the Java-style name of the package such as com.facebook.katana.
    8. Close shell, disconnect USB, disable USB debugging, etc.
  6. Cleanup
    1. Your phone should now be booted into your regular, barely-initialized Android OS. Reinstall your backup program and restore backed-up data. If using MyBackup Pro, I found that I should not restore my calendar — all of my calendar events got duplicated. I also got messages about massive deletion of my contacts, and I’m pretty sure I should have chosen not to restore those either. At some point all of my Twitter “contacts” also got merged into my Google contact list, which was really annoying.
    2. Begin the painstaking process of re-inputting all of your settings. There are a lot more than you thought.
    3. Remove the udev rules file you created.
    4. If you so desire, clean up some of the backups and things off your SD card.
    5. Finally — the main point of all of this — after you’ve had a while to get all of your settings back, run a root-powered full backup of your phone data, and copy that backup off the SD card. Once you have root, consensus seems to be that Titanium Backup is the gold standard.

Here are some of the resources I used when piecing together all of this information: