Artist’s impression of red dwarf star TVLM 513-46546. ALMA observations suggest that it has an amazingly powerful magnetic field (shown by the blue lines), potentially associated with a flurry of solar-flare-like eruptions. Credit: NRAO/AUI/NSF; Dana Berry / SkyWorks

“The First Millimeter Detection of a Non-Accreting Ultracool Dwarf”

Image credit: NRAO/AUI/NSF; Dana Berry / SkyWorks

Our latest paper is finally out! It’s called “The First Millimeter Detection of a Non-Accreting Ultracool Dwarf” and is accepted to The Astrophysical Journal. The Center for Astrophysics and the NRAO are putting out press releases about it including a cool animation.

This is a project that Edo and I did in collaboration with a group of brown dwarf experts in the UK: Sarah Casewell, Craig Stark, Stu Littlefair, and Christiane Helling. Craig is an expert in exotic plasma processes and came up with an idea about how some of them might operate in the upper atmospheres of very cool stars and brown dwarfs — in line with an overall physical model that Christiane has been developing for quite a while. If so, these processes would cause these cool stars to become quite bright at millimeter wavelengths (or, equivalently, frequencies around 100 GHz; these are the sorts of wavelengths used in new full-body airport scanners). The ALMA telescope would be the perfect tool to try to detect this emission, and just about two years ago Sarah got in touch with me about collaborating on this project since I have experience working with the long-wavelength interferometric data that ALMA produces.

ALMA has annual proposal deadlines and generally does things at a … stately … pace. Sarah led up the proposal, which was eventually accepted, and ALMA finally delivered our data in late April of this year.

Craig’s plasma processes are, well, quite exotic, so frankly we weren’t sure if we were going to see anything. So I was quite excited when I took a first look at the data and saw a strong detection of our target, a famous low-mass object called TVLM 513–46546! (Well, it’s famous among people who care about exotic phenomena in ultracool dwarfs.) Stuart promptly went out and obtained a night’s worth of visible-light monitoring of our target, since it’s known to vary periodically at visible wavelengths — if there were any variability in the ALMA data, it would be interesting to see how it compared to the optical, and you want contemporaneous data in order to be able to compare effectively.

Once we got to sit down and look at the data carefully, it didn’t seem as if the emission matched Craig’s exotic plasma processes. Instead, it comes from a different surprising source! It seems to come from the same general kind of synchrotron process that is likely responsible for TVLM 513’s emission at centimeter radio wavelengths (at least its non-bursting emission, but that’s a story for another time), just operating at much high energies. The physics of the synchrotron process tell us that emission at ALMA wavelengths must be coming from very energetic electrons — ones moving close to the speed of light — gyrating in fairly strong magnetic fields. This is really awesome! It’s kind of incredible to me that this big, relatively cold ball of gas can both self-generate strong magnetic fields (in some places thousands of times stronger than the Earth’s) and somehow channel its energy into accelerating electrons to at least 99% the speed of light.

Even though you might think that TVLM 513 would be a fairly dinky little object, our data help paint a picture in which it’s surrounded by an intense radiation environment — much stronger than the Sun’s, in fact. This is important since astronomers these days are very interested in finding planets around low-mass red dwarfs. In order to be warm enough to have liquid water, such planets would need to orbit fairly close to their host stars, but this means that they’d also be that much closer to the kind of radiation sources we observed with ALMA. It’s an active area of research to try to understand how harmful this radiation, and related effects, would be for any life on such planets.

Our finding is also very interesting physically since the electrons that produce the ALMA emission probably lose all their energy in a matter of a few hours. So, if these electrons are accelerated to such high energies only in big bursty events, we got very lucky during our observation. On the other hand, if the emission we observe turns out to be very steady, that implies that the acceleration process is probably operating continuously at a low level. The mechanism of this acceleration process is precisely one of the big mysteries in this field, so I’m excited about the possibility of using the time-dependence of the ALMA emission to learn more.

In fact, there’s a hint in our data that the ALMA emission had a bit of a burst in synch with the time when the optical modulation would have been at its maximum. (We’re glad that Stu got his data!) My best estimate is that there’s about a 7% chance that the observed data are just a noise blip, which isn’t too high, but is higher than you generally want when trying to make a scientific claim. To be honest, I’m not quite sure what the explanation would be if these pieces of emission turn out to be synchronized, but you can bet that I want to find out if they are.

Unfortunately (or is it ironically?) we got these data the day before the 2015 ALMA proposal deadline, so we weren’t able to submit a plan for follow-up observations. We’ll certainly submit one in 2016, which means that hopefully we’ll get more data in 2017. I’d like to get data sooner, but ALMA is the only telescope in the world that’s powerful enough to get the data we need, so it’s hard to complain too much!

This result opens the door to a lot of new investigations. While there are lots of details that we’d like to fill in about TVLM 513, I’d really like to observe a larger sample of similar objects and see how many can sustain this high energy acceleration process. Properties of the emission such as its time-dependence and polarization will fill in the physics and help us understand what’s driving these processes, as well as painting a clearer picture of how this radiation might affect habitability.

To be honest, there’s probably a small silver lining in the fact that we’ll have to wait a bit to try to get more ALMA data. The way the ALMA schedules observations right now is very rigid, in the sense that it’s hard to make special requests: right now you can’t even request a single observation that lasts longer than two hours. (I had a proposal for a four-hour observation rejected as being “technically impossible”; not that I’m bitter.) The kinds of observations that we’d really like to do — variability monitoring; simultaneous observations with the Very Large Array — do not fit well into that scheme at all. There are not-crazy reasons for the current way of doing things, but I think it does preclude a lot of exciting science. We will advocate for more astronomer-friendly rules and with luck we’ll have more flexibility the next time we apply.

Extensible prompts in “jupyter console”

Scientific Python users may be aware that the well-known IPython project is transitioning to a new name, Jupyter, to reflect the fact that in recent versions you can use the notebooks, etc. with languages other than Python. As part of this, the venerable interactive console program ipython is transitioning to a new program, jupyter console.

Both of these programs let you customize their command prompts. However, the documentation on how to do this isn’t great, and the new Jupyter-ified architecture means that some of the examples out there don’t work anymore. The key issue is that under Jupyter, the “shell” that displays the user interface is a separate program than the actual Python code you’re running, and the only way to modify the shell’s behavior is through extension plugins.

Here’s a snippet of shell extension code that sets up a special prompt substitution called “shorttime”. Under current versions of the Jupyter console program (4.0.3), it should go in a file named something like ~/.ipython/extensions/

# This file is dedicated to the public domain.
_loaded = False

def load_ipython_extension (ip):
    global _loaded

    if _loaded:

    from IPython.core.prompts import LazyEvaluate
    import time
    ip.prompt_manager.lazy_evaluate_fields['shorttime'] = LazyEvaluate (time.strftime, '%H:%M')

    _loaded = True

Then you can cause this extension to be loaded and modify the prompt of your shell using the standard configuration framework. In current versions of the console (4.0.3), the relevant file is still ~/.ipython/profile_default/ A standalone version of that file that would set things up is:

# This file is dedicated to the public domain.
c = get_config ()
c.InteractiveShellApp.extensions = ['shorttime_ext']
c.PromptManager.in_template = '{shorttime} \\# >>> '
c.PromptManager.in2_template = '{shorttime} {dots}...> '
c.PromptManager.out_template = '{shorttime}   \\# < '

# This suppresses 'Loading IPython extension: foo' messages on startup:
import logging
logging.getLogger ('ZMQTerminalIPythonApp').level = logging.WARN

At the moment, this setup should work in both the older ipython program as well as the newer jupyter console. As far as I can tell, there’s no simpler way to customize what your prompt can show.

A Brief Note on Geoff Marcy

We live in the future, a strange place full of contradictions, like the fact that the ridiculous clickbait factory BuzzFeed is also the venue that finally blew open the longstanding open secret that famed astronomer Geoff Marcy is a serial sexual harasser. For the most part I don’t have anything special to add to the discussion surrounding this story and it seems better to leave space for other folks with more direct connections to speak for themselves. But I did have one thought that was marginally too long to fit into a tweet.

It seems that the report coming out of Berkeley’s recent Title IX investigation is pretty clear-cut regarding Geoff’s actions. I think — and it seems many others would agree — that UC Berkeley’s response to this report has been appalling. As far as I can tell there have been no real consequences, either negatively for Geoff or positively for his targets. This would be bad in its own right, but my understanding is that he’s been around this merry-go-round multiple times already, each time promising to do better but receiving no real punishment. I’m sure there are many things we on the outside don’t know about the case and its handling, but the core is this: while it is extraordinarily difficult to discipline tenured faculty, the people in positions of authority here owe it to everyone to find a way to do so.

Here’s my thought: while the people in positions of authority have let us down here, we shouldn’t fantasize that we’d necessarily have done any better. I know some of the people involved in this case and believe that some of them are in at least the 90th percentile of righteousness in our community. For those of us who are not serial sexual harassers or their buddies, I think this case should remind us that living ethically is a matter of hard work as much as it is of righteousness. How often do I really stop and effortfully interrogate myself about the implications my actions and failures to act? I like to think of myself as a decent person, but the answer is surely “not often enough”.

It’s not fully rational, but news stories like this one generally inspire people to take actions in a way that everyday injustices don’t. Some of us — sexual harassers and their enablers — have their work cut out for them. But all of us should be inspired to question ourselves, raise our personal standards, and bust our asses, every single day, to keep them high.

OK, one other thought: the habit of silence that surrounds these things is truly strange. When it comes to institutions that don’t want to air dirty laundry, it’s not a surprise. But I’ve also had conversations about definite harassers, in completely private, trusted groups, in which we’ve still been intensely reluctant to name names.  I don’t fully understand why, but empirically it’s true. My thanks to reporter for writing her piece. And of course it’s impossible to be too vocal in expressing thanks and admiration for those of Geoff’s targets who went on the record, including the very courageous Jessica Kirkpatrick and Sarah Ballard who attached their names to their stories.

“The Rotation Period and Magnetic Field of the T Dwarf 2MASSI J1047539+212423”

Long time, no blog! I’ve been busy with a bunch of things this summer. These have mostly not been my latest paper (DOI, arxiv, ADS), which was submitted in February, but it’s finally out in ApJ so it feels like a good time to finally write about it.

The full title is “The Rotation Period and Magnetic Field of the T Dwarf 2MASSI J1047539+212423 Measured From Periodic Radio Bursts” and that pretty much sums it up. The work builds on an earlier paper describing VLA follow-up of this awesome, cold brown dwarf, a.k.a. 2M 1047+21, which was discovered to have radio emission by Route & Wolszczan in 2012. The main thing about this radio-emitting brown dwarf is that it’s much colder than the other published radio emitters; its effective temperature is just 900 K, while the next-coolest emitter at the time of discovery is at 1900 K. This is really encouraging since a big theme in the field is pushing to detect radio emission from cooler, and less massive objects, hopefully down into the (exo)planetary regime one day. The initial detections suggested that even at 1000 Kelvin cooler, the radio and magnetic activity levels held up pretty well.

Our follow-up paper confirms this. With a longer VLA observation, we both confirmed the first detection and discovered the periodic, highly-polarized radio bursts that are a hallmark of planet-style auroral current systems. This isn’t shocking, since both planets and higher-mass brown dwarfs show these bursts, but this is great evidence that the processes really do scale continuously between these regimes. And we also get to read the object’s rotation period off of the burst periodicity. The burst shapes are erratic so it’s a bit challenging to do this very precisely (see the paper if you’re curious), but the basic number is about 1.8 hours for a full rotation. This is about 150 times the Earth’s rotation speed, if you’re talking miles per hour, though it’s in line with similar objects. That being said, this is one of only a handful of rotation periods measured for mid/late T dwarfs, since they’re very faint and seem to have pretty uniformly-colored atmospheres (so that it’s hard to see clouds rotating in and out of view).

We also observed 2M 1047+21 at slightly higher frequencies and saw a single burst that seems to be due to the same process. The radio frequency tells you about the dwarf’s magnetic field strength, so we deduced that this dwarf’s magnetic field reaches strengths of about 3.5 kG. That’s as strong as you get in active stars, and about 250 times stronger than what we see in Jupiter, even though this object is only about 8 times warmer than Jupiter. This makes me wonder whether there’s some kind of big dropoff in magnetic field strengths somewhere between the T dwarfs and planets, or whether maybe some planets can have magnetic fields much stronger than we expect.

Since the discovery of radio emission from 2M 1047+21, Melodie Kao and Gregg Hallinan at Caltech have announced that they’ve detected radio emission from similarly cold objects, although the work isn’t published yet. Their target-selection strategy also yielded a much higher success rate than we typically have in this field. This is pretty encouraging and means that the race is on to see if we can get radio emission from even colder objects!

The submitted version of the paper is available not only on, but also on my personal website as an HTML document using the “Webtex” system I’ve been developing. My goal here is to make it so that scientific articles are better looking and more valuable on the computer screen than on paper; all of our writing tools are designed to make printed documents, leading to a bunch of practices that don’t make sense in the electronic context (footnotes!). I’ve found it hard to articulate why I think this is so important, and I don’t think the current prototype makes it self-evident, but it’s what I’ve got right now. (That being said, there’s a lot going on under the hood!)

There’s also a video of me giving a short talk about this work at the Harvard ITC lunch.


A little while ago, Harvard grad student Phil Cowperthwaite and I put together a little experiment in hooking up astronomical data to cloud-based Web services. It’s called YOITSAGRB, and we described the experience on AstroBetter. It helps to have some hand-holding, but it can be amazingly straightforward to set up a web service these days! Check out the AstroBetter post for more information. YOITSAGRB is open source, of course, with the code on GitHub under the MIT License.

“Extragalactic Transients in the Era of Wide-Field Radio Surveys”

Over the past few months I’ve been working on a paper with my adviser Edo Berger and Brian Metzger of Columbia, and as of today it’s submitted to ApJ and out on Arxiv! It’s entitled Extragalactic Transients in the Era of Wide-Field Radio Surveys. I. Detection Rates and Light Curve Characteristics, and as the title might suggest it connects more to my doctoral research on radio transients than my more recent work on magnetism in cool stars and brown dwarfs.

The term “radio transient” generally refers to any astronomical source of radio emission that appears unexpectedly. Since the definition involves something new appearing, radio transients are generally associated with events rather than objects. Lots of astrophysical events are known — or predicted — to result in radio emission, so the set of radio transients includes many different kinds of phenomena: it is an astronomical definition rather than an astrophysical one.

But there’s a reason we lump many kind of events into this overarching category. Up until the past decade or so, it’s been difficult to discover radio transients of any kind. There are several reasons for this, but one of the key ones is that fairly basic physical and technical tradeoffs have historically driven the best-performing radio telescopes to have very small “fields of view,” meaning that they can only observe small patches of the sky at once. And if you’re interested in unexpected events that could occur anywhere in the sky, you have to get pretty lucky to find one when you’re only ever looking at a tiny little piece of it.

You can’t change the laws of physics, but some of the same technologies that have driven the digital revolution have also significantly changed the tradeoffs involved in building radio telescopes. (For more on this, see Chapter 1 of my dissertation.) It’s now possible to build sensitive telescopes that can see much more of the sky at once than before, making searches for radio transients much more feasible. In astronomy, new ways of looking at the sky have almost always been associated with exciting new discoveries, so this new capability of studying the “time domain” radio sky has brought a lot of excitement to see what’s out there waiting to be found. That’s why there are a variety of ongoing and upcoming searches for radio transients such as VAST, VLASS (partially), the LOFAR RSM, and whatever acronym will eventually be given to the SKA transient surveys; and hence the “Era of Wide-Field Radio Surveys” in the title of our paper.

That’s the background. What our paper does is predict what these surveys might actually find. Our work is the first comprehensive end-to-end simulation of the search process, modeling the rates and distributions of various events, their radio emission, and the detection methods that surveys will likely use.

Radio transient light curves at 1.3 GHz
Science! Radio light curves of the various events we consider at 1.3 GHz. A big part of the work in the paper was to find and implement the best-available models for radio emission from a wide variety of astrophysical events.

To keep things tractable, we focused on a particular kind of potential radio transients — extragalactic synchrotron events. The “extragalactic” means that the events come from outside the Milky Way, which is usually the genre that people are most interested in. The “synchrotron” refers to the radio emission mechanism. For the general group of surveys we consider, all known radio transients are synchrotron emitters, and I’d argue that it’s hard to concoct plausible events in which synchrotron will not be the primary emission mechanism. This is important, because one of the things we show in the paper is that the synchrotron process brings certain basic restrictions to the kind of emission you can get. In particular, brighter events generally evolve on slower timescales, so that something bright enough to be seen from another galaxy cannot get significantly brighter or dimmer in less than about 100 days. That means that if you’re looking for radio transients, it’s not helpful to check the same part of the sky more frequently than this pace.

Various other papers have predicted detection rates for these sorts of events, but many of them have done so in a back-of-the-envelope kind of way. But we tried to do things right: take into account cosmological effects like non-Euclidean luminosity distances and K-corrections, use best-available radio emission models, and model the actual execution of a given survey. Doing this brought home a point that I’d say has been insufficiently appreciated: if you observe more frequently than the 100-day timescale mentioned above, typical back-of-the-envelope calculations will overestimate your survey’s efficacy. (If you do this you can recover some power by averaging together multiple visits, but in most cases it’s better to cover more sky rather than to revisit the same spot.)

Overall, we predict that few (dozens) radio transients will be found until the advent of the Square Kilometer Array. Several of the choices that lead to this result are on the conservative side, but we feel that that’s justified — historically, radio transient surveys have turned up more false positives than real events, and you’re going to need a robust detection to actually extract useful information from an event. This is particularly true because radio emission generally lags that in other bands, so if you discover something in the radio, you have poor chances of being able to learn much about it from, say, optical or X-ray emission. This is unfortunate because it can be quite hard to learn much from radio observations without the context established from data at shorter wavelengths. We’ll pursue this idea in a follow-up paper.

There’s quite a lot more to the paper (… as you’d hope …) but this post is awfully long as it is. Overall, I’m very happy with our work — people have treated this topic handwavily for a while, and it’s finally getting the detailed treatment it deserves. The process of writing the paper was also rewarding: this happens to be the first one I’ve been on in which more than one person has had a heavy hand in the manuscript. (I’d call it a coincidence, but in my prior experiences the writing has been more-or-less entirely one person’s responsibility.) The mechanics of collaborative writing are still awkward — hence upstart websites like Overleaf or ShareLatex — but that aspect made it feel like more of a team project than other work I’ve done so far. I’ll be spearheading the follow-up paper, so there should be more of this feeling in my future!

Tech note: shell loops that won’t quit

For a long time, I’ve noticed that sometimes my shell scripts that use loops behave funny. Normally, when you hit control-C to cancel a script in the middle of a loop, the whole thing just exits. But in some cases I found that the control-C would cancel the current program, but not the whole script. This gets annoying if your loop is running time-consuming processing on hundreds of data sets — you have to sit there hitting control-C over and over again.

After a lot of detective work, I finally figured out what was going on. Here’s a clue. You can control-C this shell script and it will exit as expected:

for i in 1 2 3 4 5 ; do
  echo $i
  /usr/bin/sleep 10

But this one won’t:

for i in 1 2 3 4 5 ; do
  echo $i
  python -c "import time; time.sleep (10)"

What the heck is Python up to? As you’d expect, I found a lot of misinformation online, but unexpectedly, I’ve barely been able to dig up any relevant and correct information. Fortunately, I finally found this article by Martin Cracauer, which set me straight.

Imagine that a shell script is running a long-running subprogram. When you hit control-C, both of the programs receive a SIGINT signal. In most cases the subprogram dies immediately. However, the shell’s behavior needs to be more complicated, because some subprograms handle the SIGINT and don’t die. The shell needs to wait and see what happens to the subprogram: it shouldn’t kill itself if the subprogram didn’t. The shell implements this logic by waiting for the subprogram to exit and using POSIX-defined macros like WIFSIGNALED to test how it died; specifically, if it got killed by a SIGINT or exited for some other reason.

If you’re familiar with Python, you might see the contours of the problem. Python catches SIGINT and turns it into a KeyboardInterrupt exception, which your code can then handle. However, it turns out that if you don’t handle it, Python exits through its normal means, effectively using sys.exit with an error code. In other words, from the shell’s perspective the subprogram doesn’t get killed by the SIGINT, and so then the shell decides that it shouldn’t give up either.

If you want to convince the shell that you did die from SIGINT, you can’t fake it with a special exit code or anything. You have to kill yourself with an honest-to-goodness SIGINT signal. Fortunately, it’s not hard to do that. I’d say this is a bug in Python: uncaught KeyboadInterrupts should lead to the process killing itself this way.

Once I figured out what was going on, it was easy to code up a fix that works by intercepting sys.excepthook. I’ve added it to my pwkit package, which includes several utilities useful for writing Python programs that operate on the command line, including a progam called wrapout that I’ve found to be very useful.

And yes, the fix totally works. I’m sure that other sets of software suffer from the same issue, and it’s unfortunate that you have to explicitly enable the fix in Python. (I checked and this is true for Python 3 as well as Python 2.) But if you were ever befuddled about what was going on, now you know!

(Oh, by the way: nothing about this is specific to loops at all. They just expose the problem in the most obvious way.)

Extreme magnetic activity in NLTT 33370 AB

The paper that I’ve been working on for much of this year is finally submitted to The Astrophysical Journal and out on Arxiv! It’s the fourth in a series using simultaneous observations in multiple bands to better understand magnetic activity in the coolest stars and brown dwarfs. This particular paper targets a fascinating system called NLTT 33370 AB (aka 2MASS J13142039+1320011; it’s a little unfortunate that both names are nigh-unpronounceable).

NLTT 33370 came to our attention in 2011 when it was detected as a very bright radio emitter as part of a larger radio survey of the so-called “ultracool dwarfs”. In fact, it’s the brightest known radio emitter in this class. But it turns out that NLTT 33370 is doubly special because it is also a (relatively) short-period visual binary that will yield precise dynamical mass measurements of its components, which turn out to be nearly identical. It’s generally very difficult to measure masses directly in astronomy and so systems like NLTT 33370 provide important benchmarks for checking theories of stellar formation and evolution.

Even better, Josh Schlieder has shown that this system is fairly young, with an age of about 100 million years, which is extra-unusual for a mass benchmark. This is important since you want to check that your stellar evolution theories work throughout a star’s lifetime, not just when it’s old and settled.

There’s an interplay between these two unusual aspects of the system, however. The bright radio emission of NLTT 33370 indicates that it’s an extremely magnetically active system, and high levels of magnetic activity are in turn expected to have a strong impact on the structure and evolution of low-mass stars (see, e.g., MacDonald & Mullan 2009, Stassun et al 2012). If you want to test those evolution theories correctly, then, you have to understand how the magnetic phenomena might be affecting things.

On the flip side, the sheer brightness of this system makes it a great laboratory for studying how magnetic fields are generated and dissipated in the ultracool regime. This is something that we just don’t understand very well at the moment. However, any studies on this topic need to keep in mind this system’s unusual youth — as well as the simple fact that one is looking at a somewhat tight binary system, and not a single star.

Our NLTT 33370 project was designed to tackle the latter question of how the magnetic fields are generated and dissipated. We did this by observing the system in bands across the EM spectrum: radio, optical, the line, UV, and X-ray. These all give complementary handles on the different ways that different parts of the star are affected by its magnetism. While studies of this kind (including a recent one of ours) often rely on single brightness measurements taken at quasi-random times, one of the hallmarks of magnetic activity is erratic variability on many time scales, so it can be hard to interpret such measurements. It’s far better to monitor each of these bands for long times, so you understand how they vary, and to do this at the same time in every band, so you can have as complete a view as possible of what’s going on at any given time. Of course, it’s far more difficult to observe an object in this way, because you need a lot of telescope time and the logistics are hard to work out. But the whole point of research is that you’d better be pushing the boundaries of what’s possible.

The centerpiece observations in our study were long stares with the upgraded Very Large Array (radio) and the MEarth observatory (optical) in 2012 and 2013. In 2013 we added simultaneous X-ray monitoring with the Chandra space telescope. I won’t show the complete data sets here since giving them a proper explanation would take up more space than I’d like, but rest assured that we saw a whole smorgasbord of magnetic phenomena. For a small sample, here are the radio-band light curves, showing both rapid flares and steadier, periodic variability at two frequencies. Each panel shows about ten hours of data:

The radio light curve of NLTT 33370 AB as obtained by the upgraded Very Large Array. From arxiv:1409.4411
The radio light curve of NLTT 33370 AB as obtained by the upgraded Very Large Array. From Williams et al. (1409.4411).

Strikingly, there’s a dramatic change in the radio emission between 2012 and 2013. Besides the changes that you can see above, the timing of the radio modulations relative to the optical ones shifts by about 180° — a clue that we only obtained thanks to the simultaneous observations made with MEarth. This kind of evolution was totally unexpected. Even more strangely, while the total radio emission undergoes this striking change, the “circularly polarized” sub-component (not shown above) stays more or less the same. These facts led us to speculate that the 2013 (“Campaign 2”) data probably combine a steadier radio-emitting structure with emission from a single large release of magnetic energy some time before our observation. But with only a night’s worth of observations, we don’t know how frequently such flares might occur, or how much energy they might contain, or how long their radio emission might last.

This is just one example of why it’s so important to monitor these objects in multiple bands simultaneously, for multiple consecutive rotations. But longer-term monitoring is also incredibly valuable — as the data above show, several different kinds of magnetic flares happen in this system, and you need to build up a good-sized sample of them to truly understand their rates and sizes. Thanks to the goals of the MEarth observatory and the support of its team (most notably co-author Jonathan Irwin), we got this kind of longer-term monitoring in the optical band. Those data showed yet another surprise: the optical variations are composed of two modulations at periods that are slightly (2%) different. This figure shows all of the optical data spanning 3.2 years crammed together; the parts with smooth curves are nights of intensive observations, while the noisy-looking parts are months where we get just a few measurements every night.

Optical light curve of NLTT 33370 AB from MEarth. The gaps between observations are squashed together so that all of the data points can be seen.
Optical light curve of NLTT 33370 AB from MEarth. The gaps between observations are squashed together so that all of the data points can be seen. Different colors represent data from different years. From Williams et al. (1409.4411).

The figure doesn’t make this apparent, but you can’t explain the data with a single periodicity, even if you let the modulation shape wiggle around. If you model the variation with multiple periodicities, however, you can do a very good job, even when you use a very simple sine curve to model the variations.

If these observations were of the Sun, you might suggest that the two periodic features correspond to two different sunspots. Like many astrophysical objects, the Sun has “differential rotation” where its atmosphere rotates at different rates depending on the latitude. Two sunspots at different latitudes would then imprint different rotation periods on the optical light curve. However, detailed studies of very cool stars such as the components of NLTT 33370 AB seem to show that they do not have differential rotation, even at levels much smaller than the 2% effect observed here. It’s of course possible that the two rotation rates correspond to each of the two stars in the binary, but their very similar values are, well, awfully suspicious. We’re still not sure what to make of this result.

There are a ton of other intriguing things about this system, including the fact that only one of the objects seems to emit radio waves, different kinds of X-ray flares that we observe, the possible phasing of periodic modulations in other bands such as the UV and Hα, and the frequency structure of the rapid radio flares. But this post is already quite lengthy! If you’d like to know more, you should dive into the full research paper.

We certainly hope to keep on studying NLTT 33370 AB in the future. We’d like to obtain a few key, specialized measurements that will give us some vital context for better understanding the data we have here; but also, continued observations of the kinds shown above will help us truly understand the different flaring events that happen in this system. Because NLTT 33370 AB is so unusually active and it can be a reference point for such precise physical measurements, I’m sure it will quite extensively studied going forward.

“A Laboratory Introduction to git”

Earlier this summer I ran a tutorial for our new summer students on the git version control system.

Last year, I also ran a git tutorial, but was only partially happy with the results. I knew that I didn’t want to stand and lecture the students as a group, since git is fundamentally a tool: there’s no better way to learn about it than just to use it. But there are some important and subtle concepts underlying how git works and I wanted to explain them. I found myself lecturing more than I wanted, and I could tell that the Fundamental Problem of Lecturing was rearing its head: my one-size-fits-all approach was not working for everyone. Some students were interested, others not; some were following the concepts, and others were confused. I felt like the tutorial worked very well for a few students, but must have been boring, confusing, or frustrating for others.

This year I spent some time thinking about how to do better. The idea that I kept coming back to was that, in my experience, when you’re presenting technical material, different people can absorb it at very different rates — to the extent that this should be the dominant factor in considering how to prepare a group learning experience. I decided that my key goal was to find a way to let students learn the material at their own pace.

Almost immediately I realized that what I wanted to do was run my tutorial like the lab section of a college science class. I’d write up the material in handout that would (if I did a good job) demonstrate the key concepts with hands-on activities. I’d pair up students so that they could help each other out if they got stuck: basic peer learning in action, with a whiff of pair programming too. Then I’d just tell the students to start reading and doing. Rather than “leading” the tutorial, I and my co-teachers would be lab assistants, there to step in when groups got really stuck.

The one downside of this approach that I could think of is that you can’t extemporize a half-assed, poorly-structured handout the same way you can a half-assed, poorly-structured lecture. That doesn’t seem like an entirely bad thing, but  I did need to spend some solid time planning and writing the “lab manual”.

The manual in question is here, with its full source code on GitHub. I was very happy with what I put together. I’d like to think I explained the concepts well, and I think the “lab manual” style ended up working out as well as I’d hoped. Furthermore, I stole some font ideas from Michelle Borkin’s dissertation (in turn borrowing from here and here) and I think the resulting document looks pretty snazzy too. Check it out!

And I was extremely happy with how the tutorial went, too. As you’d expect, some students got farther than others, but I don’t think anyone got frustrated; the uninterested students can let themselves get distracted, and the students that need some more time can take it. Another nice bonus of the lab approach is that the students can hang on to the manual and use it as a reference later. I highly recommend running technical tutorials in a  “lab” style! You do need to plan ahead to make a manual, but, well, sorry, sometimes it takes some work to make something you’re proud of.

I also highly encourage anyone to use or modify my git lab manual if they’re planning any analogous activities. Of course, I’d appreciate hearing about way to improve what I’ve got, say through a GitHub pull request.

I did come away with a few lessons learned from this first iteration, though:

  • Many, if not most, students will hit an immediate stumbling block with just trying to use and understand a Unix text editor. This pops up particularly early in my git tutorial but of course will come up rapidly for any kind of Unixy technical computing. (The Software Carpentry folks encounter the same problem.) As far as I can see it, right now there’s just no way to avoid this pain. Which is lame.
  • I also tried early in the manual to establish simple conventions for “type this text exactly … except for this one piece that I want you to replace,” but they were apparently not simple enough. I think that just a few more explicitly and very gently introduced examples would help.
  • Students ended up mostly working solo, rather than pairing, though they help each other out in sticky spots. I half-expected that this might happen; in general, you seem to need to exceed an enormous psychological activation energy to actually get students to work together in a small group. I think doing a better job on this front is more about my teaching technique and presence rather than any concrete instruction I could give. It’s not too bad if the students at least help each other, but I still believe it’d be even better for them to work in pairs if I could convince them to.
  • After giving the tutorial, someone pointed out that I didn’t have anything in place to evaluate the students’ learning. There are questions to answer in the lab manual, but it was clear that I wasn’t actually going to be reviewing their answers or anything. Obviously no one’s going to be grading them, but evaluation is important for understanding what’s working and what isn’t … and I do tend to think that that small  bit of pressure on the students from knowing that I’d be looking at their work would be a positive thing. Next time I might have them write the answers to my questions in a separate packet that I actually collect (while emphasizing that it’s about judging my teaching, not them).
  • There are also a bunch of smaller things. I ask the students to run man find, which creates a pager, before telling them how to exit a pager. I ask them to type rm -rf * at one point which is probably just too dangerous. Not-quite-substitutions like {edit the file foo.txt} were confusing. Despite my best efforts to make the bug in bloomdemo blazingly obvious, it was not for some people.

I’m looking forward to revising the manual next year and trying to do an even better job!

The Bandpass of the MEarth Observatory

I recently found myself wanting to know the (approximate) bandpass of the MEarth observatory. A plot of the bandpass is published in Nutzman & Charbonneau (2008), but I wasn’t able to find any tabulated data. Fortunately, the MEarth team works down the hallway from me, so I could find out if there were any better options. The short story is, not really. Jonathan Irwin kindly sent me some of the underlying data, which I used to compute a bandpass. I wrote up my method and results in an IPython notebook, which is shown statically below.

scipy.stats Cheat Sheet

Key methods of the distribution classes in scipy.stats.

  • Probability density function
  • Probability of obtaining x < q < x+dx is pdf(x)dx
  • Derivative of CDF
  • Goes to 0 at ±∞ for anything not insane
  • Not invertible because it’s hump-shaped!
  • Cumulative distribution function
  • Probability of obtaining q < x is cdf(x)
  • Integral of CDF
  • CDF = 1 – SF
  • cdf(-∞) = 0 ; cdf(+∞) = 1
  • Percent-point function (inverse CDF)
  • If many samples are drawn, a fraction z will have values q < ppf(z).
  • PPF = inverse of CDF
  • Domain is zero to unity, inclusive; range indeterminate, possibly infinite.
  • Survival function
  • Probability of obtaining q > x is sf(x)
  • SF = 1 – CDF
  • sf(-∞) = 1 ; sf(+∞) = 0
  • Inverse survival function
  • If many samples are drawn, a fraction z will have values q > ppf(z).
  • ISF = inverse of SF (duh)
  • Domain is zero to unity, inclusive; range indeterminate, possibly infinite.
  • Log of PDF
  • Log of CDF
  • Log of SF

Elementary Gaussian Processes in Python

Gaussian processes are so hot right now, but I haven’t seen examples of the very basic computations you do when you’re “using Gaussian processes”. There are tons of packages that do these computations for you — scikit-learn, GPy, pygp —but I wanted to work through some examples using, and showing, the basic linear algebra involved. Below is what I came up with, as incarnated in an IPython notebook showing a few simple analyses.

This post is also a pilot for embedding IPython notebooks on this blog. Overall it was pretty straightforward, though I had to insert a few small tweaks to get the layout to work right — definitely worth the effort, though! I haven’t really used an IPython notebook before but I gotta say it worked really well here. I generally prefer the console for getting work done, but it’s a really nice format for pedagogy.

Confidence intervals for Poisson processes with backgrounds

For some recent X-ray work, I’ve wanted to compute confidence intervals on the brightness of a source given a known background brightness. This is straightforward when the quantities in question are measured continuously, but for faint X-ray sources you’re in the Poisson regime, and things get a little trickier. If you’ve detected 3 counts in timespan τ, and you expect that 1.2 of them come from the background, what’s the 95% confidence interval on the number of source counts?

Of course, the formalism for this has been worked out for a while. Kraft, Burrows, and Nousek (1991) describe the fairly canonical (222 citations) approach. Their paper gives a lot of tables for representative values, but the formalism isn’t that complicated, so I thought I’d go ahead and implement it so that I can get values for arbitrary inputs.

Well, I wrote it, and I thought I’d share it in case anyone wants to do the same calculation. Here it is — in Python of course. There are a few subtleties but overall the calculation is indeed pretty straightforward. I’ve checked against the tables in KBN91 and everything seems hunky-dory. Usage is simple:

from kbn_conf import kbn_conf

n = 3 # number of observed counts
b = 1.2 # expected number of background counts
cl = 0.95 # confidence limit
source_rate_lo, source_rate_hi = kbn_conf (n, b, cl)

# here, source_rate_lo = 0, source_rate_hi = 6.61 --
# we have an upper limit on the source count rate.

Get in touch if you have any questions or suggestions!

CASA in Python without casapy

Like several large applications, CASA bundles its own Python interpreter. I can totally understand the decision, but sometimes it’s really annoying when you want to combine CASA’s Python modules with personal ones or those from another large package.

Fortunately, it’s not actually that hard to clone the CASA modules so that they can be accessed by your system’s Python interpreter — with the major caveat that the procedure might totally fail if the two different interpreters aren’t binary-compatible. I’ve had success in the two attempts I’ve made so far, though.

Really all you do is copy the key files. There’s a wrinkle, however, in that you need to set up the dynamically-linked libraries so that they can all find each other. This can all work automatically with the right RPATH/RUNPATH settings in the binary files, but empirically 99% of programmers are too clueless to use them correctly. Grrr. Fortunately, a tool called patchelf helps us fix things up.

Anyway, here’s how to equip an arbitrary Python interpreter with the key casac module — subject to binary compatibility of the Python module systems. I’m assuming Linux and 64-bit architecture; changes will be needed for other kinds of systems.

  1. Download and install patchelf. It’s painless.
  2. Download and unpack a CASA binary package. We’ll call the CASA directory {CASA}.
  3. Identify a directory that your Python interpreter will search for modules, that you can write to. The global directory is something like /usr/lib64/python2.7/site-packages/, but if you have a directory for personal python modules listed in your $PYTHONPATH environment variable, that’s better. We’ll call this directory {python}.
  4. Customize the following short script to your settings, and run it:
    #! /bin/sh
    casa={CASA} # customize this!
    python={python} # customize this!
    cd $casa/lib64
    # copy basic Python files
    cp -a python2.7/ python2.7/__casac__ $python
    # copy dependent libraries, with moderate sophistication
    for f in lib*.so* ; do
      if [ -h $f ] ; then
        cp -a $f $python/__casac__ # copy symlinks as links
        case $f in
          * ;; # skip -- actually text files
            # somehow patchelf fries this particular file
            cp -a $f $python/__casac__ ;;
            cp -a $f $python/__casac__
            patchelf --set-rpath '$ORIGIN' $python/__casac__/$f ;;
    # patch rpaths of Python module binary files
    cd $python/__casac__
    for f in _*.so ; do
      patchelf --set-rpath '$ORIGIN' $f
  5. At this point you can blow away your unpacked CASA tree, though certain
    functionality will require files in its data/ directory.

All this does is copy the files (, __casac__/, and dependent shared libraries) and then run patchelf on the shared libraries as appropriate. For some reason patchelf fries the libgomp library, but that one doesn’t actually need patching anyway.

After doing this, you should be able to fire up your Python interpreter and execute

import casac

successfully, showing that you’ve got access to the CASA Python infrastructure. You can then use the standard CASA “tools” like this (assuming you’re using CASA version > 4.0; things were different before):

import casac
tb = casac.casac.table ()
ms = () ('')
print ms.nrow ()
ms.close ()

I’ve written some modules that provide higher-level access to functionality relying only on the casac module: for low-level setup (in particular, controlling logging without leaving turds all over your filesystem), and for a scripting-friendly library of basic CASA tasks, with a small shim called casatask to provide quick command-line access to them. With these, you can start processing data using CASA without suffering the huge, irritating overhead of the casapy environment.

Note: for Macs, I believe that instead of patchelf, the command to run is something like install_name_tool -add_rpath @loader_path libfoo.dylib — but I haven’t tested this.

Announcing: worklog-tools, for automating tedious CV activities

There are a lot of annoyances surrounding academic CV’s. Making a document that looks nice, for one. Maintaining different versions of the same information — short and full CV’s, PDF and HTML formats. Remembering how you’ve categorized your talks and publications so that you know where to file the latest one.

For me, one of the great frustrations has been that a CV is full of useful information, but that information is locked up in a format that’s impossible to do anything useful with — besides generate a CV. I’d like to collect citation statistics for my publications, and my CV contains the needed list of references, but I can’t pull them out for automatic processing. Likewise for things like previously-awarded NSF grants (hypothetically …) and lists of collaborators in the past N years. Some of these things are just interesting to know, and others are needed by agencies and employers.

Well, problem solved. Enter worklog-tools.

Based on the issues I’ve mentioned above, I feel like it’s pretty clear what you want to do: log CV-type activities — your academic output — in some kind of simple data format, and populate some kind of LaTeX template with information from the log. While we’re at it, there’s no need to restrict ourselves to LaTeX — we can also fill in an HTML template for slick, web-native versions of the same information.

I’ve actually gone and done this. There are a lot of ways you could implement things, but here’s what I do:

  • I log activities in a series of records in simple “INI format” files named 2011.txt, 2012.txt, etc.
  • Special hooks for publication records tie in to ADS to fetch citation information and compute things like h-indices.
  • A simple command-line tool fills in templates using information from these files, in the form of either arbitrary data from the raw records, or more specialized derived data like citation statistics.

Two components of this system are data — the log files and the templates. One component is software — the glue that derives things and fills in the templates. The worklog-tools are that software. They come with example data so you can see how they work in practice.

As is often the case, most of the work in this project involved making the system less complicated. I also spent a lot of time documenting the final design. Finally, I also worked to put together some LaTeX templates that I think are quite nice — you can judge the results for yourself.

Is any of this relevant to you? Yes! I sincerely think this system is straightforward enough that normal people would want to use it. A tiny bit of Python hacking is needed for certain kinds of changes, but the code is very simple. I think my templates are pretty nice — and I’m happy for people to use them. (If nothing else, you might learn some nice LaTeX tricks.) Finally, I think the value-add of being able to do things like collect citation statistics is pretty nice — and of course, you can build on this system to do whatever “career analytics” you want. For instance, I log all of my submitted proposals, so I can compute success rates, total time allocated, and so on.

The README on GitHub has many more details, including instructions about how to get started if you want to give it a try. I hope you enjoy!

By the way: INI format is highly underrated as a simple data format. It’s almost as underrated as XML is overrated.

By the way #2: Of course, nothing in this system makes it specific to CV’s — with different templates and log files, you can insert structured data into any kind of document.

By the way #3: Patches and pull requests are welcome! There are a zillion features that could be added.

By the way #4: A lot of this work was probably motivated by the fact that my name isn’t very ADS-able — a search for P Williams pulls in way too many hits, and though I can’t get a search for exactly PKG Williams to work, I have a fair number of publications without the middle initials.

Trends in ultracool dwarf magnetism: Papers I and II

Well, the pair of papers that I’ve been working on for much of this year have finally hit, showing up as 1310.6757 and 1310.6758. I’m very happy with how they turned out, and it’s great to finally get them out the door!

These papers are about magnetism in very small stars and brown dwarfs, which we refer to as “ultracool dwarfs” or UCDs. Observations show that UCDs produce strong magnetic fields that can lead to large flares. However, the internal structure of these objects is very different than that of the Sun (no radiative core), in a way that makes it challenging to develop a theory of how UCDs produce their magnetic fields, and of what configuration those fields assume.

So we turn to observations for guidance. Our papers present new observations of seven UCDs made with the Chandra space telescope, detecting X-rays, and the recently-upgraded Very Large Array, detecting radio waves. Magnetic short circuits (“reconnection events”) are understood to lead to both X-ray and radio emission, and observations in these bands have turned out to provide very useful diagnostics of magnetism in both distant stars and the Sun.

When people such as my boss started studying UCD magnetism, they soon discovered that that the radio and X-ray emission of these small, cool objects has several surprising features when compared to Sun-like stars. We hope that by understanding these surprising observational features, we can develop a better theoretical understanding of what’s going on “under the hood.” This in turn will help us grapple with some challenging basic physics and also inform our understanding of what the magnetic fields of extrasolar planets might be like, which has large implications for their habitability (e.g.).

The first paper considers the ratio of radio to X-ray brightness. While this ratio is fairly steady across many stars, in some UCDs the radio emission is much too bright. The second paper considers X-ray brightness as a function of rotation rate. UCDs tend to rotate rapidly, and if they were Sun-like stars this would lead to them having fairly bright X-ray emission regardless of their precise rotation rate. But instead, they have depressed levels of X-ray emission, and the faster they rotate the fainter they seem to get.

Our papers make these effects clearer than ever, thanks to both the new data and to work we did to build up a database of relevant measurements from the literature. I’m really excited about the database since it’s not a one-off effort; it’s an evolving, flexible system inspired by the architecture of the Open Exoplanet Catalogue (technical paper here). It isn’t quite ready for prime time, but I believe the system to be quite powerful and I hope it can become a valuable, living resource for the community. More on it anon.

One of the things that the database helps us to see is that even if you look at two UCDs that are superficially similar, their properties that are influenced by magnetism (e.g., radio emission) may vary widely. This finding matches well with results from studies using an entirely unrelated technique called Zeeman-Doppler imaging (ZDI). The researchers using ZDI can measure certain aspects of the UCD magnetic fields directly, and they have concluded that these objects can generate magnetic fields in two modes that lead to very different field structures. These ideas are far from settled — ZDI is a complex, subtle technique — but we’ve found them intriguing and believe that the current observations match the paradigm well.

One of my favorite plots from the two papers is below. The two panels show measurements of two UCD properties: X-ray emission and magnetic field strength, with the latter being a representative value derived from ZDI. Each panel plots these numbers as a function of rotation (using a quantity called the Rossby number, abbreviated “Ro”). The shapes and colors further group the objects by mass (approximately; it’s hard to measure masses directly).

X-rays and magnetic field versus rotation.
X-rays and magnetic field versus rotation. There’s scatter, but the general trends in the two parameters (derived from very different means) are surprisingly similar. From 1310.6758.

What we find striking is that even though the two panels show very different kinds of measurements, made with different techniques and looking at different sets of objects, they show similar trends: wide vertical scatter in the green (lowest-mass) objects; low scatter and high values in the purple (medium-mass) objects; and low scatter with a downward slope in the red (relatively high-mass) objects. This suggests to us that the different field structures hypothesized by the ZDI people result in tangible changes in standard observational quantities like X-ray emission.

In our papers we go further and sketch out a physical scenario that tries to explain the data holistically. The ZDI papers have argued that fast rotation is correlated with field structure; we argue that this can explain the decrease of X-rays with rotation, if the objects with low levels of X-rays have a field structure that produces only small “short circuits” that can’t heat gas to X-ray emitting temperatures. But if these short circuits manage to provide a constant supply of energized electrons, that could explain the overly bright radio emission. The other objects may produce fewer, larger flares that can cause X-ray heating but are too infrequent to sustain the radio-emitting electrons. (There are parallels of this idea in studies of the X-ray flaring properties of certain UCDs.)

Our papers only sketch out this model, but I think we provide a great jumping-off point for more detailed investigation. What I’d like to do for Paper III is do a better job of measuring rotation; right now, we use a method that has some degeneracies between actual rotational rate and the orientation of the object with regards to Earth. Some people have argued that orientation is in fact important, so using different rotation metrics could help test our model and the orientation ideas. And of course, it’s important to get more data; our “big” sample has only about 40 objects, and we need more to really solidly investigate the trends that we think we see.

One great part of this project is that I worked on it not only with my boss Edo Berger, but also with a fantastic summer REU student from Princeton, Ben Cook. Ben’s the lead author of Paper II and he did fantastic work on many aspects of the overall effort. It was a pleasure working with him and I suspect he’ll do quite well in the years to come.

Landscape pages, emulateapj, and arxiv

The past couple of times that I’ve put up a paper on, I’ve had trouble with pages rotated to landscape mode. Here’s a quick note on what the issue is.

I’ve been using the pdflscape package to get landscape-oriented pages. It seems to be the standard LaTeX solution for this kind of thing. There’s also an lscape package; they’re almost the same, but if you’re using pdflatex the former inserts some magic to get the relevant page of the output file automatically turned when viewed on your computer.

The problem is that lscape and pdflscape have some kind of problem with the revtex4-1 document class, and revtex4-1 is what drives emulateapj under the hood. When you use them, the content on every page in the document is rotated in a bad-news kind of way. It took me a while to figure this out because I had only revtex4 installed on my laptop; emulateapj can work with either, and the older version doesn’t have the issue. does have revtex4-1, so the problem would seem to only appear for Arxiv submission.

Recent versions of emulateapj have a [revtex4] option to force use of the older version of the package, which should make this go away. I don’t know if Arxiv’s version of emulateapj is new enough to support this.

Alternatively, revtex comes with its own environment for rotating pages, turnpage. You can use this instead. Here’s an example of rotating a wide single-page deluxetable at the end of a document:

\begin{deluxetable} ...
\global\pdfpageattr\expandafter{\the\pdfpageattr/Rotate 90}

The \pdfpageattr line gets the magic PDF page rotation to happen. Obviously, this assumes that pdflatex is being used.

I do not know if this works with multipage tables. More annoyingly, you still seem to be forced to place the table at the end of the document, which is lame. At least, I don’t know how to get the deluxetable onto its own page without seriously messing with the flow of the surrounding document. (From what I read, the \pdfpageattr command lasts until you change it, so if you somehow got a rotated table into the middle of the document, you’d also need to figure out how clear the PDF page rotation afterward.)

Flashy interactive graphics in a scientific talk, realized

Following up on my previous post, I spent an hour hacking together my slide template with a little project called d3po, a hack from the recent dotAstronomy 5 conference that’s put together some demos presenting astronomical data with the d3 toolkit.

Here’s the result. It’s just a quick simple hack showing that this is possible, which in a way isn’t too impressive since there’s absolutely no reason it shouldn’t have been. But it’s still fun to see everything work.

Source code is available in the documents themselves, since this is the web!

Slides for scientific talks in HTML

As a Linux user, I’ve long been making slides for my talks in LibreOffice Impress (formerly OpenOffice Impress), the Free Software clone of PowerPoint. I don’t envy anyone who’s trying to maintain compatibility with Microsoft Office products, but frankly Impress has been slow and frustrating and buggy, and my use of it has been grudging at best.

Recently, however, I encountered an unusually wonderful bug where graphics in EPS format showed up everywhere except on the screen for the actual presentation — I confidently started my talk and moved to my first science slide, only to find an empty black expanse, which most of my subsequent slides were as well. Everything was fine on my laptop. And I said to myself: effffff this.

Today I happen to be at Bucknell University, where I just gave a colloquium to an audience of mostly physics undergraduates. I hadn’t given a talk aimed at this kind of audience before, so I had to make a bunch of new slides anyway. So, early last week, I took the plunge and tried to see if I could prepare my slides using HTML and show them in Firefox.

Why HTML? Well, I really like the idea of having my slides being stored in text format, so I can version-control them, edit quickly, and so on. And crucially, web browsers are now sophisticated enough that you can make great-looking slides in them — I don’t think this was true five years ago. It also seems really cool to have the option of embedding YouTube videos, Flash animations, interactive JavaScript demos, and all that kind of stuff. I didn’t do much of that in this talk, but I did put in a bunch of movies, which I’d never dared to do with Impress.

The experiment was a resounding success. I found a framework, reveal.js, that made it easy to prepare the slides and supported a few features that are pretty vital for scientific talks:

  • Ability to make a PDF out of the talk, in case disaster strikes or you’re not allowed to use your own computer.
  • Ability to use PDF graphics in slides (straightforwardly done with Mozilla’s pdf.js).
  • A “presenter console” mode with offscreen speaker notes.
  • As a bonus, easy embedding of relatively nice-looking equations with MathJax.

It took some time to get the appearance to be how I wanted and to figure out how best to construct slide layouts, but all told it wasn’t too bad. I loved being able to edit and rearrange my slides efficiently, and I’m very happy with the aesthetic appearance of the final product.

I put the presentation source code on GitHub so you can see how I did it. I did devise a few new tricks (such as working out how to embed PDFs), but most of the work was just using the stock reveal.js toolkit and tweaking the styling. Here’s a live, online version of the talk so you can see how it looks.

I wrote up some more detailed notes on the repository README. To be honest, this approach probably isn’t right for most astronomers — most astronomers have Mac laptops with Keynote, which is a lot better than Impress. And I needed to draw on a lot of technical background in order for the slide construction to feel “easy”. But once I got the template set up, it was quick and fun to make slides, and now here’s a template that you can use too!

Big thanks to Hakim El Hattab for making and publishing reveal.js, as well as to the authors of the various CSS/JavaScript tools I used. It was kind of incredible how easy it was to achieve some fancy, beautiful effects.

Prosser & Grankin, CfA Preprint 4539: A Bibliographic Story

A little while ago I was collecting published data on the rotation of Sun-like stars. As one often finds, there are helpful papers out there that compile lots measurements into a few big tables. If you’re going to be thorough (and of course you are!), a standard part of the job is chasing down the references to make sure that you honestly say that you know where all of the data came from. Pizzolato et al. 2003 is one of these compilation papers, and a very nice one at that. It has very good referencing, which was why I was surprised to see that some of the measurements were attributed to

Prosser, C. F., & Grankin, K. N. 1997, CfA preprint, 4539

Not a respectable paper in a respectable refereed journal, but some random preprint. Now, cutting-edge work can genuinely rely on results so new that they haven’t yet appeared in print, but that’s not quite the case here: Pizzolato et al. is from 2003, while the preprint citation is to 1997. Presumably the work hadn’t been “in preparation” for six years.

This kind of thing generally happens when authors are reading a paper, find some results that they want to use, and just blindly copy the reference down for use in their own work. Citing an old preprint, like in this case, is a bit of a red flag: it’s an academic no-no to be citing things that you haven’t actually read, and if you’d actually read the paper, you’d have updated the citation to refer to its published form.

In the spectrum of academic sins, I consider this one to be relatively minor, and I suspect it’s committed pretty often. It’d be nice if Pizzolato et al. had chased down the refereed paper that the preprint became, but sometimes people get lazy. There’s still enough information to dig up the published article, so it’s an inconvenience but not much worse. If this kind of blindly-copied reference gets propagated through several generations of papers (which I have seen before), you run the risk of a game-of-telephone situation where the meaning of the original work is warped; but in this case the reference is just providing measurements, which ought to propagate from one generation to the next pretty reliable.

Since I expected to use the relevant data, and I did feel like I should double-check that, say, the named preprint even exists, I set out to hunt down the final published form. In this day and age this is usually easy: everything modern is on ArXiv, where preprints almost always get cross-linked to their corresponding published paper when the latter appears. (I figure this must happen more-or-less automatically, since most authors don’t bother to fill in that information after they’ve made their posting.) Older preprints are more work, but generally not hard to track down thanks to ADS. If the citation has a title, you’re pretty much set; if not, it’s easy to search for something published by the lead author around the time of the preprint. No big deal.

I started doing my usual ADS searches: no “Prosser & Grankin, 1997”. No “Prosser & Grankin, 1998”. The “Prosser et al., 1998” papers clearly weren’t relevant. Hmm. A little bit more searching, and in fact there are no papers in the ADS database with both Prosser and Grankin as coauthors. Authorship lists can change between preprint-dom and publication, sure, but it’d be pretty surprising if “Prosser & Grankin” somehow became “Prosser & not-Grankin”. Some kind of dramatic falling-out?

I decided to take another tack. Googling had revealed that there wasn’t anything like an online database of the CfA preprint series, but the CfA library sits two floors under my office. If anyone knows how to look up items in the CfA preprint series, it’d better be them. And compared to some academic services (cough IT), I’ve virtually always had great “customer service” experiences with libraries — I imagine librarians are awfully strongly motivated (for a variety of reasons) to show how much more effective they can be than the Google search box.

Sure enough, just a day after I submitted a question in the CfA “Ask a Librarian” online form, I got a phenomenally helpful response from Maria McEachern. She had fetched the hardcopy preprint from the Harvard Depository, sent me a nice OCR’d scan of it, attempted to chase down the published form (also coming up empty), and even emailed the coauthor, Konstantin Grankin, to see if he could he could shed some light on the situation.

With a digital copy of the paper in hand, I felt confident. If the paper had been published somewhere, we’d definitely be able to track it down. If not, we had enough information to create a record in the ADS and upload the paper text, so that the text of the preprint would be easily accessible in the future — which, after all, is the whole point of providing precise references.

I had doubted that we’d ever hear anything from Grankin, but he actually replied in just a few days, filling in the last missing piece of the puzzle:

Dear Maria,

Unfortunately, Charles Franklin Prosser, the talented scientist, was lost in automobile accident in August 1998. […] Some papers have not been published because of this tragedy. That paper about which you ask, has not been published also. […]

Konstantin’s email included a link to Charles Prosser’s AAS obituary, written by his frequent collaborator John Stauffer. It makes for compelling reading in its way:

[…] Charles worked harder and put in longer hours than all but a few present-day professional astronomers. He could usually be found at the office seven days a week, among the first to arrive and the last to leave. Charles enjoyed harvesting astronomical data both from long observing runs on mountain tops and from rarely-read observatory publications. He was conservative in both his personal and his professional life; he very much preferred simply to state the results of his observations and to make as few extravagant interpretations of those observations as possible.

[…] Charles was survived by his parents, Charles Franklin Prosser, Sr. and Lucy Hogan Prosser, of Suwanee, Georgia, his sister Evelyn, and other relatives. His scientific papers will be offered by the family to NOAO. Charles was buried in Monterville, West Virginia on August 22, 1998. He will be remembered as a kind, dedicated, and highly moral colleague, whose greatest desire was to be allowed to work 12 hours a day in the field to which he devoted himself entirely.

Well, it’s hard to leave things half-finished after reading that. So I used the obscure ADS Abstract Submission form to submit a record and a copy of the paper text, resulting in the creation of ADS record 1997cfa..rept…..P. This achieves the most important thing to me: making it so that the preprint text is available online in a durable way. There’s also a subtle importance to the fact that the preprint is now integrated into the ADS citation system. Part of it is a certain imprimatur: this is a real publication, that you can legitimately cite. But there’s also something about how ADS bibcodes (the 1997cfa..rept.....P identifiers) are essentially our community’s vocabulary for talking about our literature. Without a bibcode and its backing record, it’s hard to use, share, discuss, or really do anything with a publication, even if you want to. (Side note: ADS is an incredibly important service in the field, and should get tons of money!)

I did some searching (mostly in Google Scholar, I must admit) and found 12 citing papers: pretty good for an unpublished preprint whose full text was virtually inaccessible until now. Those links have been added to ADS so that anyone reading one of the citing papers will easily be able to pull up the record and text of the preprint.

The ADS copy of the fulltext unfortunately loses the OCR in Maria’s PDF, making it so you can’t easily copy and paste the text and (more importantly) data tables. [Update: ADS has fixed this; I had assumed that if the OCR disappeared in the first place, their system didn’t allow for this.]  So I’ve submitted the tables to the CDS in the hopes that they’ll host a more easily-usable version of the data. CDS has a stated policy of only accepting data from papers published in refereed journals, but these circumstances are clearly exceptional; I’m waiting to hear a response about this case. If they accept the data, that should integrate the preprint into the Simbad system, further increasing its visibility.

I’m not sure if there’s a moral to this story, besides the fact that I have some OCD tendencies. Twelve citations is, frankly, not a huge number, and the very fact that there was a Mystery of Preprint 4539 signals its narrow reach: something more important would already have been integrated into ADS and the full-fledged literature, one way or another. But the research enterprise is built out of lots of tiny steps, not all coming in the form of new experimental results; I’d like to think that this is one of them.

Here’s the full text of CfA Preprint 4539, including the OCR information so that the text and tables can be easily copied. [Update: this is now fully redundant with the ADS version.] Here’s a preliminary version of the data tables that I submitted to the CDS — there are likely improvements to be made in the metadata to match the CDS formats, but the data should be fully usable and complete.