2025 July 2
At long last, the first data from the Vera C. Rubin Observatory are starting to become public! Here at the Minor Planet Center we were not only watching last week’s Rubin “First Look” event — we had some work to do too.
Asteroid-hunting and planetary defense have always been a major piece of Rubin’s science case. When it hits full steam, Rubin will increase the rate of observations coming into the MPC by a factor of around five, and at this point years of effort have gone into getting the MPC ready for Rubin’s data stream, as I’ve mentioned a few times already. With Rubin getting closer to full operations, the MPC and Rubin teams have been working together closely to build the actual systems that will send Rubin data to MPC and process them. We collectively agreed that as part of last Monday’s launch event, the MPC would accept a first official Rubin data delivery and process it for distribution through our standard, public-facing interfaces. Quite a big milestone for this long-lasting project!
While the Rubin team submitted their data in the same way that everyone else does, we processed the measurements using a set of next-generation pipelines that we’ve been developing as a major part of the broader effort — the first time that we’ve done so as part of production operations. This wasn’t strictly necessary, in a certain sense — the legacy pipelines that churn away every day could, in principle, have handled the data. The old and new systems are functionally equivalent, at the moment, and the data volume of this one submission wouldn't have been enough to tip our systems over. But the whole point of building the new pipeline is that once we start getting Rubin data every night — which will start happening in a matter of weeks! — the legacy system simply won’t be able to keep up. The time to get the new code up and running is now.
Now, the dirty secret, such as it is, is that we already processed Monday’s submission from Rubin dozens of times. We have a “sandbox” system that’s been hosting the new pipelines and we’ve been testing its results assiduously. So we knew exactly what we were going to be getting, well in advance.
But, that being said, there’s always a difference between testing and actual production deployment, so Monday was genuinely a big day for us.
How did it go? Pretty well! We did uncover some issues that needed working through, but I would generally characterize them as ones that I don’t feel too bad about — issues occurring in the gaps that we knew we wouldn’t realistically be able to test well in advance. Probably the most glaring issue was that the provisional designations generated by the new code were associated with the wrong dates. That wasn’t ideal, but it was a one-off mistake, rather than an indicator of a flawed design. MPC developer Brian Burt did a fantastic job squashing this and other issues last week, allowing us to process another batch of Rubin data on Friday with nearly ideal results.
I won’t run down all of the issues that we ran into, but there was a clear theme, alluded to above: the problems occurred in places where there were differences between our test environment and the actual production environment. (Twelve-factor wins again.) Or, in some areas like the issuing of provisional designations, at the moment we simply don’t have a way to test such code in an end-to-end fashion in a non-production setting. No surprise that that’s an area where problems surfaced.
Fortunately — in a certain sense — this theme came as no surprise to us. We’re well aware of the limitations in our software testing capabilities, and have been working steadily to address them. We have what we need when it comes to unit testing, but integration tests are another story: we simply don’t have non-production versions of various subsystems needed to test end-to-end workflows.
I was not at all surprised to discover this when I arrived at the MPC. In my experience, the idea of having parallel test and prod deployments is a great example of something that’s near-universal in industry, but that a surprising number of self-taught academic software developers aren’t used to. (Another example: semantic versioning and the concept of API breakage.) Historically, MPC certainly had nothing of the sort. But that’s something I intend to make sure we fix — a lot of my preoccupation with code deployability and topics like release automation is precisely because these practices help you ensure that you can construct multiple identical deployment environments, which can then be used to support complex testing and prototyping. Since MPC’s technology systems are both complex and legacy-filled, it will take a long time to complete the evolution, but I’ve found that it’s always worth the effort.