PKGW https://newton.cx/~peter/ Zola en Wed, 17 Apr 2024 15:20:26 -0400 A Tool and Workflow for Radio Astronomical “Peeling” Wed, 17 Apr 2024 15:20:26 -0400 https://newton.cx/~peter/2024/peeling-tool/ https://newton.cx/~peter/2024/peeling-tool/ <p>With last Monday’s North American total solar eclipse throwing a lot of things off (ask me about my nine-hour drive from Vermont to Boston!), this is a good week for catching up on the blog backlog. So, nearly five years after I published it, here’s a quick advertisement for a radio interferometric peeling tool (<a href="https://github.com/pkgw/rubbl-rxpackage">code</a>, <a href="https://ui.adsabs.harvard.edu/abs/2019RNAAS...3..110W">publication</a>) that I’ve developed. This post will describe the associated workflow in a bit more detail than could fit in the length-constrained Research Note (<a href="https://ui.adsabs.harvard.edu/abs/2019RNAAS...3..110W">Williams et al., 2019</a>) that presented the tool.</p> <span id="continue-reading"></span> <p>Fair warning: I’m going to get right into the weeds since this is a specialist topic.</p> <p>When you’re doing interferometric imaging with a telescope like the <a href="https://public.nrao.edu/telescopes/vla/">VLA</a>, very bright sources are often a problem. In interferometery, calibration errors generally scatter flux from true sources across your image, potentially corrupting the measurements of whatever features you’re interested in. People interested in interferometric techniques therefore often spend a lot of time worrying about the dynamic range that they’re achieving, measured as the ratio of the brightest part of the image to some kind of noise metric (e.g., <a href="https://ui.adsabs.harvard.edu/abs/2013A%26A...551A..91B/abstract">Braun 2013</a>). Standard calibration techniques should easily achieve a dynamic range much better than 100:1 (e.g., 1% scattering), and if that’s good enough, fine. But if you have a bright source in your field, you might need to use more sophisticated calibration to reach for 10000:1 or even better, because 1% of a big number might be enough to cause problems for your science.</p> <p>Standard interferometric calibration techniques are “direction-independent”: they compute various parameters that are, in a certain sense, based on the total flux arriving at the telescope from the field that it's looking at, without paying attention to <em>where</em> exactly the flux is coming from. To get more sophisticated, often one adopts “direction-dependent” calibrations: your antennas aren’t perfect, and so the telescope’s response to flux from one part of the sky isn’t the same as its response to the same amount of flux from another part of the sky. Accounting for this reality requires a more complex instrument model and extra computation, but can yield significantly better results.</p> <p>In the simplest situation, after solving for a direction-independent (DI) calibration, you might discover that a nuisance source dominates your image. If it’s bright enough and sufficiently easy to model, you should be able to self-calibrate to obtain a direction-dependent (DD) calibration optimized for that one particular source. If that source is off in the sidelobes of your antennas, those calibration parameters might be very different than the DI calibration parameters, which are generally derived my pointing your telescope right at a calibrator.</p> <p>In this scenario, the emission from your science target(s) is still best calibrated using the DI parameters. So, you basically want to analyze the data using two different calibrations at the same time — but how? Some radio imaging tools can do this, but if we’re not using one of them, we’re not necessarily out of luck. In the “peeling” technique (e.g. <a href="https://ui.adsabs.harvard.edu/abs/2004SPIE.5489..817N/abstract">Noordam 2004</a>), we simply subtract the source away in the visibility domain. To the extent that our system is linear, and our calibrations are invertible, and that we can model our source (hopefully all pretty good assumptions), this will get rid of the source and allow us to proceed with standard DI analysis as if it was never even there.</p> <p>For the <a href="https://newton.cx/~peter/2023/brown-dwarf-windspeeds/">Allers+ 2020</a> windspeeds paper, this was what I wanted to do. But as far as I was able to tell (and to my knowledge, this is still true), the standard <a href="https://casa.nrao.edu/">CASA</a> analysis package didn’t provide all of the tools necessary to peel. Most of the elements were there, but you need some mechanism to, essentially, invert a set of calibration gains, and I couldn't see any way to do that with the out-of-the-box tasks. Meanwhile, I couldn't find any third-party peeling tools that looked like they could plug into an otherwise fairly vanilla VLA cm-wavelength analysis.</p> <p>But the underlying computation isn’t that complex, and it can be implemented if you’re willing and able to write some code to edit your <a href="https://casa.nrao.edu/Memos/229.html">Measurement Sets</a> directly. You could do it in Python, but I had been experimenting with building an interface between the CASA data formats and the Rust language, a project called <a href="https://github.com/pkgw/rubbl">rubbl</a>. For the large, complex data sets that you get in radio interferometry, Rust is worth it — I’ve implemented data-processing steps that become 10 times faster after porting from Python to Rust, and that’s not even factoring in the way that you can implement parallel algorithms in Rust in a way that Python just can’t handle. (If you’re using <a href="https://github.com/casacore/python-casacore">python-casacore</a> naively, at least; systems like <a href="https://dask-ms.readthedocs.io">dask-ms</a> might improve things.)</p> <p>So I ended up implementing the missing step inside a toolkit called <a href="https://github.com/pkgw/rubbl-rxpackage">rubbl-rxpackage</a>, in a command-line tool called <code>rubbl rxpackage peel</code>. (There are a couple of other mini-tasks inside <code>rubbl-rxpackage</code>, but not many.)</p> <p>The <code>rxpackage peel</code> tool implements only the very specific calibration-inversion step that I found to be missing from mainline CASA. To actually implement peeling in a pipeline, you need to include the tool within a broader set of steps in your workflow. The <a href="https://ui.adsabs.harvard.edu/abs/2019RNAAS...3..110W">associated Research Note</a> describes this workflow, but length limitations forced it to be quite terse. Below I reproduce the description in the note in an expanded format.</p> <h2 id="the-peeling-workflow">The Peeling Workflow<a class="zola-anchor" href="#the-peeling-workflow" aria-label="Anchor link for: the-peeling-workflow">🔗</a></h2> <p>Assume that we have two Measurement Sets, <code>main.ms</code> and <code>work.ms</code>. The fundamental operation implemented by the <code>peel</code> command is to perform the following update:</p> <pre style="background-color:#2b303b;color:#c0c5ce;"><code><span>main.ms[MODEL_DATA] += </span><span> work.ms[MODEL_DATA] * (work.ms[DATA] / work.ms[CORRECTED_DATA]) </span></code></pre> <p>Basic calibrations are multiplicative: <code>CORRECTED_DATA = (calibration) * DATA</code>. So the ratio <code>DATA / CORRECTED_DATA</code> is the inverse calibration term that we need, and the effect of the command is to update a MS model with another model <em>perturbed by an inverted calibration</em>. By computing the inversion this way (as opposed to, say, futzing with gain calibration tables directly, which was the first approach I considered), we get the nice property that we can invert any multiplicative calibration, without worrying about how exactly it was derived.</p> <p>With this building block, we can peel. We’ll assume that we have <em>n</em> &gt; 0 bright nuisance sources to peel, numbered sequentially with an index <em>i</em> by decreasing total flux. The steps are as follows.</p> <ol> <li>Image <code>main.ms[DATA]</code>, obtaining a CLEAN component image <code>main.model</code>. (Or multiple model images, if you’re using MFS with Taylor terms, etc.). So, we’re assuming that you’ve used a task like <code>split</code> to apply your DI calibration to your dataset. In the abstract this isn’t necessary, but CASA’s triple-data-column paradigm kind of forces you into this.</li> <li>For each bright source to peel ... <ol> <li>Create a CLEAN component image <code>template.$i.model</code>, where <code>$i</code> is the index number of the bright source in question. In this component image, zero out the sources with index numbers <em>j</em> ≤ <em>i</em>. That is, edit the actual component model image data to replace the pixel values around the bright source and the already-peeled ones with zeros. This requires some custom Python code, but it's straightforward and not I/O-intensive.</li> <li>Use the task <code>ft</code> to fill <code>main.ms[MODEL_DATA]</code> with the Fourier transform of <code>template.$i.model</code>. This set of model visibilities therefore captures the non-nuisance sources, and any bright sources that we haven’t yet started dealing with.</li> <li>For each previously peeled source with index <em>j</em> &lt; <em>i</em>, use the peeling tool with its work MS (see blow). This will update <code>main.ms[MODEL_DATA]</code> to add in the best-available DD-calibrated models for these sources. Once this is done, the model will capture everything in the image <em>except</em> the <em>i</em>th nuisance source, because we zeroed it out when creating <code>template.$i.model</code>. We also zeroed out the <em>j</em> &lt; <em>i</em> sources, but now we’ve added them back in.</li> <li>Clear <code>main.ms[CORRECTED_DATA]</code> and use the task <code>uvsub</code>, which will set <code>CORRECTED_DATA = MODEL_DATA - DATA</code>. This will leave the <code>CORRECTED_DATA</code> column containing <em>only</em> source <em>i</em> — the things that are <em>not</em> in the model are the ones that remain after we subtract the model. Of course, this only holds to the extent that our calibrations and models are accurate, but if we’re focusing on mitigating bright nuisance sources, the imperfections shouldn’t be significant, by definition.</li> <li>Use the task <code>split</code> to create a new dataset <code>work.$i.ms</code>. Its <code>DATA</code> column will be equal to this <code>CORRECTED_DATA</code> column, so it will contain only the signal from source <em>i</em>. This signal has had the DI calibration applied, but still could benefit from additional DD calibration.</li> <li>Fill <code>work.$i.ms[MODEL_DATA]</code> with a model of source <em>i</em>, using whatever standard CASA tools are appropriate. E.g., you might use the “component list” functionality and the <code>ft</code> task.</li> <li>Use CASA’s standard calibration routines to determine a source-specific self-calibration for source <em>i</em>. You can use whatever calibrations are appropriate, so long as their net result is multiplicative on a per-visibility basis. You don’t have to worry about absolute flux calibration since we only care about removing the source in the “uncalibrated frame”. We now have the DD calibration for our source, since we subtracted all of the flux for everything that is <em>not</em> our source.</li> <li>Use task <code>applycal</code> to, well, apply the calibrations, filling in <code>work.$i.ms[CORRECTED_DATA]</code>. The only reason we need to do this is so that the peeling tool can then figure out the inverse calibration by computing the ratio <code>DATA / CORRECTED_DATA</code>. If there were a way to directly invert the calibration parameters obtained in the previous step, we could skip this step, but I would only feel confident doing so with unrealistically trivial calibrations.</li> </ol> </li> <li>After doing the above, we have <em>i</em> copies of our dataset, stored in the <code>work.$i.ms</code>, each of which encodes the DD calibration solution associated with its corresponding bright nuisance source. Clear <code>MODEL_DATA</code> and <code>CORRECTED_DATA</code> in <code>main.ms</code>.</li> <li>Now, for each bright source, use the peeling tool. This will build up the <code>MODEL_DATA</code> column of <code>main.ms</code> to contain only the DD-calibrated models of the bright nuisance sources. Note that, if the calibrations are multiplicative as required, the actual data “disappear” in the ratio of <code>DATA / CORRECTED_DATA</code>, so there is no danger of faint signals making their way into this model (which would be bad because they’re about to be subtracted from the science data). Put another way, this column will be noise-free. It contains a sum of ratios of analytically-derived quantities: source models (the <code>MODEL_DATA</code> in the <code>work.$i.ms</code>) divided by instrumental (calibration) models (more precisely, multiplied by the reciprocal <code>DATA / CORRECTED_DATA</code>).</li> <li>Finally, we can use <code>uvsub</code> to subtract this model from the science data. <code>main.ms[CORRECTED_DATA]</code> will now contain the science data with the bright sources peeled. You can then proceed to image, selfcal, etc.</li> </ol> <p>Does it work? It sure does! The results were actually better than I had dared hope.</p> <div> <figure> <img src="rnaasab35d5f1_lr.jpg" alt="Demonstration of the workflow with a 10 hr VLA observation. The left panel shows the image before peeling, the right after. Color scales are the same. The circles indicate the half-power response of the VLA primary beam at the central observing frequency, 6 GHz. The CLEAN residual rms at the pointing center decreases from 3.9 to 2.5 μJy. From Williams et al., 2019."> <figcaption>Demonstration of the workflow with a 10 hr VLA observation. The left panel shows the image before peeling, the right after. Color scales are the same. The circles indicate the half-power response of the VLA primary beam at the central observing frequency, 6 GHz. The CLEAN residual rms at the pointing center decreases from 3.9 to 2.5 μJy. From Williams et al., 2019.</figcaption> </figure> </div> <p>I had probably been wishing for this functionality for, say, five years, before I sat down and implemented it. Rust helped, but really the breakthrough was realizing that I could invert the calibration gains by computing <code>DATA / CORRECTED_DATA</code>. Before that (in retrospect, obvious) idea came to me, I was getting hung up on how to invert a CASA gains table — you can take the reciprocals of all of the numbers in the table, but how does that interact with time-based interpolation? What if you also want to apply a bandpass correction? It just seemed like it was going to be really dodgy. The “empirical” approach isn’t as efficient, but it’s robust, and that’s a tradeoff I’m generally happy to make.</p> The xz Backdoor and Release Automation Thu, 04 Apr 2024 10:37:08 -0400 https://newton.cx/~peter/2024/xz-and-release-automation/ https://newton.cx/~peter/2024/xz-and-release-automation/ <p>Some of you may have spent a lot of the past week following the drama ensuing from the discovery of <a href="https://www.schneier.com/blog/archives/2024/04/xz-utils-backdoor.html">a backdoor inserted into the open-source <code>xz</code> library</a>. Others of you probably have no idea what I’m talking about. There’s been more than enough chin-stroking on this topic over the past week, but I want to at least point out a connection to a topic near and dear to my software-engineering heart: release automation.</p> <span id="continue-reading"></span> <p>The short version of the <code>xz</code> story is that, more or less by luck, an engineer discovered that someone had subtly inserted a malicious <a href="https://en.wikipedia.org/wiki/Backdoor_(computing)">backdoor</a> into a low-level compression library called <code>xz</code>. The backdoor was only introduced a few months ago, so it didn’t really have time to spread much around the internet — but it was designed in a way such that if it had, it could plausibly have given the perpetrator a nearly-invisible way to hack into huge numbers of computers all across the internet. Yikes!</p> <p>The scary thing is that the “someone” who inserted the backdoor was the nominal maintainer of the package — the person in charge of it. This “person”, operating under the name Jia Tan, almost certainly doesn't actually exist. And they weren’t the <em>original</em> maintainer of the package.</p> <p>With hindsight, it appears that <code>xz</code> was subject to a multi-year, state-sponsored social engineering attack. The original, non-nefarious developer posted in 2021 about being too burned out to maintain the library; almost immediately, “Jia Tan” appeared and started helping out. Eventually “Jia Tan” was given maintainership rights, and finally, after a few years, attempted to sneak in their malicious code. I’m no security expert, but I agree with most of the armchair folks out there that between the long-time horizon of the attack and the complexity of its implementation, it seems overwhelmingly likely that it was supported by an organization of national scale.</p> <p>As a side note, you might recall that I <a href="https://newton.cx/~peter/2023/seeking-tectonic-co-maintainers/">posted about seeking new Tectonic co-maintainers</a> and eventually <a href="https://github.com/tectonic-typesetting/tectonic/discussions/1142">added two people</a>. I remain very happy that I did! They were both previous contributors and during that process I got some information about their offline identities; I can’t say that there’s <em>zero</em> chance that I’m not the subject of a sophisticated intelligence op, but I’m not losing sleep over it and I don’t think that you should either. (Also, hindsight further suggests that first step of the social-engineering campaign against the original <code>xz</code> maintainer was targeted bullying about his failure to be sufficiently responsive to feature requests; although it would be hard to distinguish this from everyday life as an open-source maintainer! Either way, that’s thankfully not something that I’ve experienced with Tectonic.)</p> <p>Anyway. The discovery of this attack has prompted a lot of musing about the open-source ecosystem; <a href="https://mastodon.world/@mikey@soylent.green/112186694087462654">this toot</a> from Mikey Dickerson (or is it “Mikey Dickerson”?) put it well:</p> <blockquote> <p>hey does anybody out there have any thoughts about the xz compromise or perhaps have you thought of a way to relate it to some axe you have been grinding for 20 years</p> </blockquote> <p>I sure do, and I sure have! But here I’m going to keep it to one specific axe that’s only a few years old.</p> <p>For something like the <code>xz</code> attack to succeed, the malicous code has to be well-hidden. People use libraries without reading their source code all of the time, but if your repository contains a big function called <code>trojan_horse_implementation</code>, someone is <em>eventually</em> going to look into it. You also only want to enable the code in narrow circumstances, so that routine security tests don’t even see that it’s there. A lot of the sophistication in the <code>xz</code> attack is how the malicious payload was obfuscated and disabled altogether when it wasn’t needed. One component of this was that the code that developers download and install from GitHub is unproblematic; a single, innocuous-looking line inserted into one of the build scripts in the packaged release files starts up the whole machinery that activates the hack.</p> <p>This has gotten some people worried, and rightly so. Generally, developers use the GitHub version of a piece of software, but most deployed versions (i.e., the stuff that ends up installed on thousands or millions of computers) are based on release packages created by the maintainer. The release package is supposed to capture what’s on GitHub, but usually the maintainer creates it by hand, so really it could contain … <em>anything</em>. If I offer a compiled version of a program, for instance, I could compile in whatever extra code that I want, and the nature of the packaging and compilation process makes it effectively impossible to verify that the “release artifacts” truly correspond to the public version of the source code found on GitHub. This system requires you to completely trust the maintainer who creates the releases — which is now feeling like a scarier proposition than it used to!</p> <p>How can we mitigate this? Well, I claim if we want to be able to verify that release artifacts truly correspond to their alleged source code, we need to automate the processes by which they are produced. And would you know that I’ve been excited about release automation <a href="https://newton.cx/~peter/2020/implementing-software-versioning/">for several years now</a>? Once you ask the question “why are maintainers creating these things by hand, anyway?”, I feel that the proper solution becomes self-evident.</p> <p>That being said … why <em>are</em> maintainers creating these things by hand? In my view, a big reason is simply inertia. The counter-question — “what’s the alternative to making a release by hand?” — only has an answer thanks to the existence of publicly-available, cloud-based <a href="https://www.redhat.com/en/topics/devops/what-is-ci-cd">CI/CD</a> (“continuous integration and deployment”) services, and I think that a lot of projects still really haven’t internalized the kinds of workflows that these services unlock. It’s been interesting to watch the evolution in this space. When I started using Travis CI, it was basically a way to trigger a VM to run test suites for various programming languages. But at some point we collectively realized that if you can do that, you can really run <em>any</em> kind of code — these services are really free, easy-to-configure platforms for cloud compute on demand, that just happen to have their workflows tied to Git repositories. (I have <em>no</em> idea how these services prevent people from abusing them to mine crypto; I’d guess that they have whole teams dedicated to stopping just that.) <a href="https://conda-forge.org/">Conda-forge</a> was quite ahead of its time in realizing that you could use CI/CD to build and publish software packages, but that sort of insight hasn’t sunk in everywhere.</p> <p>The other piece is that there are some genuine workflow problems that need to be solved in order for maintainers to create high-quality release artifacts on these public CI/CD systems. Admittedly, lot of maintainers have been perfectly happy with the status quo, but I feel that there are some subtle issues at the root of this activity that need to be addressed carefully. This problem bothered me for <em>years</em> before <a href="https://newton.cx/~peter/2020/implementing-software-versioning/">I first sketched out a solution that felt adequate</a>, and then I had to create a whole <a href="https://pkgw.github.io/cranko/">new tool</a> and <a href="https://pkgw.github.io/cranko/book/latest/jit-versioning/">workflow</a> to implement it. While the ideas behind <a href="https://pkgw.github.io/cranko/">Cranko</a> are, I think, quite general, it’s also true that its approach also benefits greatly from the sophistication of modern CI/CD systems; its release automation would be a lot more annoying without <a href="https://azure.microsoft.com/en-us/products/devops/pipelines">Azure Pipelines’</a> tools for accumulating artifacts and managing multi-stage builds across the Linux, Windows, and Mac platforms. That is to say, there are a <em>lot</em> of pieces of infrastructure that need to come together to make high-quality release automation feasible.</p> <p>Fortunately, those pieces currently exist. Release automation on public CI/CD is far from an airtight solution, of course — a malicious maintainer can insert obfuscated build steps, use hidden environmental settings, or simply replace automatically-generated release artifacts with tampered ones after the fact. But it at least <em>enhances</em> our ability to audit release artifacts and understand how they’re produced. I didn’t have the supply-chain security angle in mind when I developed <a href="https://pkgw.github.io/cranko/">Cranko</a>, but I wouldn‘t be surprised if people start adding release automation to the list of security-enhancing practices that they want to see open-source projects adopt.</p> DASCH Scanning is Complete — What’s Next? Thu, 28 Mar 2024 02:00:00 -0400 https://newton.cx/~peter/2024/dasch-scanning-completion/ https://newton.cx/~peter/2024/dasch-scanning-completion/ <p>Today’s the day! After two decades of work, the DASCH scanning “prime mission” is now complete. Today we celebrated the scanning of the final DASCH-able plates in the HCO collection with a small event and a champagne toast.</p> <span id="continue-reading"></span> <p>This is the end of a chapter, but far from the end of the story. Just how much work is left to do is a question whose answer depends on how broad of a perspective you want to take.</p> <p>When it comes to making the existing DASCH data available and useful to the astronomical research community, there’s still plenty to be done. In the near future, that work will mostly take the form of the next data release, DR7, currently under development as <a href="https://newton.cx/~peter/2024/dasch-drnext-beta/">DRnext</a>. As my recent posts have perhaps conveyed, this is going to be a lot of effort — there’s basically an unbounded amount of work that could go into writing documentation, fleshing out software like <em>daschlab</em>, and other support activities — let alone actual improvement of the underlying data products. But DR7 also needs to come out in a finite amount of time. I can already tell that it’s going to be difficult to draw the line at which the polishing has to stop and the thing simply has to get out the damn door.</p> <p>Regardless of where exactly that line gets drawn, there’s absolutely no chance that DR7 will completely plumb the depths of the DASCH data. You could spend the rest of your life improving and expanding the analysis of a dataset as large and heterogeneous as DASCH. Whether we’ll see Data Releases 8, 9, <em>etc.</em> off into the future is, as ever, going to depend on money. If you ask me, a dataset as rich and unique as DASCH certainly deserves lots of funding to analyze and enhance it — and that’s not even considering its cultural and historical implications — but lots of projects deserve funding.</p> <p>It’s also important to underline that “DASCH-able plates” represents only a portion of the Plate Stacks holdings. About 430,000 plates have been scanned for DASCH, compared to an estimated 550,000–600,000 plates in the entire collection. This gap exists because DASCH explicitly focused on plates suitable for astronomical photometric and time-domain analysis. Plates that did not fit this definition include solar, lunar, and eclipse observations, and above all, the spectrum plates, containing the hundreds of thousands of stellar spectra studied by Annie Jump Cannon. These plates might not be useful for time-domain photometry, but that is not at all to imply that they're not valuable in their own right. For instance, it’s believed that the spectrum plates could provide unique insight into climate change by measuring the gradual changes in the absorption lines that Earth’s atmosphere imprinted on the data.</p> <p>But wait, there’s more! The Plate Stacks holdings also include film, written materials, and other artifacts. We can consider DASCH to be one tentpole among many in the broader effort to digitize the holdings of the Plate Stacks. It was certainly the tallest tentpole, but others remain.</p> <p>Heading in a different direction, we can look beyond the Plate Stacks collection. <a href="https://dasch.cfa.harvard.edu/scanner/">The DASCH scanner</a> is still going strong, and I think it’s fair to say that in many ways it remains the highest-quality plate scanner in the world. It’s very tempting to consider digitizing other plate collections. This presents logistical challenges: the scanner is essentially immovable, but large collections of glass plates are difficult and expensive to transport as well. It’s also true that the scanner is a one-of-a-kind piece of hardware, analogous to a telescope instrument. Many of its components are virtually irreplaceable — and getting quite old. If you wanted me to promise that the system will keep working for the next decade, you’d have to give me a fairly significant sum of money to invest in documenting the existing setup, assessing potential failure modes, and developing recovery plans.</p> <p>Granting that baseline risk, the Plate Stacks are already in possession of a few external plate collections whose digitization with the DASCH scanner we hope to demonstrate. We have copies of <a href="https://skyserver.sdss.org/dr5/en/proj/advanced/skysurveys/poss.asp">POSS-I</a>, the ESO Southern Sky Survey (which appears to be surprisingly undocumented, but see <a href="https://ui.adsabs.harvard.edu/abs/1980Msngr..22....7S">Schuster in <em>Messenger</em>, 1980</a>), the <a href="https://en.wikipedia.org/wiki/Science_and_Engineering_Research_Council">SERC</a> Equatorial Red survey (SERC-ER) and associated <a href="https://aat.anu.edu.au/about-us/AAT">AAO</a> Second-Epoch Survey (<a href="https://ui.adsabs.harvard.edu/abs/1989BICDS..37...13L">AAO-SES</a>, aka AAO-R), and the Palomar “Quick V” (Pal-QV, <a href="https://ui.adsabs.harvard.edu/abs/1990AJ.....99.2019L">Lasker+ 1990</a>) survey done in support of the Hubble Guide Star Catalog project. Together these form a substantial fraction of the plates that went into building the <a href="https://archive.stsci.edu/missions-and-data/dss--gsc">Digitized Sky Survey</a>. I’d love to compare DASCH results to the DSS data! This could dovetail beautifully with my longstanding desire to upgrade the “DSS Terapixel” map that’s the default optical basemap for <a href="https://worldwidetelescope.org/home/">WorldWide Telescope</a>, which is generally great but has a few longstanding issues (most notably for scientists, a few-arcsecond global astrometric offset). What’s kind of amazing is that, as far as I’ve been able to figure, digitizations of these photographic surveys <em>still</em>, in the year 2024, represent the best way to obtain a deep, high-resolution optical map of every last nanosteradian of the sky. Modern surveys go awfully deep and awfully wide, but I’m not aware of a set of surveys that can be homogeneously combined across the entire sphere.</p> <p>To try to make concrete the sophistication of the DASCH system — not just the scanner, but the people and processes surrounding it — I’ll note that we could probably digitize all of POSS-I in a couple of good weeks.</p> <p>Finally, a call to action: these are things that I would like to see happen, but in truth probably <em>none</em> of them are going to occur (except DR7) without community support and collaboration. If any of these topics are at all interesting to you, please let me know! I would genuinely love to help other groups launch their own projects based on the DASCH data, the scanner, or other elements of the Plate Stacks collection.</p> Beta-Testing DASCH “DRnext” Thu, 21 Mar 2024 11:06:05 -0400 https://newton.cx/~peter/2024/dasch-drnext-beta/ https://newton.cx/~peter/2024/dasch-drnext-beta/ <p>The work on <a href="https://dasch.cfa.harvard.edu/">DASCH</a> continues to move forward! Yesterday, I posted a first draft of a new set of resources for astronomers. These are collected under the <a href="https://dasch.cfa.harvard.edu/drnext/">DASCH DRnext</a> moniker and are now ready to be checked out.</p> <span id="continue-reading"></span> <p>The fact underlying the <a href="https://dasch.cfa.harvard.edu/drnext/">DRnext</a> designation is that while DASCH has historically had a series of “data releases” (DRs), they weren’t really releases in the usual sense. Normally a DR is associated with a specific set of immutable artifacts, so that if you choose to work with, say, <a href="https://skyserver.sdss.org/dr12/">SDSS DR12</a>, you’ll always get exactly the same results if you repeat the same queries. But DASCH only has one set of data servers, and they're always being updated as scanning proceeds and the data processing gets refined, so we’re unable to provide locked-in artifacts. Historically, the DASCH DRs were basically about lifting restrictions on public access to certain portions of the sky.</p> <p>It would be nice to be able to provide traditional, immutable DRs, but with the current resources and system architecture that’s not feasible. In the meantime, the “DRnext” label is the place where the latest-and-greatest DASCH docs and tools will accumulate — sort of like the <code>main</code> branch of a repository as opposed to a versioned release.</p> <p>So, what’s just landed on <code>main</code>?</p> <p>The centerpiece of DRnext is an effort to deliver better tools for scientific data analysis via <em>daschlab</em>, the Python package that I’ve <a href="https://newton.cx/~peter/2024/daschlab-sneak-peek/">mentioned here</a> <a href="https://newton.cx/~peter/2024/fun-python-filtering-pattern/">a few times</a>. In support of this, we now have:</p> <ul> <li><a href="https://www.youtube.com/watch?v=GofXy8BZxjY">The demo video I posted about earlier</a></li> <li><a href="https://daschlab.readthedocs.io/">Python API reference docs</a> (which I also mentioned earlier)</li> <li><a href="https://dasch.cfa.harvard.edu/drnext/rycnc/">A tutorial slideshow</a> that lets you work through the notebook shown in the video, via <a href="https://mybinder.org/v2/gh/pkgw/daschlab/HEAD?labpath=notebooks%2FRY+Cnc.ipynb">a MyBinder notebook</a></li> <li><a href="https://dasch.cfa.harvard.edu/drnext/reduce-lightcurve/">The beginnings of a lightcurve reduction cookbook</a> based on <em>daschlab</em> (although there's no reason you couldn’t use the same techniques in a different data-analysis system)</li> <li><a href="https://dasch.cfa.harvard.edu/drnext/install-daschlab/">Instructions for installing <em>daschlab</em> locally</a></li> </ul> <p>Integrated with this are several other new resources:</p> <ul> <li>An <a href="https://dasch.cfa.harvard.edu/drnext/introduction/"><strong>Introduction to DASCH</strong> slideshow</a> aimed at astronomers</li> <li>A cloud-powered <a href="https://mybinder.org/v2/gh/pkgw/daschlab/HEAD?labpath=notebooks%2FQuicklook.ipynb">sample “quicklook” notebook</a> aiming to provide an alternative to the <a href="http://dasch.rc.fas.harvard.edu/lightcurve.php">Cannon web-based plotting tools</a></li> <li>Reference documentation of the <a href="https://dasch.cfa.harvard.edu/drnext/lightcurve-columns/">DASCH lightcurve table columns</a></li> <li>Thorough cross-referencing to the newly-launched <a href="https://starglass.cfa.harvard.edu/">StarGlass</a> website and API where appropriate</li> <li>Reorganizing the existing material to hopefully make it more manageable</li> </ul> <p>If you look at the <a href="https://dasch.cfa.harvard.edu/drnext/">DRnext landing page</a>, I have very plainly used <a href="https://newton.cx/~peter/2023/divio-documentation-system/">the Diátaxis (née Divio) documentation model</a> for organizing things. This is the first time that I’ve put together a landing page with an explicit Tutorial/How-To/Explainer/Reference breakdown; in this particular case, at least, I think it works well. (All of this might also help explain why I’ve been <a href="https://newton.cx/~peter/2024/digital-docs-are-web-apps/">thinking about</a> digital documentation <a href="https://newton.cx/~peter/2024/digital-docs-have-apis/">lately</a>, although that’s something I do pretty often regardless.)</p> <p>I’ve tried a couple of new things in assembling this documentation. One is the way that I’ve presented the <a href="https://dasch.cfa.harvard.edu/drnext/introduction/">“Introduction to DASCH” material</a> as a web-based slideshow; historically I would have written this kind of material up as a brief document. I find it a little painful to admit, but more and more I feel like slide decks are a good way to deliver this kind of information; the pagination breaks things up into digestible chunks in a way that’s easier to approach than even a relatively short single-column presentation. (Drive-by hot take: in practice if not stated intent, <a href="http://brokestream.com/tex-web.html">Knuth’s literate programming</a> is basically a way to PowerPoint-ify source code.) I would hate to only deliver this kind of slideshow in a properietary format unrecognized by the browser, like a PPTX file, but since I can deliver the slides seamlessly using <a href="https://revealjs.com/">reveal.js</a>, I’m much more comfortable making them the exclusive publication format for this information.</p> <p>Building on that, <a href="https://dasch.cfa.harvard.edu/drnext/rycnc/">the <em>daschlab</em> RY Cnc tutorial</a> is delivered as yet another slideshow, one in which most of the slides contain video clips showing how the software is used interactively. The idea is that people can follow along the slides, and play and replay the clips, while using the notebook in a separate browser window. I think that video examples are super important for showing people how to use the highly interactive <em>daschlab</em> software; hopefully the slide-based presentation once again breaks things into nicely digestible chunks. It turned out to be relatively easy to record one long screencast of myself running through the notebook, then to use <a href="https://kdenlive.org/">Kdenlive</a> to slice it up into short clips suitable for each slide.</p> <p>Finally, I’m hopeful that the <a href="https://mybinder.org/v2/gh/pkgw/daschlab/HEAD?labpath=notebooks%2FQuicklook.ipynb">“cloud quicklook notebook”</a> idea will provide a decent alternative to DASCH’s legacy web-based lightcurve plotting tool. While the legacy tool is not super featureful, can produce genuinely misleading output, and is nigh-unmaintainble, it is <em>clearly</em> really important to give people a way to peek at some data right in their web browser, and that’s something that’s generally tricky with Jupyter/code-oriented tools. I’m crossing my fingers that a lightweight notebook can meet that need, and hopefully the “quicklook” framing will help people see it in the right light.</p> <p>There are plenty of systems aiming to integrate this kind of notebook into web sites/apps in a much smoother way, but I don’t think that DASCH has the resources to pursue them right now. So for the time being the quicklook is based on <a href="https://mybinder.org/">MyBinder</a>, which is a great service, especially because it’s free … but it’s slow, and not always reliable. Even if we don’t add some slick <a href="https://solara.dev/">Solara-type</a> integration where you don’t even realize that you’re running code, I think that the UX would feel a <em>lot</em> better if the notebook could reliably spin up in just five or ten seconds. One of the next things I want to do is look into setting up <em>daschlab</em> on a science platform service like <a href="https://sciserver.org/">SciServer</a>, or something that can hopefully offer ease-of-use comparable to MyBinder with better performance. Here at the CfA, there’s a vision of building a “Nexus” science platform (based on <a href="https://data.lsst.cloud/">Rubin</a>). It’s still in the early stages, so it’s not something that I expect to be able to integrate any time soon, but it would be an <em>ideal</em> home for this kind of capability. </p> Digital Documents have APIs Thu, 14 Mar 2024 10:25:41 -0400 https://newton.cx/~peter/2024/digital-docs-have-apis/ https://newton.cx/~peter/2024/digital-docs-have-apis/ <p>Following up on last week’s viral sensation <a href="https://newton.cx/~peter/2024/digital-docs-are-web-apps/">Digital Documents are Web Applications</a>, I want to add another idea into the mix. Here’s an additional thing to keep in mind about digital technical documents: yes, they’re incarnated as web applications. But unlike many web applications, they also expose nontrivial APIs — by which I mean that they expose interfaces aimed not just at <em>humans</em> (the user interfaces) but ones aimed at computers as well. In particular, the primary API exposed by a digital technical document is its cross-referencing structure.</p> <span id="continue-reading"></span> <p>Consider the reference documentation for a Python library like <a href="https://numpy.org/">Numpy</a>. It describes various symbols exposed by the package, like <a href="https://numpy.org/doc/stable/reference/generated/numpy.hstack.html"><code>numpy.hstack()</code></a>. If I’m documenting my own Python library, I might want to reference that piece of documentation from my own text. Any reference like this implies a sort of contract — I’m expecting, or at least hoping, that the URL embedded in the link above will deliver you to documentation about the <a href="https://numpy.org/doc/stable/reference/generated/numpy.hstack.html"><code>numpy.hstack()</code></a> function and that it will continue doing so into the future. You could quibble about the terminology, but I’m happy to refer to any sort of contract between two distinct digital systems as an API — and that’s what we have here. (The bare word “interface” might be better, except that I think it too easily conjures up the idea of a <em>user</em> interface, which is specifically not what I want to focus on.) My document depends on a thing that I can call “the Numpy docs”, and in particular it depends on the Numpy docs providing a thing called “the <code>numpy.hstack()</code> docs”.</p> <p>The most fundamental API offered by any web document is its URL structure: what links into it are valid? Through this lens we can see <a href="https://en.wikipedia.org/wiki/Link_rot">linkrot</a> as a sort of API break, which feels right to me: you told me that I could rely on this thing, and now you’ve gone and taken it away. Please don’t do that!</p> <p>URLs, however, are semantically opaque in the aggregate. And I’d argue that one characteristic of technical documents is that they tend to expose <em>multiple</em> referencing structures that wish to be <em>semantically rich</em>. Think of a long scientific article. Discussing its contents, I might want to refer to Figure 1, Figure 2, Figure 3; or Table 1, Table 2, Table 3; or Equation 1; or Section 2; or Reference #3 (if its references are numbered); and so on. While each of these references can likely be resolved — flattened — into a specific URL in the case of a digital article, as an author I want to work at a higher level.</p> <p>But the phrase “want to” isn’t strong enough. In the technical sphere, the ability to make reliable, semantically-rich references among documents is <em>essential</em> functionality. People usually agree that scientists use TeX because of its ability to render equations, but I’ll claim that <a href="https://www.bibtex.org/">BibTeX</a> is just as important. The ability to write <code>\citep{latest.astropy}</code> and get a properly formatted reference (both in-text and metadata in the References section at the bottom) is <em>huge</em>. I’m happy to go even farther and argue we can think of the institutional architecture of worldwide academic publishing is being largely designed to promote reference-ability. If you hand me a journal name, volume, and page number, I can probably locate the article that you’re talking about, even if it was published a century ago; contrast that with the durability of your typical web link. I’ve seen people talk about how impressed they are by how academics do such a good job of citing prior work, and I think that you can explain a lot of that as being because <em>we have designed our entire profession to make this kind of citation robust and convenient</em>. The framework of citation allows us to build a (mostly) coherent intellectual edifice out of individual workers’ labor.</p> <p>And we see the same phenomenon in software! As programming languages have evolved, they’ve tended to provide increasingly sophisticated and reliable systems for expressing and resolving dependencies between independent pieces of software: from downloading libraries from random websites, to <a href="https://www.cpan.org/">CPAN</a> to <a href="https://pypi.org/">PyPI</a> to <a href="https://www.npmjs.com/">NPM</a> and <a href="https://crates.io/">Cargo</a>. Just as in the case of scholarship, these affordances unlock scalability. While it’s hard to build a C project with more than a handful of external dependencies, Python projects easily have dozens, and a Rust project like <a href="https://tectonic-typesetting.github.io/">Tectonic</a> has hundreds. The key distinction between software and scholarship is the semantic richness of the interfaces between items. Scholarly citations are famously about as empty as you can get: if you write a paper saying “Williams <em>et al.</em> (2020) is garbage” or “Williams <em>et al.</em> (2020) is brilliant”, it’s a citation either way. Meanwhile, the APIs that capture the relationship between software components are complex and getting moreso all the time: from function prototypes and header files in C to complex type systems with public/private visibility annotations in languages like Rust or Go.</p> <p>Technical documentation lags behind all of this. Within the framework of a specific programming language, cross-referencing is usually well-supported: tools like <a href="https://docs.readthedocs.io/en/stable/guides/intersphinx.html">Intersphinx</a> enable this in Python, and <a href="https://doc.rust-lang.org/rustdoc/">rustdoc</a> has it built in. But if I want to reference a Python method from the documentation of a Rust project? Or if I want to link to a particular passage of an online manual in a way that will keep working when the manual’s authors inevitably change their URL structure? I’m on my own.</p> <p>For all of the reasons given above, I think that the lack of infrastructure in this area really limits our ability to create great digital technical documents. When it comes to declaring and resolving dependencies, we probably have the baseline of what we need — namely, URLs and the web. But right now we don’t have the tools to make explicit the “APIs” (cross-referencing structures) exposed by documents, and in my view, this lack makes it so that we don’t do a good job of rationalizing those APIs or monitoring compatibility. Further, it prevents us from building authoring tools that can span multiple technical silos. This feels to me like a very solvable problem.</p> Sneak Peek: daschlab Thu, 07 Mar 2024 10:53:26 -0500 https://newton.cx/~peter/2024/daschlab-sneak-peek/ https://newton.cx/~peter/2024/daschlab-sneak-peek/ <p><a href="https://newton.cx/~peter/2024/fun-python-filtering-pattern/">Recently</a> I mentioned that I’ve been working on a Python package called <a href="https://github.com/pkgw/daschlab">daschlab</a>, which will be the recommended analysis toolkit for <a href="https://dasch.cfa.harvard.edu/">DASCH</a> data. It’s designed for interactive data exploration, so I thought that I’d make a video giving a sense of what it’s like.</p> <span id="continue-reading"></span> <p>Here it is!</p> <div class="autoresize-iframe-4by3"> <div> <iframe src="https://www.youtube.com/embed/GofXy8BZxjY" webkitallowfullscreen mozallowfullscreen allowfullscreen> </iframe> </div> </div> <p>I haven’t yet written up a <em>lot</em> of the needed documentation, but if you’re feeling adventurous you can play with it today. Besides local installation, <a href="https://mybinder.org/v2/gh/pkgw/daschlab/HEAD">this MyBinder link</a> will load up a JupyterLab environment with daschlab installed. Any data that you download won’t persist, but you should be able to try a few things out.</p> <p>There‘s also some <a href="https://daschlab.readthedocs.io/">API reference documentation</a>, but these docs are intentionally low-level; there no installation instructions, tutorials, etc. That kind of stuff will end up on <a href="https://dasch.cfa.harvard.edu/">the main DASCH website</a>.</p> <p>In the course of making this video I went through my quasi-annual revisitation of what it takes to do efficient desktop video capture on Linux; a lot of that stuff is extremely hardware- and distribution- specific, but if you want to see what a glutton for punishment I am, the gory details are documented on my <a href="https://newton.cx/~peter/howto/capture-video-efficiently-on-linux/">Capture Video Efficiently on Linux</a> and <a href="https://newton.cx/~peter/howto/get-a-standard-browser/">Get a Standardized Browser Window</a> HOWTOs.</p> Get a Standardized Browser Window Wed, 06 Mar 2024 12:08:06 -0500 https://newton.cx/~peter/howto/get-a-standard-browser/ https://newton.cx/~peter/howto/get-a-standard-browser/ <p>To demonstrate an interactive webapp, it can be very helpful to record a screencast of its usage. If you just pop open your day-to-day web browser and record something, it <em>might</em> be fine, but there are two concerns. First is privacy/professionalism: if you type something in the URL bar, say, it will likely autocomplete items from your personal history. Second is repeatability: if you’re making a series of related videos, or updating an existing one, ideally the browser environment would change as little as possible; the window size in particular should stay the same.</p> <p>Here are some steps to create a repeatable browser environment using Firefox. Nothing here is magic; there are surely other approaches that would also work, but this is what I do.</p> <p>The key is that Firefox takes a command-line argument <code>--profile &lt;path&gt;</code> that lets you specify a custom input profile directory. The basic idea is to set up an extremely generic browser profile in a custom directory, then archive it. Every time you want to record a video, unpack a copy of the archive, use it, and then throw it away.</p> <p>This approach can of course be extended: you can create any number of standardized profiles, and they can be tweaked to save logins to relevant webservices, use project branding, etc.</p> <h1 id="creating-a-standardized-browser-profile">Creating a Standardized Browser Profile<a class="zola-anchor" href="#creating-a-standardized-browser-profile" aria-label="Anchor link for: creating-a-standardized-browser-profile">🔗</a></h1> <ol> <li>In some kind of work directory, <code>mkdir standard</code></li> <li>Run <code>firefox --profile standard</code></li> <li>Skip all personalization steps.</li> <li>Ctrl-Shift-B to hide the bookmark bar.</li> <li>Remove superfluous icons/features from the address bar.</li> <li>Right-click to “Customize Toolbar” and remove more stuff, and flexible spaces around the address bar.</li> <li>Open up the Settings screen.</li> <li>Set homepage and new tab page to Blank.</li> <li>Disable all “Home” content.</li> <li>In Search, set the default engine to Wikipedia.</li> <li>Disable search suggestions and address bar suggestions.</li> <li>Remove as many default search shortcuts as possible.</li> <li>In Privacy &amp; Security, turn off “Ask to save passwords”.</li> <li>Turn off Autofill options.</li> <li>Turn <em>off</em> popup blocking (seems like the better default for this use case?).</li> <li>Follow the steps for standardizing the window size below.</li> <li>Quit Firefox.</li> <li>In a terminal, run <code>sqlite3 standard/places.sqlite</code>. We’ll clear some tables. The list here comes from looking at the schema and checking which tables are non-empty by default. <ol> <li><code>DELETE FROM moz_origins;</code></li> <li><code>DELETE FROM moz_places;</code></li> <li><code>DELETE FROM moz_historyvisits;</code></li> <li><code>DELETE FROM moz_bookmarks;</code></li> <li><code>DELETE FROM moz_keywords;</code></li> <li><code>DELETE FROM moz_anno_attributes;</code></li> <li><code>DELETE FROM moz_annos;</code></li> <li><code>DELETE FROM moz_places_metadata;</code></li> </ol> </li> <li><code>rm -rf standard/cache2</code></li> <li>Tar up <code>standard</code> and save the resulting archive somewhere.</li> </ol> <h1 id="using-the-standardized-profile">Using the Standardized Profile<a class="zola-anchor" href="#using-the-standardized-profile" aria-label="Anchor link for: using-the-standardized-profile">🔗</a></h1> <ol> <li>Unpack the archived <code>standard</code> tree</li> <li>Run <code>firefox --profile standard</code></li> <li>If in doubt, follow the steps for standardizing the window size below.</li> </ol> <h1 id="standardizing-the-window-size">Standardizing the Window Size<a class="zola-anchor" href="#standardizing-the-window-size" aria-label="Anchor link for: standardizing-the-window-size">🔗</a></h1> <ol> <li>Ctrl-Shift-I to open devtools</li> <li>Pop the devtools out to their own window</li> <li>F1 to enable devtool options</li> <li>Activate “Toggle rulers for the page”</li> <li>Click the new right-angle ruler icon in the devtools header</li> <li>Resize the window so that the content area is 1280×875 px (see below)</li> <li>Close devtools</li> </ol> <p>The size here is chosen so that the final window with toolbars and decorations has dimensions of 1280×960, which is a 4:3 aspect ratio. </p> <p>You can figure out the padding by setting the browser content area to a reasonable size, then taking a screenshot of the browser window and looking at the size of the resulting image.</p> <p><strong>However</strong>, when using X.org, which is necessary to efficiently capture video, the final window size includes a fair amount of transparent blank space surrounding the actual window — I believe this is used for drop shadows and such. I can configure OBS to crop away this border, and this is desirable to avoid black borders around the video edge. <strong>Also</strong>, if the window is on a HiDPI display, various pixel sizes are doubled — we'd be aiming for 2560×1920.</p> <p>Currently, on my laptop’s display, which is HiDPI, the X.org window border consists of 46 px at top (in HiDPI pixels), 52 px left and right, and 58 px at bottom. This means that the target window size we're actually going for, in terms of what a screenshot will yield, is 2664×2024. On an attached regular-DPI monitor, all of these measurements are halved.</p> <p>Compared to the devtools content-area size readout, the <em>cropped</em> window size has the same width, but is 85 non-HiDPI px taller, at the moment.</p> <h1 id="bonus-setting-up-obs">Bonus: Setting up OBS<a class="zola-anchor" href="#bonus-setting-up-obs" aria-label="Anchor link for: bonus-setting-up-obs">🔗</a></h1> <p>The primary use case for all this is making <a href="https://obsproject.com/">OBS</a> video captures. Here are some notes about the setup of that.</p> <ol> <li>See <a href="https://newton.cx/~peter/howto/capture-video-efficiently-on-linux/">the video capture howto</a> for notes about proper video encoding settings. In theory that is all separate from the issues considered here.</li> <li>Set up a “window capture” device to capture the browser window.</li> <li>Set it up to crop as above: 46 / 52 / 58 / 52 px (in CSS ordering) if HiDPI; half those values if not. There is a &quot;source screenshot&quot; functionality that you can use to check the results. (Crop one pixel smaller on each axis and open in Gimp to verify that there is just 1px of border on all edges.)</li> <li>Use the “Edit Transform” (Ctrl-E) box to check the size of the stream, post-crop.</li> <li>Use &quot;Resize output (source size)&quot; to match the output to the window.</li> <li>If you're going to overlay a webcap feed, how about putting it in the top-right, 24 px from the edges? And giving it a size of 412×232 px? (These measurements assuming HiDPI).</li> <li>Finally, use <code>ffprobe</code> to check the pixel dimensions of a test recording.</li> </ol> <h1 id="see-also">See Also<a class="zola-anchor" href="#see-also" aria-label="Anchor link for: see-also">🔗</a></h1> <ul> <li><a href="https://newton.cx/~peter/howto/capture-video-efficiently-on-linux/">How to capture video efficiently on Linux</a></li> </ul> Digital Documents are Web Applications Fri, 01 Mar 2024 09:41:38 -0500 https://newton.cx/~peter/2024/digital-docs-are-web-apps/ https://newton.cx/~peter/2024/digital-docs-are-web-apps/ <p>I spend a lot of time thinking about digital technical documents. A computer’s screen is a more capable medium than the printed page: can we use that capability to actually communicate more effectively? To date, the answer appears to be flatly “no”. To an incredible extent, scientists’ preferred digital format for technical material is the PDF file, a completely print-oriented format. I’m not saying that scientists are wrong to use PDFs so much. Indeed, the fact that they do (unlike almost everyone else!) tells us something profoundly important. But it seems like we <em>should</em> be able to do better.</p> <span id="continue-reading"></span> <p>And, like it or not, the alternative is clear: when I read a news story, an essay, or a recipe, I’m reading it in a web browser. <a href="https://defector.com/how-many-angry-fellas-does-it-take-to-dislodge-cam-newtons-hat">The latest irreverent sports blog from Defector</a> may not always feel like a “document”, but that’s a totally valid way to look at it. Through that lens, we can see it as a document targeting <a href="https://en.wikipedia.org/wiki/Web_platform">the web platform</a> as its “output format” rather than something like PDF. The web platform is a complete mess, but (because) it benefits from billions of dollars of annual investment. You can use it to play videos, 3D games, analyze data, and, yes, read. Maybe things will change one day, but for the foreseeable future, it’s an inescapable conclusion that if we want to create digital technical documents that anyone actually uses, we need to target the web.</p> <p>More recently, I’ve come round to thinking that we should go a bit farther and try to start thinking of digital documents as <em>web applications</em>. This idea runs into a bit of an early roadblock because I’m not sure how exactly I would draw the line between a web application and a web … not-application. But it feels like a real and meaningful distinction. On the not-application side, we can have things like traditional <a href="https://www.latex2html.org/">latex2html</a> output: static HTML, CSS, asset files, and little or no JavaScript. On the other end of the spectrum, we have Jupyter, Google Maps, and all the rest. My intuition says that somewhere in the middle there’s a phase transition. Practically, it’s probably the point at which you start building your content using the web dev stack (NPM, bundlers, etc.) rather than hand-coding your HTML/CSS/JS as a bunch of static files.</p> <p>In this schema, it’s intuitive to want to put the things that we think of as “documents” on the not-app (let’s say “static content”) end of the spectrum, and not without reason. My problem with this is that you cut yourself off from a whole world of possibilities if you think of your document as an HTML file with some CSS and JS sprinkled on top. Instead of asking yourself “I can do <em>anything</em> that the web platform permits — what <em>should</em> I do?”, you end up asking yourself, “What <em>can</em> I do that won’t be too hard?”</p> <p>Consider search. One of the key benefits of having a digital document is that you can do full-text searches instead of having to consult an index. It’s great! And the longer your document is, the more valuable a search feature will be. But if your document isn’t <a href="https://clickhole.com/the-time-i-spent-on-a-commercial-whaling-ship-totally-c-1825124286/">one giant webpage</a>, the browser’s find-in-page feature isn’t going to suffice. So it’s reasonable, perhaps even incumbent, to implement a search UI. How are we going to show results? Preview snippets would be nice. So would complex queries. And history. Do we want to hand-code this all ourselves in low-level HTML/CSS/JS? The <a href="https://github.com/rust-lang/mdBook/blob/master/src/theme/searcher/searcher.js">mdBook</a> system used by a lot of Rust documentation does, and when I look over that code, I say to myself … yeah, this is why people adopt web development frameworks.</p> <p>The same considerations apply in other areas, like interactive figures and tables, navigation, embedded computation, or theming. I can imagine a lot of really cool features in these areas and I do <em>not</em> want to implement them all using the lowest level of tooling.</p> <p>Getting a bit more ambitious, we can also start envisioning how the structure of a document could become more dynamic. I was once at a conference session about Python documentation tools where people got into a big discussion about whether it was better to have each function documented on its own page, or to group chunks of related methods on one big page. <em>Why do we have to pick one?</em> If we think of our documentation as being a bunch of static files, anything else is a pain. But if we think of ourselves as building a web app, it’s not like there’s <em>less</em> work to do, but now we have a problem-solving vocabulary to bring to bear. OK, if we want to provide both options, what will the UX design be like? How to we need to structure the backend to implement that design?</p> <p>All this being said, digital documents are also <em>unusual</em> web applications. It is essential that documents are durable and can stand alone: if I deliver a document as a web app that becomes unusable if some third-party service goes down, I’ve done something wrong. They’re a perfect fit for the <a href="https://www.inkandswitch.com/local-first/">local-first</a> web design mentality.</p> <p>Likewise, when I describe a document as a “web app”, I don’t mean that it ought to have a million buttons, animations, and widgets everywhere. It’s reasonable to think of a document-as-app as having a “user interface”, but it’s one that should consist of almost all text, almost all of the time. The app-ness is about the sophistication of the tooling on the backend, not the apparent complexity of the frontend.</p> <p>The biggest challenge to this vision is undoubtedly archiving. I can save a PDF article as a file on disk with the confidence that my grandchildren will be able to read it if they want. Even if we ignore things like dependencies on third-party services, how can I possibly save a document-app with anything like the same level of confidence? At the most basic level, there seems to be no agreed-upon standard for archiving a website to disk — <a href="https://en.wikipedia.org/wiki/WARC_(file_format)">WARC</a> seems to be the leader, but to me it’s telling that none of the browsers have anything like a built-in “open WARC file” functionality. We can still work on <em>creating</em> state-of-the-art digital technical documents without solving this problem, but until that happens, the PDF isn’t going anywhere.</p> A Fun Python API Pattern for Filtering Fri, 23 Feb 2024 14:36:31 -0500 https://newton.cx/~peter/2024/fun-python-filtering-pattern/ https://newton.cx/~peter/2024/fun-python-filtering-pattern/ <p>The past few weeks, I’ve been working away on a new Python package called <a href="https://github.com/pkgw/daschlab">daschlab</a>, which will be the recommended analysis toolkit for <a href="https://dasch.cfa.harvard.edu/">DASCH</a> data. As part of this, I’ve come up with what feels like a nice solution to a problem that’s annoyed me several times in the past: how do we provide a nice user-friendly API for filtering and subsetting data collections?</p> <span id="continue-reading"></span> <p>To be more specific: one of the core data types in <a href="https://github.com/pkgw/daschlab">daschlab</a> is a <code>Lightcurve</code>, which is just a wide <a href="https://docs.astropy.org/en/stable/table/">Astropy Table</a> with a bunch of photometry data about a source. In order to do their analysis, users are going to want to extract subsets of the lightcurve based on any one of a variety of dimensions: the date of the measurement, the telescope that was used, the plate emulsion, quality flags, and on and on.</p> <p>Users are <em>also</em> going to want to do a lot of different things with those subsets. They might want to remove selected rows from the table. They might want to remove <em>un</em>-selected rows from the table. They might want to plot those rows. They might want to apply some kind of tag to those rows.</p> <p>The same consideration comes up with other kinds of data collections as well. In general, you want to support both a lot of different ways of subsetting, and a lot of different actions. But you also don't want your API to become too “indirect”: it’s a lot nicer for users to be able to type <code>lightcurve.do_something_concrete()</code> without needing to create a bunch of intermediate variables along the way, or call multiple functions:</p> <pre data-lang="python" style="background-color:#2b303b;color:#c0c5ce;" class="language-python "><code class="language-python" data-lang="python"><span>lc = </span><span style="color:#bf616a;">Lightcurve</span><span>(</span><span style="color:#d08770;">...</span><span>) </span><span>lc.</span><span style="color:#bf616a;">remove_nondetections</span><span>() </span><span style="color:#65737e;"># nice </span><span>lc.</span><span style="color:#bf616a;">remove</span><span>(lc.</span><span style="color:#bf616a;">nondetections</span><span>()) </span><span style="color:#65737e;"># gross, redundant </span><span>lc.</span><span style="color:#bf616a;">nondetections</span><span>().</span><span style="color:#bf616a;">remove</span><span>() </span><span style="color:#65737e;"># less redundant, still not as nice </span></code></pre> <p>But if we start providing functions like <code>lc.remove_nondetections()</code>, then we quickly reach a multiplicative explosion: the natural progression is to add <code>lc.keep_nondetections()</code>, <code>lc.count_nondetections()</code>, <code>lc.count_between_dates()</code>, and so on. That’s clearly not sustainable.</p> <p>The approach that I’ve devised looks like this:</p> <pre data-lang="python" style="background-color:#2b303b;color:#c0c5ce;" class="language-python "><code class="language-python" data-lang="python"><span>lc = </span><span style="color:#bf616a;">Lightcurve</span><span>(</span><span style="color:#d08770;">...</span><span>) </span><span>lc.drop.</span><span style="color:#bf616a;">nondetections</span><span>() </span><span>lc.count.</span><span style="color:#bf616a;">nondetections</span><span>() </span><span>lc.count.</span><span style="color:#bf616a;">beween_dates</span><span>(d1, d2) </span></code></pre> <p>This has a nice subject-verb-object structure and is super amenable to chaining:</p> <pre data-lang="python" style="background-color:#2b303b;color:#c0c5ce;" class="language-python "><code class="language-python" data-lang="python"><span>lc.keep_only.</span><span style="color:#bf616a;">between_dates</span><span>(d1, d2).count.</span><span style="color:#bf616a;">detections</span><span>() </span></code></pre> <p>This pattern is really easy to implement, too. The approach that I’ve taken is as follows. An action like <code>drop</code> is expressed as a property that’s a lightweight <code>Selector</code> object:</p> <pre data-lang="python" style="background-color:#2b303b;color:#c0c5ce;" class="language-python "><code class="language-python" data-lang="python"><span style="color:#b48ead;">class </span><span style="color:#ebcb8b;">Lightcurve</span><span style="color:#eff1f5;">(</span><span style="color:#a3be8c;">Table</span><span style="color:#eff1f5;">): </span><span> </span><span style="color:#65737e;"># [...] </span><span> </span><span> @</span><span style="color:#96b5b4;">property </span><span> </span><span style="color:#b48ead;">def </span><span style="color:#8fa1b3;">drop</span><span>(</span><span style="color:#bf616a;">self</span><span>) -&gt; Selector: </span><span> </span><span style="color:#b48ead;">return </span><span style="color:#bf616a;">Selector</span><span>(</span><span style="color:#bf616a;">self</span><span>, </span><span style="color:#bf616a;">self</span><span>._apply_drop) </span><span> </span><span> </span><span style="color:#b48ead;">def </span><span style="color:#8fa1b3;">_apply_drop</span><span>(</span><span style="color:#bf616a;">self</span><span>, </span><span style="color:#bf616a;">selection</span><span>: np.array) -&gt; Lightcurve: </span><span> </span><span style="color:#65737e;"># reality is a bit more complicated, but basically: </span><span> </span><span style="color:#b48ead;">return </span><span style="color:#bf616a;">self</span><span>[~selection] </span></code></pre> <p>(Here I’ve implemented the property as a getter method, but it could be set up in <code>__init__</code>.) The selector worries about different subsetting operations, but hands things back to the original object to actually take action:</p> <pre data-lang="python" style="background-color:#2b303b;color:#c0c5ce;" class="language-python "><code class="language-python" data-lang="python"><span style="color:#b48ead;">class </span><span style="color:#ebcb8b;">Selector</span><span style="color:#eff1f5;">: </span><span> </span><span style="color:#b48ead;">def </span><span style="color:#96b5b4;">__init__</span><span>(</span><span style="color:#bf616a;">self</span><span>, </span><span style="color:#bf616a;">parent</span><span>: Lightcurve, </span><span style="color:#bf616a;">action</span><span>: Function): </span><span> </span><span style="color:#bf616a;">self</span><span>._parent = parent </span><span> </span><span style="color:#bf616a;">self</span><span>._action = action </span><span> </span><span> </span><span style="color:#b48ead;">def </span><span style="color:#8fa1b3;">detections</span><span>(</span><span style="color:#bf616a;">self</span><span>, **</span><span style="color:#bf616a;">kwargs</span><span>): </span><span> selection = np.</span><span style="color:#bf616a;">isfinite</span><span>(</span><span style="color:#bf616a;">self</span><span>._parent[&quot;</span><span style="color:#a3be8c;">flux</span><span>&quot;]) </span><span> </span><span style="color:#b48ead;">return </span><span style="color:#bf616a;">self</span><span>.</span><span style="color:#bf616a;">_action</span><span>(selection, **kwargs) </span></code></pre> <p>The <code>selection</code> is always expressed as a boolean array. This is going to be inefficient sometimes, but establishing that invariant makes a lot of the surrounding logic <em>so</em> much simpler. By always accepting and forwarding <code>**kwargs</code> in the subsetting functions, we have a generic way to provide information to the action/application function. For instance:</p> <pre data-lang="python" style="background-color:#2b303b;color:#c0c5ce;" class="language-python "><code class="language-python" data-lang="python"><span style="color:#b48ead;">class </span><span style="color:#ebcb8b;">Lightcurve</span><span style="color:#eff1f5;">(</span><span style="color:#a3be8c;">Table</span><span style="color:#eff1f5;">): </span><span> @</span><span style="color:#96b5b4;">property </span><span> </span><span style="color:#b48ead;">def </span><span style="color:#8fa1b3;">tag</span><span>(</span><span style="color:#bf616a;">self</span><span>) -&gt; Selector: </span><span> </span><span style="color:#b48ead;">return </span><span style="color:#bf616a;">Selector</span><span>(</span><span style="color:#bf616a;">self</span><span>, </span><span style="color:#bf616a;">self</span><span>._apply_tag) </span><span> </span><span> </span><span style="color:#b48ead;">def </span><span style="color:#8fa1b3;">_apply_tag</span><span>(</span><span style="color:#bf616a;">self</span><span>, </span><span style="color:#bf616a;">selection</span><span>, </span><span style="color:#bf616a;">name</span><span>: str = </span><span style="color:#d08770;">None</span><span>): </span><span> </span><span style="color:#b48ead;">if </span><span>not name: </span><span> </span><span style="color:#b48ead;">raise </span><span style="color:#bf616a;">ValueError</span><span>(name) </span><span> </span><span> </span><span style="color:#bf616a;">self</span><span>[name] |= selection </span><span> </span><span style="color:#65737e;"># [...] </span><span> </span><span>lc </span><span> .keep_only.</span><span style="color:#bf616a;">between_dates</span><span>(</span><span style="color:#d08770;">1920</span><span>, </span><span style="color:#d08770;">1930</span><span>) </span><span> .tag.</span><span style="color:#bf616a;">detections</span><span>(</span><span style="color:#bf616a;">name</span><span>=&quot;</span><span style="color:#a3be8c;">roaring_twenties</span><span>&quot;) </span></code></pre> <p>I really like this pattern because it feels really extensible while maintaining a pleasing directness. Adding new actions and filters can be incredibly easy:</p> <pre data-lang="python" style="background-color:#2b303b;color:#c0c5ce;" class="language-python "><code class="language-python" data-lang="python"><span style="color:#b48ead;">class </span><span style="color:#ebcb8b;">Lightcurve</span><span style="color:#eff1f5;">(</span><span style="color:#a3be8c;">Table</span><span style="color:#eff1f5;">): </span><span> @</span><span style="color:#96b5b4;">property </span><span> </span><span style="color:#b48ead;">def </span><span style="color:#8fa1b3;">count</span><span>(</span><span style="color:#bf616a;">self</span><span>) -&gt; Selector: </span><span> </span><span style="color:#b48ead;">return </span><span style="color:#bf616a;">Selector</span><span>(</span><span style="color:#bf616a;">self</span><span>, </span><span style="color:#b48ead;">lambda </span><span style="color:#bf616a;">s</span><span>: s.</span><span style="color:#bf616a;">sum</span><span>()) </span><span> </span><span style="color:#b48ead;">class </span><span style="color:#ebcb8b;">Selector</span><span style="color:#eff1f5;">: </span><span> </span><span style="color:#b48ead;">def </span><span style="color:#8fa1b3;">brighter_than</span><span>(</span><span style="color:#bf616a;">self</span><span>, </span><span style="color:#bf616a;">limit</span><span>, **</span><span style="color:#bf616a;">kwargs</span><span>): </span><span> selection = </span><span style="color:#bf616a;">self</span><span>._parent[&quot;</span><span style="color:#a3be8c;">flux</span><span>&quot;] &gt; limit </span><span> </span><span style="color:#b48ead;">return </span><span style="color:#bf616a;">self</span><span>.</span><span style="color:#bf616a;">_action</span><span>(selection, **kwargs) </span></code></pre> <p>I dunno — this all feels kind of obvious, but I haven’t seen this pattern before (although I’m sure it’s not at all novel), and I had to sit down and think for a while before I thought of this design.</p> <p>You might note that we don’t have a great way to do boolean logic on subsets. A few extensions can make it possible, although not quite beautiful:</p> <pre data-lang="python" style="background-color:#2b303b;color:#c0c5ce;" class="language-python "><code class="language-python" data-lang="python"><span style="color:#b48ead;">class </span><span style="color:#ebcb8b;">Lightcurve</span><span style="color:#eff1f5;">(</span><span style="color:#a3be8c;">Table</span><span style="color:#eff1f5;">): </span><span> @</span><span style="color:#96b5b4;">property </span><span> </span><span style="color:#b48ead;">def </span><span style="color:#8fa1b3;">match</span><span>(</span><span style="color:#bf616a;">self</span><span>) -&gt; Selector: </span><span> </span><span style="color:#b48ead;">return </span><span style="color:#bf616a;">Selector</span><span>(</span><span style="color:#bf616a;">self</span><span>, </span><span style="color:#b48ead;">lambda </span><span style="color:#bf616a;">s</span><span>: s) </span><span> </span><span style="color:#b48ead;">class </span><span style="color:#ebcb8b;">Selector</span><span style="color:#eff1f5;">: </span><span> </span><span style="color:#b48ead;">def </span><span style="color:#8fa1b3;">where</span><span>(</span><span style="color:#bf616a;">self</span><span>, </span><span style="color:#bf616a;">selection</span><span>, **</span><span style="color:#bf616a;">kwargs</span><span>): </span><span> </span><span style="color:#b48ead;">return </span><span style="color:#bf616a;">self</span><span>.</span><span style="color:#bf616a;">_action</span><span>(selection, **kwargs) </span><span> </span><span style="color:#65737e;"># [...] </span><span> </span><span>lc.drop.</span><span style="color:#bf616a;">where</span><span>( </span><span> lc.match.</span><span style="color:#bf616a;">brighter_than</span><span>(</span><span style="color:#d08770;">100</span><span>) &amp; ~lc.match.</span><span style="color:#bf616a;">between_dates</span><span>(</span><span style="color:#d08770;">1900</span><span>, </span><span style="color:#d08770;">1905</span><span>) </span><span>) </span></code></pre> <p>I think the key to this pattern is something that I alluded to above — it covers the N×M space of subset-action combinations with what looks like a single function call, rather than two. We pull this off with two tricks. First, we express the desired actions as objects (initialized <code>Selector</code> instances), not functions, even though the latter choice is the “obvious” one. Second, by saying that selector methods will forward their keyword arguments back to the applicator function, one function call can straightforwardly provide any needed parameters to both “halves” of the operation: positional arguments to the subsetting piece, keyword arguments to the action piece.</p> Imaging an Extrasolar Radiation Belt Thu, 15 Feb 2024 10:19:02 -0500 https://newton.cx/~peter/2024/imaging-extrasolar-radiation-belt/ https://newton.cx/~peter/2024/imaging-extrasolar-radiation-belt/ <p>Time for some actual astrophysics! I want to point people to a result from last year that I’m really excited about: <a href="https://ui.adsabs.harvard.edu/abs/2023Natur.619..272K/abstract">Resolved imaging confirms a radiation belt around an ultracool dwarf</a>, by <a href="https://melodiekao.com/">Melodie Kao</a> and colleagues (<a href="https://doi.org/10.1038/s41586-023-06138-w">2023 Nature 617(7969) 272–275</a>).</p> <span id="continue-reading"></span> <p>One way to think about being a scholar is to ask: what ideas am I fighting for or against? A field like astronomy is made up of thousands of interconnected ideas about the world. The finitude of human experience being what it is, you can only engage with a small number of them in any deep, significant sense. The ideas that you engage with deeply aren’t <em>yours</em> — they don’t belong to any of us — but they do become part of your academic identity.</p> <p>As modern scholarship gets more and more specialized, the ideas that define our professional identities get narrower and narrower. Einstein was The Relativity Guy. Me, one of the ideas that I’ve tried to promote over the years is that we should think of magnetically active ultracool dwarfs as being scaled-up planets, magnetically speaking, rather than scaled-down stars. (Best citation for that: <a href="https://ui.adsabs.harvard.edu/abs/2018haex.bookE.171W/abstract">Williams 2018</a>.) It’s not the flashiest idea out there, for sure. But I think it’s a good, solid one that deserves to be more widely appreciated.</p> <p>An important line of evidence for this idea is that radio observations of ultracool dwarfs show a lot of features that can be fruitfully interpret as signatures of planetary <a href="https://en.wikipedia.org/wiki/Van_Allen_radiation_belt">radiation belts</a> (aka Van Allen belts). These belts are associated with intense, bursty, highly polarized maser emission, which we definitely observe in the ultracool dwarfs, and also host a more quiescent population of energetic electrons forming what I like to call a “radio donut”. A super neat paper by Bob Sault et al. (<a href="https://ui.adsabs.harvard.edu/abs/1997A%26A...324.1190S/abstract">1997 A&amp;A 324 1190–1196</a>) imaged Jupiter’s radio donut in 3D:</p> <div> <figure> <img src="sodl97_fig3.jpg" alt="Interferometric 3D reconstruction of Jupiter’s synchrotron radiation belt emission at 2 GHz (Sault et al., Figure 3, partial)."> <figcaption>Interferometric 3D reconstruction of Jupiter’s synchrotron radiation belt emission at 2 GHz (Sault et al., Figure 3, partial).</figcaption> </figure> </div> <p>If you look at Jupiter using a radio telescope without Bob’s fancy techniques, you see a 2D projection of the donut. It varies periodically with the planet’s rotation, in a way that’s suprisingly strong due to synchrotron beaming effects:</p> <div> <figure> <img src="vlamoviecropped.gif" alt="Jupiter&#x27;s radiation belts in 2D over time (NASA&#x2F;JPL — Caltech; &lt;a href=&quot;http:&#x2F;&#x2F;www.vofoundation.org&#x2F;blog&#x2F;nasas-juno-spacecraft&#x2F;&quot;&gt;source&lt;&#x2F;a&gt;)."> <figcaption>Jupiter's radiation belts in 2D over time (NASA/JPL — Caltech; <a href="http://www.vofoundation.org/blog/nasas-juno-spacecraft/">source</a>).</figcaption> </figure> </div> <p>I will assert without proof that there’s a ton of neat physics in these radiation belts, and as an astronomer it’s really cool to dig into this stuff and learn about all the fun things that planetary scientists have been doing in this field for decades. There’s also a promising opportunity the planetary scientists to profit from what we observe outside the solar system: Jupiter’s radiation belts are in many ways still quite mysterious, to the extent that there’s a NASA Heliophysics mission concept called <a href="https://ui.adsabs.harvard.edu/abs/2023BAAS...55c.067C/abstract">COMPASS</a> that envisions spending a billion-ish dollars to send a probe to understand them better. (I’m involved in its science definition team.)</p> <p>Melodie is also an advocate for the idea that ultracool dwarfs have planet-like magnetospheres that should host radiation belts. If this idea’s actually correct, then around these objects you should have radio donuts resembling the Jovian one. I would argue that time-series data certainly support that conclusion, but can we actually make an image of one?</p> <p>It turns out: yes!</p> <div> <figure> <img src="kmvs23_fig1.jpg" alt="Three epochs of VLBI imaging of LSR J1835+3259 at a spatial resolution of around 1 mas (Kao et al., Figure 1)."> <figcaption>Three epochs of VLBI imaging of LSR J1835+3259 at a spatial resolution of around 1 mas (Kao et al., Figure 1).</figcaption> </figure> </div> <p>Melodie and her colleagues observed one of the well-known radio-emitting ultracool dwarfs with the ultra-high-resolution <a href="https://public.nrao.edu/telescopes/vlba/">Very Long Baseline Array</a> in an inspired gambit to image any radiation belts that might be there. And it worked! The overall morphology is stable across three epochs of observations, suggesting a long-lived structure as one would hope to observe, and the physical characteristics that you can infer from the data (the separation of the two lobes, total luminosity, etc.) are all plausible.</p> <p>(I should mention that just after Melodie et al.’s paper came out, <a href="https://ui.adsabs.harvard.edu/abs/2023Sci...381.1120C/abstract">Climent et al.</a> published essentially the same result in <em>Science</em> (<a href="https://doi.org/10.1126/science.adg6635">2023 <em>Science</em> 381(6662) 1120–1124</a>; <a href="https://arxiv.org/abs/2303.06453">arXiv:2303.06453</a>). Presumably there’s a bit of a story there, but I’m not familiar with it. I know Melodie and her coauthors, and have talked with her about this work, so that’s where I’m focusing.)</p> <p>As the <a href="https://eventhorizontelescope.org/">EHT</a> folks can tell you, it can be a really big deal to be able to attach an image to an idea! My job responsibilities have kept me away from doing much in this field (or really, any kind of day-to-day astrophysical research) for a while, so it’s genuinely really cool to see other people pushing forward some of the ideas that I want to see succeed. The results of Kao et al. and Climent et al. are awesome steps forward, and hopefully harbingers of more great stuff to come!</p>