<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>PKGW</title>
    <link>https://newton.cx/~peter/</link>
    <description></description>
    <generator>Zola</generator>
    <language>en</language>
    <atom:link href="https://newton.cx/~peter/rss.xml" rel="self" type="application/rss+xml"/>
    <lastBuildDate>Mon, 16 Mar 2026 14:55:10 -0400</lastBuildDate><item>
      <title>One Good Tutorial at HPC Best Practices Webinar: Slides</title>
      <pubDate>Mon, 16 Mar 2026 14:55:10 -0400</pubDate>
      <link>https://newton.cx/~peter/2026/one-good-tutorial-hpc-best-practices/</link>
      <guid>https://newton.cx/~peter/2026/one-good-tutorial-hpc-best-practices/</guid>
      <description>&lt;p&gt;On Wednesday I’m going to present my scientific software documentation project,
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;onegoodtutorial.org&#x2F;&quot;&gt;One Good Tutorial&lt;&#x2F;a&gt;, as part of the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;ideas-productivity.org&#x2F;&quot;&gt;IDEAS&lt;&#x2F;a&gt; &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;ideas-productivity.org&#x2F;events&#x2F;hpcbp-097-onegoodtutorial&quot;&gt;HPC Best Practices webinar
series&lt;&#x2F;a&gt;. It’ll take place at 1 PM US Eastern time. &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.zoomgov.com&#x2F;meeting&#x2F;register&#x2F;wjQvbQjnQRiNqG9p6McZJw&quot;&gt;Register to
attend&lt;&#x2F;a&gt; if you’re able to watch it live, but you can always watch a
recording later if you’re not. My slides are attached to this post.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;&lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2026&#x2F;one-good-tutorial-hpc-best-practices&#x2F;.&#x2F;slides&#x2F;&quot;&gt;Here are the slides&lt;&#x2F;a&gt; if you want to check them out.&lt;&#x2F;p&gt;
&lt;p&gt;I can say that as I’ve been polishing and promoting the project, I’ve focused
more and more on the idea of a “minimum viable documentation product”. I think
it really captures the core idea that One Good Tutorial is trying to convey, and
hopefully it anchors that idea in a context that a lot of people are familiar
with. There is a bit of potential &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Mathieu_van_der_Poel&quot;&gt;acronym
confusion&lt;&#x2F;a&gt;, though.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;The work described in this post was supported by a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;bssw.io&#x2F;pages&#x2F;bssw-fellowship-program&quot;&gt;Better Scientific Software
Fellowship&lt;&#x2F;a&gt;&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
</description>
    </item><item>
      <title>One Good Tutorial: Beta Release!</title>
      <pubDate>Wed, 21 Jan 2026 22:18:48 -0500</pubDate>
      <link>https://newton.cx/~peter/2026/one-good-tutorial-beta/</link>
      <guid>https://newton.cx/~peter/2026/one-good-tutorial-beta/</guid>
      <description>&lt;p&gt;I’m delighted to announce that a beta-testing version of my scientific software
documentation resource, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;onegoodtutorial.org&#x2F;&quot;&gt;One Good Tutorial&lt;&#x2F;a&gt;, is ready for your feedback!
Check it out at &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;onegoodtutorial.org&#x2F;&quot;&gt;onegoodtutorial.org&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;&lt;em&gt;(You might have noticed that my blogging pace has fallen off a cliff. Having a
kid will do that to you! I aspire to ramp things back up, but it’ll probably be
a little while …)&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;One Good Tutorial is the guide that I’ve been developing over the past ~year
thanks to the support of &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2024&#x2F;bssw-fellowship&#x2F;&quot;&gt;a Better Scientific Software (BSSw)
Fellowship&lt;&#x2F;a&gt;. My proposal was all about helping
maintainers of small-to-medium scientific software projects create better
documentation with more confidence, and in &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2025&#x2F;one-good-tutorial-plan&#x2F;&quot;&gt;my last
update&lt;&#x2F;a&gt;, I laid out my first-draft vision for
what I would build.&lt;&#x2F;p&gt;
&lt;p&gt;Based on some initial work &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2025&#x2F;state-of-the-docs&#x2F;&quot;&gt;surveying&lt;&#x2F;a&gt; &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2025&#x2F;state-of-the-doc-tools&#x2F;&quot;&gt;the state
of the field&lt;&#x2F;a&gt;, I came to some decisions about
where I wanted to direct my focus. First, I wanted to hammer my target audience
over the head with the idea that there’s more to docs than API docs (hugely
influenced by &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;diataxis.fr&#x2F;&quot;&gt;Diátaxis&lt;&#x2F;a&gt; here). Second, I wanted try to help out people who
want to do a good job with their docs, but don’t really know how to get started.&lt;&#x2F;p&gt;
&lt;p&gt;My initial concept was a “checklist matrix” supported by more detailed guides.
The idea was that a checklist would help give a sense of accomplishment and
manageability: if your documentation includes everything mentioned in the
checklist, you’ve done a good job. The “matrix” component was an effort to guide
people into a process for planning out how they’d write all their docs, rather
than just asking them to sit down and start scribbling.&lt;&#x2F;p&gt;
&lt;p&gt;In &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;onegoodtutorial.org&#x2F;&quot;&gt;the beta version&lt;&#x2F;a&gt;, I tweaked this design slightly. There’s still a
checklist, front-and-center on &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;onegoodtutorial.org&#x2F;&quot;&gt;the landing page&lt;&#x2F;a&gt;. But rather than show
this checklist as a matrix, I lay out a structured approach in &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;onegoodtutorial.org&#x2F;playbook&#x2F;&quot;&gt;a
“playbook”&lt;&#x2F;a&gt;, a suggested workflow of about 20 steps that aims to guide
people from a (metaphorical) blank page to a filled-out checklist. As soon as I
created my first checklist matrix mockup for presentation at the
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;us-rse.org&#x2F;usrse25&#x2F;&quot;&gt;US-RSE&#x27;25&lt;&#x2F;a&gt; meeting last fall, I could tell that it
just had too many boxes, and I never found a way to fix that fundamental issue.
In comparison, I’m much happier with the playbook model. It provides a structure
for tackling the checklist, but avoids overcomplicating the checklist itself,
and I think the framing makes it clear that if you want to tackle the checklist
in some other fashion, that’s more than fine.&lt;&#x2F;p&gt;
&lt;p&gt;On the website I deliver the playbook as &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;onegoodtutorial.org&#x2F;playbook&#x2F;&quot;&gt;an HTML slideshow&lt;&#x2F;a&gt;, following in
the footsteps of &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;dasch.cfa.harvard.edu&#x2F;dr7&#x2F;introduction&#x2F;&quot;&gt;some of my work on
DASCH&lt;&#x2F;a&gt; (and, more broadly, a
hobbyhorse I’ve been riding &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2013&#x2F;09&#x2F;slides-for-scientific-talks-in-html&#x2F;&quot;&gt;for more than a
decade&lt;&#x2F;a&gt;). I’m getting more and
more enthusiastic about HTML slideshows as a very useful form-factor for
pedagogical, web-based documentation; I’m coming around to believing that
they’re the best way to avoid the “wall of text” fatigue that arises so easily
when reading lengthy material on a screen. I hope other people actually agree!&lt;&#x2F;p&gt;
&lt;p&gt;One thing that didn’t change from my initial plan is that in addition to the
checklist and playbook I do indeed have &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;onegoodtutorial.org&#x2F;in-depth&#x2F;&quot;&gt;a series of “in-depth guides”&lt;&#x2F;a&gt;
offering advice about how to prepare different elements of the documentation
checklist. Some of them aren’t really about writing; two elements on my
checklist are &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;onegoodtutorial.org&#x2F;in-depth&#x2F;software-citation&#x2F;&quot;&gt;citation
information&lt;&#x2F;a&gt; and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;onegoodtutorial.org&#x2F;in-depth&#x2F;licensing-statements&#x2F;&quot;&gt;a
licensing
statement&lt;&#x2F;a&gt;, both of
which are the kind of thing that may only take up a few sentences in one’s final
documentation, but may require a great deal of prep work in order to be able to
write those two sentences. These guides will probably need tweaking here and
there but I’m very happy with how they’ve turned out so far, and I’d like to
think that they might become generally useful resources going forward.&lt;&#x2F;p&gt;
&lt;p&gt;I certainly have ideas about how to make One Good Tutorial even better, but I
can tell that it’s ready to be sent out into the world for beta-testing. So,
here we are! I’ll be reaching out to some folks individually, but if you’re
reading this and and you have some experience with documenting scientific
software, I’d love some feedback. You can get in touch via &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pkgw&#x2F;onegoodtutorial&#x2F;&quot;&gt;the One Good
Tutorial repository&lt;&#x2F;a&gt; on GitHub or
by &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;about-me&#x2F;#contact&quot;&gt;contacting me directly&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Over the last few months of my fellowship I’ll be polishing the website and
working to get the word out. I’m quite proud of how this project has come
together so far and I hope you find it useful! Check out the One Good Tutorial
materials at &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;onegoodtutorial.org&#x2F;&quot;&gt;onegoodtutorial.org&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;The work described in this post was supported by a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;bssw.io&#x2F;pages&#x2F;bssw-fellowship-program&quot;&gt;Better Scientific Software
Fellowship&lt;&#x2F;a&gt;&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
</description>
    </item><item>
      <title>One Good Tutorial: The Plan</title>
      <pubDate>Fri, 29 Aug 2025 14:02:11 -0400</pubDate>
      <link>https://newton.cx/~peter/2025/one-good-tutorial-plan/</link>
      <guid>https://newton.cx/~peter/2025/one-good-tutorial-plan/</guid>
      <description>&lt;p&gt;The past few posts have been about prep work for &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2024&#x2F;bssw-fellowship&#x2F;&quot;&gt;my BSSw
project&lt;&#x2F;a&gt;: &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2025&#x2F;state-of-the-docs&#x2F;&quot;&gt;interviews&lt;&#x2F;a&gt;
and &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2025&#x2F;state-of-the-doc-tools&#x2F;&quot;&gt;a survey of tools&lt;&#x2F;a&gt;. After all this
throat-clearing, I’m ready to sketch out the resource that I’m actually planning
to create!&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;I plan to call it &lt;strong&gt;One Good Tutorial&lt;&#x2F;strong&gt;. The target audience will be, of course,
developers of small-to-medium scientific software projects. The centerpiece of
the project will be a checklist: complete these actions, and you can sleep easy
knowing that you’ve documented your project adequately.&lt;&#x2F;p&gt;
&lt;p&gt;One thing that I absolutely want to beat people over the head with is the point
that &lt;em&gt;there is more to documentation than API docs&lt;&#x2F;em&gt;. This is the big idea behind
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;diataxis.fr&#x2F;&quot;&gt;Diátaxis&lt;&#x2F;a&gt;, of course, and I’m completely sold on it; and I also believe
that it’s an idea that many scientific software developers need to be exposed
to. I think this is such a big deal that, well, I let it drive the whole project
branding. In particular, I believe that the most important thing that &lt;em&gt;most&lt;&#x2F;em&gt;
projects lack is introductory, “getting started”-type material. So: &lt;em&gt;if your
docs have One Good Tutorial, you’ve done your job.&lt;&#x2F;em&gt; If the only thing that
people retain from being exposed to my project is those three words, I’ll be
happy.&lt;&#x2F;p&gt;
&lt;p&gt;I also believe that many scientific software developers don’t feel confident
about how they should approach documentation in general. This is, well, totally
reasonable: technical writing and information architecture are whole fields of
human endeavor, and we’re generally approaching them with no training or
support. Realistically, that’s not going to change: the goal is &lt;em&gt;not&lt;&#x2F;em&gt; to train
people to become expert technical writers. But I think it will make a real
difference if we can help scientific software developers feel like they’re not
quite so at sea. Hence the checklist format. I’m hopeful that a checklist will
work well to provide both tangible instructions and a rewarding sense of
clarity: “OK, I did everything they said I should — gold star for me!”&lt;&#x2F;p&gt;
&lt;p&gt;I also think that such a checklist will fill an unoccupied niche in this space.
The &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.thegooddocsproject.dev&#x2F;&quot;&gt;Good Docs Project&lt;&#x2F;a&gt; provides templates for authoring specific
documents, but doesn’t quite provide the holistic work plan that I think a
checklist will offer. The &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.writethedocs.org&#x2F;guide&#x2F;&quot;&gt;Write the Docs Guide&lt;&#x2F;a&gt; has a lot of resources
and guidance but, once again, doesn’t quite meet the needs of someone saying,
“Just tell me what to do!”&lt;&#x2F;p&gt;
&lt;p&gt;Another nice aspect of the checklist format, I think, is that it leads to a
natural structuring of the resource materials. The core artifact is, of course,
the checklist itself, which I’d expect to deliver as both HTML and a PDF
one-pager. Then, for each item on the checklist, there will be an associated
webpage with deeper explanation, references, and examples. In certain cases this
page might be quite short, but in other cases, it could get fairly extensive.
Contrast this with “cookbook” or “recipe” format, which tend to be structured
more like prose text — which means that the length keeps on increasing as you
think of little details or clarifications to throw in. The recipe format also
implies that the steps should be followed in strict order, whereas checklists
allow for some level of skipping around. I think that’s a good thing in this
case.&lt;&#x2F;p&gt;
&lt;p&gt;More specifically, I’m currently envisioning what you might call a “checklist
matrix”. The checklist items will mostly correspond to important pieces of
documentation that must exist: &lt;strong&gt;Tutorial&lt;&#x2F;strong&gt; (of course!), &lt;strong&gt;Citation
Information&lt;&#x2F;strong&gt;, &lt;strong&gt;Installation Instructions&lt;&#x2F;strong&gt;, and so on. These are the rows of
the checklist matrix.&lt;&#x2F;p&gt;
&lt;p&gt;But I’m also envisioning four columns corresponding to four phases that I will
encourage people to work through: &lt;strong&gt;Plan&lt;&#x2F;strong&gt;, &lt;strong&gt;Draft&lt;&#x2F;strong&gt;, &lt;strong&gt;Assess&lt;&#x2F;strong&gt;, and
&lt;strong&gt;Revise&lt;&#x2F;strong&gt;. The basic guidance would be to go through these phases in order:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Take some time to think about your plan for all of the different components
of your documentation. Consider creating a Google Doc, or something similar,
to hold notes about your plans.&lt;&#x2F;li&gt;
&lt;li&gt;Actually draft the materials, and do the initial setup of whatever tools
you’re going to need to get your docs published.&lt;&#x2F;li&gt;
&lt;li&gt;Assess the complete first draft. Did the process of drafting reveal any
problems that need to be fixed?&lt;&#x2F;li&gt;
&lt;li&gt;Revise. Self-explanatory.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;I pointedly do &lt;em&gt;not&lt;&#x2F;em&gt; include a “publish” phase, because I think that encourages
people to think of the docs as a one-time project: “I wrote them and published
them, and now they’re done.” I think it’s important to approach the both the
code and the docs as things that are never quite &lt;em&gt;done&lt;&#x2F;em&gt;, which to me means
having a mindset oriented around “making releases”, rather than “publishing”.&lt;&#x2F;p&gt;
&lt;p&gt;Here’s my first draft of the rows for the checklist matrix:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Synopsis.&lt;&#x2F;strong&gt; A few sentences summarizing the software. Good to do first
because it helps you keep the big picture in mind, and it’s likely to be
copied around in READMEs, website landing pages, package descriptions, etc.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Personas.&lt;&#x2F;strong&gt; I want to encourage people to take a few minutes to imagine user
personas for their documentation: who’s going to be reading the docs? I hope
this will be genuinely helpful for people as they’re thinking about docs … and
I don’t hate the idea of sneaking in an idea that they might be able to apply
much more broadly, too. A row like this one would have checkboxes for the Plan
and Assess phases, but not Draft or Revise, since it doesn’t explicitly appear
in the documentation product.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Tutorial.&lt;&#x2F;strong&gt; This had better come early! I hope that suggesting that people
sit down and plan their tutorial before actually putting (virtual) pen to
(virtual) paper will help them think through bigger-picture issues: “oh, the
user is going to need to have access to some 10-gigabyte data file, so I had
better come up with a way to distribute it”. If people just plunge straight
into writing the tutorial (perhaps after burning out developing the code),
that’s the kind of problem that gets left unresolved.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Installation Instructions.&lt;&#x2F;strong&gt; Just one of those things that you need to have.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Citation Instructions.&lt;&#x2F;strong&gt; This is one of those sections that is unlikely to
take up many words in published documentation, but I really want to make sure
that people sit down and think about how they want to approach it. It would be
beyond the scope of One Good Tutorial to tell people what approach to citation
to adopt, but the supporting materials can point them to resources on the
topic.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;License.&lt;&#x2F;strong&gt; This is similar to citation instructions: a lot of people just
don’t even think about this issue.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;How to Contribute.&lt;&#x2F;strong&gt; Another item that I find is often overlooked. This is
the sort of thing where I can offer people boilerplate language to use. This
is where I would suggest that larger projects think about adopting a Code of
Conduct too.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;API Reference.&lt;&#x2F;strong&gt; I want to make a point of putting this really far down in
the list … but yeah, this is important for software.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Other Reference Materials.&lt;&#x2F;strong&gt; I’m not quite sure how to name or describe this
element, but in scientific software, there’s often some kind of theory
underlying the software, and it’s really important to document it precisely.
In many cases, the documentation here might basically consist of a reference
to a formally-published journal article.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Acknowledgments.&lt;&#x2F;strong&gt; Don’t forget to thank your funders! I will also ask
people to acknowledge One Good Template if they have found it useful.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Authoring Tools.&lt;&#x2F;strong&gt; Once someone has gone down the list and thought about all
of the above materials, &lt;em&gt;now&lt;&#x2F;em&gt; it’s the time to think about: what tools are we
going to need to create this documentation? For some projects, you could
absolutely cover every item above in a single &lt;code&gt;README.md&lt;&#x2F;code&gt; on GitHub; for
others, you want to think about whether Sphinx (etc.) will suffice, or whether
you might need to adopt a combination tools. The Draft phase is where you
would actually start wiring these tools into your workflows such as CI.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Release Processes.&lt;&#x2F;strong&gt; Once you’ve thought about what your docs are going to
look like and what tools you’ll use to write them … how specifically are they
going to make it out into the world? As mentioned above, here I want to
encourage people to think of doc publication as an ongoing process, possibly
one that’s integrated with the software release process.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Each of these items will have a corresponding article on the One Good Tutorial
website, providing advice on how to approach the item in each of the four
phases. In at least some cases, these will branch out into more specific how-to
pages. For the Authoring Tools and Release Processes items, this is where I will
provide specific tool recommendations and step-by-step tutorials on how to
handle common scenarios (e.g., using Sphinx and ReadTheDocs to document a
pure-Python package; depositing your software to Zenodo). There should be ample
opportunity to refer people to existing resources like the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.thegooddocsproject.dev&#x2F;&quot;&gt;Good Docs
Project&lt;&#x2F;a&gt; templates. Time permitting, I could also see myself adding
supporting “explainers” giving information about, say, the topic of software
citation in general.&lt;&#x2F;p&gt;
&lt;p&gt;It will probably also make sense to have a section that I would describe as
“Extra Credit”. This would mostly be aimed at slightly larger projects,
addressing topics like codes of conduct, organizing multiple tutorials, how-tos,
social sites like StackExchange, and so on. There’s no shortage of material that
could be written here, but I expect that these topics will be out of scope for
most of the developers that I would want to visit One Good Tutorial. And in the
end, I’m aiming to reach &lt;em&gt;people&lt;&#x2F;em&gt; rather than projects, so I’d rather be
relevant to lots of people working on smaller efforts, even if the bulk of the
documentation-reading and -writing that happens might be concentrated on a small
number of high-profile pieces of software.&lt;&#x2F;p&gt;
&lt;p&gt;I’m feeling pretty good about this plan, so I’ve gone ahead and registered
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;onegoodtutorial.org&#x2F;&quot;&gt;onegoodtutorial.org&lt;&#x2F;a&gt;, set up a GitHub repo
(&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pkgw&#x2F;onegoodtutorial&quot;&gt;pkgw&#x2F;onegoodtutorial&lt;&#x2F;a&gt;), and wired up
a static site built with &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;getzola.org&#x2F;&quot;&gt;Zola&lt;&#x2F;a&gt; and hosted via &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;pages.github.com&#x2F;&quot;&gt;GitHub
Pages&lt;&#x2F;a&gt;, with deployment automated using GitHub
Actions. I am pretty sure that a basic static site generator will suffice for
setting up the OGT website; if I run into limitations, it will be easy to
rebuild it to use different infrastructure instead.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;The work described in this post was supported by a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;bssw.io&#x2F;pages&#x2F;bssw-fellowship-program&quot;&gt;Better Scientific Software
Fellowship&lt;&#x2F;a&gt;&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
</description>
    </item><item>
      <title>The State of the Doc Tools</title>
      <pubDate>Thu, 28 Aug 2025 15:13:21 -0400</pubDate>
      <link>https://newton.cx/~peter/2025/state-of-the-doc-tools/</link>
      <guid>https://newton.cx/~peter/2025/state-of-the-doc-tools/</guid>
      <description>&lt;p&gt;Last week I &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2025&#x2F;state-of-the-docs&#x2F;&quot;&gt;presented&lt;&#x2F;a&gt; some of my takeaways
stemming from a small set of interviews with researchers about scientific
software documentation. This week, I’m reporting out the results of a survey and
review of tools for documenting research software: &lt;strong&gt;The State of The Doc
Tools&lt;&#x2F;strong&gt;, 2025.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;An early, big-picture finding: as a result of both my interviews and my survey,
I’ve concluded that it’s really important to construe the words “documentation”
and “tools” expansively. That first word, “documentation”, might conjure up an
image of a hefty spiral-bound manual. But that’s just one form-factor out of
many possibilities. I think that more than ever, it should be emphasized that
documentation can live in some surprising places. YouTube videos?
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=HTXScEOwze8&quot;&gt;Sure&lt;&#x2F;a&gt;! StackExchange Q&amp;amp;As?
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;unix.stackexchange.com&#x2F;questions&#x2F;744675&#x2F;resizing-an-lvm-storage-repository-in-xcp-ng&quot;&gt;Indeed&lt;&#x2F;a&gt;.
Discord chat groups? &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;shkspr.mobi&#x2F;blog&#x2F;2023&#x2F;07&#x2F;discord-is-not-documentation&#x2F;&quot;&gt;Empirically,
yes&lt;&#x2F;a&gt;. Zines?
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;wizardzines.com&#x2F;&quot;&gt;Why not&lt;&#x2F;a&gt;? Custom-trained LLMs? I wouldn’t be
surprised if they become standard for large projects, soon.&lt;&#x2F;p&gt;
&lt;p&gt;That’s not to suggest that one ought to try to occupy all of these spaces —
especially at the scale of a typical scientific software project! — but I think
it’s really important to think very openly about where your documentation could
live. And perhaps where it &lt;em&gt;does&lt;&#x2F;em&gt; live, in practice. Maybe a hefty book is right
for your project, but maybe not.&lt;&#x2F;p&gt;
&lt;p&gt;The corollary of this, however, is that it’s impossible to survey documentation
tools in a truly comprehensive and systematic way. So, I won’t try to present an
exhaustive list of software that one could use.&lt;&#x2F;p&gt;
&lt;p&gt;But that’s not all. I also wrote that I think it’s important to construe the
word “tools” expansively. Sure, there are things that everyone would agree fits
that definition: &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.sphinx-doc.org&#x2F;en&#x2F;master&#x2F;&quot;&gt;Sphinx&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;rust-lang.github.io&#x2F;mdBook&#x2F;&quot;&gt;mdBook&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;. But what about &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;diataxis.fr&#x2F;&quot;&gt;Diátaxis&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;, the
“four kinds of docs” paradigm &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2023&#x2F;divio-documentation-system&#x2F;&quot;&gt;that I’ve written about before&lt;&#x2F;a&gt;? If we
think of a “tool” as “something that helps us accomplish a goal”, then I would
say that Diátaxis absolutely fits that definition. And that is indeed what I’m
saying!&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2025&#x2F;state-of-the-docs&#x2F;&quot;&gt;Last week&lt;&#x2F;a&gt; I observed that, despite valiant efforts to build one, there
is not much of a “community of practice” for many of the people who are
documenting scientific software. I believe that the lack of this community,
while it definitely doesn’t do us any favors when it comes to tangible tools
like Sphinx or mdBook, is even &lt;em&gt;more&lt;&#x2F;em&gt; damaging when it comes to intangible,
intellectual tools like Diátaxis. The key difference is that intangible tools
tend to remain nebulous, un-named, un-documented, and therefore more difficult
to research on your own. I keep on coming back to the Diátaxis example, after
all, because Daniele Procida has turned what could have been a somewhat vague
mental model into a tangible, referenceable thing! (We see a bit of this now in
how everyone falls over themselves to brand &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.heartbleed.com&#x2F;&quot;&gt;their security vulnerability
discoveries&lt;&#x2F;a&gt;.) If we had stronger communities of
practice, the intellectual tools would spread more easily. In the meantime, more
people should follow Daniele’s lead.&lt;&#x2F;p&gt;
&lt;p&gt;All that being said, even after doing a fair amount of digging I’ve been unable
to unearth many other “intangible tools” to include in my survey. There is the
&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;informationmapping.com&#x2F;&quot;&gt;Information Mapping&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; methodology, which might be useful, but I’m not
going to find out because it’s a proprietary system (!) — you have to pay to be
taught the model. For both practical and ethical reasons, I consider this to be
a total non-starter.&lt;&#x2F;p&gt;
&lt;p&gt;The kinds of intellectual tools that I’m looking for would, for the most part,
probably be found in the field known as &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Information_architecture&quot;&gt;Information Architecture&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; (IA).
There are techniques associated with the IA field like &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Card_sorting&quot;&gt;card sorting&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; and
&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Tree_testing&quot;&gt;tree testing&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; that could be useful tools in the software documentation
toolbox to sit alongside Diátaxis. I’ve found the website of an outfit called
the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.nngroup.com&#x2F;&quot;&gt;Nielsen Norman Group&lt;&#x2F;a&gt; to be surprisingly useful in learning about some
of these concepts. While NNG is your typical corporate consultancy trying to
make a buck off of you, I’ve found their materials to be &lt;em&gt;much&lt;&#x2F;em&gt; more useful than
the usual content-marketing drivel that you find on such sites. I’ve also bought
a few IA books (including the aptly-named &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;search.worldcat.org&#x2F;title&#x2F;86110226&quot;&gt;&lt;em&gt;Information Architecture&lt;&#x2F;em&gt;&lt;&#x2F;a&gt;)
and will see if any of them seem worth recommending to non-specialists.&lt;&#x2F;p&gt;
&lt;p&gt;Turning back towards the traditionally-recognized software documentation tools,
we can return to &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.sphinx-doc.org&#x2F;en&#x2F;master&#x2F;&quot;&gt;Sphinx&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; as a starting point. One direction to go is to
consider comparable tools aimed at other programming systems (granting that
Sphinx is not actually Python-specific), which of course are numerous:
&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;tsdoc.org&#x2F;&quot;&gt;tsdoc&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; for TypeScript, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;doc.rust-lang.org&#x2F;rustdoc&#x2F;what-is-rustdoc.html&quot;&gt;rustdoc&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; for Rust, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;doxygen.nl&#x2F;&quot;&gt;Doxygen&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;go.dev&#x2F;blog&#x2F;godoc&quot;&gt;godoc&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;,
&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;swagger.io&#x2F;&quot;&gt;Swagger&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; for web APIs, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;ruby.github.io&#x2F;rdoc&#x2F;&quot;&gt;rdoc&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;, and on and on and on and on. It’s
understandable that most modern programming languages deliver integrated
documentation systems; each language has its own distinctive semantics that need
to be captured in its API documentation framework. It’s also understandable that
developers may be naturally inclined to try to author &lt;em&gt;all&lt;&#x2F;em&gt; of their
documentation using these tools. But it’s worth pointing out explicitly that
there’s no particular reason that a tool designed to document APIs in a certain
language will be very good, or even adequate, for authoring general-purpose,
non-API documentation.&lt;&#x2F;p&gt;
&lt;p&gt;A very good reason to try to stick with language-specific documentation tools,
though, is that modern ones can pair with public services that allow you to host
documentation online for free. &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;readthedocs.io&#x2F;&quot;&gt;ReadTheDocs&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; is the standard for Python,
&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.rs&#x2F;&quot;&gt;docs.rs&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; for Rust, and in a certain sense, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.ctan.org&#x2F;&quot;&gt;CTAN&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; for LaTeX are
examples. If there’s one thing I’ve come to appreciate over the years, it’s that
to first order nobody wants to host their own content, and in my experience
services like these have been transformative for how scientific software is
documented. That is, the universe of documentation tools to consider includes
not just authoring tools, but also publishing platforms.&lt;&#x2F;p&gt;
&lt;p&gt;Continuing along that thread, there are &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;myles&#x2F;awesome-static-generators&quot;&gt;&lt;em&gt;huge&lt;&#x2F;em&gt; numbers&lt;&#x2F;a&gt; of static site
generators out there that can (or could) be used to generate software
documentation websites, as well as “* Pages” hosting services to host such sites
for free. A random sub-sample from the former group: &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;jekyllrb.com&#x2F;&quot;&gt;Jekyll&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;gohugo.io&#x2F;&quot;&gt;Hugo&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;,
&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.gatsbyjs.com&#x2F;&quot;&gt;Gatsby&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;; I’m partial to &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;getzola.org&#x2F;&quot;&gt;Zola&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; and have used it for docs;
&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docusaurus.io&#x2F;&quot;&gt;Docusaurus&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.mkdocs.org&#x2F;&quot;&gt;MkDocs&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;; &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;rust-lang.github.io&#x2F;mdBook&#x2F;&quot;&gt;mdBook&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.gitbook.com&#x2F;&quot;&gt;GitBook&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;. In the latter
group: &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;pages.github.com&#x2F;&quot;&gt;GitHub Pages&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.github.com&#x2F;en&#x2F;pages&quot;&gt;GitLab Pages&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.netlify.com&#x2F;&quot;&gt;Netlify&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;; &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;readthedocs.io&#x2F;&quot;&gt;ReadTheDocs&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;
can effectively act as a static pages host; and all of the cloud providers make
this drop-dead easy. This is a completely saturated market.&lt;&#x2F;p&gt;
&lt;p&gt;A market that’s a bit &lt;em&gt;less&lt;&#x2F;em&gt; saturated, somewhat to my surprise, is the one for
wikis. This is probably because wiki software needs to implement both authoring
and publishing (i.e., hosting) features, unlike the static-site case where
there’s a clean separation between the two halves. But more and more I’ve come
to feel that the basic wiki paradigm is a strong one — it works pretty well for
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;&quot;&gt;one of the top ten sites on the internet&lt;&#x2F;a&gt; — and that it would be a great
fit for a lot of use cases in the software-documentation. But for some reason
wiki software tends to have a 90’s feel, and a 90’s look as well. I’m becoming
more and more convinced that there’s a lot of untapped opportunity in this
space. Anyway, some of the main wiki tools are: &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.mediawiki.org&#x2F;&quot;&gt;MediaWiki&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.github.com&#x2F;en&#x2F;communities&#x2F;documenting-your-project-with-wikis&#x2F;about-wikis&quot;&gt;GitHub’s
wikis&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.dokuwiki.org&#x2F;&quot;&gt;DokuWiki&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;moinmo.in&#x2F;&quot;&gt;MoinMoin&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;; you could put &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.atlassian.com&#x2F;software&#x2F;confluence&quot;&gt;Confluence&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; into
this box too.&lt;&#x2F;p&gt;
&lt;p&gt;There are other platforms that, integrate authoring and publishing like wikis,
but have modernized WYSIWYG styles. &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;curvenote.com&#x2F;&quot;&gt;Curvenote&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; (a commercial product) is
specifically aimed at scientific authoring. &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;pubpub.org&#x2F;&quot;&gt;PubPub&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; is a unique open-source
platform aimed at a more general academic audience (combining Google-Docs-like
collaborative editing with features like DOI minting and citations), but sadly
the project’s very existence is under threat due to the withdrawal of a major
funder. PubPub is the only substantial noncommercial player in this space that
I’m aware of, so I really hope something the worst doesn’t come to pass. (In
case it’s not obvious, I’m focusing almost exclusively on open-source and
noncommercial tools here; beyond generalized academic cheapness, I believe that
there are profound reasons to prefer them in this domain.) I’m a bit suprised
that &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.overleaf.com&#x2F;&quot;&gt;Overleaf&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;, the online collaborative LaTeX editor, hasn’t gotten into
the publishing business, but as far as I can tell they haven’t yet.&lt;&#x2F;p&gt;
&lt;p&gt;If you want to go a little bit farther, you can think of mainstream social media
platforms as being on the same continuum. It’s not incorrect to describe
&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;&quot;&gt;YouTube&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; as a platform for creating and publishing content, after all, and
one can imagine scientific software projects where it would not be an
unreasonable place to host (nontextual) documentation. Likewise, the bulk of the
documentation regarding some software projects probably lives on
&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;stackexchange.com&#x2F;&quot;&gt;StackExchange&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; sites, in the form of answers to user questions. This line
of thought brings us to publicly-visible “forum”-type systems (&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.discourse.org&#x2F;&quot;&gt;Discourse&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;,
&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.phpbb.com&#x2F;&quot;&gt;phpBB&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;) and more-synchronous “chat”-type platforms (&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;discord.com&#x2F;&quot;&gt;Discord&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;,
&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;gitter.im&#x2F;&quot;&gt;Gitter&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;zulip.com&#x2F;&quot;&gt;Zulip&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;, &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;slack.com&#x2F;&quot;&gt;Slack&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt;) which are usually less- or non-public.
These platforms are not generally what people think of when they think of
“documentation”, but I reiterate that it can very much happen that they end up
hosting a significant or even dominant portion of the recorded information about
a software project.&lt;&#x2F;p&gt;
&lt;p&gt;Finally, we can narrow our focus to the tools used to produce individual
documents. In the context of our touchstone Sphinx, this takes us to the
underlying markup options like &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docutils.sourceforge.io&#x2F;rst.html&quot;&gt;reStructuredText&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; and the hugely popular
family of &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Markdown&quot;&gt;Markdown&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; syntaxes (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;commonmark.org&#x2F;&quot;&gt;less well-defined than you would
hope&lt;&#x2F;a&gt;), especially the relatively new entrant &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;mystmd.org&#x2F;&quot;&gt;MyST&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; which emphasizes
technical applications. My perennial favorite
&lt;strong&gt;&lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2024&#x2F;what-tex-gets-right&#x2F;&quot;&gt;TeX&#x2F;LaTeX&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; is relevant, but for the vast
majority of users it only targets PDF output, and you’ll note that virtually
everything that I’ve discussed revolves around HTML and the web browser. (To me,
this is &lt;em&gt;the&lt;&#x2F;em&gt; fundamental issue holding TeX back in the 21st century.) You can
think of the &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;nbformat.readthedocs.io&#x2F;&quot;&gt;Jupyter notebook&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; as a document file format (one that happens
to come with a standard WYSIWYG editor), in which case &lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;nbconvert.readthedocs.io&#x2F;&quot;&gt;nbconvert&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; joins
&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;pandoc.org&#x2F;&quot;&gt;pandoc&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; in the category of tools that connect the low-level document file
formats to higher-level systems like Sphinx and mkdocs.&lt;&#x2F;p&gt;
&lt;p&gt;Whew! I could keep going, too, but I &lt;em&gt;think&lt;&#x2F;em&gt; I’ve managed to touch on the major
categories of tools that go into producing software documentation.&lt;&#x2F;p&gt;
&lt;p&gt;Synthesizing all of the above, I think it might be not be unreasonable to talk
about documentation as being produced within &lt;em&gt;documentation systems&lt;&#x2F;em&gt;. Here I’m
referring only to the technical implementation, not conceptual tools like
Diátaxis. I’ll claim that these technical systems include four kinds of
technologies:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Low-level file formats of individual documents&lt;&#x2F;li&gt;
&lt;li&gt;Tools for authoring individual documents&lt;&#x2F;li&gt;
&lt;li&gt;Tools for assembling documents into structured collections&lt;&#x2F;li&gt;
&lt;li&gt;Tools for publishing (i.e., hosting) such collections&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Some tools blur the boundaries between these layers, and you could probably
write a whole PhD thesis on the definition of the word “document”, but I’m going
to claim that if you look at anything that you can call “documentation” with a
straight face, you’ll be able to meaningfully isolate the technologies that are
used for each of these four layers.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;The work described in this post was supported by a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;bssw.io&#x2F;pages&#x2F;bssw-fellowship-program&quot;&gt;Better Scientific Software
Fellowship&lt;&#x2F;a&gt;&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
</description>
    </item><item>
      <title>The State of the Docs</title>
      <pubDate>Fri, 22 Aug 2025 13:04:45 -0400</pubDate>
      <link>https://newton.cx/~peter/2025/state-of-the-docs/</link>
      <guid>https://newton.cx/~peter/2025/state-of-the-docs/</guid>
      <description>&lt;p&gt;As part of my &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2024&#x2F;bssw-fellowship&#x2F;&quot;&gt;BSSw Fellowship&lt;&#x2F;a&gt; project on scientific
software documentation, I’ve conducted free-form interviews with some of my
colleagues to try to learn a little about how working scientists are approaching
research software documentation in the year 2025. In this post I’ll report my
findings.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;But first, some disclaimers. I’m describing an &lt;em&gt;extremely&lt;&#x2F;em&gt; modest, qualitative
undertaking. While I very much enjoyed my conversations, and so have plans to
interview a few more people (reach out if you’d like to volunteer!), at the
moment my sample size is &lt;em&gt;n = 4&lt;&#x2F;em&gt;. Even if that sample size were a lot bigger,
I’m not doing structured interviews here (nor am I trained to do so), so this is
far from rigorous in any sense of the word. I’ve also tried to pursue a group of
interviewees that’s relatively diverse, but … again, we’re talking about four
people here. Three of them are astronomers, and of course the set of people
willing to sit down and talk to me is a biased subsample of the universe of
people working with research software. All this being said, I suspect that the
patterns I’m noticing are likely to generalize.&lt;&#x2F;p&gt;
&lt;p&gt;If I had to choose one word to summarize how my interviewees felt about
scientific software documentation in general, I’d go with &lt;em&gt;dissatisfaction&lt;&#x2F;em&gt;. I
think it&#x27;s fair to say that everyone I talked to has a real appreciation for
good documentation; but no small amount of that appreciation stems from its
rarity. Good documentation is hard to find. Part of the reason for that is
intrinsic: it’s hard to write well! But, based on my interviews, I’m not alone
in feeling that there are a lot of extrinsic factors holding things back,
especially authoring tools that are limited and limiting.&lt;&#x2F;p&gt;
&lt;p&gt;In that context, I was a little bit surprised that none of my interviewees
complained about documentation being &lt;em&gt;unappreciated&lt;&#x2F;em&gt;. A perennial — and
completely valid — complaint from scientific software developers is that they
don’t get the recognition that they deserve from their peers. I had expected
that at least one or two people would report that they were discouraged from
spending time on docs because it seemed like the results weren’t being valued.
But no one offered that sentiment, although I didn’t make a point of trying to
elicit it either. Nor did anyone bring up the challenges of making documentation
work legible to the traditional academic reward structures (e.g., making docs
citeable), which is another preoccupation when it comes to research code. It
could be that, given a baseline expectation that time spent on research software
is going to be underappreciated in general, a lack of appreciation for time
spent on software docs goes without saying.&lt;&#x2F;p&gt;
&lt;p&gt;In a similar vein, only one interviewee specifically mentioned the connection
between funding and software documentation, or the lack thereof. I’m confident
that the potential impact of funding is obvious to everyone involved; it’s just
that the prospects for any kind of systematic financial support for software
documentation feel grim at best, at the moment. I can imagine the contours of a
pitch to change this: fundamentally, documentation is education, and education
deserves investment, right? Well, lately there’s less agreement on that point
than I’d like. But I’d be happy to argue that most students would learn not just
as well but actually &lt;em&gt;more&lt;&#x2F;em&gt; effectively from great software documentation than
from a great textbook: it’s hard to beat learning by doing. Anyway, maybe one
day there’ll be somebody to listen to this pitch, but I’m not holding my breath.&lt;&#x2F;p&gt;
&lt;p&gt;The lack of systematic support — both financial and less tangible kinds — surely
has much to do with another theme that I noticed: a lack of a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Community_of_practice&quot;&gt;community of
practice&lt;&#x2F;a&gt; around research software documentation. It feels (emphasis on the
&lt;em&gt;feels&lt;&#x2F;em&gt;) like everybody’s out there on their own, figuring things out
independently. While some of my interviewees mentioned learning how to create
better documentation from looking at &lt;em&gt;other docs&lt;&#x2F;em&gt;, no one gave much indication
that they felt that they had learned much from &lt;em&gt;other people&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;This is another result that can be filed under “Absolutely Zero Surprise
Involved”. But I do want to emphasize that there are people out there actively
trying to create exactly these communities. The two groups that come to mind
immediately are &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.writethedocs.org&#x2F;&quot;&gt;Write The Docs&lt;&#x2F;a&gt; and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.thegooddocsproject.dev&#x2F;&quot;&gt;The Good Docs Project&lt;&#x2F;a&gt;, the
latter of which I only learned about recently. I’m sure there are more, and
everything I’ve seen suggest that the people in these organizations are doing
exactly what they ought to — it’s just that this kind of community-building is
slow, tedious, &lt;em&gt;difficult&lt;&#x2F;em&gt; work. It’s also worth pointing out that there is a
&lt;em&gt;much&lt;&#x2F;em&gt; bigger world out there once you start considering the field of technical
writing in general, as opposed to the narrower group of
scientists-doing-software-docs-amateurishly. (On the other hand, I was going to
link to the Society for Technical Communication as an example, but I see that
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;technicalwriting&#x2F;comments&#x2F;1id0m5d&#x2F;stc_is_gone&#x2F;&quot;&gt;they closed their doors earlier this year&lt;&#x2F;a&gt;.)&lt;&#x2F;p&gt;
&lt;p&gt;Getting a bit more pragmatic, one high-level idea that I took away from my
interviews is that there are basically two levels of documentation. At the lower
level, we have the things that we can think of as, approximately, individual
documents: a single tutorial, or the API reference for a single Python package.
At the higher level, we have document collections: a whole suite of tutorials
and API references, design docs, and so on. At both levels, we face challenges
relating to design, organization, and tooling, but the details of those
challenges are actually quite different. Smaller projects are basically
operating only at the lower level, but for people working in larger projects, it
seems that attention quickly migrates to primarily focus on the high-level
issues, rather than the details of individual documents. For my BSSw project, I
plan to primarily address the lower level — preparing individual documents — but
I came out of my interviews feeling that it’ll be helpful to draw a line between
the two levels (even if it is, inevitably, fuzzy) and offer some guidance about
how to tackle the higher-level issues.&lt;&#x2F;p&gt;
&lt;p&gt;Another theme from my interviews was interest in experimenting with different
documentation “form factors”. I think that Jupyter notebooks came up in all of
my discussions, and everyone was pretty much on the same page in feeling that
they can be a very good vehicle for certain kinds of software documentation. I
brought up videos in a few interviews, because I’ve been struck by the seeming
trend in industry to produce video documentation for &lt;em&gt;everything&lt;&#x2F;em&gt;, even cases
where it feels like a few paragraphs of text would be more than sufficient. I
still don’t quite understand where that’s coming from — and none of my
interviewees seemed to see any special appeal either. There are cases where
video documentation can be very helpful (as I’ve experienced with &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.worldwidetelescope.org&#x2F;&quot;&gt;WorldWide
Telescope&lt;&#x2F;a&gt;) but, given that it takes a lot of work to produce, my
interviewees’ conception of “documentation” still centers on text.&lt;&#x2F;p&gt;
&lt;p&gt;One interviewee was sympathetic to an idea that’s been bouncing around in my
head: perhaps we should be making a lot more use of slide decks for
documentation. There’s no need to belabor the ways in which they can go wrong,
but at their best, slide decks can break down material and interleave text and
graphics in a way that seems a lot more digestible than a linear document. And,
as I’ve harped on for, gosh, &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2013&#x2F;09&#x2F;slides-for-scientific-talks-in-html&#x2F;&quot;&gt;more than a
decade&lt;&#x2F;a&gt;, you can make nice HTML
decks that can be viewed without the need to “context switch” into a dedicated
app. I tried this approach &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;dasch.cfa.harvard.edu&#x2F;dr7&#x2F;introduction&#x2F;&quot;&gt;in the DASCH documentation&lt;&#x2F;a&gt; and am
personally pretty happy with the results, although the scale of deployment is
small enough that I don’t have any firm feedback about what anyone else thinks.
One of the challenges for this idea is that authoring these decks is a bit
tricky, although maybe most people would be satisfied with links to Google
Slides or something along those lines.&lt;&#x2F;p&gt;
&lt;p&gt;Attention to authorship barriers — how hard or easy is it to just sit down and
write something? — is looming large in my thinking as a result of these
interviews. One interviewee was adamant: write docs using whatever tool you have
at hand; the best doc is one that actually exists. Others were focused on
getting the right tooling into place: setting up CI to run doctests (using a
tool like &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;sybil.readthedocs.io&#x2F;&quot;&gt;Sybil&lt;&#x2F;a&gt;) ensuring that they never break, or to automatically execute
Jupyter notebooks and integrate them into a Sphinx tree. Both of these attitudes
have their appeal, but there’s definitely a tension between them.&lt;&#x2F;p&gt;
&lt;p&gt;I’m not quite sure how to navigate this tension in the resource that I’ll be
creating. My personal bias ought to be pretty clear: whatever the opposite of
quick-and-dirty is, that’s usually me. Turns out, though, that most people
aren’t like me. I’d like to think that my resource can offer people some
cookbook recipes to help them get the benefits of some of the more complex tools
(it’s good to verify that your example code actually runs!) without getting
mired in a slog of Sphinx configuration, but I’ll have to be careful not to go
overboard.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;The work described in this post was supported by a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;bssw.io&#x2F;pages&#x2F;bssw-fellowship-program&quot;&gt;Better Scientific Software
Fellowship&lt;&#x2F;a&gt;&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
</description>
    </item><item>
      <title>MPC and the Rubin First Look</title>
      <pubDate>Wed, 02 Jul 2025 13:25:47 -0400</pubDate>
      <link>https://newton.cx/~peter/2025/mpc-rubin-first-look/</link>
      <guid>https://newton.cx/~peter/2025/mpc-rubin-first-look/</guid>
      <description>&lt;p&gt;At long last, the first data from the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;rubinobservatory.org&#x2F;&quot;&gt;Vera C. Rubin Observatory&lt;&#x2F;a&gt; are
starting to become public! Here at the Minor Planet Center we were not only
watching last week’s Rubin &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;rubinobservatory.org&#x2F;news&#x2F;first-imagery-rubin&quot;&gt;“First Look”&lt;&#x2F;a&gt; event — we had some work to do
too.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;Asteroid-hunting and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;science.nasa.gov&#x2F;planetary-defense&#x2F;&quot;&gt;planetary defense&lt;&#x2F;a&gt; have always been a major piece of
Rubin’s science case. When it hits full steam, Rubin will increase the rate of
observations coming into the MPC by a factor of around five, and at this point
&lt;em&gt;years&lt;&#x2F;em&gt; of effort have gone into getting the MPC ready for Rubin’s data stream,
as I’ve mentioned &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2025&#x2F;the-next-chapter&#x2F;&quot;&gt;a few times&lt;&#x2F;a&gt; &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2025&#x2F;mpc-is-hiring&#x2F;&quot;&gt;already&lt;&#x2F;a&gt;. With Rubin getting closer to
full operations, the MPC and Rubin teams have been working together closely to
build the actual systems that will send Rubin data to MPC and process them. We
collectively agreed that as part of last Monday’s launch event, the MPC would
accept a first official Rubin data delivery and process it for distribution
through our standard, public-facing interfaces. Quite a big milestone for this
long-lasting project!&lt;&#x2F;p&gt;
&lt;p&gt;While the Rubin team submitted their data in the same way that everyone else
does, we processed the measurements using a set of next-generation pipelines
that we’ve been developing as a major part of the broader effort — the first
time that we’ve done so as part of production operations. This wasn’t strictly
necessary, in a certain sense — the legacy pipelines that churn away every day
&lt;em&gt;could&lt;&#x2F;em&gt;, in principle, have handled the data. The old and new systems are
functionally equivalent, at the moment, and the data volume of this one
submission wouldn&#x27;t have been enough to tip our systems over. But the whole
point of building the new pipeline is that once we start getting Rubin data
every night — which will start happening in a matter of weeks! — the legacy
system simply won’t be able to keep up. The time to get the new code up and
running is now.&lt;&#x2F;p&gt;
&lt;p&gt;Now, the dirty secret, such as it is, is that we already processed Monday’s
submission from Rubin dozens of times. We have a “sandbox” system that’s been
hosting the new pipelines and we’ve been testing its results assiduously. So we
knew exactly what we were going to be getting, well in advance.&lt;&#x2F;p&gt;
&lt;p&gt;But, that being said, there’s always a difference between testing and actual
production deployment, so Monday was genuinely a big day for us.&lt;&#x2F;p&gt;
&lt;p&gt;How did it go? Pretty well! We did uncover some issues that needed working
through, but I would generally characterize them as ones that I don’t feel too
bad about — issues occurring in the gaps that we knew we wouldn’t realistically
be able to test well in advance. Probably the most glaring issue was that the
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;science.nasa.gov&#x2F;planetary-defense&#x2F;&quot;&gt;provisional designations&lt;&#x2F;a&gt; generated by the new code were associated with
the wrong dates. That wasn’t ideal, but it was a one-off mistake, rather than an
indicator of a flawed design. MPC developer &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.cfa.harvard.edu&#x2F;people&#x2F;brian-burt&quot;&gt;Brian Burt&lt;&#x2F;a&gt; did a fantastic job
squashing this and other issues last week, allowing us to process another batch
of Rubin data on Friday with nearly ideal results.&lt;&#x2F;p&gt;
&lt;p&gt;I won’t run down all of the issues that we ran into, but there was a clear
theme, alluded to above: the problems occurred in places where there were
differences between our test environment and the actual production environment.
(&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;12factor.net&#x2F;&quot;&gt;Twelve-factor&lt;&#x2F;a&gt; wins again.) Or, in some areas like the issuing of
provisional designations, at the moment we simply don’t have a way to test such
code in an end-to-end fashion in a non-production setting. No surprise that
that’s an area where problems surfaced.&lt;&#x2F;p&gt;
&lt;p&gt;Fortunately — in a certain sense — this theme came as no surprise to us. We’re
well aware of the limitations in our software testing capabilities, and have
been working steadily to address them. We have what we need when it comes to
unit testing, but integration tests are another story: we simply don’t have
non-production versions of various subsystems needed to test end-to-end
workflows.&lt;&#x2F;p&gt;
&lt;p&gt;I was not at all surprised to discover this when I arrived at the MPC. In my
experience, the idea of having parallel test and prod deployments is a great
example of something that’s near-universal in industry, but that a surprising
number of self-taught academic software developers aren’t used to. (Another
example: &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;semver.org&#x2F;&quot;&gt;semantic versioning&lt;&#x2F;a&gt; and the concept of API breakage.)
Historically, MPC certainly had nothing of the sort. But that’s something I
intend to make sure we fix — a lot of my preoccupation with code deployability
and topics like &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2024&#x2F;xz-and-release-automation&#x2F;&quot;&gt;release automation&lt;&#x2F;a&gt; is precisely because these practices
help you ensure that you can construct multiple identical deployment
environments, which can then be used to support complex testing and prototyping.
Since MPC’s technology systems are both complex and legacy-filled, it will take
a long time to complete the evolution, but I’ve found that it’s &lt;em&gt;always&lt;&#x2F;em&gt; worth
the effort.&lt;&#x2F;p&gt;
</description>
    </item><item>
      <title>CASA 6 is Now in Conda-forge</title>
      <pubDate>Tue, 29 Apr 2025 10:08:38 -0400</pubDate>
      <link>https://newton.cx/~peter/2025/casa-6-conda-forge/</link>
      <guid>https://newton.cx/~peter/2025/casa-6-conda-forge/</guid>
      <description>&lt;p&gt;I’m pleased to report that &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;casa.nrao.edu&#x2F;&quot;&gt;CASA 6&lt;&#x2F;a&gt; is now available in &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;conda-forge.org&#x2F;&quot;&gt;conda-forge&lt;&#x2F;a&gt;!
CASA is a software suite for processing radio astronomy data from telescopes
such as the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.vla.nrao.edu&#x2F;&quot;&gt;Very Large Array&lt;&#x2F;a&gt;, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.almaobservatory.org&#x2F;&quot;&gt;ALMA&lt;&#x2F;a&gt;, and more. The availability
includes the &lt;code&gt;casatools&lt;&#x2F;code&gt; and &lt;code&gt;casatasks&lt;&#x2F;code&gt; Python packages but not the full suite
of CASA end-user applications. Just run:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color-scheme: light dark; color: light-dark(#24292E, #E1E4E8); background-color: light-dark(#FFFFFF, #24292E);&quot;&gt;&lt;code data-lang=&quot;shellscript&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: light-dark(#6F42C1, #B392F0);&quot;&gt;$&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#032F62, #9ECBFF);&quot;&gt; conda&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#032F62, #9ECBFF);&quot;&gt; install&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#032F62, #9ECBFF);&quot;&gt; casatasks&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;or the equivalent command in a Python environment that &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;conda-forge.org&#x2F;docs&#x2F;user&#x2F;introduction&#x2F;#how-can-i-install-packages-from-conda-forge&quot;&gt;has conda-forge
enabled&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;Somewhat terrifyingly, I’ve been working on packaging CASA &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pkgw&#x2F;conda-recipes&#x2F;commit&#x2F;a4a4b55416a403eb17b3182c3854d68dc98cfc84&quot;&gt;for almost exactly a
decade&lt;&#x2F;a&gt;. The origin of this is that like many complex data reduction packages
of its time, CASA was (and still is) distributed by &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;public.nrao.edu&#x2F;&quot;&gt;NRAO&lt;&#x2F;a&gt; as a large,
self-contained software environment: the latest CASA installer for Linux is just
shy of a gigabyte in size, including not just data files but also all of the
support libraries needed to ensure that the binaries can run on as many systems
as possible. The issue with CASA was that it was &lt;em&gt;also&lt;&#x2F;em&gt; trying to embrace Python
as a scripting language. In NRAO’s monolithic distribution model, this meant
embedding an entire freestanding Python interpreter in the CASA distribution.&lt;&#x2F;p&gt;
&lt;p&gt;This might seem like a minor packaging choice, but my claim is that it hugely
affects how you can work with the software. A key advantage to a scripting
language like Python is that it allows you to bring together a huge range of
codebases in one “place:” you can write a program that glues together the
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.netlib.org&#x2F;&quot;&gt;netlib&lt;&#x2F;a&gt; numerical libraries via &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;scipy.org&#x2F;&quot;&gt;Scipy&lt;&#x2F;a&gt;, with GUI toolkits like &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.gtk.org&#x2F;&quot;&gt;GTK&lt;&#x2F;a&gt; using
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;pygobject.gnome.org&#x2F;&quot;&gt;PyGObject&lt;&#x2F;a&gt;, and data I&#x2F;O packages like &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.hdfgroup.org&#x2F;solutions&#x2F;hdf5&#x2F;&quot;&gt;HDF5&lt;&#x2F;a&gt; through &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.h5py.org&#x2F;&quot;&gt;h5py&lt;&#x2F;a&gt;. When a package
like CASA bundles its own interpreter, if you want access to these kinds of
libraries you can’t just reuse whatever software collection you’ve painstakingly
set up over the years — you have to install everything from scratch into &lt;em&gt;its&lt;&#x2F;em&gt;
environment. In this model, CASA isn’t a tool to add to your toolbox — it’s a
quarantine zone that can only be entered or exited through an airlock.&lt;&#x2F;p&gt;
&lt;p&gt;Even worse, back then CASA’s Python installation was missing key development
files, so that many packages with binary components couldn’t be installed into
its environment at all! At least, not unless you were willing and able to devise
some extreme hacks to fill in the needed files.&lt;&#x2F;p&gt;
&lt;p&gt;Of course, NRAO wasn’t distributing CASA as a monolith out of spite: for a long
time, that was the only realistic way to deliver a large application (or suite
of applications) with complex, specialized dependencies. But in 2012 (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.asu.cas.cz&#x2F;~barta&#x2F;ARC-doc&#x2F;casa-intro-prague-2012.pdf&quot;&gt;around
the time of CASA 3.3&lt;&#x2F;a&gt;) &lt;a rel=&quot;external&quot; href=&quot;http:&#x2F;&#x2F;ilan.schnell-web.net&#x2F;prog&#x2F;anaconda-history&#x2F;&quot;&gt;Anaconda first released the conda package
manager&lt;&#x2F;a&gt;. I’ve mentioned before that in my view &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2024&#x2F;all-in-on-pixi&#x2F;&quot;&gt;much of conda’s design was
more or less evolutionary&lt;&#x2F;a&gt;, but that’s not to downplay its impact — in my
view it has truly transformed the way that we can deliver scientific software to
users. In particular, conda and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;conda-forge.org&#x2F;&quot;&gt;conda-forge&lt;&#x2F;a&gt; have combined to create an
ecosystem where it can be amazingly straightforward to install complex
dependencies into arbitrary Python environments.&lt;&#x2F;p&gt;
&lt;p&gt;Back in 2015, that was what I wanted for CASA. So I started slogging through its
obscure and out-of-date dependencies and developed conda recipes &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pkgw&#x2F;conda-recipes&#x2F;blob&#x2F;2cce9ed2fb0f0b650cbbc68cd38b942cee9b2889&#x2F;ORDERED.md#the-casa-stack&quot;&gt;for the whole
stack&lt;&#x2F;a&gt;, starting with version 4.4. And lo, it was good.&lt;&#x2F;p&gt;
&lt;p&gt;I did discover that I had to write &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pkgw&#x2F;pwkit&#x2F;tree&#x2F;master&#x2F;pwkit&#x2F;environments&#x2F;casa&quot;&gt;a whole bunch of code&lt;&#x2F;a&gt; to make a bunch of
CASA’s functionality meaningfully usable from Python. It turned out that despite
having its outermost layers written in Python and claims of “scriptability”,
much of the CASA simply wasn’t usable like a regular Python package.&lt;&#x2F;p&gt;
&lt;p&gt;In 2017 (CASA 4.7, Python 3.6), it was clear that it was time to really shift to
Python 3. But it was also clear that CASA wasn’t going to be supporting Python 3
for a long time yet, and that there wasn&#x27;t anything that individuals outside of
NRAO could do about that. So I &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pkgw&#x2F;casa&#x2F;commits&#x2F;casa3k-5.6&#x2F;&quot;&gt;added Python 3 support to CASA myself&lt;&#x2F;a&gt;, which
was … painful. But once again, I found it worthwhile to be able to actually have
CASA be a first-class member of my general software toolkit.&lt;&#x2F;p&gt;
&lt;p&gt;Around 2019, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;science.nrao.edu&#x2F;enews&#x2F;casa_008&#x2F;index.shtml#casa6&quot;&gt;CASA 6&lt;&#x2F;a&gt; was coming out, which both added support for Python 3
and started the process of making CASA’s architecture more Python-native. But &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2018&#x2F;operation-innovation&#x2F;&quot;&gt;I
had started spending a lot less time thinking about astrophysics&lt;&#x2F;a&gt;, so I
didn’t spend much time exploring it. I took an initial look at updating my
recipes for CASA 6, saw that things seemed about as challenging as in the CASA
5.x series, and decided not to worry about it for a while.&lt;&#x2F;p&gt;
&lt;p&gt;Now, after a long hiatus, I’ve had some reason to take another look at updating
my CASA conda packages. Finally, &lt;em&gt;finally&lt;&#x2F;em&gt;, the upstream source code is in
pretty good shape! I was able to put together conda recipes for &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;conda-forge&#x2F;casacpp-feedstock&#x2F;&quot;&gt;casacpp&lt;&#x2F;a&gt;,
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;conda-forge&#x2F;casatools-feedstock&#x2F;&quot;&gt;casatools&lt;&#x2F;a&gt;, and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;conda-forge&#x2F;casatasks-feedstock&#x2F;&quot;&gt;casatasks&lt;&#x2F;a&gt; over the course of just a few days. Instead of
thousands of lines of patches, I only needed a handful. And pretty much all of
the old, outdated dependencies needed by CASA 4&#x2F;5 are now gone. The combination
of all of these factors made it feasible for me to get these recipes integrated
into conda-forge, rather than building the packages myself. This will offer
massive maintainability gains going forward, thanks both to conda-forge’s
impressive infrastructure and a much more realistic possibility for other people
to help keep the packages up-to-date. Huzzah!&lt;&#x2F;p&gt;
&lt;p&gt;If you want to install CASA using these packages — or, really, create and manage
any kind of customized software environments — &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2024&#x2F;all-in-on-pixi&#x2F;&quot;&gt;I highly recommend using
Pixi&lt;&#x2F;a&gt;. Run &lt;code&gt;pixi add casatasks&lt;&#x2F;code&gt; and you should be good to go. But you
can use these packages to install CASA into any conda-backed environment just by
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;conda-forge.org&#x2F;docs&#x2F;user&#x2F;introduction&#x2F;#how-can-i-install-packages-from-conda-forge&quot;&gt;activating conda-forge&lt;&#x2F;a&gt; and running the analogous installation step.&lt;&#x2F;p&gt;
&lt;p&gt;As for &lt;em&gt;why&lt;&#x2F;em&gt; I’ve returned to CASA … I would say “more soon,” but that’s
unlikely. My wife is due to give birth to our first child within the week so I
don’t expect to be posting much for a while! I won’t be able to take as much of
a formal leave as I’d like because I’m within my first year of Smithsonian
employment and hence not eligible for &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.dol.gov&#x2F;agencies&#x2F;whd&#x2F;fmla&quot;&gt;FMLA&lt;&#x2F;a&gt;, but I have a sneaking suspicion
that my hands will be pretty full for the foreseeable future.&lt;&#x2F;p&gt;
</description>
    </item><item>
      <title>“Generic” Artifacts in GitHub Packages</title>
      <pubDate>Thu, 03 Apr 2025 11:38:56 -0400</pubDate>
      <link>https://newton.cx/~peter/2025/generic-github-packages/</link>
      <guid>https://newton.cx/~peter/2025/generic-github-packages/</guid>
      <description>&lt;p&gt;Service blogging today! For a while I’ve been pondering if it would be possible
to use the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.github.com&#x2F;en&#x2F;packages&quot;&gt;GitHub Packages&lt;&#x2F;a&gt; service to host “generic” files: namely,
arbitrary binary artifacts that aren’t necessarily NPM packages, Docker images,
etc. Motivated by some of the my current &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;minorplanetcenter.net&#x2F;&quot;&gt;MPC&lt;&#x2F;a&gt; projects, I sat down this week to
look into the topic more deeply than I have before. Lo and behold, you can do
this! And it isn’t even (that big of) a hack.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;&lt;strong&gt;Warning:&lt;&#x2F;strong&gt; &lt;em&gt;I realized that I wrote this blog like some ridiculous internet
casserole recipe. Skip down to the code blocks at the end if you just want to
see what to do.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Back when I was a lad, installing software was an adventure: for every program
you needed, you dug up its website, found the Downloads page, pulled whatever
file(s) the authors provided, and figured out how to actually get the damn thing
installed. (OK, well, actually, I remember the days of installing software from
stacks of floppy disks, but we&#x27;re not going back &lt;em&gt;that&lt;&#x2F;em&gt; far.) From the very
earliest days of the internet, though, people saw the value of pulling files
into shared repositories: &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.cpan.org&#x2F;&quot;&gt;CPAN&lt;&#x2F;a&gt; and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;ctan.org&#x2F;&quot;&gt;CTAN&lt;&#x2F;a&gt; were among the first; then we had
Linux distributions that packaged up and hosted amazingly wide-ranging
collections of software. But I feel like it took a while for people to
appreciate just how valuable these systems could be; I remember being struck by
the remarkably tight integration between the &lt;code&gt;npm&lt;&#x2F;code&gt; tool and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.npmjs.com&#x2F;&quot;&gt;npmjs.com&lt;&#x2F;a&gt; when
they launched, which was 2010.&lt;&#x2F;p&gt;
&lt;p&gt;Nowadays, you would be foolish to launch a new language or framework &lt;em&gt;without&lt;&#x2F;em&gt;
some kind of central package registry. But we&#x27;re actually seeing a trend towards
&lt;em&gt;de&lt;&#x2F;em&gt;-agglomeration as ecosystems get so large and complex that you start running
into problems if you&#x27;re limited to a single, global package namespace. For
instance, while we started with a single original &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;hub.docker.com&#x2F;&quot;&gt;Docker Hub&lt;&#x2F;a&gt; for hosting
Docker container images, we now live in a world where you can spin up your own
organizational registry using infrastructure provided by &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;aws.amazon.com&#x2F;ecr&#x2F;&quot;&gt;Amazon&lt;&#x2F;a&gt;,
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;azure.microsoft.com&#x2F;en-us&#x2F;products&#x2F;container-registry&quot;&gt;Azure&lt;&#x2F;a&gt;, or &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;cloud.google.com&#x2F;artifact-registry&#x2F;docs&quot;&gt;Google&lt;&#x2F;a&gt;, not to mention many other options. You see the
same pattern for NPM, Cargo, Conda, and other major packaging ecosystems as
well.&lt;&#x2F;p&gt;
&lt;p&gt;(As a side note, this emergent flexibility is a testament to the brilliant
simplicity of the Internet’s architecture! None of this would be possible
without the URL. Good job, team.)&lt;&#x2F;p&gt;
&lt;p&gt;In 2019, GitHub joined the fray with its own package hosting infrastructure:
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.github.com&#x2F;en&#x2F;packages&quot;&gt;GitHub Packages&lt;&#x2F;a&gt; (GHP). While a lot of people might only be familiar with
GHP through the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.blog&#x2F;news-insights&#x2F;product-news&#x2F;github-packages-container-registry-generally-available&#x2F;&quot;&gt;GitHub Container Registry&lt;&#x2F;a&gt;, the subset of the service
that deals specifically with Docker containers, it also supports NPM, RubyGems,
Maven, Gradle, and NuGet. You can see how all of these systems might have a lot
in common under the hood: they&#x27;re all basically dealing with versioned sets of
binary artifacts, and you can imagine building a common infrastructure for
naming, hosting, access control, and more.&lt;&#x2F;p&gt;
&lt;p&gt;That’s cool. But. What if I’d like to leverage the GHP infrastructure to manage
a binary artifact that isn’t a Docker image, an NPM package, or any of those
other things? A “generic” package, if you will — some kind of file whose
contents could be anything?&lt;&#x2F;p&gt;
&lt;p&gt;Obviously, if all else failed, you could embed your file in one of the schemes
that GHP &lt;em&gt;does&lt;&#x2F;em&gt; support. You could write a Dockerfile that constructed a Docker
image containing your file, and then you could fetch the image and extract the
file. It’s not pretty, but it works — it’s an approach that I’ve used myself
more than once. You could also do the same with NPM’s tooling, or probably &lt;em&gt;any&lt;&#x2F;em&gt;
of the other packaging systems supported by GHP.&lt;&#x2F;p&gt;
&lt;p&gt;Can we do better? Thankfully, we can.&lt;&#x2F;p&gt;
&lt;p&gt;The short story is that nowadays you can use the GitHub Container Registry to
manage generic packages in a pretty clean way. I’m not familiar with the
detailed history, but as best I can gather, the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;opencontainers.org&#x2F;&quot;&gt;Open Container Initiative&lt;&#x2F;a&gt;
has driven the development of standards and tools to allow container registries
to handle arbitrary file formats, and a side benefit is that we can (ab)use that
support to leverage these registries even if our binaries don’t correspond to
what we would normally think of as “container images”.&lt;&#x2F;p&gt;
&lt;p&gt;In particular, there’s a tool called &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;oras.land&#x2F;&quot;&gt;&lt;code&gt;oras&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; that can talk to GHCR in a
“generic“ way rather than a “Docker-specific” way. (It seems that ORAS stands
for OCI Registry As Storage, based on the title of its webpage.) With this tool,
it’s quite straightforward to deal with generic packages.&lt;&#x2F;p&gt;
&lt;p&gt;Specifically, if you’re like me and you’d like to publish a generic package to
GHCR in a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;features&#x2F;actions&quot;&gt;GitHub Actions&lt;&#x2F;a&gt; workflow, all you need is the following:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color-scheme: light dark; color: light-dark(#24292E, #E1E4E8); background-color: light-dark(#FFFFFF, #24292E);&quot;&gt;&lt;code data-lang=&quot;yaml&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;-&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#22863A, #85E89D);&quot;&gt; u&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#22863A, #85E89D);&quot;&gt;ses&lt;&#x2F;span&gt;&lt;span&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#032F62, #9ECBFF);&quot;&gt; o&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#032F62, #9ECBFF);&quot;&gt;ras-project&#x2F;setup-oras@v1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: light-dark(#22863A, #85E89D);&quot;&gt;  w&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#22863A, #85E89D);&quot;&gt;ith&lt;&#x2F;span&gt;&lt;span&gt;:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: light-dark(#22863A, #85E89D);&quot;&gt;    v&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#22863A, #85E89D);&quot;&gt;ersion&lt;&#x2F;span&gt;&lt;span&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#005CC5, #79B8FF);&quot;&gt; 1.2.2&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: light-dark(#6A737D, #6A737D);&quot;&gt;#&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#6A737D, #6A737D);&quot;&gt; ... create `myfile.zip` somehow&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: light-dark(#6A737D, #6A737D);&quot;&gt;#&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#6A737D, #6A737D);&quot;&gt; $SLUG is your package slug, e.g. `pkgw&#x2F;my-generic-package`&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: light-dark(#6A737D, #6A737D);&quot;&gt;#&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#6A737D, #6A737D);&quot;&gt; $TAG is the version tag, e.g. `latest`&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;-&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#22863A, #85E89D);&quot;&gt; n&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#22863A, #85E89D);&quot;&gt;ame&lt;&#x2F;span&gt;&lt;span&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#032F62, #9ECBFF);&quot;&gt; P&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#032F62, #9ECBFF);&quot;&gt;ush package&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: light-dark(#22863A, #85E89D);&quot;&gt;  r&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#22863A, #85E89D);&quot;&gt;un&lt;&#x2F;span&gt;&lt;span&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#D73A49, #F97583);&quot;&gt; |&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: light-dark(#032F62, #9ECBFF);&quot;&gt;    echo ${{ secrets.GITHUB_TOKEN }} |oras login --username ${{ github.repository_owner }} --password-stdin ghcr.io&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: light-dark(#032F62, #9ECBFF);&quot;&gt;    oras push ghcr.io&#x2F;$SLUG:$TAG --artifact-type application&#x2F;vnd.pkgw myfile.zip&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;It’s basically the same thing as pushing with the &lt;code&gt;docker&lt;&#x2F;code&gt; CLI, except the
artifact data come from a file on disk, and you need to specify an associated
&quot;media type&quot;. If you need your artifact to be consumable by third-party systems
(say, Docker), you’re going to need to set up a variety of other metadata too.
But if all you care about is pushing and pulling bytes, you can skip that, make
up a meaningless media type, and call it a day.&lt;&#x2F;p&gt;
&lt;p&gt;To retrieve your package later, it&#x27;s exactly what you would hope:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color-scheme: light dark; color: light-dark(#24292E, #E1E4E8); background-color: light-dark(#FFFFFF, #24292E);&quot;&gt;&lt;code data-lang=&quot;shellscript&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: light-dark(#6A737D, #6A737D);&quot;&gt;#&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#6A737D, #6A737D);&quot;&gt; this will create `myfile.zip`:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: light-dark(#6F42C1, #B392F0);&quot;&gt;oras&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#032F62, #9ECBFF);&quot;&gt; pull&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#032F62, #9ECBFF);&quot;&gt; ghcr.io&#x2F;&lt;&#x2F;span&gt;&lt;span&gt;$&lt;&#x2F;span&gt;&lt;span&gt;SLUG&lt;&#x2F;span&gt;&lt;span style=&quot;color: light-dark(#032F62, #9ECBFF);&quot;&gt;:&lt;&#x2F;span&gt;&lt;span&gt;$&lt;&#x2F;span&gt;&lt;span&gt;TAG&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Boom, done!&lt;&#x2F;p&gt;
&lt;p&gt;Readers experienced with GitHub will note that all of this might seem a bit
redundant with &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.github.com&#x2F;en&#x2F;repositories&#x2F;releasing-projects-on-github&#x2F;about-releases&quot;&gt;GitHub Releases&lt;&#x2F;a&gt;, which you can also use to distribute
versioned binary artifacts associated with your repository.&lt;&#x2F;p&gt;
&lt;p&gt;That’s not at all off-base. As someone with plenty of &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;pkgw.github.io&#x2F;cranko&#x2F;book&#x2F;latest&#x2F;&quot;&gt;experience automating the
creation of GitHub releases&lt;&#x2F;a&gt;, though, I have to say that the GHCR
approach feels a lot more lightweight. You don’t have to make up release notes,
and you can just push a file instead of having to make API calls to declare a
release and then attach artifacts to it. I also suspect that GHCR offers more
fine-grained access control settings. For my MPC needs, I was willing to use the
Releases system if it felt necessary, but I’m much happier to be able to use
GHCR and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;oras.land&#x2F;&quot;&gt;&lt;code&gt;oras&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; instead.&lt;&#x2F;p&gt;
</description>
    </item><item>
      <title>The MPC is Hiring</title>
      <pubDate>Fri, 21 Mar 2025 10:52:42 -0400</pubDate>
      <link>https://newton.cx/~peter/2025/mpc-is-hiring/</link>
      <guid>https://newton.cx/~peter/2025/mpc-is-hiring/</guid>
      <description>&lt;p&gt;The MPC is hiring! We have two positions currently open — one
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;trustcareers.si.edu&#x2F;en&#x2F;postings&#x2F;64e9c581-1659-42a8-aee1-05bd867bdd63&quot;&gt;Astronomer&lt;&#x2F;a&gt; and one &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;trustcareers.si.edu&#x2F;en&#x2F;postings&#x2F;dde574e5-5d74-432a-b7e1-06ccfd4a7f2c&quot;&gt;IT Specialist&lt;&#x2F;a&gt; (i.e., software developer).
Both applications close on March 31st.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;You should check out the links above for the formal descriptions of the two
roles. Less formally, I’d say that there’s a wide variety of work and science
that happens at the MPC and there are many different ways that a well-qualified
person could contribute to the MPC’s success. I personally came into the MPC
with essentially zero experience in minor-planet science (and it’s &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2025&#x2F;the-next-chapter&#x2F;&quot;&gt;only been a
few months&lt;&#x2F;a&gt; so that’s absolutely still the case) but that hasn’t hampered my
ability to get involved, or my level of interest in what I’m doing!&lt;&#x2F;p&gt;
&lt;p&gt;The job ads have to be a bit more neutral, but I’ll take the opportunity to try
to sell the MPC a bit. Most narrowly, this is an important and exciting time for
the MPC: with Rubin&#x2F;LSST coming online &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.lsst.org&#x2F;about&#x2F;project-status&quot;&gt;soon&lt;&#x2F;a&gt;, the there’s about to be a
boom in minor-planet science, as well as a five-fold increase in the rate of
data coming into the MPC. Just as we start getting used to The Era of Rubin,
NASA’s &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;science.nasa.gov&#x2F;mission&#x2F;neo-surveyor&#x2F;&quot;&gt;NEO Surveyor&lt;&#x2F;a&gt; will launch and ramp things up even more. All of
this will mean that MPC will have a lot of work to do, but our impact and
visibility will be higher too.&lt;&#x2F;p&gt;
&lt;p&gt;Of course, in these (cough) “interesting” times it’s hard to feel like you can
be at all sure what’s going to happen in a few months, let alone a few years.
But for what it’s worth, the funding for MPC (and a lot of that for Rubin and
NEO Surveyor) comes from NASA’s &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;science.nasa.gov&#x2F;planetary-defense&#x2F;&quot;&gt;planetary defense program&lt;&#x2F;a&gt;; that is, the
people who are worrying about city-destroying asteroids. Any scientist who tells
you that their funding is totally secure is either delusional or lying, but I
feel a lot better about MPC’s situation than some of the alternatives I can
think of.&lt;&#x2F;p&gt;
&lt;p&gt;Looking beyond the MPC’s immediate future, I think there’s a ton of opportunity
for the MPC to make a much bigger impact on both science, and society more
broadly. As &lt;a href=&quot;https:&#x2F;&#x2F;newton.cx&#x2F;~peter&#x2F;2025&#x2F;the-next-chapter&#x2F;&quot;&gt;I wrote before&lt;&#x2F;a&gt;, I see MPC as a perfect harbinger of where
21st-century science is heading: more and more, your ability to accomplish
&lt;em&gt;anything&lt;&#x2F;em&gt; is going to depend critically on your ability to execute technology
projects, above all software projects. Historically, MPC hasn’t done a great job
of this, but my goal is to complete the task of turning that around. I’d love
for MPC to be seen as a leader both scientifically &lt;em&gt;and&lt;&#x2F;em&gt; technically, and my
sincere belief is that if you can excel on “both sides of the ball,” you’re
going to be &lt;em&gt;wildly&lt;&#x2F;em&gt; more successful than if you’re only strong on one side. If
you’re someone that feels the same way, please consider joining us!&lt;&#x2F;p&gt;
</description>
    </item><item>
      <title>Fun with Databases</title>
      <pubDate>Wed, 05 Mar 2025 13:35:03 -0500</pubDate>
      <link>https://newton.cx/~peter/2025/databases/</link>
      <guid>https://newton.cx/~peter/2025/databases/</guid>
      <description>&lt;p&gt;Yikes, it’s been hard to keep up a weekly posting schedule lately. I’m hoping to
ramp that back up, though — I’ve just been occupied the past couple of weeks
with some nitty-gritty database performance work.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;A couple of weeks ago we migrated the MPC’s primary production database from
physical hardware to a virtual machine running our “Borg” cluster, which runs
the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;xcp-ng.org&#x2F;&quot;&gt;XCP-ng&lt;&#x2F;a&gt;&#x2F;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;xen-orchestra.com&#x2F;&quot;&gt;Xen Orchestra&lt;&#x2F;a&gt; virtualization stack. This database runs
on &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.postgresql.org&#x2F;&quot;&gt;PostgreSQL&lt;&#x2F;a&gt;, is about 2 terabytes in size, and really constitutes the
beating heart of the MPC: it stores all of our core data assets, such as orbits,
designations, and about 500 million astrometric observations. All of MPC’s
operational systems are querying and modifying the database non-stop, 24&#x2F;7.&lt;&#x2F;p&gt;
&lt;p&gt;As one might imagine, you want to do such a migration carefully. Indeed, there
were several months of planning and rehearsals leading up to the switchover. It
went completely smoothly, thanks in no small part to all of the preparatory
work. We’re now running the latest version PostgreSQL on a VM backed by a
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.dell.com&#x2F;en-us&#x2F;shop&#x2F;storage-servers-and-networking-for-business&#x2F;sf&#x2F;powervault&quot;&gt;Powervault&lt;&#x2F;a&gt; storage array with automated snapshots, hot migrations between
physical hosts, and all sorts of other nice resilience features.&lt;&#x2F;p&gt;
&lt;p&gt;We did discover, however, that some aspects of the database performance weren’t
living up to what we expected. In particular, there was a lot of variability:
the same query would sometimes run quickly, and other times run orders of
magnitude (plural!) slower. It wasn’t a crippling issue, but definitely
something to address.&lt;&#x2F;p&gt;
&lt;p&gt;And that’s what’s been keeping me busy since then. Pretty early on, we
discovered that the variable performance had something to do with the particular
way in which the VM was connecting to its Powervault storage. You can register
Powervault volumes with the Xen system and then use them as “storage
repositories” for virtual disks, which lets you manage and migrate them all
through the Xen interface. But something about the Xen layer seemed to be
responsible for the uneven performance — because you can also configure a VM to
connect directly to the Powervault using &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;ISCSI&quot;&gt;iSCSI&lt;&#x2F;a&gt;, and doing so gave much more
consistent results. &lt;em&gt;Part&lt;&#x2F;em&gt; of the reason might be that Xen disks currently are
limited to a maximum size of 2 TB, so that we had to use &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Logical_Volume_Manager_(Linux)&quot;&gt;LVM&lt;&#x2F;a&gt; in the database
VM to create a sufficiently large virtual disk. On the other hand, while I’m
sure this didn’t help performance, it’s hard for me to see how it would induce
the variability that we were seeing. (For what it’s worth, that 2 TB limit &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;xcp-ng.org&#x2F;forum&#x2F;topic&#x2F;10308&#x2F;dedicated-thread-removing-the-2tib-limit-with-qcow2-volumes&quot;&gt;is
going away&lt;&#x2F;a&gt;.)&lt;&#x2F;p&gt;
&lt;p&gt;Migrating from the Xen + LVM approach to the direct mount would require us to
dump and reload the whole database, so we didn’t want to make that change unless
we were convinced that it would help things. So, we spent a while running tests
and trying to understand the performance characteristics of the stack, and then
planning out the changeover once we were convinced that it was worth trying.
Yesterday we made the change, and things have been a lot smoother since.&lt;&#x2F;p&gt;
&lt;p&gt;The performance that we’re getting is still noticeably less than bare metal, but
the benefits of using the virtualized system are extremely real. In particular
I’m really excited that we have the option to “hot migrate” the database from
one physical machine to another, which means that we can do hardware maintenance
with zero downtime of the actual database service. But, if we really need to, we
can sacrifice that and get better performance by adopting a technology called
&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.kernel.org&#x2F;PCI&#x2F;pci-iov-howto.html&quot;&gt;SR-IOV&lt;&#x2F;a&gt;, which basically allows the VM to access the physical host’s network
cards in a lower-level, but still virtualized, way. In our low-level tests
SR-IOV gave a ~50% performance boost for some workloads, which apparently stems
from the fact iSCSI often uses a lot of little packets that require a lot of
interrupts to service. For the time being, we’d rather keep the ability to hot
migrate, but the SR-IOV option is there in our back pocket if we need it.&lt;&#x2F;p&gt;
</description>
    </item></channel>
</rss>
