PKGW: All In on Pixi

2024 November 25

I’ve been playing around with the Pixi software environment management tool for a little while now. After incorporating it into a several different projects, I think that I’m ready to declare: this is the way of the future. You should think about adopting it yourself.

Some context might be helpful. Pixi is a tool that creates and manages software environments, most similar to conda or its variants (mamba, micromamba); you can think of it as yet another drop-in replacement for any of these. In particular, Pixi uses exactly the same file formats and web APIs as conda and friends, and so in a certain sense they’re all interchangeable. My claim — always hard to prove — is that despite this, there are some key differences about Pixi’s design that make it a dramatic improvement on the alternatives.

When I use the term “software environments,” I mean a freestanding set of software packages installed somewhere on a computer; different environments are independent and can be activated at the user’s discretion. In the context of the past half-century of computing, it’s worth emphasizing how novel, perhaps even revolutionary, this concept is. People have always installed custom software on computers (that’s kind of the whole reason they exist), but for much of the history of computing, managing those installations has been an awfully ad-hoc affair. Traditionally, if I’m on a Unix machine, I might compile a library that I need and install it into $HOME/lib; then futz with some compiler flags to compile a binary that depends on that library and install it into $HOME/bin; and so on, on a case-by-case basis. If I want to update or uninstall anything, I need to manually recompile code, set up environment variables, rerun build scripts, and all other sorts of finicky things.

Given where I’m going to take things below, it’s worth pointing out explicitly that this difficulty, in turn, made it quite impractical to maintain multiple environments on a single machine. In general, you had what was installed globally, your user-specific customizations, and that was it. If you needed to install some special tool to support a specific workflow, you’d drop it into one of these big buckets. Anything finer-grained would take way too much effort.

The quality of the modern tools has completely changed things. It used to be the problem was getting people to be able to install Python at all; nowadays the problem is that they’ve lost track of all of their different Python installations. This is a real problem, but more and more I’m convinced that it’s a gamechanger to escape the “scarcity mindset” regarding environments. We can truly have as many of them as we want! (This can be hugely inefficient, it’s true, but disk space is cheap, man.)

The core breakthrough, as I see it, was the idea of creating standardized software packages that a generic package manager can then manipulate. But I’d argue that it wasn’t until Conda that we got the first end-to-end system with a really killer combination of features:

Arbitrary roots: You can create any number of software environments, rooted in any directory on your computer.
Language-agnostic: You can reliably install software written in any language.
User-level: No system administrator privileges needed.
Decently cross-platform: Support for the Big Three of Linux, Mac, and Windows.

(For what it’s worth, there’s a key technical innovation that enabled Conda to deliver all this: install-time rewriting of the install prefix and RPATH records in text and binary files, respectively, which makes the “arbitrary roots” feature possible without admin privileges.)

The conda ecosystem certainly has its issues — it has sometimes been frustrating to see these tools rediscover problems that were solved by RPM and dpkg twenty years ago — but I have found this particular combination of capabilities to be very powerful. Above all, the ability to create and throw away environments on-demand opens up all sorts of new work patterns. It’s exactly the same sort of space of opportunities that were created by Docker — which has been utterly transformative — just at a somewhat more user-facing level.

But if the conda ecosystem is so great, why am I saying that Pixi in particular is such an improvement? Because Pixi finally nails two things that the original conda tools didn’t do so well: it’s declarative and reproducible.

In contrast to the declarative-ness of Pixi, conda environments have to be managed imperatively. To create, change, or destroy an environment, you have explicitly signal your intent by running a command — conda create to create an environment, conda install to add a new piece of software into it, and so on.

This is fine as far as it goes, but it’s actually a fairly cumbersome, low-level approach. Pixi’s approach is higher-level: you declare that you want an environment with certain characteristics by putting a pixi.toml file in a directory; and then Pixi will do whatever is needed to ensure that the requested environment is available there.

This might not sound like a major difference, but I believe that it’s hugely important. Imagine that my team has a set of scripts that need to run in a certain software environment (say, ones supporting a research paper in progress), and the environment is evolving along with the scripts as we work. In the imperative model, every user needs to drive that evolution themselves: run the command to create the environment when they're first getting set up; install new packages as needed; delete and recreate the environment if anything goes wrong. It’s really hard to ensure that everyone is actually using the same software.

In the declarative, Pixi world, it’s much simpler: if you have the same pixi.toml, you’ll be running the same code. In essence, you’re telling the computer what you want and letting it figure out how to get there, instead of explicitly taking it there yourself. This is the difference between compiling software by manually running gcc and ld commands, and having a Makefile (or, my preference, a build.ninja) that reliably delivers your final product no matter what your exact starting point is.

You might be about to bring up environment.yml files, which can play a role similar to that of pixi.toml. I will claim, without delving into every detail, that they simply can’t support the declarative paradigm. They can help with the initial creation of an environment, but not with maintenance after it’s created. They also don’t work if you want to declare an environment that can be created on multiple OSes. Finally, they miss the mark on the other virtue of Pixi: maintainable reproducibility.

Pixi makes environments more reproducible using a technique that is now bog-standard: lockfiles. When I write a pixi.toml file, I specify requirements about what software needs to be in my environment, which can often be quite loose: “I need Python 3.10 or newer, but beyond that I don’t care about the precise version.” Whenever Pixi sets up an environment, it resolves my requirements into a specific set of packages to install (say, python-3.12.7-hc5c86c4_0_cpython) and records that result into a companion file, pixi.lock. Future operations will stick with what’s in the lockfile, unless I explicitly specify that I want to reconsider some of those decisions. If I distribute my lockfile, other users can get an environment that isn’t just consistent with my requirements, but that matches mine exactly. The hallmark of the lockfile approach is while a human controls the file that defines the space of allowed solutions (pixi.toml, here), the software tool controls the file capturing the specific solution that was arrived at (pixi.lock), and the tool ensures that the lockfile always reflects the ground truth of the actual installed environment.

Pixi’s developers describe it as being “project-oriented.” To me, this terminology gets at the fact that the state of a Pixi environment is completely compartmentalized on the filesystem — not just the installed environment but the pixi.toml and pixi.lock files that define it all necessarily live within the same directory; and nothing outside of that directory affects them. (Ideally, at least.) I feel that there’s something about this compartmentalization that enables a mental inversion: instead of setting up a software environment, inside of which I run code for various projects, instead I set up various projects, some of which might happen to define software environments that can run their code.

There are other aspects of Pixi that I like but are perhaps less interesting. One worth mentioning is that it’s written in Rust and distributed as a single executable, so installation is easy and reliable. (One of the decade-old mistakes repeated by conda: don’t write a package manager in a scripting language!) This also makes it pretty fast.

That being said, there will be times when Pixi isn’t the right tool to use. In particular, it’s not the kind of thing that would replace pip (or poetry or uv or …) for managing a Python software project, or npm (or pnpm or yarn or …) for JavaScript. Those are, broadly speaking, software-development tools that can manage software environments; Pixi is exclusively focused on the latter.

But I’m pretty sure that I can use Pixi everywhere that I used to use conda and mamba. I still maintain a what you might call a personal-but-global software environment, activated by default in all of my terminals — now it’s just more reproducible and better defined, since I have a pixi.toml defining what goes into that environment. Whereas in the past, I always dreaded upgrading my Python version, because it could easily break conda and wedge everything, now I’m kind of looking forward to the next upgrade — I’ll just update the python = "3.12.*" spec in my pixi.toml file and ask Pixi to re-solve everything. If anything goes wrong, I’ll git restore pixi.toml pixi.lock and I’ll be back where I started. Doing this kind of reversion is super hard in an imperative system, but trivial in a declarative one.

The other change is that now my personal-but-global software environment is getting slimmer, since I can break out rarely-used packages into more purpose-oriented environments. For instance, I use the phenomenal Beancount software to do accounting. (I like Beancount so much that I now think that accounting is fun.) The Beancount ecosystem is almost entirely written in Python, and I used to have all of those tools installed in my global environment. But now with Pixi, it makes way more sense to put a pixi.toml in the same Git repository where I maintain my Beancount data — I’m only going to use the tools in connection with those data files. The Beancount tools aren’t always compatible with the latest-and-greatest versions of things, so it’s genuinely helpful to be able to isolate that environment from my general-purpose one.

I got some good use out of conda environments before Pixi, but I was always a little dissatisfied with the tools that conda provided for working with them — or the lack thereof. Pixi solves essentially all of the problems that were bothering me: the reliable, declarative behavior takes conda’s support for environments from a sketch to a fully-realized, new tool in my toolbox. I can imagine how it will be especially helpful for collaborative projects, but even for purely individual efforts, I have a feeling that it will unlock powerful new work patterns. Kudos to the Pixi team!

Questions or comments? For better or worse this website isn’t interactive, so send me an email or, uh, Toot me.

To get notified of new posts, try subscribing to my lightweight newsletter or my RSS/Atom feed. No thirsty influencering — you get alerts about what I’m writing; I get warm fuzzies from knowing that someone’s reading!

Later: DASCH Data Services Now Expose Colorterms

Earlier: Preview Access to DASCH DR7 Now Available

See a list of all posts.

View the revision history of this page.