The Linux development process: Is it worth the hassle?

Linux is around for almost three decades by now. In its early days contributions were managed manually by Linus himself without any versioning system whatsoever. In modern times, by using git.

Throughout all of that time, however, one thing has been constant: code is sent to one (or many) mailing lists, where it reviewed and debated until it is deemed ready for inclusion.

But despite its success, this process has been coming under fire. An article by Microsoft’s Sarah Novotny recently made quite the splash on social media by claiming that the collaboration tools used by Linux are outdated and would be better off replaced if the community is to attract new blood, with defenders of the process and its detractors clashing.

I believe I am in a privileged position to offer some perspective: For almost a decade, I have written code for Linux and other projects with a similar workflow. During my tenure at Red Hat, I have contributed code to the core x86 infrastructure, to the KVM hypervisor and QEMU, to the Xen Hypervisor, and others. Although I haven’t much touched Linux in some 7 years, that is because I devoted my time to the Seastar C++ framework and the ScyllaDB database, both developed with a very similar methodology. Now I work as a Staff Engineer for Datadog, a company that follows pretty much the opposite process which is much more in line with other web companies do.

So where do I stand? Let me first state unequivocally: I do not like the Linux development process. I fully believe it is not only a barrier to entry but a barrier to sustained productivity (although not because of e-mail) and a source of frustration. It is not my intention to follow it in any project in which I have the sole power to decide how things are done.

But at the same time, many critics of the process seem to believe that the fact that its defenders cling so hard to it is just because Linux is full of old gatekeeping boomers that are averse to change. And it’s not (although I am sure some, individually, are). The development process followed by Linux offers some unique and important advantages that would benefit any other organization.

Any other tooling aside from e-mail is opinionated enough to force Linux out of those benefits, and it is to those benefits — not e-mail, that people cling to. Tooling that can lower the barrier to entry and fix the frustrating aspects of the process while at the same time allowing organizations to realize the benefits that Linux has would truly advance the state of software development.

There are many such advantages, but in the interest of time I will focus in one of them, which I consider the most important. I will do my best to explain what it is, why it is so frustrating despite its advantages, and why although beneficial to other organizations, it is just crucial for Linux.

Commit messages and the patch

There is a rule in Linux that says that a code change that is sent for inclusion needs to be broken in individual patches. Each of them must do one thing and one thing only, and each of them should have its own descriptive commit message. It is not uncommon that commit messages are much longer than the code change itself.

Want to read this story later? Save it in Journal.

This is a prime example of something that organizations at large are missing. Most commit messages I see around GitHub in modern projects are things like “checkpoint for Aug 25th” or the slightly (but only slightly) better “implement function X”. If someone else needs to look at this code later, they will have a hard time understanding why the change was done the way it was. Some bugs are really subtle, and can easily resurface. By looking at a short, non-descriptive commit message, I would not necessarily know what conditions were present when the bug was discovered.

As a quick example, see this Linux commit from my great friend Johannes Weiner, that I could easily imagine some other project writing as “delete warnings”. By reading it I know why it is safe to delete the warnings, in which circumstances that is indeed safe, and what guarantees should I make if I am changing this code in the future.

I am sure that many organizations have individuals that do this. But with the Linux process that is enforced, so I am absolutely sure that by reading the commit message I would understand everything there is to understand about the change. If we are talking about a bug, I would know in which systems it was present, under which conditions it happened, why it hasn’t affected more systems and what should I do not to make the same mistake again.

This is desirable for any organization: it is easier for other people (including future you) to understand why changes were made, why the code behaves the way it does, which eases the ramp up time of new engineers, prevents bugs from resurfacing, and reduces the danger of unrelated code sneaking in and breaking things.

But it is crucial for Linux for two reasons:

  1. The sheer amount of people from different backgrounds, different companies, with different motivations and agendas. Massive projects happening within a company’s wall can use other mechanisms to spread information and guarantee accountability. And for Open Source projects, few (if any) are as big, long lived, and touched-by-many as Linux.
  2. Backports: given its size and importance, Linux is in a constant state of forking. Even now in 2020 distributions may add their own fixes on top of a version they deem LTS. And if this occurs less frequently now than in the early 2000, when a non negligible part of Red Hat’s competitive moat was the stabilization fixed it brought to its kernel, that is only because Linux itself started having its own LTS series that distributions can draw from.

Backports are usually not a problem for modern online companies that do not have to maintain parallel product lines. They ship, and that’s it. But when backports are involved things get more complex. A developer (likely not the author) may have to choose how to adapt slightly that code to a somewhat different, older codebase. And the decision that minimizes risk can be — and often is — to backport only certain parts of a large change. Imagine a 2,000-line code change that has a 5 line bugfix embedded in it. Also imagine that the bugfix may have happened after a refactor of the API. Would you rather backport from a massive changeset, or from a very well documented, well described, and well split patchset? As a person that has done countless backports, I know where I stand.

Backports or no backports, the benefits of organizing and caring for the actual changes but they come at a steep cost. Not only the programmer now has to worry about the code, but also about how to reorganize and reshuffle the code.

Some of that reorganization is easy: you can git add -p and select which parts you can add into each change. It starts to become a little bit more complex when one starts to see circular dependencies between pieces of code. Imagine a function that returns an object of a type that will only be introduced later on. You end up having to add code that will not end up in the final project, and exists solely to act as temporary glue.

All of that is frustrating, but not insurmountably so. Let’s say you beautifully split all your work in easy to consume pieces. The real problem starts once people review the code. In any organization, code review is a thing. People read code and suggest (or demand) changes.

Let’s say that the change requested was that a method I added in the first change should have an extra parameter. And let’s also say that I happen to use that method in all subsequent patches.

Now I am forced to go back into the first patch and add the parameter, which will make all subsequent patches fail to apply. Not only I now have to apply the cognitive effort to figure out exactly why, but also manually fix all the failures. If I had tested the individual patches before, that testing is now invalid and I have to test them again.

Reorganizing work is a little bit of a problem. But rebasing existing work is a really big problem.

What I wish the Linux community and friends would understand: this is obviously all doable. But if this is not a barrier to entry, I don’t know what is. Having to spend time, effort and both mental and computer cycles in reorganizing, rewriting, reworking, it’s not something that people want to do. I also find the arguments that sometimes surface as “.. but good programmers will have no problems with that” or “but it forces you to think in this or that way which is how good programmers ought to think” disingenuous and unhelpful: My God, I have just acknowledged all the benefits that this method has, and I find doing all of this code reorganization absolutely soul crushing and excruciating. A good analogy is cleaning your home: one can be ready to sing the benefits of keeping your house clean (I agree) and be perfectly capable of vacuuming it (I am), but often times I wouldn’t. For the simple reason that I had other things to do that I deemed more important. Which is why I am so happy with my Roomba, which allows me to realize all the benefits of keeping my home clean, without having to do it myself. Which brings me to…

But there are also things I wish outsiders to Linux would understand: the process that Linux follows has tangible advantages. And no tool is fully up to the task. GitHub, as an example, works very well with a workflow where new code is always added on top of existing code. It is possible to force-push a branch, but comments which were attached to a commit now get orphaned and the discussion appear nonsensical.

Modern development tools make a lot of things easier: You can trigger actions, integrate CI/CD pipelines, notify people of changes, etc. But they objectively make the process of splitting one’s work in easy to consume pieces harder. Plain text e-mails make many things harder, but it also doesn’t stand in the way of enforcing a process that has desirable outcomes.

Even if it was possible to objectively and accurately say how much Linux would win by abandoning this process versus what it would lose (which it isn’t), at this point it is just perfectly human and reasonable to want to keep aspects of a process that has been working so well.

Is there a solution?

I sincerely believe that if we had tools that would allow an organization to realize the same benefits as Linux sees from its process, that would be a huge win for everybody. In face of such tools, maybe even Linux could move away from plain text e-mails.

I don’t have the answer of how such a tool would like like. But maybe I can risk some musings:

  1. Git is a source control system, and source control systems naturally want to append into, not rewrite history. However that gets conflated together with the development process both in GitHub, where development and review is centered around git commits, and for plain text Linux developers who develop in their own local git trees, constantly rewriting history. Perhaps we need to break this in two, and allow development and review to happen in separate tools that are more ephemeral in nature and make it easier to move code around. Git would store the results. One good analogy for this is how CSS allowed HTML developers to dissociate presentation from logic. Remember how HTML used to be before CSS? I, for one, am old enough…
  2. As an extension of the above, perhaps patches with line-by-line diff are making everything harder. Could we have a system in which we could describe in a higher level what changes I am making to the code, and deterministically be able to apply those changes somewhere else? For example, I could say “Move function create_foo() before create_bar()” or “add an integer parameter called y as the last parameter of create_bar()”. Even if subsequent changes will add things to the surroundings of that code that would break line-by-line diff, such a system would still be able to apply the changes to a slightly different versions of the code base that suffered modifications. Perhaps I am too naive and this is just impossible, but seeing some mind-boggling advances that GPT-3 is making makes me think that this may be possible any time soon.
  3. Or in a less ambitious take, perhaps there is an intermediate solution where code review process can happen by always appending code. When all parts are happy, then at that point and at that point only, history is rewritten. A much simpler and easier tool can just help the maintainer verify that the changes made are all around reorganization by ensuring that there is no diff against the code that was approved.

📝 Save this story in Journal.

👩‍💻 Wake up every Sunday morning to the week’s most noteworthy stories in Tech waiting in your inbox. Read the Noteworthy in Tech newsletter.

Veteran infrastructure engineer with decades of experience in low-level systems. Previously Linux Kernel and ScyllaDB. Now at Datadog.