Zack's Kernel News

Zack's Kernel News

Article from Issue 258/2022
Author(s):

This month in Kernel News: Git Merge "Simplification" Advice; Loading Modules from Containers; Git Tree Synchronicity; and The New "No New Warnings" Warning.

Git Merge "Simplification" Advice

Bjorn Helgaas submitted some PCI patches in the form of a merge request from another Git tree. This is a standard part of the development process for larger distributed projects like the Linux kernel, and this one included work from dozens of contributors. The idea is that a bunch of people work on a given sub-project in relative isolation so their changes don't break everyone else's work on the main Linux tree. Then, with the merge request, the contributors ask Linus Torvalds to resolve any conflicts that their changes might have produced with other changes going into the kernel at the same time. No biggie, nothing to see here. Tens of thousands of contributors can get their hands dirty at the same time, without throwing dirt onto any of their fellow contributors' hands while they're at it.

In this case, Linus noticed some wonky twirling going on behind the scenes, and it posed a problem for him. Specifically, Bjorn and his fellow PCI travelers had already done some merging from multiple separate trees (used for different sub-sub-projects within their sub-project), followed by a patch reversion, so that all merges going into Linus's official repository would seem to come from the same tree. It's not psychotic; they were just trying to keep things simple.

So first of all, Linus objected to the patch reversion itself. Patch reversions remove a patch that was previously accepted into a tree, but a reversion is itself a patch that also needs to be accepted via the same process as other patches – including having a meaningful commit message, which the PCI patch reversion did not. However, it's a relatively common occurrence for patch reversions to have no meaningful commit message – developers don't tend to see the point of it because all the patch reversion does is take something out that had recently been put in.

In this particular case, Linus pointed out that the purpose of the reversion was not that the patch itself had been bad – it was specifically in order to reduce potential conflicts when Linus would eventually merge the PCI trees into his own.

There were two problems with this. First, the normal conclusion to draw about a patch reversion is that the patch was bad. In such a case, if the same patch can be seen coming from another tree (which was the case here – the PCI folks reverted the patch so it could come back again from somewhere else), the natural conclusion to draw, said Linus, was that he should avoid merging the patch from that other tree as well. If the identical patch was bad once, it would be bad again, right? So Linus had to figure out that the patch had actually been intended to go into the tree – which he did, but it meant more work.

The second problem was that the reversion and behind-the-scenes merges had been done to reduce the conflicts Linus would see against his own tree. Linux declared it "pure and utter garbage, because I end up with the merge conflict *ANYWAY* due to the other changes, and now instead of going 'ok, the PCI tree had that same commit, all good', I have to go 'ok, so the PCI tree had the same commit, but it was reverted in the networking tree, so now I have both sides making different changes and a very confusing merge'."

Linus compared this to a similar situation where developers "rebase" their tree as a different way to avoid merge conflicts. In Git, sub-projects will pull the latest version of the main project tree into their own to act as a "base" and do all their work against that version. But by the time they're ready to submit the code upstream to Linus, his kernel has already advanced. So the sub-project may choose to pull from his tree again before submitting their own patches, thus changing their base version to match the main official tree. Then they can resolve the conflicts themselves and send a conflict-free merge request to Linus.

The problem with rebasing is that it can utterly destroy the make-good-sense-ness of each individual patch in that merge, because now the code differences are not against the version the developers thought they were working on but are against other versions that the developers never even looked at. The end result is the same code, but the history of those developers' changes has been reduced to a fine slurry and garbled.

Linus wants clarity in all kernel patches. Among other things, it helps with debugging, when developers may need to identify an earlier patch to revert. If the bogus patch itself makes as much sense as possible, it's easier to see where it messed things up.

Linus summed up his official policy as, "Don't make my life 'easier' by doing stupid things, and DO put a reason for every single commit you do. Reverts aren't 'oh, I'm just turning back the clock, so no reason to say anything else'."

Loading Modules from Containers

Thomas Weißschuh wanted to make life easier. He said, "We are using nested, privileged containers which are loading kernel modules. Currently we have to always pass around the contents of /lib/modules from the root namespace which contains the modules." He posted a Request for Comments (RFC) suggesting a new request_module() system call, which would allow his containers to get modules from the root system without having to do so much bookkeeping.

The whole point of containers, though, is that they are supposed to resemble a separate running system as much as possible. This is the principle behind Google and Amazon offering cloud services that provide computer "instances" to users. Those instances are nothing more than Linux containers running on top of other systems.

Greg Kroah-Hartman replied to Thomas, saying, "So you want any container to have the ability to 'bust through' the containers and load a module from the 'root' of the system? That feels dangerous." He went on, "why are modules somehow 'special' here, they are just a resource that has to be allowed (or not) to be accessed by a container like anything else on a filesystem."

Thomas said he wasn't trying to dissolve the barriers between the host and virtual system entirely – he wanted to use the CAP_SYS_MODULE capability to give a container the right to load modules in this way. Kernel capabilities were introduced in the Linux kernel version 2.2, to divvy up abilities that used to be under the one umbrella of the "root" user. Before then, the root user could do anything and revel in the glory and the blood. After version 2.2, the root user was chastened and humbled and had to check the list of allowed capabilities before doing just any old thing.

The thing about modules, Thomas said, was that they needed to match the running kernel on the host system in order to load. And this was something only the root namespace could access. And he said, "the biggest problems would probably arise if the root namespace has non-standard modules available which the container would normally not have access to."

Meanwhile Andy Lutomirski said, regarding Thomas's initial post, "I feel like I'm missing something, and I don't understand the purpose of this syscall. Wouldn't the right solution be for the container to have a stub module loader (maybe doable with a special /sbin/modprobe or maybe a kernel patch would be needed, depending on the exact use case) and have the stub call out to the container manager to request the module? The container manager would check its security policy and load the module or not load it as appropriate."

Christian Brauner agreed with Andy's assessment that the container manager should be the gatekeeper for loading modules into containers.

Andy asked Thomas, "What exactly is the container doing that causes the container's copy of modprobe to be called?" To which Thomas replied, "The container is running an instance of the docker daemon in swarm mode. That needs the 'ip_vs' module (amongst others) and explicitly tries to load it via modprobe." Swarm mode is a Docker virtualization feature that just means you're starting up and managing a bunch of virtual systems. And IP Virtual Server (ip_vs) is a load balancer for distributing network requests amid the "swarm."

To which Andy replied, "Do you mean it literally invokes /sbin/modprobe? If so, hooking this at /sbin/modprobe and calling out to the container manager seems like a decent solution." In other words, the container could invoke modprobe as if nothing special was going on, but modprobe would "hook" or "catch" or "get weird with" that call. Diverting it to the container manager would then do the right thing with the request, thus protecting the border between the container and the host system.

Thomas liked this idea and said he'd see if it would work for his project.

Meanwhile Luis Chamberlain's ears pricked up at this interesting use case and suggested that someone write documentation that could have helped Thomas and subsequently help others in the future to handle similar situations in similar ways.

Andy replied enthusiastically, "If someone wants to make this classy, we should probably have the container counterpart of a standardized paravirt interface. There should be a way for a container to, in a runtime-agnostic way, issue requests to its manager, and requesting a module by (name, Linux kernel version for which that name makes sense) seems like an excellent use of such an interface."

And suddenly there was an implementation discussion going on.

Apparently Thomas's original problem is something a bunch of people have been duct taping in various ways for quite some time, and a real solution would be welcome. Christian rattled off several ways users had loaded modules from the host system into containers. And he asked Andy, "So what was your idea: would it be like a device file that could be exposed to the container where it writes requests to the container manager? What would be the advantage to just standardizing a socket protocol?"

To which Andy said, "My idea is standardizing *something*. I think it would be nice if, for example, distros could ship a /sbin/modprobe that would do the right thing inside any compliant container runtime as well as when running outside a container."

And Christian replied with this suggestion:

"I think we never want to trust the container's modules.

"What probably should be happening is that the manager exposes a list of modules the container can request in some form. We have precedence for doing something like this.

"So now modprobe and similar tools can be made aware that if they are in a container they should request that module from the container manager be it via a socket request or something else."

Andy asked, "Why bother with a list? I think it should be sufficient for the container to ask for a module and either get it or not get it." And Christian clarified, "I just meant that the programs in the container can see the modules available on the host. [...] But yeah, it can likely be as simple as allowing it to ask for a module and not bother telling it about what is available."

Thinking further about this, Andy remarked, "If the container gets to see host modules, interesting races when containers are migrated CRIU-style will result." CRIU is a tool for completely stopping a running container and storing it to disk. The container can then be started up again at any time, from exactly the moment it stopped. So Andy was apparently saying that if a container saw a list of available modules and then was frozen with CRIU, migrated to a new running system, and then started up again, it might then request a module on that list that was no longer available on its new host system.

The conversation ended there, but it's clear that the ability for containers to load modules via the host system will be cleaned up and regularized at some point in the not-too-distant future. To me, it's exciting that Thomas ran into this thorny problem, tried to solve it in one way, almost got the smackdown, but then it turned out that other people joined in to find a more general solution that would work for all cases and make many lives easier.

Git Tree Synchronicity

Konstantin Komarov from Paragon Software has been maintaining the NTFS3 code for a little while and is ironing out the developer process for patch submissions, merge and pull requests, and whatnot. Konstantin asked Linus Torvalds, "Right now my github repo [is] still based on 5.14-rc7. Do I need to update it with git merge up to 5.15-rcX? Or will it be ok to send git pull request as is and back merge master only when 5.15 will release?"

Linus replied:

"Oh, keep your previous base, and just send me a pull request with your changes and no merge.

"In fact, the best workflow is generally to avoid merging from me as much as humanly possible, but then if you notice that we're all in sync, and you have nothing pending in your tree, you can basically fast-forward and start any new development at some new point.

"But even then, it's a good idea to make that new point be something well-defined – like a full release, or at least an rc release (usually avoiding rc1 is good, since rc1 can be a bit experimental).

"But I have no problems pulling from a git tree that is based on older kernels. I much prefer tha[t] to having people rebase their work overly aggressively, or having people do lots of merges of my tree.

"At some point, if you end up being multiple releases behind, it ends up being inconvenient for both of us just because some infrastructure has changed, so _occasionally_ syncing up is just a good idea.

"In my experience, people tend to do it too much, rather than too little. Don't worry too much about it."

And that was that.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Kernel News

    Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

  • Kernel News

    Adding git Documentation; Untangling the System Call Situation; and Bit or Bitmap?

  • Kernel News

    Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

  • Kernel News

     

  • Kernel News

    Chronicler Zack Brown reports on the NOVA filesystem, making system calls userspace only, and extending module support to plain executables. 

comments powered by Disqus