Zack's Kernel News

Zack's Kernel News

Article from Issue 257/2022

In kernel news: The "Filesystem" System; Maintaining GitHub Kernel Forks; and Going In or Going Out?

The "Filesystem" System

Dov Murik from IBM posted some confidential computing (coco) patches. This is gussied-up marketing speak for "sandbox" (i.e., an isolated set of processes that implement security protections for whatever is going on inside). But, I'm not here to talk about coco.

The interesting thing was that Dov's patches used SecurityFS to provide the interface between the user and the secure area, which turned out to be controversial. SecurityFS came along about 15 years ago in the 2.6 timeframe in response to a frightening proliferation of homespun filesystems. The idea was for security modules to use the same SecurityFS application programming interface (API) and just do whatever insanity was boiling their brains in the back end. SecurityFS simply provided a consistent gateway so everyone's various concepts could be straightforwardly navigated by everyone else.

Greg Kroah-Hartman replied to Dov:

"Why are you using securityfs for this?

"securityfs is for LSMs [Linux Security Modules] to use. If you want your own filesystem to play around with stuff like this, great, write your own, it's only 200 lines or less these days. We used to do it all the time until people realized they should just use sysfs for driver stuff.

"But this isn't a driver, so sure, add your own virtual filesystem, mount it somewhere and away you go, no messing around with securityfs, right?"

But James Bottomley (also from IBM) took exception to Greg's statement. He replied, "we use it for non LSM security purposes as well, like for the TPM BIOS log and for IMA. What makes you think we should start restricting securityfs to LSMs only? That's not been the policy up to now." He added, "I really think things with a security purpose should use securityfs so people know where to look for them."

Greg replied that using SecurityFS for LSMs "was the original intent of the filesystem when it was created, but I guess it's really up to the LSM maintainers now what they want it for." But he suggested, "Why not just make a cocofs if those people want a filesystem interface? It's 200 lines or so these days, if not less, and that way you only mount what you actually need for the system. Why force this into securityfs if it doesn't have to be?"

James pointed out that coco was only the first of potentially many uses of the particular feature they were implementing – transferring secret data into a virtual machine without the virtual machine itself being able to read it. And he said, "It's not being forced. Secrets transfer is a security function in the same way the bios log is."

Greg was very dubious and was surprised to hear that the BIOS log used SecurityFS. James confirmed, "Yes. It's under /sys/kernel/security/tpm0/ All the ima policy control and its log is under /sys/kernel/security/ima/ that's why I think declaring securityfs as being for anything security related is already our de facto (if not de jure) policy."

And James also added, "I know Al [Viro] likes this business of loads of separate filesystems, but personally I'm not in favour. For every one you do, you not only have to document it all, you also have to find a preferred mount point that the distributions can agree on and also have them agree to enable the mount for, which often takes months of negotiation. Having fewer filesystems grouped by common purpose which have agreed mount points that distros actually mount seems a far easier approach to enablement."

Greg did not find these arguments convincing. For one thing, he said, regardless of whether you make a new filesystem or use an existing one, you still need to document the data you store in it.

As for Linux distributions negotiating on mount point – or deciding to adopt the given feature in the first place – Greg felt this was normal. He said, "Enabling it does take time, which is good because if they do not think it should be present because they do not want to use it, then it will not be, which means either they do not need your new feature, or you have not made it useful enough. So again, not an issue."

Greg's fundamental issue was security. The wider the array of SecurityFS users, the more likely each of those users would be to add more features to SecurityFS, exposing various powers to user space, and generally presenting a more tempting surface for attackers to look for security holes.

The discussion ended inconclusively. There is probably not much at stake either way – if coco uses SecurityFS it will simply be doing the same as other things that use SecurityFS. And if it makes its own filesystem, it will likewise be doing the same as other things. What's interesting about a discussion like this is that it's possible to witness development policies emerging and transforming.

Various developers have different interpretations about how the parts of the kernel should be used for further development. These differences don't generally interfere with the their ability to get patches in, so they can happily continue working, each believing different things about how to code for the kernel.

Then when something like IBM's coco patches come along, suddenly everyone's different ideas become relevant in a practical way. Sometimes a giant clash follows and sometimes not. In the current case, we can see this question: Should projects all create their own filesystems, or should similar users group themselves together to minimize filesystem proliferation? Eventually the question will probably have to be resolved. But for now, there are just these faint rumblings.

Maintaining GitHub Kernel Forks

Konstantin Komarov from Paragon submitted an NTFS patch – or rather, he requested that Linus Torvalds pull the patch from Paragon's GitHub tree. Linus replied, saying:

"For github accounts (or really, anything but where I can just trust the account management), I really want the pull request to be a signed tag, not just a plain branch.

"In a perfect world, it would be a PGP signature that I can trace directly to you through the chain of trust, but I've never actually required that.

"So while I prefer to see a full chain of trust, I realize that isn't always easy to set up, and so at least I want to see an 'identity' that stays constant so that I can see that pulls come from the same consistent source that controls that key."

A few minutes later, he replied to his own email, saying he would just pull the tree this time to avoid delay, but he asked Paragon to set up PGP and start using signed tags moving forward.

On a related note, Linus added:

"I notice that you have a github merge commit in there.

"That's another of those things that I *really* don't want to see – github creates absolutely useless garbage merges, and you should never ever use the github interfaces to merge anything.

"This is the complete commit message of that merge:

Merge branch 'torvalds:master' into master

"Yeah, that's not an acceptable message. Not to mention that it has a bogus '' committer etc.

"github is a perfectly fine hosting site, and it does a number of other things well too, but merges is not one of those things.

"Linux kernel merges need to be done *properly*. That means proper commit messages with information about what is being merged and *why* you merge something. But it also means proper authorship and committer information etc. All of which github entirely screws up.

"We had this same issue with the ksmbd pull request, and my response is the same: the initial pull often has a few oddities and I'll accept them now, but for continued development you need to do things properly. That means doing merges from the command line, not using the entirely broken github web interface."

There was no discussion or controversy following Linus's posts. The interesting thing is that the signature issue is not hypothetical. Linus had to endure some legal hassles a few years back over licensing issues surrounding various patches. Consequently, patch signing became the standard way to trace all patches back to an origin point for licensing verification purposes.

Going In or Going Out?

An ongoing question in operating systems development is which is better: a microkernel or a monolithic kernel? The debate is a little odd since there is no clear definition of a monolithic kernel, although everyone seems to agree Linux is one.

The microkernel camp, in the open source world, is the GNU project with its Hurd microkernel. The idea of a microkernel is that user space should do as much work as possible and that only the bare minimum functionality should be implemented in the kernel itself. By doing it this way, the microkernel people argue that it's possible to produce a highly secure system because the kernel has fewer attack vectors. The smaller scope will mean less code churn over time and therefore fewer bugs that might be exploited by attackers.

The debate is odd because it frames the question in terms of efficiency and safety versus total chaotic insanity. In fact (and also the subject of this column), pretty much everyone wants their operating system kernel to be as small as possible for exactly those same reasons. Perhaps the debate really ought to be over exactly what belongs in user space and what doesn't.

For example, in the Linux Kernel Mailing List recently, Shijie Huang posted some non-uniform memory access (NUMA) patches. NUMA is a multiprocessing hardware architecture like symmetric multiprocessing (SMP). But whereas SMP hides the differences between the memory and CPUs that might be in use at any given time, NUMA can have some hardware that is faster than others. For consumer products, this means less expensive manufacturing. The question for operating system developers is how to support the allocation process so that faster hardware is used when needed and slower hardware is used when speed isn't as important.

Shijie's patch addressed the need for the OS to be aware of which hardware, and therefore which speeds, would be used for a given task. For example, a given file loaded into memory would have only one cached version of itself, in spite of the fact that there were multiple hardware speeds that might be relevant to using that cached version.

Shijie wanted users to be able to specify the NUMA needs of any given file, so the kernel would use the faster or slower hardware whenever it needed to access that file. In other words, he asked, "is it possible to implement the per-node page cache for programs/libraries?"

This, however, was easier said than done. As you might imagine, controlling which hardware would operate on a given file takes place in a fairly deep part of the kernel. As Matthew Wilcox put it in his response:

"At this point, we have no way to support text replication within a process. So what you're suggesting (if implemented) would work for processes which limit themselves to a single node. That is, if you have a system with CPUs 0-3 on node 0 and CPUs 4-7 on node 1, a process which only works on node 0 or only works on node 1 will get text on the appropriate node.

"If there's a process which runs on both nodes 0 and 1, there's no support for per-node PGDs. So it will get a mix of pages from nodes 0 and 1, and that doesn't necessarily seem like a big win."

Al Viro also replied to Shijie, saying, "What do you mean, per-node page cache? Multiple pages for the same area of file? That'd be bloody awful on coherency…"

Coherence is a crucial part of operating system design – it means that when the kernel writes to a piece of memory, subsequent attempts to read that same piece of memory should show the written value rather than show an older value that was there before the write. Loss of memory coherence in an operating system is sort of the equivalent of suddenly discovering the building you're in is engulfed in flames.

Shijie agreed that coherence was important, but he said that if the data was restricted to being read-only, coherence wouldn't be a problem because the values would never change. He posted a patch to demonstrate what he was talking about.

Linus Torvalds came in at this point, saying, "You absolutely don't want to actually duplicate it in the cache."

There was some implementation discussion between Shijie and various others, but at one point Linus came in again, a bit harder, saying:

"You can't have per-node i_mapping pointers without huge coherence issues.

"If you don't care about coherence, that's fine – but that has to be a user-space decision (ie 'I will just replicate this file').

"You can't just have the kernel decide 'I'll map this set of pages on this node, and that other [set] of pages on that other node', in case there's MAP_SHARED things going on.

"Anyway, I think very fundamentally this is one of those things where 99.9% of all people don't care, and DO NOT WANT the complexity.

"And the 0.1% that _does_ care really could and should do this in user space, because they know they care.

"Asking the kernel to do complex things in critical core functions for something that is very very rare and irrelevant to most people, and that can and should just be done in user space for the people who care is the wrong approach.

"Because the question here really should be 'is this truly important, and does this need kernel help because user space simply cannot do it itself'.

"And the answer is a fairly simple 'no'."

The discussion ended at that point, and Shijie went back to the drawing board.

For me, this is an example of a feature that could have gone into the monolithic Linux kernel. It would have performed a useful function for a certain bunch of users. It would have handled use cases that exist in the real world. It would have made the kernel as a whole slower, but it would be exactly the sort of thing the microkernel people would expect a monolithic kernel to include.

It seems very clear that proponents of both microkernels and monolithic kernels believe that if something can be done in user space, then it shouldn't be put into the kernel. And the point of difference becomes what exactly do we use to draw the line? At this point, it becomes largely a matter of personal opinion. Linus would probably say that it depends on a lot of factors. Can the feature run much faster in the kernel than in user space? Will the kernel code look sane and be maintainable? And so on. And probably the GNU Hurd people would have different opinions, as would the Redox people, the Genode people, the HelenOS people, and the contributors to various other microkernel projects.

The Author

The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching 10,000 messages in a week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Drone Brain Goes Open Source

    DARPA and NICTA release the code for the ultra-secure microkernel system used in aerial drones.

  • Linux News

    Updates on Technologies, Trends, and Tools

  • Kernel Tips

    Worried about a recent security exploit? Want to take advantage of a new hardware feature? You don’t need to be a Linux expert to patch and compile the Linux kernel. We'll show you how to get started.

  • Minix 3

    Minix is often viewed as the spiritual predecessor of Linux, but these two Unix cousins could never agree on the kernel design. Now a new Minix with a BSD-style free license is poised to attract a new generation of users.

  • Kernel News

    Chronicler Zack Brown reports on the NOVA filesystem, making system calls userspace only, and extending module support to plain executables. 

comments powered by Disqus