Zack's Kernel News

Zack's Kernel News

Article from Issue 255/2022

Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

Supporting New Hardware Features: UINTR

Sohil Mehta from Intel brought up User Interrupts (UINTR), a hardware technology that allows delivering interrupts directly to user space. Interrupts are a general-purpose idea in multitasking that have been around for decades. The idea is that if something on the system changes state, it might "interrupt" whatever process is currently running, in order to handle this change. After that, process execution returns to normal. There are hardware interrupts, software interrupts, and a whole pile of tools and ideas related to triggering and handling interrupts. Interrupts happen all the time. For example, when multiple programs are running at the same time, the system clock triggers an interrupt many times per second to switch between those processes and make it look like they are all running simultaneously.

With UINTR, a particular interrupt targets a particular process already running in user space. This way the interrupt can bypass time-consuming operations in the kernel and go directly to where it is intended. Sohil said, "There is a ~9x or higher performance improvement using User IPI over other IPC mechanisms for event signaling." And he went on to say that "Today, virtually all communication across privilege boundaries happens by going through the kernel. These include signals, pipes, remote procedure calls and hardware interrupt based notifications. User interrupts provide the foundation for more efficient (low latency and low CPU utilization) versions of these common operations by avoiding transitions through the kernel."

Aside from all the normal sources of interrupts – the kernel or hardware devices – Sohil said that UINTR also introduces interrupts that originate from another user process called User IPIs.

Because UINTR is a hardware feature, Sohil also added, "The first implementation of User IPIs will be in the Intel processor code-named Sapphire Rapids. Refer [to] Chapter 11 of the Intel Architecture instruction set extensions for details of the hardware architecture."

He also said, "I am also planning to talk about User Interrupts next week at the LPC Kernel summit," and concluded, "We are hoping to get some feedback on the direction of overall software architecture – starting with User IPI, extending it for kernel-to-user interrupt notifications and external interrupts in the future."

Sohil also wanted feedback from the kernel developers about the future direction Intel should take with this technology. For example, "Should Uintr interrupt all blocking system calls like sleep(), read(), poll(), etc?" And "Should the User Interrupt Target table (UITT) be shared between threads of a multi-threaded application or maybe even across processes?"

An interesting aspect of UINTR, Sohil pointed out, is that it is not available to all user processes by default. As he put it, "User Interrupts (Uintr) is an opt-in feature (unlike signals). Applications wanting to use Uintr are expected to register themselves with the kernel using the Uintr related system calls."

There were also some security issues that Sohil wanted to point out. He said, "The current implementation expects only trusted and cooperating processes to communicate using user interrupts. [...] Currently, a sender can easily cause a denial of service for the receiver by generating a storm of user interrupts. A user interrupt handler is invoked with interrupts disabled, but upon execution of uiret, interrupts get enabled again by the hardware. This can lead to the handler being invoked again before normal execution can resume."

Sohil added, "To enable untrusted processes to communicate, we need to add a per-vector masking option through another syscall (or maybe IOCTL). However, this can add some complexity to the kernel code. A vector can only be masked by modifying the UITT entries at the source. We need to be careful about races while removing and restoring the UPID from the UITT."

Dave Hansen replied to Sohil's announcement, suggesting some improved wording, and said, "Your problem in all of this is going to be convincing folks that this is a problem worth solving."

Dave also highlighted the 10x speedup as a real selling point for Sohil, and Sohil replied:

"One thing to note, the 10x gain is only applicable for User IPIs. For other source[s] of User Interrupts (like kernel-to-user notifications and other external sources), we don't have the data yet.

"I realized the User IPI data in the cover also needs some clarification. The 10x gain is only seen when the receiver is spinning in User space – waiting for interrupts.

"If the receiver were to block (wait) in the kernel, the performance would drop as expected. However, User IPI (blocked) would still be 10% faster than Eventfd and 40% faster than signals."

Sohil posted a table of speed comparisons between User IPI versus things such as pipes and signals, saying that the latency values were all "relative" to each other, rather than given in absolute quantities of time.

To which Greg Kroah-Hartman replied:

"Relative is just that, 'relative'. If the real values are extremely tiny, then relative is just 'this goes a tiny tiny bit faster than what you have today in eventfd', right?

"So how about 'absolute'? What are we talking here?

"And this is really only for the 'one userspace task waking up another userspace task' policies. What real workload can actually use this?"

Sohil responded with some absolute microsecond measurements – though he hedged a bit, saying, "The data here is more of an approximation with the final performance expected to trend in this direction. [...] The overall gain in a real workload would depend on how it uses IPC."

Sohil also responded to Greg's question about what real-world workloads would use Intel's feature. He replied, "User mode runtimes is one [of] the usages that we think would benefit from User IPIs. Also as Jens mentioned in another thread, this could help kernel to user notifications in io_uring (using User Interrupts instead of eventfd for signaling). Libevent is another abstraction that we are evaluating."

Earlier, Greg also pointed out that Intel's hardware feature was limited to only a single CPU. And he asked, "Are syscalls allowed to be created that would only work on obscure cpus like this one?"

To which Dave said, "Well, you have to start somewhere." And he pointed to memory protection keys – another kernel feature that had a very narrow focus when it first went into the kernel. Dave said, regarding that feature, "At the point that I started posting these, you couldn't even buy a system with this feature. For a while, there was only one Intel Xeon generation that had support. But, if you build it, they will come. Today, there is powerpc support and our friends at AMD added support to their processors. In addition, protection keys are found across Intel's entire CPU line."

And Dave concluded, "I encourage everyone submitting new hardware features to include information about where their feature will show up to end users *and* to say how widely it will be available. I'd actually prefer if maintainers rejected patches that didn't have this information."

To which Greg replied, "So, what are the answers to these questions for this new CPU feature?" But this question went unanswered.

Meanwhile, Pavel Machek took note of the fact that this was an Intel CPU feature and therefore was fundamentally implemented via CPU machine code (i.e., assembly language). Sohil's original post listed about half a dozen assembly instructions on their CPUs that would be responsible for UINTR. And Pavel asked, "Are other CPU vendors allowed to implement compatible instructions? If not, we should probably have VDSO entries so kernel can abstract differences between CPUs."

This wasn't answered, but several developers had a technical discussion about potential use cases for UINTR. Ultimately the thread ended inconclusively – although clearly the Linux kernel developers have a strong interest in supporting any CPU features that exist in the world and are not inherently insecure.

Grooming Corporate Filesystem Maintainers

The Paragon corporation has been the official maintainer of the NTFS3 filesystem for several months. Considering this, there was an interesting conversation on the kernel mailing list recently. Konstantin Komarov from Paragon sent in a patch to update NTFS3, and Linus Torvalds replied:

"Well, I won't pull until the next merge window opens anyway (about a month away). But it would be good to have your tree in linux-next for at least a couple of weeks before that happens.

"Added Stephen to the participants list as a heads-up for him – letting him know where to fetch the git tree from will allow that to happen if you haven't done so already.

"The one other thing I do want when there's big new pieces like this being added is to ask you to make sure that everything is signed-off properly, and that there is no internal confusion about the GPLv2 inside Paragon, and that any legal people etc are all aware of this all and are on board. The last thing we want to see is some 'oops, we didn't mean to do this' brouhaha six months later.

"I doubt that's an issue, considering how public this all has been, but I just wanted to mention it just to be very obvious about it."

The GPL issue is no joke and is one of the main reasons Linus adopted the whole "signed-off-by" procedure that accompanies all patches these days. There were some accusations in the past about non-GPL code being included in the GPL. At the time, it was extremely difficult to trace the origins of each patch, so the dispute dragged on for quite awhile. Finally, Linus decided that each patch needed to be signed off by developers and reviewers, in part for that reason – to make sure that the person submitting the patch affirmed that it was being submitted under the terms of the GPLv2. Then, any future issues could be resolved by simply querying the patch history.

On another level, when Linus made this open reminder to the Paragon people, he might have been trying to preempt any future problems where Paragon might claim they had not been informed of the rules.

But on a less legal level, Linus seems to simply be training a new contributor in the ways of developing the kernel and submitting patches – making sure everything is GPLed, timing their submissions with the merge windows, and so on.

Konstantin thanked Linus for the info and sent the patch along to Stephen Rothwell to include in the linux-next tree.

And Konstantin confirmed, "Indeed, there is no internal confusion about the GPLv2 and we mean to make this contribution."

Stephen replied to Konstantin, saying:

"Thanks for adding your subsystem tree as a participant of linux-next. As you may know, this is not a judgement of your code. The purpose of linux-next is for integration testing and to lower the impact of conflicts between subsystems in the next merge window.

"You will need to ensure that the patches/commits in your tree/series have been:

  • submitted under GPL v2 (or later) and include the Contributor's Signed-off-by,
  • posted to the relevant mailing list,
  • reviewed by you (or another maintainer of your subsystem tree),
  • successfully unit tested, and
  • destined for the current or next Linux merge window.

"Basically, this should be just what you would send to Linus (or ask him to fetch). It is allowed to be rebased if you deem it necessary."

So Stephen too, in the same spirit, seemed to be training the Paragon folks on proper GPL procedures and other code submission practices, as well as managing their expectations about what they'd be likely to see with their code in the linux-next tree in the near future.

Kari Argillander also did some of the same, reminding the Paragon folks to "add reviewed-by tag and signed-off-by tag" and to post the code as a plain-text patch to the mailing list for review by other developers.

Linus posted again at a certain point, with further gentle instruction on how to get code into the kernel after marinating first in linux-next:

"Ok, so I've merged the biggest pieces of this merge window, and I haven't actually seen a NTFSv3 pull request yet.

"I wonder if you expected that being in linux-next just 'automatically' causes the pull to happen, because that's not the case. We often have things 'brewing' in linux-next for a while, and it's there for testing but not necessarily ready for prime time.

"So linux-next is a preparatory thing, not a 'this will get merged'

"So to actually merge things, I will expect to get an explicit pull request with the usual diffstat and shortlog, to show that yes, you really think it's all good, and it's ready to merge.

"Don't worry about – and don't try to fix – merge conflicts with possible other work that has been going on. Stephen fixes it for linux-next and makes people aware of it, and I want to _know_ about them, but I will then handle and re-do the merge conflicts myself based on what I have actually merged up to that point.

"And of course, the other side of that is that if linux-next uncovered other issues, or if there are things holding things up, please _don't_ feel obligated to send me a pull request. There is always the next merge window."

And that was the end of the discussion. It may seem like a surprisingly soft kid-glove approach, but in fact, for all the corporations that contribute code to the Linux kernel, it can be surprising how frequently a company doesn't seem to know its ass from its elbow in terms of working with kernel developers.

Apparently instead of allowing each case of "first contact" to crash and burn for awhile, while their engineers and marketers and lawyers bark conflicting internal orders and end up looking evermore absurd in public, Linus and others are taking the approach of trying to make the initiation as smooth and unsurprising as possible.

It seems like a much better approach than some of the freaky scenarios that have come before.

The Author

The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching 10,000 messages in a week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More