Zack's Kernel News

Kernel News

Article from Issue 237/2020

Zack covers: When a Security Hole Is OK; Kernel Documentation Updates; and Security Through Obscurity

When a Security Hole Is OK

Eric W. Biederman recently posted a patch to replace a 32-bit counter with a 64-bit counter. This would fix the problem that, as he put it, "With care an attacker can cause exec_id wrap and send arbitrary signals to a newly exec'd parent."

He added that he had tested this hole and found that he could wrap the 32-bit exec_id and exploit the problem in two weeks. Faster systems, of course, could do it more quickly.

However, Eric did acknowledge that on 32-bit CPUs, "reading self_exec_id is no longer atomic and can take two read instructions." This meant that on 32-bit systems there would be a microscopic window of time when the actual self_exec_id value would not match the value being read by the code. During that time, he said, this security hole remained exploitable.

Linus Torvalds acknowledged the patch, saying it didn't seem urgent, and asked Eric to put it in his own source tree, where it would percolate up to Linus the next time he took a merge from that tree.

Jann Horn was concerned. Jann wrote some test code that reduced Eric's 14-day rollover to 14 hours.

Jann said that while 64-bit CPUs would probably be fine, 32-bit CPUs would be vulnerable to "store tearing." Store tearing is the name for what Eric described – when it takes two instructions to write a piece of data, and those two instructions can be "torn" by malicious code to take advantage of the fact that the actual data is different from what we think it is in that brief instant. The reciprocal term "load tearing" refers to when it takes two instructions to read a piece of data.

But Linus said he didn't care.

He said that in order to exploit that vulnerability, "first you'd have to work however many weeks to do 4 billion execve() calls, and then you need to hit basically a single-instruction race to take advantage of it. Good luck with that. If you have that kind of God-like capability, whoever you're attacking stands no chance in the first place."

Jann agreed the risk was low, but said the code was simply technically wrong and amounted to having an "intentional race" condition in the kernel. This could theoretically break tools that tested kernel security. It could also lead developers to unwittingly copy that same broken code for use elsewhere. Jann suggested at least including a code comment to let people know this wasn't how to do things.

Even better, Jann said, would be that, "Since the read is already protected by the tasklist_lock, an alternative might be to let the execve path also take that lock to protect the sequence number update, given that execve is not a particularly hot path."

Linus agreed that a code comment would be fine. However, tasklist_lock was very time consuming and already highly utilized in the kernel and wasn't worth it for a security hole that could essentially never be exploited.

Eric asked Linus which code paths used tasklist_lock so much. He offered to go through those code paths and clear out some of the locks if Linus thought it was a problem.

Linus explained:

"It's generally not bad enough to show up on single-socket machines.

"But the problem with tasklist_lock is that it's one of our remaining completely global locks. So it scales like sh*t in some circumstances.

"On single-socket machines, most of the truly nasty hot paths aren't a huge problem, because they tend to be mostly readers. So you get the cacheline bounce, but you don't (usually) get much busy looping. The cacheline bounce is 'almost free' on a single socket.

"But because it's one of those completely global locks, on big multi-socket machines people have reported it as a problem forever. Even just readers can cause problems (because of the cacheline bouncing even when you just do the reader increment), but you also end up having more issues with writers scaling badly.

"Don't get me wrong – you can get bad scaling on other locks too, even when they aren't really global – we had that with just the reference counter increment for the user signal accounting, after all. Neither of the reference counts were actually global, but they were just effectively single counters under that particular load (ie the count was per-user, but the load ran as a single user).

"The reason tasklist_lock probably doesn't come up very much is that it's _always_ been expensive. It has also caused some fundamental issues (I think it's the main reason we have that rule that reader-writer locks are unfair to readers, because we have readers from interrupt context too, but can't afford to make normal readers disable interrupts).

"A lot of the tasklist lock readers end up looping quite a bit inside the lock (looping over threads etc), which is why it can then be a big deal when the rare reader shows up.

"We've improved a _lot_ of those loops. That has definitely helped for the common cases. But we've never been able to really fix the lock itself."

The conversation ended there.

It seems as though Jann has made a good case that the opportunity to encounter the race condition can be accomplished a lot quicker than Eric first thought. Though as Linus pointed out, an attacker would need to wrap the 32-bit exec_id variable over and over again in order to hit the actual race condition.

I often talk about how in Linux development security trumps all other considerations. And this is true, but it doesn't mean that all security holes get plugged. In a situation like this, for example, Linus made the judgment that "If you have that kind of God-like capability, whoever you're attacking stands no chance in the first place." It sounds a little flippant, but it's an actual principle of kernel development.

The same principle accounts for why Linus has sometimes refused to add security patches for cases when an attacker has physical access to a machine.

In those cases, Linus's assumption is, if someone has physical access to your machine, then that's the ball game. You've been owned. And trying to throw tiny obstacles into the path of that owning is simply pointless.

This is one reason why there is a contingent of hackers who feel that Linus is bad on security. That contingent believes it's important to shrink all possible attack surfaces as small as possible, whereas Linus doesn't care about security issues that are only theoretical, but impossible in practice, or that he considers irrelevant, because the conditions creating the vulnerability would also create other much larger vulnerabilities.

In general, if a security hole can't be exploited, Linus doesn't care about it. And if a security hole can only exist in the context of a much larger security hole, Linus won't care about the smaller hole if the larger hole remains unpatched.

Kernel Documentation Updates

There was recently some debate over documentation file formats. John Mathew posted some documentation for the scheduler, using reStructuredText format (RST) and DOT (a graph description language).

There will always and forever be holy wars fought over file formats. This time, Peter Zijlstra remarked, "I despise RST; it's an unreadable mess." As an example, he pointed to some DOT code describing a graph and said "it pretty much mandates you view this with something other than a text editor."

Daniel Bristot de Oliveira, in defense of DOT, replied, "The good thing about the dot format is that we can convert it to many other formats, including text." And to prove it, he ran some of the documentation through the "graph-easy" program and produced a lovely ASCII art depiction of what was being described.

Peter replied, "Oh, I know and love dot files; I generate them occasionally. But they stink as end-result, which is what it is here. If you can't read a document (or worse comment) in a code editor, it's broken (and yes, I know some subsystems have a different opinion here)."

And Valentin Schneider remarked, "Agreed. Feel free to use dot to draw stuff, but please include the textual version in the doc. If you really insist on having a fancier image in the web output, have a look at things like ditaa (or whatever equivalent is supported in rst)." The ditaa program is used to convert ASCII art diagrams into graphical images.

Jonathan Corbet also suggested, "just put the ascii art into the doc as a literal block. I don't see any real reason to embed Dot stuff unless there's really no alternative."

Daniel agreed with Jonathan, saying that it might make the most sense to keep the DOT file separate and use it to generate the ASCII art for the documentation – sort of a source file generating another source file.

Meanwhile Matthew Wilcox, coming from an entirely different file format religion, mentioned to John, "please format your line lengths to more like 75 characters; don't go all the way to 80. Just use 'fmt'; its defaults work fine." But there was no discussion on that point.

Documentation is an eternal struggle, and it's a miracle any two developers ever agree on anything about it. You've got your RST people, your wiki people, your plain text people, your man page people, your code comment people, and the list goes on. And since finding a developer who wants to write documentation is sort of like finding a two-headed donkey, it's often best to just let them do what they want and not ask too many questions. But that won't stop all the other two-headed donkeys from griping about it.

Security Through Obscurity

It can be very tempting to try to secure something by making it very difficult to decipher. A lot of websites do this with JavaScript. They need the code to run on the user's system so they have to transmit it as code, but they also don't want the user to figure out what it's doing, so they run it through a "compiler" to turn it into mush before they send it.

It doesn't really work. Someone always cracks the code, and that's the end of the security. It's pretty funny to realize the number of JavaScript "compilers" and "decompilers" available in the world.

In February, Ilya Dryomov had a similar objection to a security feature in the Linux kernel. He noticed that NULL and IS_ERR() error pointers were obfuscated and remarked, "it probably wouldn't take long for an attacker to find the hash that corresponds to 0. Although harder, the same goes for most common error values, such as -1, -2, -11, -14, etc."

Rather than increasing security, he added, "Obfuscating them just makes debugging based on existing pr_debug and friends excruciating."

He posted a patch to remove the obfuscation.

The issue is made more daunting by the fact that error codes are not universal. In a later email, Ilya pointed out that the EAGAIN code was different on the Alpha architecture than on Intel.

However, the patches fell right down on the floor and lay there for a couple of months, until Ilya brought them up again to ask if they would be accepted or not. He pointed out that although there had been some discussion elsewhere on the mailing list, the fix itself had come through the gauntlet OK. And Ilya added, "If you want to restructure the test suite before adding any new test cases, v1 doesn't have them. Other than the test cases, the only difference between v1 and v2 is added reviews and acks."

Andy Shevchenko replied, saying that Petr Mladek, who would ordinarily be the one to accept such patches, "has some obstacles that prevent him to pay attention on this and actually do anything right now for Linux kernel. If Rasmus, Sergey, you and maybe others will got consensus here, I think Andrew can take it thru his tree."

Ilya remarked that the other two maintainers, Sergey Senozhatsky and Steven Rostedt, had already approved the patches. Ilya reminded Andy that the delay therefore was because Petr had wanted to restructure the test suite first. Of course if he wasn't available for kernel work….

Andy suggested generating the patches again, based on the latest kernel release, and submit them for the next upcoming release. Then everyone would have another chance to say yay or nay.

At this point Linus Torvalds replied, "For something simple and straightforward like the obfuscation patches, I can take them directly if people are unavailable. Just make sure to re-post the patches, and make it clear that it's to me (because usually if I'm cc'd I end up just reading and then ignoring things that I expect to go through a maintainer)."

So that was that.

The interesting thing about all this to me is the way it reveals Linus's policy towards security patches. In this case, obfuscating the error codes did provide a benefit, in that they forced an attacker to do work in order to figure them out. And to a lot of security hackers, any obstacle they can put in front of a bad actor is a good obstacle. But Linus seems to disagree. He seems to believe that obstacles are only good if they actually prevent the bad actor from penetrating system security; anything else only tends to make the code less maintainable, more bug-prone, slower, and uglier. He wants the kernel source code to be sleek and beautiful, without any extras that are there for theoretical purposes only.

The Author

The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching 10,000 messages in a week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Kernel News

    Chronicler Zack Brown reports on the NOVA filesystem, making system calls userspace only, and extending module support to plain executables. 

  • Kernel News

    Zack Brown reports on: Trusted Computing and Linux; Load Balancer Improvements; and New Random Number Handling.

  • Linus Torvalds Invites Attackers to Join the Kernel Community

    He wants attackers to join the community instead of attacking the code.

  • Kernel News

    In kernel news: Heap Hardening Against Hostile Spraying; and Core Contention Improvements … or Not.

  • Kernel News

    Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

comments powered by Disqus