Zack's Kernel News

Zack's Kernel News

Article from Issue 223/2019

Zack Brown reports on enhancing KUnit, arguing over a nonexistent problem, and Cgroup core wars.

Enhancing KUnit

Since its introduction by Brendan Higgins in October 2018, the KUnit unit testing framework has seen a lot of activity, and Brendan has implemented a lot of enhancements and documentation. Recently, he submitted some patches to support the bailing out of any given unit test, which he said was needed for implementing assertions. An assertion is a useful and cute debugging tool, where at a certain point in your code you claim that a particular value or condition is true, because at that point in the code you really expect it to be true. You're not branching execution based on a conditional; you're just saying, "I think the world looks like this." Then, if it doesn't look like "this," the program can crash or throw an exception or do whatever you told it to do in case an assertion turns out to be false.

You can see how unit tests and assertions might get in each other's way. If you write a unit test to feed bogus inputs into a function to make sure it handles that case properly, the bogus inputs could end up triggering the assertion and short circuiting your goal, which is to test the actual code, not the assertion.

Brendan wanted unit tests to abort in the event of triggering an assertion within the code. At that point, there would be no reason to continue that particular test. As he put it at some point deep in the conversation, "The idea with assertions is that you use them to state all the preconditions for your test. Logically speaking, these are the premises of the test case, so if a premise isn't true, there is no point in continuing the test case, because there are no conclusions that can be drawn without the premises."

Near the periphery of his goals, however, in the error-handling portion of his code, he tested to see whether the unit test had failed to actually abort as intended, in which case his patch called BUG(). To that, Frank Rowand remarked, "BUG(), which is a panic, which is crashing the system, is not acceptable in the Linux kernel. You will just annoy Linus if you submit this."

Brendan replied that he thought BUG() would be OK, since KUnit would never be used in a production kernel. Consequently, there was no risk of panicking any production system. Also, he said, if KUnit ever did get to this particular point of execution, everything would go kablooey and trying to proceed normally would be a no-no.

But Frank replied that, in fact, even in a non-production kernel, it would be unacceptable to intentionally produce a panic unless it were the only way to avoid data corruption. He also added that it would be wrong to assume KUnit would not be compiled into production systems. It could very well happen, he said.

Even if the unit test failed, Frank also said that didn't mean it would be best to abort the test as a whole. He said, "The remainder of the unit tests can still run. There may be cascading failures, but that is OK."

Brendan said he didn't want to dispute Frank's point, but he did also reply for clarity that "the path where BUG() gets called only happens if there is a bug in KUnit itself, not just because a test case fails catastrophically." And he went on, "This should never actually happen, even if the assertion that called it was violated. KUNIT_ASSERT_* just says, 'this is a prerequisite property for this test case'; if it is violated, the test case should fail and bail, because the preconditions for the test case cannot be satisfied. Nevertheless, other tests cases will still run."

There was a little more discussion, but nothing conclusive. Several folks went back and forth on how best to handle various cases, like what null pointers traditionally meant in a testing context, and whatnot.

I love watching the Linux kernel ecosystem as it attracts more and more symbiotic entities to it. One of the greatest of these is of course the C compiler, but so are all aspects of revision control, debugging, and testing, including KUnit. It's clear this is a project that's still in the relatively early stages of what will be a long and elaborate life. For me, it's fun to watch KUnit's tendrils descending into the various inner reaches of the kernel development process.

Arguing over a Nonexistent Problem

A strange argument arose over a nonexistent problem in an inscrutable part of the kernel that turns out to be utterly relevant to all things. Andrea Parri started the whole debate with a patch to affect the ordering of memory reads during concurrent operation. Under a hypothetical circumstance, one read might depend on another, yet be performed in the reverse order. Andrea's patch was intended to fix this.

Paul McKenney had no objection and was all set to apply the patch and feed it up to Linus Torvalds, when Peter Zijlstra pointed out that, in fact, the particular hypothetical situation being fixed by Andrea's patch could never be part of an actual physical computer.

Andrea agreed wholeheartedly that there was no way that even the evilest and most depraved mind could ever construct a monstrous perversion of space and time so as to produce a computer capable of triggering the bug that he had fixed. By way of compromise, he said that in the future the bug could always be reinserted into the code if that's what the developers really wanted.

Peter was having none of this sophistry. He said flatly, "What I do object to is a model that's weaker than any possible sane hardware."

And in support of this, Will Deacon remarked, "Whilst I completely agree that relying on the ordering provided by dep **; **> rfi is subtle and error prone, having it forbid the outcome above appeals to a hardware-based mindset of how memory ordering works. In the kernel community, I would posit that the majority of developers are writing code with the underlying hardware in mind and so allowing behaviors in the memory model, which are counter to how a real machine operates, is likely to make things more confusing, rather than simplifying them!"

Alan Stern suggested letting Andrea's impossible-to-hit bug fix go into the kernel, but to also retain the code comment for the behavior that was being replaced, in order to "explain that the operational model predicts this ordering, but it has been left out of the LKMM for practical (not technical) reasons."

Andrea backed off, saying, "I won't try to push this patch further." But he did feel that Peter in particular was being a bit of a stickler, always insisting that kernel features should only be implemented if there were actual users to use them.

Meanwhile, Paul veered into untapped realms of theory, saying that since Andrea's patch was primarily related to issues of concurrent code execution, and reading and writing:

"We had some debates about this sort of thing at the C++ Standards Committee meeting last week.

"Pointer provenance and concurrent algorithms, though for once not affecting RCU! We might actually be on the road to a fix that preserves the relevant optimizations while still allowing most (if not all) existing concurrent C/C++ code to continue working correctly. (The current thought is that loads and stores involving inline assembly, C/C++ atomics, or volatile get their provenance stripped. There may need to be some other mechanisms for plain C-language loads and stores in some cases as well.)

"But if you know of any code in the Linux kernel that needs to compare pointers, one of which might be in the process of being freed, please do point me at it. I thought that the smp_call_function() code fit, but it avoids the problem, because only the sending CPU is allowed to push onto the stack of pending smp_call_function() invocations.

"That same concurrent linked stack pattern using cmpxchg() to atomically push and xchg() to atomically pop the full list would have this problem. The old pointer passed to cmpxchg() might reference an object that was freed between the time that the old pointer was loaded and the time that the cmpxchg() executed. One way to avoid this is to do the push operation in an RCU read-side critical section and use kfree_rcu() instead of kfree(). Of course, code in the idle loop or that might run on offline CPUs cannot use RCU, plus some data structures are not happy with kfree_rcu() delays, so…."

Peter tried to wrap his head around what Paul was asking, but had no sort of luck until he found the scholarly article, "Exploring C Semantics and Pointer Provenance" [1], which begins with the lovely and encouraging pronouncement:

"C values cannot be treated as either purely abstract or purely concrete entities: The language exposes their representations, but compiler optimizations rely on analyses that reason about provenance and initialization status, not just runtime representations. The ISO WG14 standard leaves much of this unclear, and in some respects differs with de facto standard usage – which itself is difficult to investigate."

Peter read this and the rest of that baleful article and then staggered back to the Linux Kernel Mailing List saying, "that's all massive bong-hits. That's utterly insane. Even the proposed semantics are crazy; they include UB and are therefore broken on principle."

Peter cast about desperately right and left, saying, "We need to find a GCC person to find/give us a knob to kill this entire class of nonsense. This is just horrible broken shit."

Paul calmly replied:

"Yes, this all is a bit on the insane side from a kernel viewpoint. But the paper you found does not impose this; it has instead been there for about 20 years, back before C and C++ admitted to the existence of concurrency. But of course compilers are getting more aggressive, and yes, some of the problems show up in single-threaded code.

"The usual response is 'then cast the pointers to intptr_t!, but of course that breaks type checking.

"There is an effort to claw back the concurrency pieces, and I would be happy to run the resulting paper past you guys.

"I must confess to not being all that sympathetic to code that takes advantage of happenstance stack-frame layout. Is there some reason we need that?"

Peter replied:

"Not that I'm aware; but if it gets this 'obvious' case wrong, I worry what else it gets wrong.

"At the very least, we should get this fixed and compile a kernel with the fixed compiler to see what (if anything) changes in the generated code and analyze the changes (if any) to make sure we were OK (or not).

"I mean, yes that example is UB, but the result is also clearly batshit insane."

So, there you have it – from fixing a nonexistent bug, to questioning the nature of reality and one's own place in the universe.

Cgroup Core Wars

Andrey Ryabinin had a bone to pick with the cgroup code. In particular, he didn't like the way Linux handled memory limits. Any time there was more than one virtual machine running, if anything at all hit any kind of memory limit, Linux would attempt to reclaim memory from all cgroups. Andrey didn't like this at all, because "the cgroup that allocates more often always wins. E.g., job that allocates a lot of clean rarely used page cache will push out of memory other jobs with active relatively small all in memory working set."

He pointed out that he knew about the memcg controls, designed to help avoid that very circumstance. But he said these were insufficient, because "memory cgroups are very hard to configure right, because it requires precise knowledge of the workload, which may vary during the execution. E.g., setting memory limit means that job won't be able to use all memory in the system for page cache even if the rest [of] the system is idle."

Essentially, as Andrey said, the current situation required painstakingly error-prone configuration of each and every cgroup.

He submitted a patch that he felt would do better. Instead of manually tweaking each cgroup, he wanted to fix the way memory was reclaimed in the first place. In particular, instead of Linux reclaiming RAM from all cgroups equally, it would reclaim only inactive memory pages – and only from cgroups that were hoarding a lot of inactive pages. Only when everyone's list of inactive pages was low would the kernel then revert to its previous behavior, shrinking all cgroups equally.

Rik van Riel liked this whole concept, but had some technical suggestions for Andrey's patch. Johannes Weiner also strongly supported Andrey's patch and also offered some technical suggestions for improvements.

Roman Gushchin, however, had doubts. As he put it, "What's considered active and inactive depends on memory pressure inside a cgroup. Actually active pages in one cgroup (e.g., just deleted) can be colder than inactive pages in another (e.g., a memory-hungry cgroup with a tight memory.max)." He added, "Also a workload inside a cgroup can to some extent control what's going to the active LRU. So it opens a way to get more memory unfairly by artificially promoting more pages to the active LRU. So a cgroup can get an unfair advantage over other cgroups."

Andrey objected to this description of fairness. He said, "If fair means to put equal reclaim pressure on all cgroups, then, yes, the patch increases such unfairness, but such unfairness is a good thing." He presented, as an example, "Cgroup A with active job loading a big and active working set, which creates high memory pressure, and cgroup B – idle (no memory pressure) with a huge not used cache. It's definitely preferable to reclaim from B rather than from A."

Roman insisted that while Andrey was logically correct, there was still no clear way to distinguish between active and inactive pages, and thus, no clear way to know that Andrey's implementation would do the right thing – or better than existing code – in any given instance. He concluded, "So it means that depending on memory usage pattern, some workloads will benefit from your change, and some will suffer."

There was no conclusion during the thread. And it's not at all clear whether Andrey's patch will ultimately make it into the kernel. Anytime something depends on average use-case patterns across the known universe, coupled with uncertain mechanisms to distinguish between the thing we want and the thing we don't want, we're probably looking at a long and controversial debate.

At the same time, Andrey's initial memory theft exploit may indeed be real – giving preference to the cgroup that allocates and wastes the most memory and pushing other cgroups out entirely. Depending on how you look at it, that could be considered a security hole. If so, there may start to be real pressure to address the problem one way or another, either with something along the lines of Andrey's patch, or something else that avoids the exploit.


  1. "Exploring C Semantics and Pointer Provenance":

The Author

The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching 10,000 messages in a week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Kernel News

    This month Zack discusses the KUnit testing framework, removing the Ancient 00-INDEX files, and identifying process termination without polling.

  • Kernel News

    Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

  • Kernel News

    Zack Brown reports on container-aware cgroups, a different type of RAM chip on a single system, new SARA security framework, and improving GPIO interrupt handling.

  • Security Lessons: cgroups and LXC

    The big virtualization tools like KVM and Xen can’t compete on a small scale with resource-spare cgroups and Linux Containers.

  • LinuxTag 2009: Resource Management with OpenVZ

    OpenVZ project leader Kir Kolyshkin clarified at LinuxTag 2009 that the software also lends itself to Linux resource management.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95