Zack's Kernel News

Zack's Kernel News

Article from Issue 238/2020

Zack Brown looks at improving memory management, simplifying(ish) the Kernel Build System, and detecting firmware crashes.

Improving Memory Management

SeongJae Park wanted to improve Linux's memory management. To do this, he wanted to implement finer-grained data access tracking. Then, using that data, Linux would have a better idea of how to move data around in memory and which data to swap out to disk.

The problem was that finer-grained tracking would take CPU time that the system might otherwise use for executing user code, so there'd be a slowdown. On the other hand, the improved memory management made possible by SeongJae's patch would speed up the system – potentially enough to justify the hit taken by gathering the data.

SeongJae pointed out that there were already patches to take advantage of the kind of fine-grained data he wanted to track, but none of those patches had been merged into the Linux kernel because of the lack of that kind of tracking.

So, he announced DAMON, a data access monitoring subsystem that he claimed was accurate, lightweight, and scalable to large systems. He had also implemented it as a standalone kernel module, with a general-purpose API to allow not just the kernel to track memory access, but also allowed user code to do it too.

SeongJae posted some numbers illustrating the speed and effectiveness of the tracking system and said he had a lot more stuff planned for the future. He wanted to automate a lot more of DAMON's action at run time and support more than just RAM – he felt his code could quickly be made to support tracking the page cache, NUMA nodes, specific files, and whole block devices.

He anticipated the potential criticism that DAMON should be implemented as part of the perf performance analysis system, saying that while perf was restricted to use by the Linux kernel only, DAMON's API supported user programs as well. So to that extent, the two tools offered different capabilities.

There was no discussion of whether or not to accept SeongJae's patches into the kernel, but there was a bit of a technical discussion between him and Jonathan Cameron. Jonathan noticed that SeongJae's code could merge an arbitrary number of regions of memory into a single region, while splitting regions of memory would only halve each region, thus doubling the total number of regions. He felt that since any number of regions could merge into one, but a single region could not split into any number of smaller regions, this created an imbalance that could cause the system to end up with a very small number of regions in the normal case.

The problem with that is that a program could end up allocating a memory region that was much larger than it actually needed, leaving only a small amount of memory available for other allocations.

Jonathan proposed changing DAMON's region splitting algorithm to split regions twice if certain conditions were met, thus ensuring that there would remain a useful number of regions for the system's needs.

SeongJae agreed that Jonathan's proposal would work. But he also pointed out that DAMON's existing algorithm would accomplish the same thing, but more slowly. And Jonathan's fix would complicate the code somewhat. SeongJae wanted to wait until users had time to really expose any serious bugs and only then consider incremental improvements like Jonathan's suggestion.

But Jonathan didn't agree that the current code would accomplish the same thing, only more slowly. He felt the current code would not resolve the splitting/merging conflict at all. He posted an example, playing out the logic of SeongJae's code, showing how it wouldn't accomplish the same thing as Jonathan's suggestion, even at slower speeds.

And this made sense to SeongJae. The problem, as Jonathan had pointed out, was that if a "hot region" of memory was right in the middle of a given region, it wouldn't be noticed, because the splitting process would split that hot region itself.

So, Jonathan admitted this was a "pathological case," and SeongJae accepted the patch to fix it.

I find discussions like this one interesting, because it illustrates the open source saying, "given enough eyeballs, all bugs are shallow." Jonathan identified a true bug, albeit one that was extremely unlikely to ever be a problem. SeongJae didn't see it at first, but finally together the two of them were able to produce an improved patch. It's a fun aspect of the open source development model, as is the fact that we all get to watch.

Simplifying(ish) the Kernel Build System

The kernel build system uses GNU Make at its core, but it does not constrain itself to using only GNU Make's standard features. There are a lot of ifs, ands, and buts that go into deciding which code to build into the kernel versus as a module and which options to pass to GCC. It's not your ordinary build system.

Recently, Saeed Mahameed posted a patch to update the build system to handle certain dependencies in a clearer way. He wanted to address cases where one kernel feature, built into the binary, needed to be able to reach another feature that was only built as a loadable module. Technically, both could be considered part of the kernel, but there would still be a disconnect if the one feature needs to know specific addresses into the other feature that doesn't exist before run time.

Saeed's patch introduced the uses keyword into the build system, so if one feature "uses" another, the build system will know to make sure that the addresses are available.

Essentially, the uses keyword provides an intuitive way to express this type of dependency. Then during processing, the build system replaces it with a freakishly incomprehensible logic statement that means exactly the same thing and which can then be interpreted correctly by the build system.

Arnd Bergmann was happy to see Saeed's patch and confirmed that it worked on his system, for the most part. But Arnd was able to crash the build system with a more complex task, so he submitted a bug report. This prompted Saeed to produce a new version of his patch.

However, Masahiro Yamada didn't like Saeed's patch – or at least, felt it was not necessary. He pointed out that adding the uses keyword didn't actually provide any new functionality; it simply offered a simpler way to do the exact same thing that had been standard before.

Masahiro said, "It is true that it _hides_ the problems and makes the _surface_ cleaner at best, but the internal will be more complicated." He added that anyone trying to actually understand the uses keyword would still have to delve into the underlying nightmare logic. And for users who failed to understand the keyword, he feared they would start using uses all over the place, even where it wasn't truly appropriate. He concluded, "I do not want to extend Kconfig for the iffy syntax sugar."

You'll notice I've avoided actually showing any of the underlying logic until now. But the type of statement being replaced by uses boils down to "X or not X," which logically might be seen to mean nothing at all. But Nicolas Pitre said it was the equivalent of "depends on X if X," which Masahiro did not feel was a big improvement. In any case, in the language of kernel configuration, he said, the "depends on" construct didn't support taking a conditional like that.

The real problem may be that the logic necessary to decide what to do simply doesn't lend itself to clear or simple statements. If one kernel feature will take advantage of another kernel feature if and only if that second kernel feature is compiled into the kernel, but it will fail gracefully if the second feature is completely unavailable, just so long as the second feature is not actually loaded in as a standalone module, that's not necessarily something that's easy to express with a clear and simple keyword.

And so, in the normal course of events, side effects like code being "reachable" come to represent one or another complicated situation. Then those side effects are codified into keywords and paired up with other keywords into apparently logically connected AND/NOT/OR/XOR statements that do what's needed, but God help anyone who really needs to know what's going on.

Thus, the kernel configuration system continues to grow.

In the current debate, one of the main points at issue was how well users (i.e., developers of kernel modules) could be made to understand the keywords that would express their modules' dependencies on other parts of the system. Or at least, how easily they could figure out which configuration text to cut and paste from some other module into their own.

At one point, Nicolas offered one example of this problem when he remarked to Saeed, "I don't dispute your argument for having a new keyword. But the most difficult part as Arnd said is to find it. You cannot pretend that 'optional FOO' is clear when it actually imposes a restriction when FOO=m. Try to justify to people why they cannot select y because of this 'optional' thing."

There's a real benefit to be gained. If users can't figure out how to express their dependencies correctly, they will end up depending on more than they need to, just in order to err on the side of caution. This then increases kernel build times and possibly bloats the compiled binary with unnecessary code.

Nicolas actually went on to propose his own solution. He blurted out, "saying that 'this is weird but it is described in the documentation' is not good enough. We must make things clear in the first place." And went on to say:

"This is really a conditional dependency. That's all this is about. So why not simply making it so rather than fooling ourselves? All that is required is an extension that would allow:

depends on (expression) if (expression)

This construct should be obvious even without reading the doc, is already used extensively for other things already, and is flexible enough to cover all sort of cases in addition to this particular one."

At this point Jani Nikula said he agreed with Nicolas's proposal and asked for an implementation. Nicolas sent in a patch, and Randy Dunlap said he preferred this over other proposed solutions.

The conversation then descended into one particular special case that various people wanted to fold into Nicolas's patch. So the conversation petered out at around this point.

When I see debates like this, where it's obvious that everyone is struggling to make sense out of the nearly nonsensical, it gives me a strong sense of appreciation for the people choosing to bang their heads against these particular rocks. Eventually the kernel build system will be clean and beautiful and easy to use, but that day will only come because of the psychotic determination of obsessive lunatics like Nicolas, Saeed, and the rest.

Detecting Firmware Crashes

Luis Chamberlain wanted to make it easier for Linux to handle bad firmware. He said, "Device driver firmware can crash, and sometimes, this can leave your system in a state which makes the device or subsystem completely useless. Detecting this by inspecting /proc/sys/kernel/tainted instead of scraping some magical words from the kernel log, which is driver specific, is much easier. So instead provide a helper which lets drivers annotate this."

Luis's helper function actually does the magical scraping itself and simply puts the results into the /proc file. So the goal is to save everyone else from having to do it. But this means that Luis's code has to know how to scrape the truth out of every single device driver in the kernel.

As of his post, he had covered the device drivers that have names starting with Q.

He pointed out that one of the motivations for this work was not only to make it easier for Linux to detect these crashed firmwares, but also to make it easier to support users in general, by ruling out firmware as the cause of various bug reports.

Kees Cook loved Luis's code. But instead of just a single patch, he suggested splitting the entire project into separate patches for each maintainer – that way each patch could be reviewed by the appropriate people, instead of potentially getting hung up waiting for some maintainers, while others responded more quickly.

Daniel Vetter and Rafael Aquini agreed with splitting up the patch, and Luis said he'd give it a shot.

Steven Rostedt also said he liked Luis's work.

So there seemed to be universal approval. Aside from practical changes like splitting the patch, there were some technical comments, but no one objected to the project itself or the implementation.

The hardest part of the project, ultimately, may just be maintaining all the scraping code, so that crash detection continues to be accurate as each driver continues to develop. But it's possible that maintaining crash detection will become just another part of each maintainer's responsibilities, so that Linus may start refusing driver updates from maintainers who don't maintain good corresponding crash detection code. But that's just a guess.

The Author

The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching 10,000 messages in a week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus