Zack's Kernel News
Zack's Kernel News
Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.
No More Worlds to Conquer?
An interesting aspect of open source software is that pretty much everything is public. When Linux first arrived in the early 1990s, a common debate was how to deal with Microsoft inevitably bringing the hammer down. But the debate had to happen entirely in public, with Microsoft fully aware of the Linux community's entire strategy. Imagine getting into a street fight where the big bully knows with certainty everything you're going to do before you do it. You'd better pick the right moves, right? Thirty years later, voila! The Linux community picked the right moves and won the world (except the pesky desktop, but that's a different story).
But when everything you say is public, you don't always have the luxury of appearing to everyone as the infallible deity of operating system development and world domination. Sometimes you're just wrong about something, and sometimes you yourself are the one who discovers that you were wrong. And sometimes … you discover you were wrong about the thing you thought you were wrong about. And everyone gets to see.
Recently, Linus Torvalds responded to a patch from Masahiro Yamada. These were a bunch of updates to the Kconfig build system. So, not necessarily anything regular distro users would encounter, but definitely something that every single kernel developer would use over and over again.
Linus actually accepted the patch, but he complained about a slowdown in the build system. Nothing that would stop him from taking the patch, but something he wanted to see fixed eventually. In particular, Linus generally ran the make allmodconfig
command as part of his standard routine, to make sure the kernel and every module compiled okay before pushing out a new release. But he noticed this command took more time to run than he expected. On his machine, he expected it to be instantaneous, and instead it took a very, very small amount of time.
Having taken a look into the source tree, Linus reported that the files in the scripts/kconfig/conf
directory were being recompiled each and every time, regardless of whether they'd been compiled already. This is one of the lovely features of compiled languages – if a source file doesn't change, you don't need to recompile its binary every time. You just use the existing binary and only compile the small bits that are different.
Linus's obsessive/compulsive hindbrain needed to know why these files were being recompiled each time. By running the make --trace allmodconfig
command he saw that a bunch of files were being reported as not existing, which he knew to be false.
Linus said, "Yeah, I realize I'm being silly. Doing a 'time make allmodconfig' shows that it takes 1.5s to do. Should I care? No. But I feel that's an eternity for something that I think should just be instantaneous."
Within five minutes Linus responded to himself, saying that his previous evaluation had been a red herring. In fact, he said, the bunch of files that had been reported as not existing was just the output of make
being misleading. So he was still curious about why this slowdown was happening, but he no longer felt he understood exactly what the problem was.
The next day Linus replied to himself again, saying, "… and that red herring was what made me think that it always recompiles the 'conf' binary. But no, that's not what is going on. Profiling shows that it does spend a lot of time in the compiler (which was the other thing that made me incorrectly think it was the conf program getting recompiled every time)."
He posted some profile statistics, showing that most of the lost 1.5 seconds was simply testing the various options before compiling. He remarked, "Oh well, I clearly misread the problem. Maybe 1.5s is more reasonable than I really expected it to be."
But this just didn't sit well with him. That 1.5 seconds gnawed at Linus's brain like a vampiric zombie.
He replied again to himself, pointing out that a third of the profiling stats was spent on a single invocation of the cc1plus
back end of the GCC compiler, the purpose of which was solely to verify that GCC plugins would work.
Masahiro came back into the discussion at this point. He confirmed what Linus had already said: The conf
binary was not, in fact, recompiled every time. And he confirmed Linus's profiling statistics for the cc1plus
invocation. He remarked, "Actually, I did not know this shell script was so expensive to run…." He added, "Even if we are able to manage this script somehow, Kconfig invocation still takes more than 1 sec due to the current design."
But now Linus, with the parasitic vampire zombie still firmly attached to his hindbrain, pointed out:
"So it turns out that one reason it's so expensive to run is that it does a *lot* more than it claims to do.
"It says 'we need a c++ compiler that supports the designated initializer GNU extension', but then it actually includes a header file from hell, rather than just test designated initializers."
In other words, instead of simply verifying GCC plugin support, cc1plus
was hauling in a monstrous block of code. And that's what was taking so long.
Linus posted a patch that brought the 1.5 second execution time down by a lot. However, he added, "I'm doubtful we really want gcc plugins at all, considering that the only real users have all apparently migrated to clang builtin functionality instead."
At this point – no, the story's not quite finished – Kees Cook pointed out that, in fact, the whole test for GCC plugin support was only needed by ancient GCC versions that the Linux kernel no longer supported. And that it would be perfectly acceptable to drop the test entirely, eliminating Linus's 1.5 second slowdown for good.
Furthermore, Kees said, "As for dropping GCC plugins entirely, I'd prefer not – the big hold-out for the very paranoid system builders is the randstruct plugin (though they tend to also use the entropy one too). Clang's version of randstruct has not gotten unstuck yet."
And Linus replied:
"Yes.
"It sounds like we might be able to delete the build test entirely if we just always expect to have a recent enough gcc.
"Testing the headers for existence would presumably still be needed, just to verify 'do we have plugin support installed at all'.
"But I'm not planning on applying this directly – I find the config overhead to be a bit annoying, but it's not like it is _objectively_ really a problem. More of a personal hangup ;)"
So there you have it. A completely unimportant bug that received a fair number of person-hours of attention – a series of red herrings that ultimately revealed the true problem to be something that actually didn't exist anymore.
Linus, on behalf of us all, I thank you for your obsessive/compulsive hindbrain, and the vampiric zombie monster beasties that feed on it and generate these personal hangups that inspire you to examine every square centimeter of kernel code.
Saving the World, One Graphics Card at a Time
Dmitry Osipenko was concerned about certain Linux systems "becoming burning hot" even while the system was idling. In particular, he said, some NVidia graphics cards were getting just a little too much voltage. Ideally, the Linux power management features would regulate the voltage allocations a bit more conservatively, giving just the minimum required voltage to each different chip, depending on its particular needs and level of activity.
Dmitry posted a bunch of patches to implement voltage scaling for some NVidia Tegra chips, which he had tested on a number of running systems. In these tests, he had noticed a five degree cooling on some systems. He suggested that it would be fairly straightforward to extend these voltage scaling features to other chips, as needed.
There were some technical comments. Michal Miroslaw noticed some code duplication, and Ulf Hansson also felt that Dmitry hadn't actually located the fix in the best spot in the kernel interface.
Viresh Kumar also spotted a bug in Dmitry's code. But he disagreed with Ulf's criticism of Dmitry's choice of interface. He remarked, "if the hardware (where the voltage is required to be changed) is indeed a regulator and is modeled as one, then what Dmitry has done looks okay. i.e. add a supply in the device's node and microvolt property in the DT entries."
Ulf backed down, saying, "I guess I haven't paid enough attention how power domain regulators are being described then. I was under the impression that the CPUfreq case was a bit specific – and we had legacy bindings to stick with".
However, Viresh was not certain about his position either, remarking elsewhere, "I am also confused on if it should be a domain or regulator, but that is for Ulf to tell."
In fact, all of these folks, and others, joined with Dmitry in a general-purpose design/implementation discussion, trying to figure out the best shape and structure of Dmitry's proposed voltage scaling feature. Everyone seemed to agree that this would be good – especially given the improved temperature results – but it was clear that Dmitry's code was a first attempt, intended to generate exactly this sort of discussion.
The discussion itself ended in the middle, with patches flying madly about and more work on the way.
But personally, I love seeing patches like this. Imagine the amount of energy savings they represent to the world, when applied to over a billion running Linux systems. It's an excellent moment in history for this sort of patch.
How to Train Your Kernel Developer
Julia Lawall reported that the official Linux kernel version 5.10 failed to boot on her organization's Intel Xeon CPU E7-8870 v4 @ 2.10GHz server. She added that Linux v5.9 and earlier worked fine. She posted some backtrace data to help diagnose the problem.
Linus Torvalds's quaint suburban house spun around three times and steam began rising from the rooftop.
Linus asked if the problem started with the -rc1 version (i.e., the first release candidate after the 5.9 release). And he requested, "Could you try bisecting – even partially? If you do only six bisections, the number of suspect commits drops from 15k to about 230 – which likely pinpoints the suspect area."
Bisecting is a very cool bug-hunting technique based on a binary search. You've got a known-working and a known-broken version, so you test out the version directly in the middle of those two versions. Whatever the result of your test, you've just eliminated 50 percent of all the patches that might be the culprit. Then you do the same thing again and keep doing it until you find the patch that actually caused the problem. As Linus said, just a few bisections will rapidly reduce the suspicious area from "everywhere" to "right here."
Linus also looked at Julia's backtrace data and said that there seemed to be some breakage in the SCSI device scanning code – though he couldn't be certain if that breakage was causing Julia's problem.
Julia confirmed that yes, the -rc1 release also had the problem. Thus eliminating from consideration any patches that were added between -rc1 and the official 5.10 release. She also agreed to do a bit of bisecting to try to further narrow down the bad patch.
Martin K. Petersen said that the only SCSI change at that time had gone into the -rc2 release, not -rc1. This put SCSI out of the running to be behind Julia's report. But he said he'd dig further and see what he found.
Linus confirmed that he'd also found the -rc2 SCSI patch and "dismissed it for the same reason you did – it wasn't in rc1. Plus it looked very simple." He suggested, "maybe Julia might have misspoken, and rc1 was ok, so I guess it's possible."
But Julia replied definitively to Linus, saying, "rc1 was not ok. I just started rebasing and the first step was not ok either." She also posted the dmesg
file from a successful boot of Linux 5.9 to aid in the bug hunt.
Linus looked at the dmesg
file and said:
"Ok, from a quick look it's megaraid_sas.
"The only thing I see there is that commit 103fbf8e4020 ('scsi: megaraid_sas: Added support for shared host tagset for cpuhotplug').
"Of course, it could be something entirely unrelated that just triggers this, but I don't think I've seen any other reports, so at a first guess I'd blame something specific to that hardware, rather than some generic problem that just happens to hit Jula."
Without confirming that megaraid_sas
was in fact the real problem, John Garry reported that in that same area, "we did have an issue reported here already from Qian about a boot hang."
John suggested that Julia revert the megaraid_sas
patch and try a test boot.
And Linus said:
"Julia – if it's that thing, then a
git revert 103fbf8e4020
"would be the thing to test."
Linus also said that based on this info from John, megaraid_sas
did seem like a probable culprit.
And Julia replied, "This solves the problem. Starting from 5.10-rc7 and doing this revert, I get a kernel that boots."
Martin thanked Julia for testing and said he'd revert that change for the next official release.
In fact, the patch would be reverted, but then re-added after some more patches from Ming Lei that were expected to fix the underlying issue. John sent a link to Julia, asking her to test Ming's patches as they stood, to see if the boot problem recurred. Julia did so and confirmed, "5.10-rc7 plus these three commits boots fine."
If anyone ever writes a textbook about how to report a kernel bug and make sure the Linux developers look at it, this incident will be the first example given. Julia started off by reporting her hardware, the kernel version in question, and included a pile of relevant output that could be used to trace the problem. It didn't hurt that she was reporting a bug in an official release, rather than just a release candidate.
Since her system was known to be able to reproduce the problem, she helped out by bisecting kernel versions to help narrow down the search. She reverted patches and tested alternate versions of the kernel, including testing new patches that were slated to go into the next official release. At every step along the way, she posted more and more data from various boot attempts, including those from the broken kernel as well as from known working versions.
The kernel developers, for their part, thanked Julia for each of those steps, and offered easy-to-follow guidance on what she could do to add further information. It's obvious that they were super happy to have such good bug reportage and wanted to encourage more of the same.
Infos
- The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching 10,000 messages in a week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.
Buy this article as PDF
(incl. VAT)