Zack's Kernel News
Zack's Kernel News
Zack Brown reports on communicating with Linux during bootup, pruning SuperH, and bug hunting for Stea.
Communicating with Linux During Bootup
The open source development model is geared to maximize everyone's contribution with the two assumptions: If you need something, it's probably a good idea to implement it, and if you implement it, there are probably other developers interested in helping you get your patch into an acceptable state. So the amount of wasted work is probably very low, though I'm not sure how to measure that precisely.
It's definitely the case that sometimes you can put in a lot of work, only to discover that other developers are simply opposed to whatever it is that you need.
Masami Hiramatsu recently tried to extend DeviceTree to pass debugging configuration information into the kernel. This could also be done on the kernel command line, but as Masami himself pointed out, "the kernel command line is a command line." It's not the ideal format for passing structured data.
His excuse was that the command line actually didn't support passing all the information he wanted to pass. So, why not look for something else? And since DeviceTree already existed for the express purpose of passing structured data into the kernel, making a small extension for this use case seemed like a perfect fit.
He posted his patch in good faith and immediately ran into a fairly common syndrome in Linux kernel development: People need the better thing that's also harder, so no one can do anything in that region of code except implement the good thing.
No easy hacks allowed! Frank Rowand replied to Masami's initial patch submission, pointing out some of the problems. DeviceTree was not something to be extended on a whim. It was a community standard with many corporate stakeholders. And when it comes to corporate stakeholders, you are likely to see each of them salivating to add their own favorite extensions in order to hit some short-term milestone or other. Even though the official standard was only version 0.2, it was still something that needed careful thought and policing.
Frank explained that DeviceTree existed to describe hardware to the kernel that could not otherwise be discovered. It was not a configuration tool. Its purpose was specifically to free the kernel from having to be hard-coded with tons of hardware descriptions and instead to pass that data in during bootup.
The possibility of extending DeviceTree in the way Masami wanted, had actually been discussed and rejected at the Linux Plumbers Conference in 2015, Frank said.
And this was the problem – lots of people wanted to pass configuration and other data into the kernel at boot time, using something better than the command line. But, Frank said, DeviceTree was not the solution. Yes, it was a mechanism for passing data into the kernel; but it had a specific and inviolable purpose. If Masami and others wanted to pass configuration data and other forms of data into the kernel at boot time, they would have to develop a full-featured and generic solution of their own. They couldn't extend DeviceTree.
So there you have it: A policy statement – something that might be overridden eventually. But for the moment at least, that was Frank's position, and the position of the main DeviceTree stakeholders.
But no one ever takes it lying down, the quiet wheel never gets the grease, and you don't find out if a standard is truly locked down if you don't do your best to bust it open.
To some extent, Masami simply pleaded. It seemed obvious to him that the command line was a terrible data conduit, and that standards existed to be extended. He knew already that DeviceTree was not currently used for configuration data, but he said, "Can't we talk about some future things?" And when he offered up as an unrealistic suggestion the possibility of encoding large blobs of data in some sort of ASCII-safe format and including it on the command line, he was surprised when Frank said this would be acceptable – and certainly preferable to extending DeviceTree.
But Frank really wanted "people to come up with an additional boot-time communication channel or data object to support this use case. So far, no serious proposal that I am aware of."
It was unpleasant. Masami's initial patch had been 500 lines of code; while implementing a general-purpose boot-time communication channel would be some kind of ungodly amount of code, with many subtleties, security considerations, bizarre-yet-crucial use cases, and so on.
But there were other reasons not to use DeviceTree. Rob Herring said that not all bootloaders supported it; for Masami's purpose, it would definitely be best to use a mechanism from which everyone else could benefit. Rob said he would actually be fine with extending DeviceTree, if it could also be extended to work with all bootloaders.
Eventually Masami did abandon his patch in favor of trying to develop a new communication channel, as Frank had originally suggested.
Not all developers are as willing to change direction. Richard Gooch maintained DevFS for a very long time as a separate patch from the kernel, but was never able to win over the other kernel maintainers, particularly Greg Kroah-Hartman who supported his own udev project. I don't think we can generalize from Masami's experience to say that nowadays disputes over technical direction are handled more agreeably, but it's nice to see it when it happens.
Pruning SuperH
Where oh where has SuperH gone – the RISC chip for embedded systems that came out in the 1990s? Recently Christoph Hellwig pointed out that the maintainers of the Linux port were not so responsive, and there hadn't been any pull requests coming from that project in quite awhile. He suggested that the architecture was dead and could just be removed wholesale from the kernel.
This kind of bell-ringing is always only partly a genuine suggestion to ditch a kernel feature – it's also intended to alert any users of that feature that they should speak up quickly. If a feature has users, it won't be removed no matter how old it is. But if no one speaks up for it, then out the window it goes.
In fact, it's true that even very old features will be kept if they still have users. But it's also true that even a very popular feature will be removed if it constitutes a security hole. Security is always the top priority, even to the point of accepting massive slowdowns and other inconveniences. That's one of the differences between Microsoft's development philosophy and that of the open source community. Microsoft will allow long-term security flaws in their operating system and software, just because users like those features. Linux won't.
But security issues were not one of the things pushing SuperH out the door. It just seemed to Christoph that the code was rotting, and there were no users.
John Paul Adrian Glaubitz was the one who came to SuperH's rescue, saying that the architecture – at least version 4 of the chip – was indeed still available as a Debian package and was most definitely still in use. Maybe the code was getting old, but the compiled kernels would still run.
There was some skepticism. Adam Borowski suggested that merely being a Debian package didn't mean anyone used it. He pointed to the Debian Popularity Contest package, which showed a bottom-of-the-barrel score for SuperH; and he said he hadn't heard anyone talking about SuperH on any of the Debian mailing lists.
But John Paul proved his point, giving a link to a recent bug report from an industry user at Dell [1].
So that settled it. No security issues plus an active user meant that some support for SuperH would remain in the official Linux kernel. But that didn't mean absolutely all of SuperH had to remain. Arnd Bergmann in particular suggested that the SuperH maintainers should go through the code and identify which parts were truly necessary and which could go. Obviously SuperH v4 had users, so it would stay. But maybe other versions (he mentioned version 5) could be removed, since those parts "don't build, or are incomplete and not worked on for a long time."
Rich Felker and Yoshinori Sato were the official SuperH maintainers. Rich came into the discussion now, saying that version 5 could definitely be removed. It had already lost support in the C compiler, and he seemed to recall that the hardware chip itself had never really been available. Rich could also see a bunch of other infrastructure-related code that was probably broken – if it had ever run at all.
Rich also said that he had originally maintained the architecture as part of his job, but as his situation had changed, he was no longer being paid to work on it. So his motivation had declined in favor of other projects. But he said he still wanted to be involved in any remaining team of SuperH maintainers. In fact, he had some patches queued up and ready that he just hadn't gotten around to, because they'd require fixing other parts of the code that had also rotted.
Yoshinori, meanwhile, said he still had some actual SuperH hardware, specifically versions 2, 2A, and 3, and he'd be happy to update the code for those versions.
So SuperH will get a little love, even though only a handful of people still use it. Dead code will be removed. Living code will be fixed and stabilized, at least to the point of being easily maintainable over the long term.
I almost said that eventually the last users would move on, and the architecture would be removed at last. But this is not certain at all. You can never tell what ancient hardware will develop a hobbyist following and live on through the years, even coming to exist only as a software simulation with a modern kernel running both below (on the host system) and above (on the simulated hardware). Maybe SuperH will become one of those some day and remain in the official kernel tree for decades to come.
Bug Hunting for Steam
Linus Torvalds sometimes involves himself in tracking down a specific bug, either because it's code he particularly cares about, or for some other reason. Recently he helped track down a bug in the official 5.1.11 kernel release, because it affected a large group of users – all those running Valve's Steam platform for gaming.
It's unusual to have such a significant breakage slip through the release candidate process, but it does happen. This time, Pierre-Loup A. Griffais reported a lot of discussion traffic on GitHub and reddit, from users who couldn't run Steam on Linux 5.1.11.
The entire bug hunt, including pushing the fix out to both the development and stable Linux kernels, took less than two days. Linus shepherded the process along himself and just by his presence was able to ensure that every step went as quickly as possible.
At first, Greg Kroah-Hartman, the stable kernel maintainer, asked for more information, specifically whether Pierre-Loup had been able to bisect the kernel git
repository to identify the exact patch that caused the problem.
But this wasn't as easy to do as it might seem. Linus pointed out that not everyone was experiencing the same problem or else experienced it only intermittently. This suggested it was a timing issue. But if it was intermittent, it might make it harder to find a consistent way to reproduce it during the bisection process.
Since the advent of git
, bisecting the tree has been one of the primary ways kernel developers identify the origins of bugs in the code. It's an incredibly convenient technique and, in many cases, can even be automated.
In this case, the problem turned out to be a security fix from Eric Dumazet. This complicated things slightly, because you can't simply abandon a security fix. Eric's patch prevented a denial-of-service attack. Without the patch, a hostile user could trigger a kernel panic by overloading the system with networking requests. But the patch also made it so Steam would no longer work.
In fact, it turned out that Eric's patch could be replaced by something much simpler – and that wouldn't break Steam – if the patch simply did nothing but tackle the security flaw and left out some of its other subtle details. And so the new fix went into the tree – both Linus's tree and Greg's and was made available to users right away.
It's interesting, because the pace of the bug hunt was so rapid. During normal development, with release candidates coming out once in awhile, no one worries about getting fixes into the tree as soon as possible, because only kernel developers use release candidates. The fact that this bug affected an official kernel release, meant that real users would be experiencing real problems. In that situation, Linus will always try to move as fast as possible, consistent with actually getting the fix right.
Infos
- SuperH bug report: https://marc.info/?l=linux-sh&m=155170489401832
Buy this article as PDF
(incl. VAT)