Zack's Kernel News

Zack's Kernel News

Article from Issue 247/2021

Chronicler Zack Brown reports on: “Welcoming" a New Kernel Developer; and An Ancient Feature Goes Belly Up. 

"Welcoming" a New Kernel Developer

Amy Parker, a newcomer to Linux kernel development, had an idea for a new filesystem. Ideally, she said, "once it's completed, rich, and stable I'd try to get it into the kernel." She asked what would be involved in such a process.

Andreas Dilger welcomed her and offered a few words of caution (a.k.a doom 'n' gloom). First of all, he said, a new filesystem would need to have a unique value for users. If it only did something that other filesystems already did well, there wouldn't be a need to include it in the source tree.

He added that filesystems had a particularly onerous burden of reliability, since users relied on them for their very lives. Unlike many software problems that could be fixed with a simple reboot, he said, if a filesystem lost user data, there was no way home. In light of this, Andreas added, "the general rule of thumb is 10 years before a new filesystem is stable enough for general use."

Because of this, Andreas suggested that instead of writing a whole new filesystem, it could sometimes make more sense to take whatever idea Amy had in mind and add it to an existing filesystem, if that would be a good enough solution. He said, "Otherwise, users would have to stop using their existing filesystem before they started using yours, and that is a very slow process, because your filesystem would have to be much better at *something* before they would make that switch."

In terms of Amy's actual question about the process for submitting a new filesystem for consideration, Andreas said the first step would probably be to describe her idea and see if there were any existing filesystems that would be a better fit for those features.

Finally, after sufficient doom 'n' gloom had been dispersed, he concluded, "Note that I don't want to discourage you from participating in the Linux filesystem development community, but there are definitely considerations going both ways wrt. [with regards to] accepting a new filesystem into the kernel. It may be that your ideas, time, and efforts are better spent in contributing to an exiting project. It may also be that you have something groundbreaking work, and I look forward to reading about what that is."

Amy thanked him for his feedback and said she'd think about all those things.

Meanwhile, Randy Dunlap suggested that Amy shouldn't wait until her code was truly finished, but should release patches as she developed them, using a "release early, release often" philosophy. And Chaitanya Kulkarni suggested submitting patches and an overall description of the project as a Request For Comments (RFC), in order to get the discussion going.

Theodore Ts'o had some suggestions of his own:

"File systems are also complicated enough that it's useful to make the patches available via a git repo, and it's highly recommended that you are rebasing it against the latest kernel on a regular basis.

"I also strongly recommend that once you get something that mostly works, that you start doing regression testing of the file system. Most of the major file systems in Linux use xfstests for their testing. One of the things that I've done is to package up xfstests as a test appliance, suitable for running under KVM or using Google Compute Engine, as a VM, to make it super easy for people to run regression tests. (One of my original goals for packaging it up was to make it easy for graduate students who were creating research file systems to try running regression tests so they could find potential problems – and understand how hard it is to make a robust, production-ready file system, by giving them a relatively well documented, turn-key system for running file system regression tests.)"

And he concluded:

"The final thing I'll point out is that file system development is a team sport. Industry estimates are that it takes between 50 and 200 person-years to create a production-ready, general purpose enterprise file system. For example, ZFS took seven years to develop, starting with a core team of 4, and growing to over 14 developers by the time it was announced. And that didn't include all of the QA, release engineering, testers, performance engineers, to get it integrated into the Solaris product. Even after it was announced, it was a good four years before customers trusted it for production workloads.

"If you look at the major file systems in Linux: ext4, xfs, btrfs, f2fs, etc., you'll find that none of them are solo endeavors, and all of them have multiple companies who are employing the developers who work on them. Figuring out how to convince companies that there are good business reasons for them to support the developers of your file system is important, since in order to keep things going for the long haul, it really needs to be more than a single person's hobby."

Matthew Wilcox also had some advice of his own to offer. He said:

"Writing a new filesystem is fun! Everyone should do it.

"Releasing a filesystem is gut-churning. You're committing to a filesystem format that has to be supported for ~ever.

"Supporting a new filesystem is a weighty responsibility. People are depending on you to store their data reliably. And they demand boring and annoying features like xattrs, acls, support for time after 2038.

"We have quite a lot of actively developed filesystems for users to choose from already – ext4, btrfs, xfs are the main three. So you're going to face a challenge persuading people to switch.

"Finally, each filesystem represents a (small) maintainance burden to people who need to make changes that cross all filesystems. So it'd be nice to have a good justification for why we should include that cost.

"Depending exactly what your concept is, it might make more sense to make it part of an existing filesystem. Or develop it separately and have an existing filesystem integrate it."

Matthew concluded with some extra-doomy doom 'n' gloom, saying, "Anyway, I've been at this for twenty years, so maybe I'm just grouchy about new filesystems. By all means work on it and see if it makes sense, but there's a fairly low probability that it gets merged."

Rather than running for the hills, as I myself was at that moment doing out of sheer sympathy, Amy replied, "I'm bored and need something to dedicate myself to as a long-term commitment."

She thanked everyone for their advice. And since multiple people had suggested looking for existing filesystems to merge her idea into, she said she'd explore that possibility.

In response to Ted's suggestion that she should set up a Git repository, Amy replied that she had already been setting up the infrastructure for that.

As for Ted's further suggestion that Amy plan on doing some regression testing, Amy laughed into her sleeve, baiting him with a quote that actually came from Linus, "Regression testing? What's that? If it compiles, it is good; if it boots up, it is perfect." Though she immediately followed up with, "In all seriousness, though, yeah, already been planning for stuff like that."

And she remarked that she was already familiar with Ted's xfstests tool and had used it on a previous project.

And that was the end of the discussion.

So there you have it. Gone are the days of Linus Torvalds welcoming all comers with open arms, saying, "absolutely a filesystem would be a marvelous project, and here is the process for submitting patches; thank you for joining the community!"

Now it's, "hi, your project is utterly unrealistic, whatever it is, but we encourage you to give it a try anyway, sort of, not really, and in any case we're all so burnt out and bitter that we are sort of just speaking on autopilot. Welcome, whoever you are. The exit's over that way."

What conclusions can we draw? Is it possible that half-a-dozen big-time kernel hackers were simultaneously having a really bad day? Has COVID-19 fatigue caused a certain amount of brain atrophy or just outright depression? Or could it really be true that a newcomer can be told, sight-unseen, that their idea probably isn't really anything special and that she'd probably be better off working on someone else's project?

An Ancient Feature Goes Belly Up

Way back in September, "when the grass was still green and the pond was still wet and the clouds were still clean" (apologies to The Lorax), Linus Torvalds wrote, submitted, accepted, and applied a patch to remove the VGA soft scrollback feature from the Linux kernel.

VGA soft scrollback is what lets you scroll the console to see fleeting kernel messages as they flow past during bootup or crash down. You just hit Shift and Page Up to see whatever messages have scrolled past the top of the monitor.

Linus explained that VGA soft scrollback, "turns out to have various nasty small special cases that nobody really is willing to fight. The soft scrollback code was really useful a few decades ago when you typically used the console interactively as the main way to interact with the machine, but that just isn't the case any more."

Randy Dunlap said that with this patch going in, it should also be possible to remove the soft scrollback documentation at the same time.

Linus also clarified the situation somewhat:

"Note that scrollback hasn't actually gone away entirely – the original scrollback supported by _hardware_ still exists.

"Of course, that's really just the old-fashioned text VGA console, but that one actually scrolls not by moving any bytes around, but by moving the screen start address. And the scrollback similarly isn't about any software buffering, but about the ability of moving back that screen start address.

"Do people use that? Probably not. But it wasn't removed because it didn't have any of the complexities and bitrot that all the software buffering code had.

"That said, I didn't check how much of the documentation is for the VGA text console, and how much of it is for the actual software scrollback for fbcon etc. So it is entirely possible that all the docs are about the removed parts."

All seemed well until Pavel Machek cried out in anguish, "Could we pause this madness?"

Pavel went on to say:

""Scrollback is still useful. I needed it today… it was too small, so command results I was looking for already scrolled away, but… life will be really painful with 0 scrollback.

"You'll need it, too… as soon as you get oops and will want to see errors just prior to that oops.


"Kernel is now very verbose, so important messages during bootup scroll away. It is way bigger deal when you can no longer get to them using shift-pageup.

"fsck is rather verbose, too, and there's no easy way to run that under X terminal… and yes, that makes scrollback very useful, too."

Pavel put his money directly where his mouth was, saying, "If it means I get to maintain it… I'm not happy about it but that's better than no scrollback."

Adam Borowski felt Pavel's cry of pain. Adam unleashed his own tormented howl of "I concur," lamenting, "this a serious usability regression for regular users."

Adam pointed out that "without some kind of scrollback, there's no way of knowing why eg. your rootfs failed to mount (there was some oops, but its reason was at the beginning…). Or, any other problem the user would be able to solve, or pass the error messages to someone more knowledgeable."

He also said to Linus:

"I also wonder why did you choose to remove softscrollback which is actually useful, yet leave hardscrollback which doesn't come to use on any non-ancient hardware:

* on !x86 there's no vgacon at all

* on x86, in-tree drivers for GPUs by Intel, nVidia and AMD (others are dead) default to switching away from vgacon

* EFI wants its own earlycon

"… thus, the only niche left is nVidia proprietary drivers which, the last time I looked, still used CGA text mode."

Finally, to Pavel's willingness to maintain the code in question, Adam remarked, "That'd be greatly appreciated. There are also some simplifications/rewrites that could be done, like getting rid of redundant 1-byte/4-byte storage (or even the code for 1-byte…). Hard scrollback could be axed altogether (it provides only a small amount of scroll). Etc…."

Throwing his lot in with the rebellious Adam and Pavel, Maciej W. Rozycki confirmed that "For the record I keep using the console scrollback all the time, and FWIW I have gone through all the hoops required to keep using VGA hardware emulation and its console text mode with my most recent laptop, which is a ThinkPad P51; no longer manufactured, but still hardly an obsolete device by today's standards I believe." He therefore concluded that "no, it's not that nobody uses that stuff anymore, and not with obsolete hardware either."

At that point, the matter rested and several months passed. Then, as if no time whatsoever had passed, Phillip Susi replied to Pavel, "Amen! What self respecting admin installs a gui on servers? What do we have to do to get this back in? What was so buggy with this code that it needed to be removed? Why was it such a burden to just leave it be?"

To which Linus replied:

"It really was buggy, with security implications. And we have no maintainers.

"So the scroll-back code can't come back until we have a maintainer and a cleaner and simpler implementation.

"And no, maintaining it really doesn't mean 'just get it back to the old broken state'.

"So far I haven't actually seen any patches, which means that it's not coming back."

Philip asked if there was any more information available. He said, "I can't try to fix it if I don't understand what is wrong with it. Are there any bug reports or anything I could look at?"

Meanwhile, Daniel Vetter was not going to let scrollback return without a fight. In addition to the problems Linus had identified, Daniel said, "on anything that is remotely modern [...] there's a pile more issues on top of just the scrollback/fbcon code being a mess." He continued:

"Specifically the locking is somewhere between yolo and outright deadlocks. This holds even more so if the use case here is 'I want scrollback for an oops'. There's rough sketches for how it could be solved, but it's all very tricky work.

"Also, we need testcases for this, both in-kernel unit-test style stuff and uapi testcases. Especially the full interaction on a modern stack between /dev/fb/0, /dev/drm/card0, vt ioctls and the console is a pure nightmare.

"Altogether this is a few years of full time hacking to get this back into shape, and until that's happening and clearly getting somewhere the only reasonable thing to do is to delete features in response to syzkaller crashes."

At this point, Greg Kroah-Hartman piled in, saying, "Along with what Daniel has already pointed out, just look at all of the old syzbot reports for the code in this area. Try fixing one of those reports in an older kernel to give yourself an idea of the issues involved. Best of luck!"

Philip was utterly unwilling to let this go, however. And when Geert Uytterhoeven offered some comments on the overall situation, Philip said, "Judging from some of the comments in the code, it looks like you were one of the original authors of fbcon?" And Geert replied, "Indeed, a looooong time ago…."

The two of them embarked on an implementation discussion. Philip said he was willing to try to rewrite scrollback from scratch if that was what it took, and he proposed some ideas about how to do that. And Geert replied:

"There are multiple ways to implement scrolling:

1. If the hardware supports a larger virtual screen and panning, and the virtual screen is enabled, most scrolling can be implemented by panning, with a casual copy when reaching the bottom (or top) of the virtual screen. This mode is (was) available on most graphics hardware with dedicated graphics memory.

2. If a 2D acceleration engine is available, copying (and clearing/filling) can be implemented by rectangle copy/fill operations.

3. Rectangle copy/fill by the CPU is always available.

4. Redrawing characters by the CPU is always available.

"Which option was used depended on the hardware: not all options are available everywhere, and some perform better than others."

Several people joined the discussion, but no patches seemed to come out of it.

Reimplementing this feature seems, on the one hand, like something a fair number of people want badly enough to do just about anything for it and, on the other hand, like something that's very hard to get right. And Linus doesn't seem inclined to accept any patches that don't actually get the thing right.

Will it come back? It seems like a fairly large mountain to climb for a feature that is only really useful for kernel developers debugging kernel code. And yet, it does seem to have a special place in the hearts of a fair number of those kernel developers. Time will tell.

The Author

The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching 10,000 messages in a week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus