Zack's Kernel News
Zack's Kernel News
Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.
Status of OverlayFS and Union Filesystems in General
Recently, Miklos Szeredi requested that OverlayFS be included in the main kernel tree. OverlayFS allows two directory trees to appear as one. Two files with the same path on each tree would appear to occupy the same directory in the overlayed filesystem. The project has been in existence for several years, but this time Linus Torvalds replied, "Yes, I think we should just do it. It's in use, it's pretty small, and the other alternatives are worse. Let's just plan on getting this thing done with."
Al Viro said he'd start reviewing the code, but he also suggested that if they were going to merge a union filesystem such as OverlayFS, they might as well consider merging other similar projects, such as Unionmount and Aufs. Unionmount in particular, he said, had been getting some good work lately from David Howells.
Meanwhile, Sedat Dilek jumped for joy at seeing OverlayFS close to acceptance. Al also replied again with his initial review. He'd identified some security issues and other technical problems, and he went back and forth with Miklos about them. The two at first didn't see eye-to-eye about how to fix the issues, or even whether a given issue was really a problem.
At one point, George Spelvin offered his, admittedly, somewhat hacky solution to one of Al's problems. The whole thing boiled down to the way OverlayFS or any union filesystem would behave under the full range of possible uses. Regarding George's particular suggestion, Al walked through the convoluted process necessary to remove a directory  and replied, "I'm sorry, but this is insane."
Elsewhere, in an entirely different thread, Sedat asked about the status of David's Unionmount project. David replied, "It's being reengineered again to take account of VFS changes that went in in the last merge window."
He added, "It's a maze of twisty locking problems – some of which also apply to things like overlayfs:-(".
The discussion in both threads ended there. It appears everyone, including Linus, is ready to see union filesystems like OverlayFS in the kernel. But no one, including Al Viro and the maintainers of the various union filesystem projects, are able to solve satisfactorily the technical problems that remain. At the moment, none of the projects seem close to getting past Al's laser-beam code reviews, and until that happens, I'm certain none of them will be merged.
Astonishing Tux3 Performance Claims
There seems to be some suspicion between certain kernel developers and Tux3 developers. Tux3 is a versioning filesystem that's been in development since 2008. Recently, Daniel Phillips, the project leader, posted some benchmarks that showed Tux3 outperforming tmpFS. As he put it, "To put this in perspective, we normally regard tmpfs as unbeatable because it is just a thin shim between the standard VFS mechanisms that every filesystem must use, and the swap device."
Dave Chinner took a look at Daniel's numbers and found some issues that he felt indicated a deliberate attempt to mislead people. In particular, he pointed out that the Tux3 benchmark didn't include any "flush" operations – the Tux3 front end was off-loading all of its work to a back end that could take all the time it needed to complete the job. The front end would never block, and so it could simply race through the benchmark and exit. Dave said, "You've carefully crafted the benchmark to demonstrate a best case workload for the tux3 architecture, then carefully not measured the overhead of the work tux3 has offloaded, and then not disclosed any of this in the hope that all people will look at is the headline."
Hirofumi Ogawa, one of the Tux3 developers, responded, saying
fsync() had not yet been implemented, and the benchmarks were intended to show comparisons between just the parts of the code that had already been written.
Daniel also responded to Dave's post, saying, "I should indeed have noted that 'modified dbench' was used for this benchmark, thus amplifying Tux3's advantage in delete performance. This literary oversight does not make the results any less interesting: we beat Tmpfs on that particular load. Beating tmpfs at anything is worthy of note."
Regarding the specific issue Dave had raised about off-loading 100% of Tux3's work, Daniel said, "Yes, that is the entire point of our front/back design: reduce application latency for buffered filesystem transactions."
Theodore Ts'o pointed out that one couldn't simply ignore the
fsync() data and expect a meaningful benchmark result. As he put it, "Since
fsync() is defined as not returning until the data written to the file descriptor is flushed out to stable storage – so it is guaranteed to be seen after a system crash – it means that the foreground application must not continue until the data is written by Tux3's back-end." He added, "any advantage of decoupling the front/back end is nullified, since
fsync() requires a temporal coupling."
Daniel replied that when they optimized fsync, he expects "… Tux3 to perform competitively, because our delta commit scheme does manage the job with a minimal number of block writes …" .
Elsewhere in the thread, Dave remarked on his real concern. He said, "I don't care how fast tux3 is – I care about being able to reproduce other people's results. Hence if you are going to report benchmark results comparing filesystems then you need to tell everyone exactly what you've tweaked and why, from the hardware all the way up to the benchmark config."
The discussion trailed out around there, but some kernel folks also seemed to feel that Daniel's approach was too marketing-oriented, trying to make big announcements at the expense of clarifying the real progress made.
Dealing with Empty Symlinks
Back in January, Pádraig Brady noticed that Linux didn't allow users to create symlinks that pointed to non-existent files. He asked why this was, because POSIX specified that it should be allowed, and other operating systems supported it. There was no discussion at the time, but he recently followed up again, asking if this was going to be fixed.
Part of the idea was that symlinks could be valuable just to store data in their name alone, without utilizing their traditional purpose of linking to other files.
But Al Viro thought this was "utterly pointless," especially considering that the behavior would end up being operating-system-dependent anyway. He said, "blanket refusal to traverse such beasts is a legitimate option."
Eric Blake replied that the real point was not whether creating an empty symlink should be allowed in Linux – it was the way Linux should behave when it encountered an empty symlink during path resolution.
After all, even if Linux didn't allow empty symlinks to be created, other operating systems did, and the filesystems containing those symlinks could be mounted under Linux. It would make sense to handle those cases correctly. Eric remarked:
"I personally don't care whether you fix the Linux kernel
symlink() to allow empty symlinks, or successfully argue for a bug fix against POSIX to permit the existing Linux
symlink() behavior. I'd love to see Linux obtain POSIX certification someday, and either of those two courses of action would get us closer. Meanwhile, I know there are enough other issues in the kernel … that it will be a long time before we ever get a POSIX certification of a Linux system."
Pavel Machek started exploring the extent of the issue under Linux, trying to identify which tools would break when encountering empty symlinks and how bad a break it would be, but the discussion ended at that point, with no clear resolution on a course of action, or even it was worth doing anything about the situation.
Linus Torvalds is notoriously disdainful of compliance for compliance's sake. If there's no cost to it, he's not opposed, but if there are valid technical reasons to implement something in a non-compliant way, he'll choose that over compliance every time, and he makes no secret of his contempt for certain parts of the POSIX standard.
On the other hand, if there's a danger that users might get burned if they mount a filesystem on which another OS has created an empty symlink, Linus would rather eat sand than let that go unfixed. The real question may boil down to whether the status quo would burn anyone. At the moment, it still seems unclear.
Buy this article as PDF
A major setback for the Linux desktop.
Improved support for GPU in virtualization.
News site for the openSUSE community falls victim to a Wordpress exploit.
The source code is available online.
One out of three virtual machines on Microsoft Azure Cloud run Linux.
The form factor of the board makes it a drop-in replacement for Raspberry Pi.
Makes it easier for customers to move workloads into container-centric applications.
SUSE’s answer to container-centric operating systems.
Linux 4.9 is the biggest release in terms of number of commits.
The latest version of the official RHEL clone is here.