Linus Torvalds Upset over Ext3 and Ext4

Mar 30, 2009

Britta Wuelfing

Linus Torvalds, Ted Ts'o, Alan Cox, Ingo Molnar, Andrew Morton and other Linux kernel developers are embroiled in a contentious discussion over the sense -- or nonsense -- of journaling and delayed allocation before a commit in the ext3 and ext4 filesystems. Heavy words are flying.

It all started with a request for help from Jesper Krogh in one of the first responses to Torvalds's announcement March 24 of Kernel 2.6.29 on the gmane.linux.kernel mailing list. Krogh reported a significant delay when writing from cache with the ext3 filesystem, despite faster hardware and extensive RAM. Was there a way to autotune it? Ingo Molnar opined that Krogh's wait time of 10 minutes was totally unacceptable, "it is the year 2009, not 1959." His personal "pain threshold" is about one second: "the historic limit for the hung tasks check was 10 seconds, then 60 seconds."

Ted Ts'o, groundbreaking in the filesystem's development, chimed in to the forum. It was just recently that he had been confronted by users over data loss upon installing their apps on the new ext4 filesystem. Ts'o set himself intensely on the problem with the source research and detailed explanation. Again he described the delayed effect in writing data. Synchronization in ext3 occurs every five seconds, whereas ext4 normally writes from cache every two minutes. Ts'o got pretty defensive: "People can call file system developers idiots if it makes them feel better --- sure, OK, we all suck. If someone wants to try to create a better file system, show us how to do better, or send us some patches."

Torvalds, for one, didn't seem too excited about the delayed synchronization. He writes on the mailing list, "Doesn't at least ext4 default to the insane model of 'data is less important than metadata, and it doesn't get journalled'? And ext3 with 'data=writeback' does the same, no? Both of which are -- as far as I can tell -- total brain damage. At least with ext3 it's not the default mode." To avoid the synchronization problem Ts'o had recommended at least temporarily migrating ext4 to a few separate systems only. Torvalds considered this to be "crappy" advice and that "we might as well go back to ext2 then."

In his response, Ts'o fell back on the performance benefits thanks to delayed allocation, as had been allowed earlier under POSIX. By his experience, the difference between five seconds and three minutes "wasn't that big of a deal" in practice, "at least in the days when people were proud of their Linux systems having 2-3 year uptimes." Plus there was a remedy: "For precious files, applications that use fsync() will be safe." If this were a problem for some, they could "turn off delayed allocation with the nodelalloc mount option."

Kernel chief Torvalds is hardly convinced by these arguments. In his view, "if you write your metadata earlier (say, every 5 sec) and the real data later (say, every 30 sec), you're actually more likely to see corrupt files than if you try to write them together... This is why I absolutely detest the idiotic ext3 writeback behavior. It literally does everything the wrong way around -- writing data later than the metadata that points to it. Whoever came up with that solution was a moron. No ifs, buts, or maybes about it."

Comments

You might as well use XFS

mcwilliam
If you go for big delayed writes to gain performance, we might as well go for XFS, or put our development efforts in it.
What FS Does Linus Use/LIke?

Sa
Well looking at this, I'd be curious to know what FS Linus likes to use then?

THANKS
Ext3/4 reliability

William Boyle
Well, it's the old saw about performance vs. reliability in this case. In my opinion, as a designer and developer of large-scale distributed transaction processing systems, data is king. If a transaction commits, the data is on disc. I think that this should be the case for file systems as well. Journaling should allow roll-forward recovery (deltas are written, but the full file might not be updated) so that when the caller gets back control, the journal has been written to persistent store, including any required metadata. If the system fails after that fact, then roll-forward recovery should restore the data that the user application had written. In any case, if you want to look at power-fail safe file systems, look at QNX. They wrote the book on high-reliability file systems, IMO.
Mechanism for ext3/ext4 data loss?

Bob Gustafson
Has anyone worked through the/any mechanism by which data is lost?

I can see that if there is a power failure when data is in memory and it hasn't been written to a journal somewhere - it can be lost. A fairly old fix to this is to write the journal to a battery backed memory on the disk controller. If this write can be done before the main power supply capacitors are depleted, there shouldn't be any loss. Maybe there is a less expensive way to do it.
Delayed sync

Bob
You can delay sync and it (probably) does not matter in the majority of cases if that delay is 2 minutes. Because in that time any reloads are likely to be still cached locally anyway.

Unless it is a day or time when your machine is busy.

But imagine the situation on a fresh install or the copying of huge amounts of data, I can't help feeling that cacheing system is going to be a terrible bottle neck.

Run a rsync in the back ground while you are editing pictures, when is the sync going to catch up exactly.

I know the ext4 guys are getting hot under the collar, but surely they can understand that people are going to wonder at how good the ext4 is at deciding on the best action on the fly. Journal now or journal after a delay?

Those data losses, why respond with "If you know a better way tell us...", well I know one that might be better, don't lose data.

comments powered by Disqus

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Substantial Update to IPFire Now Available

The lastest version of IPFire features a fundamental change to how the system handles DNS.
Gnome Working on Test Center App to Make Testing Easier

Gnome , Linux

It's now possible to test experimental features on the Gnome desktop without worrying that you'll break things.
New Vulnerability Discovered in Linux Kernel

Artificial Inte... , Kernel , vulnerability

Hiding out for nearly 15 years, the Ghostlock vulnerability allows a standard logged-in user to gain root privileges.
New Linux Flaw Lets Attackers Escape VMs

RHEL , Security , vulnerability

A 16-year-old vulnerability allows an attacker to escape a virtual machine, gain access to the host, and execute malicious code.
Hannah Montana Linux Is Back!

DEBIAN , Kubuntu , Plasma

Developer Noah Cagle decided the world needed the once obscure but beloved Linux distribution and gave it a decidedly pink refresh.
System76 Refreshes the Lemur Laptop

Hardware , laptop

If you're looking for a laptop with tons of power and battery, look no further than the latest iteration of the System76 Lemur Pro.
More than 43 Million Lines of Code in Linux Kernel 7.2

Kernel , Linux

Using the cloc utility, Michael Larabel of Phoronix discovered that Linux kernel 7.2 has over 43 million lines of code.
Kubuntu Focus Goes Ultra

Hardware , Kubuntu , laptop

The Kubuntu Focus team has upped the performance ante of its M2 and Zr laptops with the latest, greatest CPUs from Intel.
Linux Gamers May Soon See Less Mouse Lag in KDE Plasma

Games , KDE , Plasma

Gamers using KDE’s Plasma desktop have been suffering from a slight input delay in mouse movement that could lead to getting fragged.
Three Lines of Code Improve Linux Storage Performance

Kernel , Performance , Storage

A developer changed three lines of code, giving Linux storage performance a 5% bump.

Linus Torvalds Upset over Ext3 and Ext4

Related content

Comments

You might as well use XFS

What FS Does Linus Use/LIke?

Ext3/4 reliability

Mechanism for ext3/ext4 data loss?

Delayed sync

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters