Linus Torvalds Upset over Ext3 and Ext4

Mar 30, 2009

Britta Wuelfing

Linus Torvalds, Ted Ts'o, Alan Cox, Ingo Molnar, Andrew Morton and other Linux kernel developers are embroiled in a contentious discussion over the sense -- or nonsense -- of journaling and delayed allocation before a commit in the ext3 and ext4 filesystems. Heavy words are flying.

It all started with a request for help from Jesper Krogh in one of the first responses to Torvalds's announcement March 24 of Kernel 2.6.29 on the gmane.linux.kernel mailing list. Krogh reported a significant delay when writing from cache with the ext3 filesystem, despite faster hardware and extensive RAM. Was there a way to autotune it? Ingo Molnar opined that Krogh's wait time of 10 minutes was totally unacceptable, "it is the year 2009, not 1959." His personal "pain threshold" is about one second: "the historic limit for the hung tasks check was 10 seconds, then 60 seconds."

Ted Ts'o, groundbreaking in the filesystem's development, chimed in to the forum. It was just recently that he had been confronted by users over data loss upon installing their apps on the new ext4 filesystem. Ts'o set himself intensely on the problem with the source research and detailed explanation. Again he described the delayed effect in writing data. Synchronization in ext3 occurs every five seconds, whereas ext4 normally writes from cache every two minutes. Ts'o got pretty defensive: "People can call file system developers idiots if it makes them feel better --- sure, OK, we all suck. If someone wants to try to create a better file system, show us how to do better, or send us some patches."

Torvalds, for one, didn't seem too excited about the delayed synchronization. He writes on the mailing list, "Doesn't at least ext4 default to the insane model of 'data is less important than metadata, and it doesn't get journalled'? And ext3 with 'data=writeback' does the same, no? Both of which are -- as far as I can tell -- total brain damage. At least with ext3 it's not the default mode." To avoid the synchronization problem Ts'o had recommended at least temporarily migrating ext4 to a few separate systems only. Torvalds considered this to be "crappy" advice and that "we might as well go back to ext2 then."

In his response, Ts'o fell back on the performance benefits thanks to delayed allocation, as had been allowed earlier under POSIX. By his experience, the difference between five seconds and three minutes "wasn't that big of a deal" in practice, "at least in the days when people were proud of their Linux systems having 2-3 year uptimes." Plus there was a remedy: "For precious files, applications that use fsync() will be safe." If this were a problem for some, they could "turn off delayed allocation with the nodelalloc mount option."

Kernel chief Torvalds is hardly convinced by these arguments. In his view, "if you write your metadata earlier (say, every 5 sec) and the real data later (say, every 30 sec), you're actually more likely to see corrupt files than if you try to write them together... This is why I absolutely detest the idiotic ext3 writeback behavior. It literally does everything the wrong way around -- writing data later than the metadata that points to it. Whoever came up with that solution was a moron. No ifs, buts, or maybes about it."

Comments

You might as well use XFS

mcwilliam
If you go for big delayed writes to gain performance, we might as well go for XFS, or put our development efforts in it.
What FS Does Linus Use/LIke?

Sa
Well looking at this, I'd be curious to know what FS Linus likes to use then?

THANKS
Ext3/4 reliability

William Boyle
Well, it's the old saw about performance vs. reliability in this case. In my opinion, as a designer and developer of large-scale distributed transaction processing systems, data is king. If a transaction commits, the data is on disc. I think that this should be the case for file systems as well. Journaling should allow roll-forward recovery (deltas are written, but the full file might not be updated) so that when the caller gets back control, the journal has been written to persistent store, including any required metadata. If the system fails after that fact, then roll-forward recovery should restore the data that the user application had written. In any case, if you want to look at power-fail safe file systems, look at QNX. They wrote the book on high-reliability file systems, IMO.
Mechanism for ext3/ext4 data loss?

Bob Gustafson
Has anyone worked through the/any mechanism by which data is lost?

I can see that if there is a power failure when data is in memory and it hasn't been written to a journal somewhere - it can be lost. A fairly old fix to this is to write the journal to a battery backed memory on the disk controller. If this write can be done before the main power supply capacitors are depleted, there shouldn't be any loss. Maybe there is a less expensive way to do it.
Delayed sync

Bob
You can delay sync and it (probably) does not matter in the majority of cases if that delay is 2 minutes. Because in that time any reloads are likely to be still cached locally anyway.

Unless it is a day or time when your machine is busy.

But imagine the situation on a fresh install or the copying of huge amounts of data, I can't help feeling that cacheing system is going to be a terrible bottle neck.

Run a rsync in the back ground while you are editing pictures, when is the sync going to catch up exactly.

I know the ext4 guys are getting hot under the collar, but surely they can understand that people are going to wonder at how good the ext4 is at deciding on the best action on the fly. Journal now or journal after a delay?

Those data losses, why respond with "If you know a better way tell us...", well I know one that might be better, don't lose data.

comments powered by Disqus

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU

Games , Hardware , laptop , Linux

This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
XZ Gets the All-Clear

Arch Linux , Fedora , Linux , open source , Security , Ubuntu

The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
Canonical Collaborates with Qualcomm on New Venture

Artificial Inte... , Linux , open source , Security , Ubuntu

This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
Kodi 21.0 Open-Source Entertainment Hub Released

audio , Multimedia , Music , open source , streaming video , Video

After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
Linux Usage Increases in Two Key Areas

Games , Linux , open source , Steam

If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
Vulnerability Discovered in xz Libraries

Fedora , Linux , malware , Security

An urgent alert for Fedora 40 has been posted and users should pay attention.
Canonical Bumps LTS Support to 12 years

Linux , open source , Operating Systems , Ubuntu

If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
Fedora 40 Beta Released Soon

Fedora , Gnome , open source , Plasma , Wayland

With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.
New Pentesting Distribution to Compete with Kali Linux

Linux , open source , Tools , Ubuntu

SnoopGod is now available for your testing needs
Juno Computers Launches Another Linux Laptop

Hardware , laptop , Linux , Ubuntu

If you're looking for a powerhouse laptop that runs Ubuntu, the Juno Computers Neptune 17 v6 should be on your radar.

Linus Torvalds Upset over Ext3 and Ext4

Related content

Comments

You might as well use XFS

What FS Does Linus Use/LIke?

Ext3/4 reliability

Mechanism for ext3/ext4 data loss?

Delayed sync

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters