Comparing the ext3, ext4, XFS, and Btrfs filesystems

XFS

XFS [4] is a 64-bit journaling filesystem. It was originally created in 1994 by Silicon Graphics, Inc. for its IRIX operating system. XFS was later ported to the Linux kernel version 2.4 in 2002.

A benefit of XFS is its stability an maturity. XFS is often seen as the filesystem for people with massive amounts of data. Because it is a full 64-bit filesystem, XFS is capable of handling filesystems as large as millions of terabytes (Exabytes). XFS ensures data consistency via metadata journaling, which allows it to restart very quickly after an unexpected interruption, regardless of the number of files it is managing. At the same time, XFS manages to minimize the performance impact of journaling.

XFS also supports write barriers (a mechanism for enforcing a particular ordering in a sequence of writes). Another specialty of XFS is its allocation groups. Allocation groups allow systems with multiple processors or multi-core processors to provide better throughput by simultaneously reading and writing through multiple application threads. XFS is capable of delivering close to the raw I/O performance that the underlying hardware can provide.

Performance

Table 1 shows the basic features of the four filesystems at a glance. Comparing performance is more of a challenge. Evaluating filesystem performance is a very difficult task because of the complex role a filesystem plays. What does "faster" mean? One system might be faster for accessing many small files, while another is faster for accessing a single large file. One filesystem might perform better on metadata operations, and another might handle data better. At the same time, problems writing metadata to the journal can thwart the overall I/O performance. Thus, a single number can never characterize the performance of a filesystem. Instead, it is better to isolate the different aspects of performance and measure them separately. Afterward, you can determine which aspects are most significant for the workloads you envision.

Table 1

Comparing Features

Name

Btrfs

Ext3

Ext4

XFS

Created

2007

1998

2006

1994

Original OS

Linux

Linux

Linux

IRIX

Limits

Max. filename length

255 bytes

255 bytes

255 bytes

255 bytes

Max. file size (4k blocks)

8EB (Linux kernel limit)

2TB

16TB

8EB

Max. volume size

16EB

16TB

1EB

16EB

Features

Hard links

yes

yes

yes

yes

Symbolic links

yes

yes

yes

yes

Meta-data journaling

no

yes

yes

yes

Snapshots

yes

no

no

no

Clones

yes

no

no

no

Encryption

no

no

no

no

Compression

yes

no

no

no

Deduplication

yes

no

no

no

Integrated LVM

yes

no

no

no

Online resizing

grow/shrink

grow only

grow only

grow only

Offline resizing

no

grow/shrink

grow/shrink

no

Extent allocation

yes

no

yes

yes

Delayed allocation

yes

no

yes

yes

Choosing the right benchmarks for measuring filesystem performance is important. Some benchmarks study a filesystem's ability to scale with increasing load; other benchmarks works by replaying traces of recorded workloads. Block device benchmarks such as iometer [5] or fio [6] evaluate bandwidth and latency of read and write operations on the physical device. These benchmarks are not very useful for this study. I need benchmarks that operate on the filesystem layer, not on the block device.

My goal is to evaluate read and write performance as a function of the file size. An example for such a benchmark is iozone [7]. These benchmarks can become in-memory with small file sizes and "warm-cache" results. I can mitigate this effect by running all benchmarks on the same server and RAID system, so that the influence of CPU and memory is the same in all cases. I used an Exus Data ProServII server with Ubuntu Linux Server 14.04 (kernel 3.13.0) and a Transtec SCSI RAID system.

A file size larger than the buffer cache (i.e., nearly the amount of free RAM) lets the performance drop down to the spindle speed of the underlying HD or RAID group. It is possible to delete the page cache contents before running the benchmark. Delete the page cache by writing a special value to /proc/sys/vm/drop_caches. Writing a 3 will free pagecache, dentries, and inodes:

echo 3 > /proc/sys/vm/drop_caches

I chose iozone and let it run in an automated manner with the following command:

iozone -Raz -bExt3_auto_20G.xls -g20G

Iozone takes quite a while and produces a lot of data.

The program iterates through all file sizes, starting with 64KB, doubling each step, and going through all possible record sizes, starting with 4KB. The server has 16GB of RAM, so the last pass of the benchmark with 16GB file size will no longer fit in the buffer cache. It shows the speed of the physical device – all other results are defined by the speed of the CPU and buffer caches.

I picked samples with typical file sizes out of the data flood to compare the filesystems. They don't show big differences (Figure 1). The similarity is probably because RAM speed is the dominating effect in this case. Most of the reads or writes go to the buffer cache. The situation will change slightly if you use IOzone to measure the throughput explicitly. The appropriate option (-t) allows the user to specify how many threads or processes to have active during the measurement. Scaling performance with the number of processes is a strength for XFS, which clearly performs better than ext3 (Figure 2).

Figure 1: The four filesystems deliver similar performance when writing mid-sized files.
Figure 2: Reading via multiple streams shows bigger differences between the candidates, with ext3 clearly lagging behind the others.

Another question is how fast the candidates perform metadata tasks, such as creating or destroying files and directories. I used fdtree [8] to test metadata performance. fdtree is a highly portable shell script that recursively creates and removes directories and files. In my test, fdtree created/deleted four directory levels with 10 directories at each level for a total of 11111 directories, with 10 files of size 40KB per directory, for a total of 111110 files and 4.34GB. Figure 3 shows the results. Again, the differences were not huge, but ext4 takes first place, and ext3 ranks far behind at last place. It is also important to consider that file size and depth of the directory structure make a difference. It also makes a difference where the journal of ext3/4 or XFS filesystem resides – on the same disk (bad), on an extra disk (better), or on a RAM disk or SSD (best).

Figure 3: Metadata performance includes tasks such as creating and removing files and directories.

Conclusion

Choosing a RAID system with lots of spindles instead of a single disk, or choosing an SSD instead of a hard disk, has a much greater influence on I/O performance than filesystem selection. Today's fileystems aren't far apart. Nevertheless, you might gain some percentage points of performance by choosing the right filesystem for your workload.

If you need the latest features and benefit from volume manager and RAID integration, self-healing, or snapshots, your only choice is Btrfs. If stability is the most important criterion, a less complex, but well-established, solution such as ext3 might be the best option. Very large filesystems and the need for high stability lead to XFS, which is also good for reading or writing many parallel streams. Ext4 is a balanced compromise, with many new features and a rock-solid foundation, that excels at metadata operations.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Btrfs

    The Btrfs filesystem offers advanced features such as RAID, subvolumes, snapshots, checksums, and transparent compression, but do desktop users really need all that power?

  • Managing Linux Filesystems

    Even with all the talk of Big Data and the storage revolution, a steady and reliable block-based filesystem is still a central feature of most Linux systems.

  • File systems

    Many users just opt for the defaults and don’t think about the file system when they install Linux. But if better performance is your goal, it pays to do some shopping.

  • RAID Performance

    You can improve performance up to 20% by using the right parameters when you configure the filesystems on your RAID devices.

  • Configuring Filesystems

    Although most Linux distributions today have simple-to-use graphical interfaces for setting up and managing filesystems, knowing how to perform those tasks from the command line is a valuable skill. We’ll show you how to configure and manage filesystems with mkfs, df, du, and fsck.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News