Tips for managing Linux filesystems

Blockwise

© Lead Image © Dmitry Sunagatov, Fotolia.com

© Lead Image © Dmitry Sunagatov, Fotolia.com

Article from Issue 187/2016
Author(s): , Author(s):

Even with all the talk of Big Data and the storage revolution, a steady and reliable block-based filesystem is still a central feature of most Linux systems.

Linux is an operating system for lovers of variety, and filesystems are no exception: The range of filesystem options extends from block-based filesystems to temporary filesystems in RAM or pseudo filesystems. This workshop offers some tips for managing a filesystem in Linux.

Block-based filesystems are the most important components for storing data on disk. The best way to imagine a filesystem is like a kind of library that stores data efficiently and in a structured way. Almost every Linux system has at least one block-based filesystem, such as Ext4, XFS, or Btrfs. You have several filesystems to choose from on Linux, and you have probably had some experience with at least the Ext series. If you work with a current distribution, you are likely to have met other filesystems, too. Table 1 shows the standard filesystem for some leading Linux distributions.

Table 1

Standard Filesystems

Distribution

Standard Filesystem

Debian (from version 7.0 Wheezy)

Ext4

Ubuntu (from version 9.04)

Ext4

Fedora (from Version 22)

XFS

SLES (from Version 12)

Btrfs for the root partition, XFS for data partitions

RHEL 7

XFS

Current filesystems are very similar, but they differ in some of the details. You will encounter the following terms when your work with Linux filesystems:

  • Superblock: The superblock stores metadata about a filesystem. This metadata includes information such as the total number of blocks and inodes, block sizes, UUIDs, and timestamps.
  • Inode: An inode or index node consists of metadata associated with a file. The inode data might contain permissions, owners, timestamps, and so on. In addition to this descriptive information, one inode can contain direct extents (data).
  • Extents: Older filesystems used direct and indirect blocks to reference blocks of data – modern filesystems use a more efficient method with extents [1]. Extent mapping is a more efficient way to map the logical filesystem blocks to physical blocks.
  • Journaling: Journaling records operations performed on the filesystem, which helps you get back to a consistent state after a crash. A journal comes into its own in exceptional situations such as during recovery of the filesystem due to sudden power failure.

From RAM to Persistence Memory

Random access memory (RAM) still has speed advantages over hard drives and SSDs. To improve performance and reduce the need for disk access, the Linux kernel uses a caching mechanism that keeps data in RAM. This cache is known as the page cache; running the free command reveals its current size (Listing 1).

Listing 1

Free Space

01 free -h
02                       total     used     free     shared     buffers     cached
03 Mem:                  7.7G      4.9G     2.7G     228M       203M        2.7G
04 -/+ buffers/cache:    2.1G      5.6G
05 Swap:                 1.0G        0B     1.0G

In this example, 2.7GB of 7.7GB RAM are available to the system at first glance. If the RAM usage for the page cache is deducted, actually 5.6GB are free. The page cache thus occupies 2.7GB (column "cached"). The "buffers" column also includes cached filesystem metadata.

The page cache consists of physical pages in RAM whose data pages are associated with a block device. The page cache size is always dynamic; it always uses the RAM unused by the operating system. If the system suffers from high memory consumption, the page cache size is reduced, freeing up memory for applications. In terms of caching mechanisms, the page cache is a write-back cache. Such caches buffer data for both reading and writing. A read from the block device propagates the data to the cache, where it is then passed to the application. Write access temporarily lands directly in the cache and not on the block device. At this point, the system has dirty pages because the data has not yet been written persistently. The Linux kernel gradually writes the data from RAM to the block device.

In addition to periodically writing data through the kernel, Ext4 explicitly synchronizes its data and metadata. Ext4 uses an interval of five seconds by default. You can change the time if necessary with the commit mount option (see the Ext4 documentation of the Linux kernel [2]). In the worst case, the data that is still in the RAM is lost in case of a sudden power outage. The risk of data loss increases with the length of the commit interval.

The use of RAM as a cache provides huge performance advantages for the user. Don't forget, however, that RAM is volatile and not persistent. This fact forced itself into the awareness of many Ext4 users recently, when a bug with the title "Data corruption caused by unwritten and delayed extents" caused a stir. On Ext4, ephemeral files may never even reach the block device [3] under certain circumstances. Ext4 uses a technique called delayed allocation to allocate system call blocks immediately for a write. Although the blocks are reserved, they are only kept in RAM for the time being. Ext4 is not the only filesystem that uses this acceleration action: XFS, ZFS, and Btrfs also use delayed allocation. The filesystems benefit from the RAM speed, less fragmentation, and the ability to combine small random writes.

Ext4

As the successor to Ext3, Ext4 is one of the most popular Linux filesystems. Whereas Ext3 is slowly reaching its limits with a maximum filesystem size of 16 Tebibytes (slightly more than 16 Terabytes), Ext4 provides sufficient space for many years with up to 1 Exbibyte capacity.

To create a new Ext4 filesystem, you need an unused block device. You can simply use a spare partition (for example, /dev/sdb1, if you have created an unused partition on the second disk), or you can use an LVM logical volume. In the following examples, I will use a Logical Volume (/dev/vg00/ext4fs). With root privileges, run mkfs.ext4 to create the new filesystem:

mkfs.ext4 /dev/vg/00/ext4fs

A newly created Ext4 filesystem requires that all inode tables and the journal do not contain data. The corresponding areas must therefore be reliably overwritten with zeros ("zeroed"). This step might take significant time for larger filesystems, especially with hard drives. But to let you use a new filesystem as soon as possible, the Ext4 developers have implemented what they refer to as "lazy initialization," meaning that initialization does not occur when you create a filesystem but in the background when you first mount the filesystem.

Little wonder then that you suddenly notice I/O activity on mounting a new filesystem. Caution is therefore advised if you want to run performance tests with a newly created filesystem. In such cases, I recommend not creating the filesystem with lazy initialization.

To set up a filesystem with lazy initialization, use the following parameters:

mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vg00/ext4fs

To mount the filesystem, create an appropriate mount point up front, and then run the mount command:

mkdir /mnt/ext4fs
mount /dev/vg00/ext4fs /mnt/ext4fs

If you want to mount the new filesystem automatically at boot time, add a corresponding entry in the /etc/fstab file.

You can optionally use specific mount options with the -o parameter for the mount command; for example, you can use -o to mount a partition as read-only. See the Linux kernel documentation for a list of possible options [2]. Once the filesystem is mounted, /proc/mounts only shows a few options (rw, relatime, data=ordered) that need to run with the mount command or in /etc/fstab (for example, errors = remount-ro); to enable these options:

# cat /proc/mounts | grep ext4
/dev/sda1 / ext4 rw,relatime,errors=remount-ro,data=ordered 0 0
/dev/mapper/vg00-ext4fs /mnt/ext4fs ext4 rw,relatime,data=ordered 0 0

In addition to these options, other standard options are active. Since Linux kernel version 3.4, you can now view filesystem information through the proc filesystem. Listing 2 shows an example.

Listing 2

Filesystem Info in /proc

01 # cat /proc/fs/ext4/sda1/options
02 rw
03 delalloc
04 barrier
05 user_xattr
06 acl
07 resuid=0
08 resgid=0
09 errors=remount-ro
10 commit=5
11 min_batch_time=0
12 max_batch_time=15000
13 stripe=0
14 data=ordered
15 inode_readahead_blks=32
16 init_itable=10
17 max_dir_size_kb=0

Filesystem Check

When you run a check on an Ext4 filesystem, be sure the filesystem is not mounted. You simply run the e2fsck program to check; as an alternative, you can also use the symbolic link, fsck.ext4. If the filesystem was not properly unmounted, the check terminates; alternatively you can force validation with the -f parameter.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Choose a Filesystem

    Every Linux computer needs a filesystem, and users often choose a filesystem by habit or by default. But, if you're seeking stability, versatility, or a small performance advantage, it pays to take a closer look.

  • Offline FS

    Tired of copying and recopying files from your laptop to the office file server? Maybe you need an automated offline filesystem, such as OFS.

  • AuFS

    AuFS offers a painless filesystem for a thin client, and FS-Cache provides a persistent cache.

  • Write Barriers

    Your journaling filesystem is carefully tracking write operations – but what happens when the data gets to the disk? A write barrier request can help protect your data.

  • Configuring Filesystems

    Although most Linux distributions today have simple-to-use graphical interfaces for setting up and managing filesystems, knowing how to perform those tasks from the command line is a valuable skill. We’ll show you how to configure and manage filesystems with mkfs, df, du, and fsck.

comments powered by Disqus