Clean your filesystem with FSlint

Garbage Collector

© Lead Image © denis cristo, 123RF.com

© Lead Image © denis cristo, 123RF.com

Article from Issue 177/2015
Author(s): , Author(s):

FSlint detects the source of filesystem problems and remedies or mitigates them while cleaning up the hard drive.

In principle, filesystems are large, not particularly intelligent databases that tend to gather a lot of dust and fluff over time. Occasionally checking and, if necessary, repairing filesystems to keep them functional is considered good practice.

At the system level at startup, fsck acts as a wrapper for various special tools that check and repair filesystems at the lowest (block) level; however, it does not take into account any problems that have anything to do with the logical (directory) structure, which can cause virtually the same amount of damage in the case of a mishap. Problems in the directory layer of the filesystem typically include "dangling links" (symbolic links that point to non-existent files) and problematic file names, among other difficulties.

Padraig Brady created the easy-to-use solution FSlint [1] with a straightforward graphical interface, fslint-gui, that combines and unifies the use of more than a dozen tools, each of which addresses a specific task (Table 1). All tools also can be used independent of the interface, enabling scripting and supporting work in the shell.

Table 1

FSlint Tools

Name

Function

findup

Finds duplicate files

findnl

Finds files with problematic names

findu8

Finds files with invalid UTF8 codes

findbl

Finds problematic symlinks (bad links)

findsn

Finds different files with identical names (same name)

finded

Finds empty directories

findid

Finds files with unknown users (dead user IDs)

findns

Finds non-stripped executables

findrs

Finds files with redundant spaces (redundant whitespace)

findtf

Finds temporary files

findul

Finds what seem to be unused libraries

zipdir

Shrinks ext2/3 directories

Support scripts under supprt/

md5sum_approx

Generates MD5 checksums for file parts

getffp

Generates paths in the find format

getffl

Generates library path in the find format

fslver

Processes errors

rmlint

Merges files

Support scripts under fstool/

lS

Sorts tasks by file size

edu

Determines file size

dupwaste

Calculates memory wasted by duplicate files

dir_size

Determines directory size

Setting up FSlint

Unfortunately, very few distributions provide FSlint in their repositories. Many distros install FSlint in the /usr/share/fslint/ directory (including Ubuntu 14.04 LTS) or in a directory contained in the path ($PATH). In most cases, you will have to download the tarball from the FSlint home page (Listing 1). An installation in the strictest sense is unnecessary because FSlint is a combination of Python code (GUI) and shell scripts (tools).

Listing 1

Installing FSlint

$ cd /usr/share/
$ sudo wget http://www.pixelbeat.org/fslint/fslint-2.44.tar.gz
$ sudo tar xf fslint-2.44.tar.gz
$ cd fslint-2.44/po/
$ sudo make
$ cd ../..
$ sudo rm fslint-2.44.tar.gz
$ PATH=$PATH:/usr/share/fslint-2.44/ && export PATH
$ fslint-gui &

Simply unpack the archive in any directory and then add the directory to your $PATH/usr/share/fslint/. Next, switch to fslint/po/ and call make for correct localization. You must then complete the path to run the program. The PATH line from Listing 1 is fine for test purposes; for a permanent change, you would need to add or edit the line in your ~/.bashrc.

If you need to package FSlint yourself to integrate it with your system's package manager, the FSlint homepage and the tool's FAQ offers instructions for how to do this in many common distributions [2].

Under the Hood

When you take a closer look at the FSlint package, you'll see that each tool is implemented as a shell script. To find file duplicates, for example, findup is used. The script is well structured and documented (Listing 2) and possesses a number of interesting details (e.g., filtering out duplicate files from the stream of file names).

Listing 2

findup Tool

[...]
# The following optional block, md5sums a small sample of each file,
# which can help when there are many files of the same size,
# even more so if they are large. This usually adds a small amount of
# runtime, however it can save a large amount of time in certain situations.
if "$script_dir"/supprt/md5sum_approx </dev/null 2>/dev/null; then
    xargs -r0 "$script_dir"/supprt/md5sum_approx |
    sort |                     #group duplicate files together
    uniq --all-repeated -w32 | #pick just duplicates
    cut -d' ' -f3- |           #get filenames
    sort |                     #sort by paths to try to minimise disk seeks
    tr '\n' '\0'               #delimit names with \0
else
    cat
fi |
[...]

Furthermore, the shell scripts have additional options that are only indirectly available in the GUI. For example, -m (merge) merges the files found by using hard links (see the "Links: Hard and Soft" box), whereas -d (delete) removes all but a single copy of identical files. Findup displays the identical files with -t (test) without executing an action. The script's output can be redirected to an external file, edited manually, and then reused.

Links: Hard and soft

The man page for ln explains that links between files and directories can be created in four ways. Links to existing targets can be created and shown on the same inode as hard (physical) links only for files within the same filesystem. You can generate soft or symbolic links for any target; the system interprets them as links relative to the parent directory at run time. Symlinks can also point at directories and might go beyond the boundaries of your own filesystem.

The use of find in the scripts has a special advantage: It allows all relevant options for this command – the recursion level, for example (-maxdepth and -mindepth) – to be transmitted directly to the command line when called. Through specific restrictions, this accelerates the processing significantly under certain circumstances or excludes certain directories and files. As shown in the first line of Listing 3, directories can be excluded; for individual files, you would use the construction in the second line.

Listing 3

Exclusions

$ findup \( -path "*/.svn" \) -prune -o
$ findup \( ! -name "*.tex" \)

Whereas the FSlint "find" tools are used for finding files and directories (see the "FSlint find Tools" box), the scripts in the fslint/supprt/ directory all take actions. Some tools have two variants; for example, both fixdup.sh and fixdup delete or merge the selected files. Both draw on the formats generated by findup; however, fixdup is a Python script, so it works a little faster than the shell script variant. Four other interesting scripts reside under fstool/. Given the complex results sets (Figure 1), the best approach could take you on a fairly roundabout route.

FSlint find Tools

The FSlint tools are based on the powerful GNU find command, which imports filesystems piece by piece and directories one by one. Ideally, you should use these tools in the single-user mode, which you select when booting or enable later as superuser. Failing to enter this mode (e.g., because a server needs to remain accessible) can cause problems. For example, duplicate files could occur in directories already checked by FSlint, or a user could modify files that had already been processed.

Like all technologies based on find, the tools reach their limits with big filesystems. Execution time increases proportionally with filesystem size. With today's terabyte-scale hard disks, even running scripts overnight cannot guarantee that the tools will find the relevant files. A second shortcoming can be serious as well: Not only does processing time grow with the size of the filesystem, so too does the RAM required. The system could then start to swap, slowing down processing even further.

Figure 1: In practice, you often receive very inhomogeneous results lists. A good alternative could be exporting into an external file that you then edit manually.

Working with the GUI

FSlint tools are a real asset for specialists and experienced Linux users. They make it possible to achieve effective, accurate filesystems, albeit at an abstract level. FSlint creator Brady developed a Python interface for less savvy users that roughly provides the same features as the packaged tools, but with a clear structure and with results that display in an easy-to-understand manner. The tool requires superuser privileges if you want to use it in the system control area.

Figure 2 shows the interface shortly after startup. You can select the directories to be edited in the upper section of the window (Search path). Mounted external filesystems can also be included here, which FSlint processes in exactly the same way. Clicking the recurse? selection box adds subdirectories to the process. If this choice selects directories you would prefer to ignore, the Advanced search parameters tab lets you exclude individual directories from processing (e.g., .wine/ in the user's home directory).

Figure 2: The fslint-gui interface is straightforward.

In the lower rank of buttons, you can select the desired action. Pressing Find reveals a list of the generated or updated results. To help display all of the relevant information, you can both enlarge the window and reduce the size of the columns.

At the very bottom of the window, a line displays statistical information for the results output and the error messages generated by the find* scripts or the commands called by them.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Letters
  • Tutorial – Detecting Broken Links

    Broken links can wreak havoc in directory structures. This article shows you how to use scripts to avoid having your links lead to a dead end.

  • Disk Cleaners

    You wouldn’t believe how much of the disk space on your system is wasted, filled with duplicate files, or cruft you don’t need.

  • fdupes

    The command-line fdupes tool helps you find duplicate folders and directories.

  • Metadata in the Shell

    Armed with the right shell commands, you can quickly identify and evaluate file and directory metadata.

comments powered by Disqus