Clean your filesystem with FSlint

Tools

The intended uses of the tools pooled under the GUI are relatively easy to understand, but a second glance at the details is still worthwhile.

  • Duplicates searches for files with identical content in the specified directories, independent of file names.
  • Installed packages only lists the installed packages on Debian-based systems.
  • Bad names finds problematic file names.
  • Name clashes displays executable files with identical names.
  • Temp files finds files probably created for temporary caching that satisfy the typical name rules (*~, #*#, *.bak, …).
  • Bad symlinks displays faulty symbolic links.
  • Bad IDs identifies files with UIDs from non-existent users.
  • Empty directories lists files without content.
  • Non stripped binaries supplies a list of programs and libraries from which unneeded debug code was not removed. You can normally correct this with strip <file>.
  • Redundant whitespace identifies files containing multiple spaces – more on this later.

Detecting duplicate files is one of the foremost features of FSlint and is accordingly mature. The tool uses an algorithm that first identifies files of the same length. It then filters out those connected by hard links. Finally, it compares the remaining files using multiple checksums.

Determining how you want to deal with the identified files occurs in two stages. For example, you can perform specific actions for individual or previously selected files via the context menu, or you can use the buttons under the results window – most of the time.

Select allows files to be marked in different ways. You can put the results list in a file using Save or delete the selected files with Delete. Merge combines the files found (i.e., replaces the duplicates with hard links). This action, unlike all other actions in FSlint, does not act on the files selected; rather, it acts on all the files not selected. The corresponding dialog window makes this matter clear (Figure 3) but can still be overlooked easily.

Figure 3: When merging files, the selected files remain unchanged.

Identical program names can lead to big problems if it is not unequivocally clear to the user which program should be used and when. Such situations typically occur when installing new, old, or different program versions – for example, under /usr/local/bin/. In such a case, the system $PATH order defines which version to use (Figure 4).

Figure 4: As FSlint brings to light here, multiple version of some programs coexist on the computer.

In the Search ($PATH) line above the results window, you can specify how and where FSlint searches for name conflicts. The variants offered under Conflicting files change as soon as you uncheck the box. Then, other conflicts, such as different spellings (different capitalization) or conflicts with alias definitions, can be detected.

You can determine in two places what FSlint finds when testing for Bad names: The Sensitivity slider above the results window sets the accuracy with which the test detects problems with four settings; high values (3 or 4) lead to more results. Additionally, you can display files with encoding problems by ticking the invalid UTF8 mode? checkbox. Both modifications require a separate run. In many cases – particularly when transferring files between multiple computers with different operating systems – you might want to use both the sensitivity slider and the UTF8 checkbox to minimize the remaining differences and combat compatibility issues. See also the "Bad Names" boxout.

Bad Names

FSlint uses findnl to find problematic file names. The options -1 (least) to -3 (most stringent) check path and file names for maximum lengths (-3 requires shorter names); the -p option checks POSIX compatibility. A higher sensitivity level also restricts the characters allowed in file names. The script first only considers non-standard characters (space, @, etc.) at the beginning and end of the name and characters that need to be masked in the shell, such as ?, *, $, and #; multiple dots in a name (*.*.*), as well as a number of other restrictions. The script sets the rules very clearly in exactly 160 lines of code [3]. Under the Bad names option, FSlint also uses the findu8 and supprt/fslver scripts to check for UTF8 compatibility.

The findtf script [4] detects Temp files. It has two important options that also are present in the graphical interface: The -c option, which corresponds to the core file mode? GUI checkbox, finds core files, whereas --age=<days> corresponds to the minimum age spinner and only output files that are older than the specified threshold value (default =  ).

If you are still missing naming patterns for typical temporary files on your system, you can add them to the tmpFiles variables (Listing 4; lines 87 and 88 in the source) by specifying patterns and wildcards in single quotes (e.g., '*.del') and enter static file names directly (e.g., delme).

Listing 4

Adding Temp Files to Delete

[...]
tmpFiles="'*.del' delme core dead.letter '*~' ',*' '*.v' tmp junk '\#*' '.emacs_[0-9]*' '*.[Bb][Aa][Kk]' '*.swp' '*.pure'"
[...]

The findbl script identifies Bad symlinks, as well as dangling links (-d, default) and uses options to detect the following problems:

  • -d: all dangling links (i.e., links pointing to a black hole)
  • -s: (suspicious) absolute links to a path below the current directory
  • -l: all relative links
  • -a: all absolute links
  • -n: all redundant links (/././., /////, /../, …)

FSlint returns all files whose owners or groups are not known or no longer exist in the system as matches for Bad IDs – the owners are thus missing in /etc/passwd, and the groups are missing in /etc/group. Such occurrences are indicative of installation problems or are the result of transferring files from other systems.

The check for Redundant whitespace checks the content of text files, not the file names. Lines with multiple successive spaces are sometimes found in automatically generated files (e.g., *.aux or *.log files) but are also caused by typos. Many programs, especially tools for evaluating protocols, have trouble dealing with redundant whitespace.

The relevant FSlint script supports three options: -c (count) counts the problematic lines, -w (whitespace) finds space characters at the ends of lines, and -t (tabs) reports the mixed occurrence of tabs and space characters. The --view option displays problematic passages of text in Vi, which the editor to be installed; however, Vi is installed by default in almost all popular distributions.

Conclusions

If you think your neat and tidy home directory and your well-organized system directories contain virtually no problematic files, then you are almost certainly laboring under a delusion. The results that come to light with the FSlint suite – even on purportedly the cleanest systems – speak to the quality of the clever garbage collection package.

To what extent the weaknesses identified are actually problems needs to be investigate on individual merit; however, hanging links, unnecessary duplicates, and directories without content are of no use to the user. The option to track and remove orphaned temporary files alone would justify the regular use of FSlint. Thus, it is amazing that nearly all the popular distributions fail to include this practical tool in their repositories.

Quite aside from its practical benefits, FSlint proves to be a genuine treasure trove in another context: The suite also can be used as a source of inspiration to all enthusiasts and beginners in the world of shell scripting as a perfect example of what readable scripts should look like.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Letters
  • Tutorial – Detecting Broken Links

    Broken links can wreak havoc in directory structures. This article shows you how to use scripts to avoid having your links lead to a dead end.

  • Disk Cleaners

    You wouldn’t believe how much of the disk space on your system is wasted, filled with duplicate files, or cruft you don’t need.

  • fdupes

    The command-line fdupes tool helps you find duplicate folders and directories.

  • Metadata in the Shell

    Armed with the right shell commands, you can quickly identify and evaluate file and directory metadata.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News