Clean your filesystem with FSlint
Garbage Collector
FSlint detects the source of filesystem problems and remedies or mitigates them while cleaning up the hard drive.
In principle, filesystems are large, not particularly intelligent databases that tend to gather a lot of dust and fluff over time. Occasionally checking and, if necessary, repairing filesystems to keep them functional is considered good practice.
At the system level at startup, fsck
acts as a wrapper for various special tools that check and repair filesystems at the lowest (block) level; however, it does not take into account any problems that have anything to do with the logical (directory) structure, which can cause virtually the same amount of damage in the case of a mishap. Problems in the directory layer of the filesystem typically include "dangling links" (symbolic links that point to non-existent files) and problematic file names, among other difficulties.
Padraig Brady created the easy-to-use solution FSlint [1] with a straightforward graphical interface, fslint-gui
, that combines and unifies the use of more than a dozen tools, each of which addresses a specific task (Table 1). All tools also can be used independent of the interface, enabling scripting and supporting work in the shell.
Table 1
FSlint Tools
Name | Function |
---|---|
findup |
Finds duplicate files |
findnl |
Finds files with problematic names |
findu8 |
Finds files with invalid UTF8 codes |
findbl |
Finds problematic symlinks (bad links) |
findsn |
Finds different files with identical names (same name) |
finded |
Finds empty directories |
findid |
Finds files with unknown users (dead user IDs) |
findns |
Finds non-stripped executables |
findrs |
Finds files with redundant spaces (redundant whitespace) |
findtf |
Finds temporary files |
findul |
Finds what seem to be unused libraries |
zipdir |
Shrinks ext2/3 directories |
Support scripts under supprt/ |
|
md5sum_approx |
Generates MD5 checksums for file parts |
getffp |
Generates paths in the find format |
getffl |
Generates library path in the find format |
fslver |
Processes errors |
rmlint |
Merges files |
Support scripts under fstool/ |
|
lS |
Sorts tasks by file size |
edu |
Determines file size |
dupwaste |
Calculates memory wasted by duplicate files |
dir_size |
Determines directory size |
Setting up FSlint
Unfortunately, very few distributions provide FSlint in their repositories. Many distros install FSlint in the /usr/share/fslint/
directory (including Ubuntu 14.04 LTS) or in a directory contained in the path ($PATH
). In most cases, you will have to download the tarball from the FSlint home page (Listing 1). An installation in the strictest sense is unnecessary because FSlint is a combination of Python code (GUI) and shell scripts (tools).
Listing 1
Installing FSlint
$ cd /usr/share/ $ sudo wget http://www.pixelbeat.org/fslint/fslint-2.44.tar.gz $ sudo tar xf fslint-2.44.tar.gz $ cd fslint-2.44/po/ $ sudo make $ cd ../.. $ sudo rm fslint-2.44.tar.gz $ PATH=$PATH:/usr/share/fslint-2.44/ && export PATH $ fslint-gui &
Simply unpack the archive in any directory and then add the directory to your $PATH
– /usr/share/fslint/
. Next, switch to fslint/po/
and call make
for correct localization. You must then complete the path to run the program. The PATH
line from Listing 1 is fine for test purposes; for a permanent change, you would need to add or edit the line in your ~/.bashrc
.
If you need to package FSlint yourself to integrate it with your system's package manager, the FSlint homepage and the tool's FAQ offers instructions for how to do this in many common distributions [2].
Under the Hood
When you take a closer look at the FSlint package, you'll see that each tool is implemented as a shell script. To find file duplicates, for example, findup
is used. The script is well structured and documented (Listing 2) and possesses a number of interesting details (e.g., filtering out duplicate files from the stream of file names).
Listing 2
findup Tool
[...] # The following optional block, md5sums a small sample of each file, # which can help when there are many files of the same size, # even more so if they are large. This usually adds a small amount of # runtime, however it can save a large amount of time in certain situations. if "$script_dir"/supprt/md5sum_approx </dev/null 2>/dev/null; then xargs -r0 "$script_dir"/supprt/md5sum_approx | sort | #group duplicate files together uniq --all-repeated -w32 | #pick just duplicates cut -d' ' -f3- | #get filenames sort | #sort by paths to try to minimise disk seeks tr '\n' '\0' #delimit names with \0 else cat fi | [...]
Furthermore, the shell scripts have additional options that are only indirectly available in the GUI. For example, -m
(merge) merges the files found by using hard links (see the "Links: Hard and Soft" box), whereas -d
(delete) removes all but a single copy of identical files. Findup displays the identical files with -t
(test) without executing an action. The script's output can be redirected to an external file, edited manually, and then reused.
Links: Hard and soft
The man page for ln
explains that links between files and directories can be created in four ways. Links to existing targets can be created and shown on the same inode as hard (physical) links only for files within the same filesystem. You can generate soft or symbolic links for any target; the system interprets them as links relative to the parent directory at run time. Symlinks can also point at directories and might go beyond the boundaries of your own filesystem.
The use of find
in the scripts has a special advantage: It allows all relevant options for this command – the recursion level, for example (-maxdepth
and -mindepth
) – to be transmitted directly to the command line when called. Through specific restrictions, this accelerates the processing significantly under certain circumstances or excludes certain directories and files. As shown in the first line of Listing 3, directories can be excluded; for individual files, you would use the construction in the second line.
Listing 3
Exclusions
$ findup \( -path "*/.svn" \) -prune -o $ findup \( ! -name "*.tex" \)
Whereas the FSlint "find" tools are used for finding files and directories (see the "FSlint find Tools" box), the scripts in the fslint/supprt/
directory all take actions. Some tools have two variants; for example, both fixdup.sh
and fixdup
delete or merge the selected files. Both draw on the formats generated by findup; however, fixdup is a Python script, so it works a little faster than the shell script variant. Four other interesting scripts reside under fstool/
. Given the complex results sets (Figure 1), the best approach could take you on a fairly roundabout route.
FSlint find Tools
The FSlint tools are based on the powerful GNU find
command, which imports filesystems piece by piece and directories one by one. Ideally, you should use these tools in the single-user mode, which you select when booting or enable later as superuser. Failing to enter this mode (e.g., because a server needs to remain accessible) can cause problems. For example, duplicate files could occur in directories already checked by FSlint, or a user could modify files that had already been processed.
Like all technologies based on find
, the tools reach their limits with big filesystems. Execution time increases proportionally with filesystem size. With today's terabyte-scale hard disks, even running scripts overnight cannot guarantee that the tools will find the relevant files. A second shortcoming can be serious as well: Not only does processing time grow with the size of the filesystem, so too does the RAM required. The system could then start to swap, slowing down processing even further.
Working with the GUI
FSlint tools are a real asset for specialists and experienced Linux users. They make it possible to achieve effective, accurate filesystems, albeit at an abstract level. FSlint creator Brady developed a Python interface for less savvy users that roughly provides the same features as the packaged tools, but with a clear structure and with results that display in an easy-to-understand manner. The tool requires superuser privileges if you want to use it in the system control area.
Figure 2 shows the interface shortly after startup. You can select the directories to be edited in the upper section of the window (Search path). Mounted external filesystems can also be included here, which FSlint processes in exactly the same way. Clicking the recurse? selection box adds subdirectories to the process. If this choice selects directories you would prefer to ignore, the Advanced search parameters tab lets you exclude individual directories from processing (e.g., .wine/
in the user's home directory).
In the lower rank of buttons, you can select the desired action. Pressing Find reveals a list of the generated or updated results. To help display all of the relevant information, you can both enlarge the window and reduce the size of the columns.
At the very bottom of the window, a line displays statistical information for the results output and the error messages generated by the find*
scripts or the commands called by them.
Buy this article as PDF
(incl. VAT)