Detect duplicates with fdupes
Double Trouble
The command-line fdupes tool helps you find duplicate folders and directories.
Hard disks have the unpleasant tendency of filling up faster than expected. It is not always immediately obvious why. Keeping things tidy should not be underestimated in this context. Untidy, poorly organized hard disks tend to fill up faster than well-organized ones. Because life is a mixture of order and chaos, most users probably face this problem.
The unexpectedly high utilization level of hard disks is often caused by duplicate files. The typical candidates are photos, music, or videos, which can quickly occupy several gigabytes of space and are often difficult to find. There are several graphical applications on Linux to help you detect and remove duplicates like this, and there are several more for the command line.
GUI or CLI?
Well-known tools with a graphical interface for a cleanup include FSlint and dupeGuru. In this article, I will look at fdupes for the command line [1], first released in 2000. Most distributions include the tool, which weighs in at just over 100KB, in the archives; you can install using your distribution's choice of package manager. Listing 1 shows a guide for Debian, Fedora, and Arch Linux.
Listing 1
Installing fdupes
##### Debian and derivatives: $ sudo apt install fdupes ### Fedora: $ sudo dnf install fdupes ##### Arch Linux and derivatives $ sudo pacman -S fdupes
The current 2.2.1 version from September 2022 has not made its way into all repositories [2]. If you want to compile fdupes from the source code, you can use the tarball from GitHub. After unpacking, just follow the familiar three-step process of ./configure
, make
, and make install
. As of fdupes 2.0, there are two dependencies that you may also need to resolve yourself, depending on the distribution. To do this, follow the instructions in the INSTALL
file from the unpacked archive.
After the install, you can use the tool immediately without any configuration. It identifies duplicate files in the specified directories in several steps. The file name is not important for detection as a duplicate. Instead, two files must first be the same size; given this, fdupes compares their MD5 checksums. Finally, the software performs a byte-by-byte comparison, to make sure that it is definitely the same file.
Fdupes has numerous options that let you control the search and the subsequent deduplication. Initially, you will want to familiarize yourself with the tool by running the fdupes --help
command. This will help you identify the options that suit your use case.
Test Run
For the test, I created an fdupes
directory in the Documents
directory and then created 10 text files whose content read fdupes finds and removes duplicates. Listing 2 shows you how to do this quickly.
Listing 2
Create Multiple Text Files at the Same Time
mkdir /home/"$USER"/Documents/fdupes\ && cd /home/"$USER"/Documents/fdupes\ && for i in {1..10}; do echo\ "fdupes finds and removes duplicates."\ > fdupes${i}.txt ; done
A following ls -l
confirms that the files were created. The easiest way to search for duplicates in the new directory is to use the fdupes ~/Documents/fdupes
command (Figure 1). By separating the paths with spaces, you can specify multiple directories at the same time. To search recursively in directories, you need to use the -r
option, as in fdupes -r ~/documents
(Figure 2). In this case, the tool finds my 10 text files along with some other duplicates. Use the -r
option to specify the path of subdirectories you want to include.
The -S
(--size
) options shows you the size of the hits. You can use -t
or --time
to find out when a file was last modified. -G
or --minsize=SIZE
and -L
or --maxsize=SIZE
lets you further narrow down the selection.
Be Careful When Removing
But finding is only the first part of the task; after all, we want to delete duplicates to clean up the hard disk. This is where the (--delete
) option comes in. When using -d
, always make sure that your path specification is correct – files deleted with fdupes cannot be recovered. The command
fdupes -d ~/documents/fdupes
first lists the files in a numbered list (Figure 3). Note that the number at the beginning of the line will not necessarily match the number in the file name. If you now enter numbers separated by commas, they are tagged with a plus sign and remain intact, while the software removes all of the duplicates with a minus sign.
If you make a mistake, the rg
command cancels your previous entries. Pressing Delete applies your entries. If you want to remove all duplicates except the first one displayed, use the command
fdupes -r -d -N /path
You do not need to press Delete here – the -N
(noprompt) option works without any confirmation.
Another selection option after calling fdupes with the -d
option relies on the sel
parameter. You can select all files with a specific term in the path by typing sel <term>
. To select all files whose path starts with the term, use selb <term>
. Use sele <term>
to select files whose path ends with the term. To select all files whose path corresponds exactly to the term, use the selm <term>
command. After that, you can decide which of the candidates you want to keep. Further options are described by the help
command, which displays the matching fdupes man page sections.
Buy this article as PDF
(incl. VAT)