Tools for reconstructing deleted data
Rescue Mission
One false click can quickly delete important data, or even an entire partition. If a backup tool is missing, only a rescue specialist can help.
Accidentally deleting data without a backup is a nightmare scenario for many users. Even if the application prompts for confirmation before deleting, the user only has to click too fast, and the data is gone. The power of the command line is another threat. An incorrect command parameter could send an entire directory tree into the black hole of oblivion.
Despite the danger of losing data, surveys show that users often create inadequate or no backups [1]. Luckily, the Linux environment includes several tools for reconstructing lost data.
Organizational Matters
Mass storage – whether it be hard drives, solid state drives, or optical discs – always manages and stores data in an organizational structure. Some filesystems also inform the operating system about the size, location, and directory attributes of the file resource using metadata.
The filesystem maintains a table of contents, so the operating system can track down lost files on the storage medium. The system tracks the initial and subsequent clusters, as well as the number of clusters occupied by a file. The internal directory either creates references to the corresponding data or generates a table.
If you delete a file, the data does not automatically disappear. Instead, the filesystem simply drops the corresponding entry from its table of contents. In some cases, it might only delete the first letter of the filename in the directory, while the file lives on anonymously. The original file really only disappears when the system overwrites the occupied sectors of the storage medium with other data.
If you accidentally delete a file, the chances are quite good that the system administrator can still reconstruct the file or at least fragments of it with the appropriate tool, especially on large storage devices. (See the "Preparations" box for information on getting your storage medium ready for data recovery.)
Preparations
If the storage medium no longer has the data, the administrator needs to consider some factors for successful reconstruction.
You will need to use the affected medium in read-only mode to prevent accidental overwriting of the residual data. Specifying a directory on the affected storage medium as a target path for reconstructed files is not a good idea. It is best to launch the computer system from a Live CD or a USB flash drive with a Live system on it.
Sometimes third parties want to edit the data carrier professionally and forensically – for example, in the case of official inquiries or legal conflicts that require watertight data reconstruction documentation. In this case, the administrator cannot work with the original storage medium anyway. Instead, the admin needs to make a complete copy of the medium and verify it using a checksum; this copy can then be used to attempt data recovery.
A copy of a faulty disk can be created by the fairly ancient but reliable command-line program dcfldd
[2], which exists as an extension of the dd
command in Linux. It is also found in many software repositories, such as Debian, Ubuntu, Fedora, Slackware, CentOS, and Mageia.
PhotoRec
PhotoRec [3], which is published under the GNU GPLv2 and is already part of most rescue distributions, is a classic tool for deleted database reconstruction. Also, many Mandriva-based distributions, openSUSE, Fedora, and CentOS offer PhotoRec as a separate package with an optional GUI.
In Linux distributions, PhotoRec is often part of the TestDisk package that reconstructs whole mass storage partitions. It can be installed from the repositories on Debian and Ubuntu. The TestDisk project website [4], in addition to the source code, offers current versions of TestDisk that work with kernel 2.6.18.
PhotoRec – contrary to what the name suggests – does not just recover image files, but also archives, various document formats, and multimedia data. In total, PhotoRec supports hundreds of formats from more than 100 format families [5].
At the same time, PhotoRec operates independently of the underlying filesystem. It doesn't matter whether the data to be reconstructed is on an ext2, ext3, or ext4 partition; a FAT, VFAT, or NTFS drive; or – in the Apple world – on an HFS filesystem.
PhotoRec can also cope with a wide variety of media: In addition to traditional hard drives and USB flash memory devices, source media also include SSDs, SD and compact flash cards, and optical discs.
After installation, the software is immediately available at the command line but requires administrative privileges. The QPhotoRec GUI [3] (Figure 1) for Qt-based work environments, such as KDE Plasma 5, also requires root privileges. After installation, it can be found under the System menu of the KDE desktop.
After running the command, first select the drive from which you want to reconstruct the data. At the command line, PhotoRec offers a selection from a table of detected drives, whereas QPhotoRec shows a selection box in the upper part of the window.
The GUI already anticipates the next step, which the user otherwise needs to type at the command line: It shows the partition table of the activated mass storage device, from which you then select a partition if necessary. It then determines the filesystem of the selected partition; the software only supports two options here. If the source drive is not an ext2, ext3, or ext4 filesystem, it will choose the Other option.
In the command-line version, PhotoRec asks for the destination where you would like to store the reconstructed files and then starts the search when you press C; it informs you of the progress in real time. The software stores the recovered files in numbered subdirectories within the target directory.
While the console displays the options in separate dialogs, the GUI user can change options in just two displays: After specifying the source drive, you specify the desired partition in a tabular list. Define the filesystem and the target path in the same dialog with a selection box and an input cell.
Over time, mass storage can accumulate hundreds of thousands of files of various formats, which PhotoRec can recover. Reconstructing all available file types not only takes a few hours, it uses a huge amount of storage space in the target directory. The numerous subfolders very quickly make this type of complete reconstruction confusing.
The software therefore provides the option to exclude arbitrary file formats from the reconstruction. In the PhotoRec command-line version, a separate dialog lists the available formats one by one; an x to the left selects the file type. You can deactivate unwanted formats by removing the x; deselected formats are then ignored in the subsequent recovery.
QPhotoRec lets you ignore file formats much more conveniently by clicking on the File Formats button at the bottom of the program window. You can then select your favorites from a tabular list. If you only want to rescue a few file types, you first need to deactivate all formats by clicking the Reset button and add the desired file types by checking the boxes (Figure 2).
After setting the options in the dialog box, you can launch the reconstruction by clicking the Search button at the bottom of the window. QPhotoRec then displays the progress with a bar chart and a table. When finished, a click on Quit terminates the program.
Out of the box, QPhotoRec also provides a logging function that stores data carrier settings in an ASCII file located in the logged-on user's home directory. In addition to information about partitions and filesystems, there are also hardware and operating system-specific settings in the file that particularly help in forensic work.
TestDisk
TestDisk [4], which many distributions provide in the same package as PhotoRec, reconstructs partitions and filesystems. The command-line program is included with many Live systems for data reconstruction [6], but it is also found in the repositories of all major Linux distributions. If you run the application with root privileges, a text screen pops up suggesting that you create a logfile. Later, all your work steps can be traced.
In another dialog box, you select the corrupt partitions from a list of detected drives. If a valid partition table is missing on the medium, TestDisk lets you search for one with the Analyse option. This includes GUID Partition Tables (GPTs) that are used with today's very large data carriers instead of conventional master boot record (MBR) partition tables.
Depending on the storage medium's size, the data structure analysis can take a while. If TestDisk detects damaged partitions, it displays them in the terminal, including the filesystem used (Figure 3).
Once the software has identified the partition to be repaired, TestDisk will search for the partition type. For large disks with partitions larger than 2TB, select EFI GPT in the partition table type; in the case of smaller partitions, you can usually keep the Intel default. The software then displays the available tools for repair. The first two options: Analyse and Advanced are used to recover damaged partition tables and work with files and entire partitions.
You can copy individual files or create images here. By using the Boot menu, you can also repair faulty boot sectors or set up the backup for a boot sector. In this case, you control the software by entering individual letters. Dialogs describe what the respective letters do. You need to take into account that the letters are partially case sensitive and trigger different actions depending on whether they are uppercase or lowercase.
The other options: Geometry, Options, MBR Code, and Delete help to adapt faulty disk geometries, repair a storage medium's MBR, or completely empty a partition table.
TestDisk also provides the option to create a logfile immediately after startup in the user's home directory. Because of an unformatted text file, the format is deliberately independent of the application and operating system and logs all your work steps. The steps can be easily traced at any time.
Buy this article as PDF
(incl. VAT)