Locate and fix hardware faults
SMART Monitoring
Problems with mass storage are particularly critical, because, in the worst case, they result in data loss. Therefore, as early as the 1990s, several leading hard disk manufacturers collaborated with IBM to launch the Self-Monitoring, Analysis and Reporting Technology (SMART) standard [3], which, as a diagnostic tool, proactively warns you about mass storage medium failures. SMART technology works with threshold values and has long been integrated into virtually all mass storage devices. Even modern SSDs use parts of it.
On Linux, the Smartmontools package [4] takes care of testing and evaluating SMART data. Almost all distributions have this collection of command-line tools on board; therefore, you can easily install it with your distribution's package manager.
Smartmontools, a command-line program run in a terminal, is now supported by various graphical front ends, probably because of the extensive set of command-line parameters. Most of these GUIs are tailored to individual desktop environments, and they often support both reading the mass storage medium's various operating parameters and options for running benchmarks or testing the hard disks or SSDs built into the system. The most well known are:
All three front ends display the selected parameters of the mass storage medium in a table and additionally run test routines (Figure 3).
Poor Values
Smartmontools normally allows both a quick test and a more in-depth test (Figure 4), but even if the drives pass all tests, individual values sometimes indicate problems.
For example, with conventional hard drives in desktop systems, it is a good idea to take a look at the load-cycle-count value shown under ID 193 in the table. In some Linux distributions, the hard disk drive heads move to park position too frequently because of overly aggressive default ACPI settings. This not only means excessive mechanical wear and tear but also spoils any attempts to save energy: When repositioning from the park position, the heads need far more energy than they do in normal use.
If the table shows a six-digit value at this point, you will want to back up your data; all major hard drive manufacturers state a mean value between failure figures for mass storage media of between 300,000 and 600,000 load cycle counts, depending on the model.
If you then compare the load cycle counts with the values displayed in the power-on-hours or power-on-time section of the SMART table and determine that the hard drive heads have often parked despite a relatively short period of operation, it is advisable to adjust the ACPI values to stop this undesirable behavior [8].
Smartmontools and its graphical front ends do not monitor the state of the overall system; rather, they focus on mass storage media, which also includes optical drives. Problems can arise if the external hard disks you are monitoring use a USB or IEEE 1394 adapter (Firewire). The signal converters required for these mass storage media often fail to pass through the SMART values; in this case, the drives cannot be analyzed.
The badblocks
command-line tool that is integrated into the e2fsprogs tool collection [9] can then step into the breach, if need be. Most Linux distributions come with the tool installed; the following command launches a non-destructive read-write test:
badblocks -n -s -v /dev/<drive label>
Here, the progress and results appear in the terminal, but be careful: If you use the application incorrectly, you risk losing data. It is strongly recommended that you read the corresponding man page before use or call the tool with the --help
parameter to learn the command syntax.
hdparm
Hdparm [10] is a very powerful tool for modifying hard disk parameters. The software is provided by virtually all Linux distributions, it can be installed – if not already preinstalled – using the respective package manager. Since the program can permanently change the settings of mass storage media, you should also carefully study the documentation before using the tool's various options; this will help you avoid potential data loss or even damage to the hardware from incorrect settings.
For an initial overview of the hard disk parameters, enter
hdparm -I / dev/<drive label>
at the prompt with admin rights; this syntax will usually correspond to block device sda
in current Linux distributions. The software then outputs all the relevant data in a list (Figure 5). While doing so, hdparm can hide information not relevant for the type of drive: For example, a rotational speed does not appear for a flash drive.
Buy this article as PDF
(incl. VAT)