Troubleshooting for beginners

Painkiller

© Lead Image © Vasyl Nesterov, 123RF.com

© Lead Image © Vasyl Nesterov, 123RF.com

Author(s):

It's not easy for beginners to solve problems in an operating system they haven't used before. We show you how to deal with some common issues.

One thing that might be new to a Windows user is the Linux terminal: a special window in which you can type text commands. This may seem a bit antiquated at first glance – in contrast to GUIs everywhere – but, this is a deceiving picture. The so-called command line is often a powerful and very efficient tool. With it, you are able to exploit otherwise inaccessible Linux strength. This article will show the power of many such useful commands.

When working with these commands, there is one golden rule for Linux troubleshooting: Keep calm. Panicking and clicking blindly is always misleading. Such behavior not only prevents you from studying the cause of problems but also can easily lead to undesirable but irreversible changes. Instead, the right approach is to try to understand the root cause of the problem. If the cause is not obvious, it often helps to systematically rule out one possible cause after another.

Becoming familiar with your own system can be a huge help. I've read a lot of letters to editors where readers talked about the installation of two, three, or even more different Linux distributions in parallel. This makes no sense to me. It is far better to deeply understand the specialties of one distribution as to know half a dozen superficially.

What's Normal?

One thing you can do before an error occurs is baseline your system. You can do this to gather information on your fresh installed and healthy system that you can later compare with data from a system that might be in trouble. This comparison can tell you what is normal and what is probably a sign of problems.

For example, seeing a load average of 8 when you usually measure only 2 on your dual-core CPU is always suspicious. In this case, a tool such as top will show you what is eating up your compute power.

Or, if you typically have 20 or 30 open Internet connections (type lsof -i4 in a terminal window) and then suddenly have 2,000, something malicious may be going on. You can make a simple check of your write performance with dd:

dd bs=1M count=256 if=/dev/zero of=test conv=fdatasync

If this command shows more than 80MBps on a regular day and then only 2MBps sometime later, you know you have to investigate your I/O stack. Listing 1 shows some commands you can use for simple baselining, just get a feeling what is normal for your system.

Listing 1

Simple Baselining Commands

01 # Memory usage (unit: MB)
02 free -m
03
04 # Load Average
05 uptime
06
07 # Available disk space
08 df -k /<mount point>
09
10 # All established TCP connections
11 lsof -i -sTCP:ESTABLISHED
12
13 # CPU, memory and I/O statistics
14 vmstat 2

Who Knows What?

Another general rule for the troubleshooter is to consult the logs. Log files contain a large variety of information about the system and its behavior, and you will often found valuable hints there. I used the Linux distribution Fedora 20 for this article, which comes with a specialty: It uses the so-called systemd journal [1] instead of the classic syslog under /var/log.

This means the admin has to use the special command journalctl to read the logs. If called without parameters, journalctl will show the full contents of the journal, starting with the oldest entry collected. The big advantage of the journal, however, shows up when parameters are passed to the command. With them, it is possible to filter any field of the log line without using grep. For example:

journalctl _PID=1436

shows all log entries for the process with PID 1436.

journalctl --since "2014-06-18 \
  10:00:00" --until "2014-06-18 \
  13:00:00"

lists all entries between 10:00 and 13:00 on June 18. Using

journalctl -k

brings up kernel messages.

As shortcuts for a few types of field/value matches, file paths may be specified. If such a path refers to an executable file, this is equivalent to an _EXE=</path/file>. Similarly, if a path refers to an device node, this is equivalent to _KERNEL_DEVICE=<device file>. Thus,

journalctl /dev/sda

shows all log messages that refer to the disk /dev/sda.

Boot Problems

The first problems you might encounter are related to the boot process, which can have a number of causes. The countermeasures required depend on the point in time during the boot process at which the error occurs. Most modern Linux distributions hide the boot messages behind a graphical splash screen.

The first step in this case is to remove the boot parameter quiet, as well as additional parameters like rhgb (Fedora) or splash (Ubuntu). To do so, choose an entry from the GRUB menu that you see after powering on and then press e. A window opens in which you can delete the above-mentioned parameter entries. Then, press Esc while booting and you should see all the messages during startup.

If you see no messages except "operating system not found" or just a black screen, then the boot manager is not found or is damaged. If this happens you first need to check whether the boot device is recognized by the BIOS. The second thing to check is the order of the boot devices. If you've placed an optical drive before the first HDD or SSD, it should contain no media. Remove any attached USB sticks. If this does not work, the partition table or the filesystem might be damaged. You should try to boot from an emergency disk – like SystemRescueCd [2]  – and repair the filesystem.

If the boot manager has found the kernel, but booting stops with a blinking cursor or a sudden reboot, then the kernel itself or the hardware should be blamed. There can be numerous causes in this case, and making a diagnosis might be difficult. You could try kernel parameter like acpi=off and/or noapic, although modern CPUs need both ACPI (Advanced Configuration and Power Interface) as well as APIC (Advanced Programmable Interrupt Controller) to perform well. Updating the BIOS if possible is a good idea, and you could additionally remove all non-mandatory hardware components on a trial basis to further test.

If booting seems to work but you end up with a black screen, then the problem is likely with the graphic device driver. Try booting with the parameter nomodeset. This action will cause the kernel to use a simple VGA text mode. If this works, the problem may be the monitor detection. You could then test the kernel parameter video=1024x768-24@75, which configures a resolution of 1024x768 pixel, 24-bit color depth, and 72Hz refresh rate. If necessary, you can play with the values until they match your monitor. Often, a good solution is to use another electrical connection to the monitor, for example, VGA, DVI, or HDMI instead of DisplayPort.

Last but not least, you might encounter the famous Unable to mount root fs message along with a kernel panic. This message indicates a problem with initramfs that is responsible for mounting the root device. In this case, you can use dmesg | less – a little program that reads all messages from the kernel ring buffer – and scroll to the storage driver messages.

Is there an error message? If so, it probably contains a useful hint. Otherwise, you can try blkid as root user. This lists all block devices, their Label, and their UUID, which most contemporary Linux distributions use to identify the root filesystem. The root device value from the GRUB2 configuration I mentioned above should appear in the blkid output (Figure 1).

Figure 1: The output from blkid should contain the UUIDs used in the GRUB configuration and in the /etc/fstab file.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Nmon

    Administrators often assume that if all nodes are functioning, the system is fine. However, a common problem is poor or unexpected application performance. In this case, you need a simple tool to help you understand what's happening on the nodes: nmon.

  • Networking

    What good is a laptop or desktop that's not connected to the Internet? Sure, you could do a few things with it, but these days a machine that's not online might as well be a paperweight.

  • Installing Fedora 20

    Installing Linux is easier and faster than you might think. In this article, we'll help you take stock of your hardware, back up your data, and install Fedora 20.

  • Fedora Core 3 Preview

    Version 3 of Red Hat’s free community-supported Fedora Core distribution will be available by the time you read this article.On the surface, very little has changed,but under the hood,Core 3 has innovations designed to improve the user experience.We investigated the final release candidate,and here’s what we found.

  • News

    Updates on Technologies, Trends, and Tools

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News