Tools and techniques for performance tuning in Linux

Tuning Toolbox

© shocky, Fotolia

© shocky, Fotolia

Article from Issue 100/2009
Author(s): , Author(s): , Author(s):

Tune up your systems and search out bottlenecks with these handy performance tools.

Over the past several years, the Linux Kernel Performance Project [1] has tracked the performance of Linux and tuned it for throughput and power efficiency on Intel platforms. This experience has given us some insights into the best tools and techniques for tuning Linux systems. In this article, we describe some of our favorite Linux performance utilities and provide a real-world example that shows how the Kernel Performance Project uses these tools to hunt down and solve a real Linux performance issue.

Finding Bottlenecks

The first task in performance tuning is to identify any bottlenecks that might be slowing down system performance.

The most common bottlenecks occur in I/O, memory management, or the scheduler. Linux offers a suite of tools for examining system use and searching out bottlenecks. Some tools reveal the general health of the system, and other tools offer information about specific system components.

The vmstat utility offers a useful summary of overall system performance. Listing 1 shows vmstat data collected every two seconds for a CPU-intensive, multi-threaded Java workload. The first two columns (r, b) describe how many processes in the systems can be run if a CPU is available and how many are blocked. The presence of both blocked processes and idle time in the system is usually a sign of trouble.

Listing 1

vmstat Output

01 #vmstat
02 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
03 r  b  swpd     free   buff cache  si   so   bi    bo    in  cs     us sy id wa
04 7  0  34328    757464 2712 26416   0    0    0     0    12  616773 34 28 37 0

The next four columns under memory show how much memory space is used. Frequently swapping memory in and out of the disk swap space slows the system. The cache column gives the amount of memory used as a page cache. A bigger cache means more files cached in memory. The two columns under io, bi, and bo, indicate the number of blocks received and sent to block devices, respectively, which gives an idea of the level of disk activity. The two columns under system, in, and cs, reveal the number of interrupts and context switches.

If the interrupt rate is too high, you can use an interrupt utility, like sar, to help uncover the cause. The command sar -I XALL 10 1000 will break down the source of the interrupts every 10 seconds for 1000 seconds. A high number of context switches relative to the number of processes is undesirable because of flushing of cached data.

The next four columns in Listing 1, us, sy, id, and wa, indicate the percentage of time the CPU(s) has spent in userspace applications, in the kernel, being idle, or waiting for I/O, respectively. This output shows whether the CPUs are doing useful work or whether they are just idling or being blocked. A high percentage of time spent in the OS could indicate a non-optimal system call. Idle time for a fully loaded system could point to lock contentions.

Disk Performance

Hdparm is a good tool for determining whether the disks are healthy and configured:

# hdparm -tT /dev/sda
Timing buffered disk reads: 184 MB in 3.02 seconds = 60.88 MB/sec
Timing cached reads: 11724 MB in 2.00 seconds = 5870.80 MB/sec

The preceding command displays the speed of reading through the buffer cache to the disk, with and without any prior caching of data. The uncached speed should be somewhat close to the raw speed of the disk. If this value is too low, you should check in your BIOS to see whether the disk mode is configured properly. Also, you could check the hard disk parameter setting for an IDE disk

# hdparm -I /dev/hda

or for a SCSI disk:

# sdparm /dev/sda

To study the health of a run-time workload's I/O, use iostat. For example, Listing 2 shows how to use iostat for dumping a workload. If %iowait is high, CPUs are idle and waiting for outstanding disk I/O requests. In that case, try modifying the workloads to use asynchronous I/O or dedicate a thread to file I/O so workload execution doesn't stop.

Listing 2


01 #iostat -x sda 1
02 avg-cpu: %user %nice %system %iowait %steal %idle
03          0.00  0.00  2.16    20.86   0.00   76.98
05 Device: rrqm/s   wrqm/s r/s     w/s  rsec/s     wsec/s avgrq-sz avgqu-sz await svctm %util
06 sda     17184.16 0.00   1222.77 0.00 147271.29  0.00   120.44   3.08     2.52  0.81  99.01

The other parameter to check is the number of queued I/O requests: avgqu-sz. This value should be less than 1 or disk I/O will significantly slow things down. The %util parameter also indicates the percentage of time the disk has requests and is a good indication of how busy the disk is.

CPU Cycles

One important way to identify a performance problem is to determine how the system is spending its CPU cycles. The oprofile utility can help you study the CPU to this end. Oprofile usually is enabled by default. If you compile your own kernel, then you need to make sure that the kernel configs CONFIG_OPROFILE=y and CONFIG_HAVE_OPROFILE=y are turned on.

The easiest way to invoke oprofile is with the oprofile GUI that wraps the command-line options. To do so, use oprofile 0.9.3 or later for an Intel Core 2 processor and install the oprofile-gui package. Now invoke


to bring up the Start profiler screen with Setup and Configuration tabs (Figure 1). First, select the Configuration tab. If you want to profile the kernel, enter the location of the kernel image file (that is, the uncompressed vmlinux file if you compile the kernel from source). Now return to the Setup tab.

In the Events table, select the CPU_CLK_UNHALTED event and the unit mask Unhalted core cycles. Note: Normally, you do not need to sample the system any more often than the setting listed under in the Count field.

A lower count means that fewer events will need to happen before a sample is taken, thus increasing the sampling frequency. Now run the application you want to profile, and start oprofile by clicking on the Start button. When the application has stopped running, click the Stop button.

To view the profile data, invoke:

#opreport -l

The output for this command is shown in Listing 3.

Listing 3 shows the percentage of CPU time spent in each application or kernel, and it also shows the functions that are being executed. This report reveals the code the system is spending the most time in, which should improve performance if you can use this data as a basis for optimization.

Listing 3

Viewing Profile Data with oprofile

01 CPU: Core 2, speed 2400 MHz (estimated)
02 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
03 mask of 0x00 (Unhalted core cycles) count 1200000
04 samples  %        app name                 symbol name
05 295397   63.6911  cc1                      (no symbols)
06 22861     4.9291  vmlinux-2.6.25-rc9       clear_page_c
07 11382     2.4541              memset
08 10959     2.3629  genksyms                 yylex
09 9256      1.9957              _int_malloc
10 6076      1.3101  vmlinux-2.6.25-rc9       page_fault
11 5378      1.1596              memcpy
12 5178      1.1164  vmlinux-2.6.25-rc9       handle_mm_fault
13 3857      0.8316  genksyms                 yyparse
14 3822      0.8241              strlen
15 ... ...

If you have collected call graph information, type the command

#opreport -c

to obtain the output shown in Listing 4.Listing 4 shows that this workload has some very heavy memory allocation activity associated with getting free memory pages and clearing them.

Listing 4

opreport Output

01 CPU: Core 2, speed 2400 MHz (estimated)
02 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 1200000
03 samples  %        image name               app name                 symbol name
04 -------------------------------------------------------------------------------
05 295397   63.6911  cc1                      cc1                      (no symbols)
06   295397   100.000  cc1                      cc1                      (no symbols) [self]
07 -------------------------------------------------------------------------------
08   1         0.0044  vmlinux-2.6.25-rc9       vmlinux-2.6.25-rc9       path_walk
09   2         0.0087  vmlinux-2.6.25-rc9       vmlinux-2.6.25-rc9       __alloc_pages
10   2         0.0087  vmlinux-2.6.25-rc9       vmlinux-2.6.25-rc9       mntput_no_expire
11   22922    99.9782  vmlinux-2.6.25-rc9       vmlinux-2.6.25-rc9       get_page_from_freelist
12 22861     4.9291  vmlinux-2.6.25-rc9       vmlinux-2.6.25-rc9       clear_page_c
13   22861    99.7121  vmlinux-2.6.25-rc9       vmlinux-2.6.25-rc9       clear_page_c [self]
14   36        0.1570  vmlinux-2.6.25-rc9       vmlinux-2.6.25-rc9       apic_timer_interrupt
15   24        0.1047  vmlinux-2.6.25-rc9       vmlinux-2.6.25-rc9       ret_from_intr
16   3         0.0131  vmlinux-2.6.25-rc9       vmlinux-2.6.25-rc9       smp_apic_timer_interrupt
17   2         0.0087  vmlinux-2.6.25-rc9       vmlinux-2.6.25-rc9       mntput_no_expire
18   1         0.0044  vmlinux-2.6.25-rc9       vmlinux-2.6.25-rc9       __link_path_walk
19 -------------------------------------------------------------------------------
20 11382     2.4541                  memset
21   11382    100.000                  memset [self]
22 -------------------------------------------------------------------------------
23 10959     2.3629  genksyms                 genksyms                 yylex
24   10959    100.000  genksyms                 genksyms                 yylex [self]
25 ...  ...

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Kernel News

    Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

  • Kernel 2.6.25: 64 Bit Systems At Risk

    The changelog for kernel includes just a single entry, however, it seems to be so important that the Kernel Stable Team urgently advises users to upgrade the kernel on 64 bit multiple user systems.

  • Timechart: Zoom in on Operating System

    Intel developer Arjan van de Ven is working on a new tool named Timechart that records Linux system performance in detailed graphics.

  • Linux 2.6.25 without Closed Source USB Drivers

    A controversial patch for the imminent kernel 2.6.25 is causing much debate in the developer community: in a similar move to one he made two years ago, the well-known kernel developer Greg Kroah-Hartman has submitted a patch that prevents closed source USB drivers from using the kernel's USB driver API.

  • Torvalds Releases Kernel 2.6.25: GPL Only Restriction Imposed

    Linus Torvalds has released the new 2.6.25 kernel just slightly behind schedule. Besides improvements to the CFS scheduler and a plethora of new drivers, the kernel also introduces a political aspect: it debars non-GPLd USB drivers.

comments powered by Disqus

Direct Download

Read full article as PDF:

030-036_tuning.pdf  (2.10 MB)