Anatomy of a simple Linux utility
How ls Works
A simple Linux utility program such as ls might look simple, but many steps happen behind the scenes from the time you type "ls" to the time you see the directory listing. In this article, we look at these behind-the-scene details.
What really happens when you enter a program's name in a terminal window? This article is a journey into the workings of a commonly used program – the ubiquitous ls
file listing command. This journey starts with the Bash [1] shell finding the ls program in response to the letters ls typed at the terminal, and it leads to a list of files and directories retrieved from the underlying filesystem [2].
To recreate these results, you'll need some basic understanding of standard debugging techniques using the GNU debugger (gdb
), some familiarity with the SystemTap system information utility [3] [4], and an intermediate-level understanding of C programming code. SystemTap is a scripting language and an instrumentation framework that allows you to examine a Linux kernel dynamically. If you don't have all these skills, following along will still give you some insight into the inner workings of a program on Linux.
This article assumes you are running Linux kernel 3.18 [5] with the debug symbols for Bash installed, that a local copy of the 3.18 kernel source is available, and that SystemTap is set up properly. In the next section, I will describe how to configure your system to follow this article.
Setting Up Your System
To install the Bash debug symbols on Fedora 21, you can use the command:
# debuginfo-install bash
If you do not have the GNU debugger gdb
installed, you can install it using yum install gdb
.
The kernel 3.18 source can be downloaded from The Linux Kernel Archives [6], or, if you prefer to clone the kernel source, switch to the v3.19 branch. SystemTap can be installed on Fedora 21 with:
# yum install systemtap-devel systemtap-client # stap-prep
The last line installs the necessary kernel packages for your kernel.
Methodology
Before getting started, it is worthwhile discussing the methodology I adopted for this investigation. The first step is to understand how the program – an executable script or a binary program – corresponding to a command entered on the command line is found. By placing breakpoints at key locations in Bash, you can halt the execution of Bash and examine key variables to get an idea what the program is processing at that point in the program. The next section makes this step clearer with an example that uses the ls
program.
Once you know how the program to be executed is found, you want to know how the program itself works. System calls are the entry point for a program to the kernel space. The program either invokes one directly or via a library function call.
After determining the key system call or calls, you then look into the kernel source code to find the function implementing that system call. SystemTap scripts can then trace the entry and exit from these functions, illustrating how the control flow occurs to and from kernel space.
I adopt this methodology to understand how the ls
program works, but the same techniques should be relevant for any program.
First Steps: Typing ls
When I type ls
, the location of the binary corresponding to the command is first searched in the locations in the PATH
environment variable. You can chart this action using the GNU debugger (gdb
); you'll either need the debug symbols for Bash installed or a locally built copy of Bash with debug enabled.
To begin, start a gdb
session and pass in the bash
binary:
> gdb bash
Place a breakpoint in the search_for_command()
function and start bash
, passing in ls
as the argument (Listing 1).
Listing 1
Placing Breakpoints in Bash Source
As you can see from line #0 in Listing 1, the argument pathname
refers to the string ls, which now has to be searched in the locations specified by the user's $PATH
variable. My $PATH
is as follows:
> echo $PATH /usr/lib64/qt-3.3/bin:/usr/lib64/ccache:/bin:/usr/bin:\ /usr/local/bin:/usr/local/sbin:/usr/sbin:/home/asaha/.local/bin:\ /home/asaha/bin
I now place a breakpoint in the find_user_command_in_path()
function to see how Bash searches through all the locations present in $PATH
(Listing 2).
Listing 2
Searching for the Program in $PATH
At the end of Listing 2, /usr/bin/ls
has been found (/bin
is a symlink to /usr/bin
on Fedora 21); the function shell_execve()
invokes the execve()
system call to execute the command.
The stat()
system call is invoked to check the existence of the executable corresponding to ls
in the path locations. Listing 3 shows the snippet of the calls to stat()
for the three path locations.
Listing 3
stat() Calls to Path Locations
A closer look at the kernel reveals how the stat()
command works. From here on out, all source references are relative to the top-level kernel source directory.
The stat()
system call is defined as in fs/stat.c
(Listing 4). The vfs_stat()
function in turn is defined as shown in Listing 5. The function vfs_fstatat()
makes use of the inode data structures to check for the file's existence, and, if it exists, it retrieves the file's attributes. To see what is happening in kernel space when the stat()
function call is invoked, I use the SystemTap script in Listing 6 to trace the call to and from the vfs_fstatat()
function (Listing 6).
Listing 5
Definition of vfs_stat()
Listing 6
Tracing Call To and From vfs_fstatat()
Listing 4
Definition of stat() System Call
The vfs_fstatat()
function has the prototype:
int vfs_fstatat\ (int dfd, const char __user *filename, struct kstat *stat, int flag)
The parameter, filename
is what I am interested in here. When you run the SystemTap script, you will see the lines shown in Listing 7.
Listing 7
Output of Script in Listing 6
Now, execute the ls
command in another terminal window. You should see the lines shown in Listing 8 in the SystemTap window.
Listing 8
Output of SystemTap script in Listing 7
At this stage, I have a fairly reasonable idea of what happens in userspace and kernel space so that the location of the program to which ls
corresponds is found. Now, I am ready to see how the binary is executed.
Buy this article as PDF
(incl. VAT)