Watching activity in the kernel with the bpftrace tool
Huge Selection
There's plenty of choice of probes in the kernel. From vfs_read
(the function that reads bytes from disk and can pass a count to a probe), through do_exe_cve
(for monitoring newly created Unix processes), to trace_pagefault_reg
(which is triggered when a memory page is reloaded), users can inspect the kernel's workings at will and discover in real time what's going on and where the bottlenecks are.
Figure 2 lists the probes that bpftrace prints when called with the -l
switch. BPF distinguishes between kprobe
s, which track important kernel functions by name, and tracepoint
probes, which the kernel maintainers manually maintain at a slightly higher logical level and which are thus more resilient to changes in the kernel. In contrast to userspace-facing kernel APIs, the kernel's internal functions are by no means guaranteed to be stable.
Potential for More
How about a script that outputs all newly created processes on the system in real time, including the command that was used to start them and their parameters? Listing 2 shows a one-liner that activates the sys_enter_execve
tracepoint and prints its argument list argv
in the args
structure.
Listing 2
procs-new.bt
01 #!/usr/bin/bpftrace 02 03 BEGIN 04 { 05 printf("New processes with arguments\n"); 06 } 07 08 tracepoint:syscalls:sys_enter_execve 09 { 10 join(args->argv); 11 }
Here you can see that the range of functions in bpftrace still has potential for more. For example, there is the join()
function, which uses spaces to join and output elements of a command line in args->argv
. It cannot return the result as a string, however, so you could format the output with printf()
. Hopefully, upcoming versions will resolve this issue.
The BEGIN
block from line 3 simply provides entertainment for the user. If you want the script to display a message or initialize a variable right at startup, this happens in the BEGIN
block as shown in Listing 2, based on the Awk programming model.
In the Thick of It
However, things become more complicated if a probe that detects a problem cannot output the desired data because it is located somewhere else. For example, to look at processes that try to open files that do not exist (or to which they have no access), Listing 3 taps into the sys_exit_openat
tracepoint, which the kernel runs through when the open()
system call returns.
Listing 3
opens-failed.bt
01 #!/usr/bin/bpftrace 02 03 tracepoint:syscalls:sys_enter_openat 04 { 05 @filename[tid] = args->filename 06 } 07 08 tracepoint:syscalls:sys_exit_openat 09 / @filename[tid] / 10 { 11 if ( args->ret < 0 ) { 12 printf("%s %s\n", comm, str(@filename[tid])); 13 }; 14 delete(@filename[tid]); 15 }
Using the condition args->ret < 0
, Bpftrace checks whether the return code from the system call was negative, which indicates that the desired file could not be opened. If so, we want the code to output the name of the process in question and the file name at this point. However, the exit
tracepoint does not have access to the file name, which was only present when the kernel previously ran the open()
function, tied to the sys_enter_openat
tracepoint (notice the subtle difference between enter
versus exit
).
The solution in this case is to have bpftrace create a data structure during the open()
call and somehow carry it over to exit
, which then extracts the filename from it and reports the error with the desired context. For this to happen, the script stores all names of opened files in a Map type data structure when entering open()
(i.e., in the sys_enter_openat
tracepoint), under the key of the current kernel thread ID, which is present in the predefined tid
variable. If the file fails to open later on, the sys_exit_openat
tracepoint can look up the name of the file in question in the map and notify the user of this and even tell it the command of the process in comm
that experienced the error.
The filter set in line 9 of Listing 3 is / @filename[tid] /
, and it ensures that the probe executes the following code if the kernel thread has previously set a file name in the map. If the call came from elsewhere than the sys_enter_openat
tracepoint defined above, the map entry won't exist, and the filter lets bpftrace ignore the event.
After reporting the incident, the code proceeds to line 14, which calls delete
to remove the map entry. If it forgot to do that, the map would grow indefinitely and eventually consume too much memory if the bpftrace script were to run for a longer period of time.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Canonical Bumps LTS Support to 12 years
If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
-
Fedora 40 Beta Released Soon
With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.
-
New Pentesting Distribution to Compete with Kali Linux
SnoopGod is now available for your testing needs
-
Juno Computers Launches Another Linux Laptop
If you're looking for a powerhouse laptop that runs Ubuntu, the Juno Computers Neptune 17 v6 should be on your radar.
-
ZorinOS 17.1 Released, Includes Improved Windows App Support
If you need or desire to run Windows applications on Linux, there's one distribution intent on making that easier for you and its new release further improves that feature.
-
Linux Market Share Surpasses 4% for the First Time
Look out Windows and macOS, Linux is on the rise and has even topped ChromeOS to become the fourth most widely used OS around the globe.
-
KDE’s Plasma 6 Officially Available
KDE’s Plasma 6.0 "Megarelease" has happened, and it's brimming with new features, polish, and performance.
-
Latest Version of Tails Unleashed
Tails 6.0 is based on Debian 12 and includes GNOME 43.
-
KDE Announces New Slimbook V with Plenty of Power and KDE’s Plasma 6
If you're a fan of KDE Plasma, you'll be thrilled to hear they've announced a new Slimbook with an AMD CPU and the latest version of KDE Plasma desktop.
-
Monthly Sponsorship Includes Early Access to elementary OS 8
If you want to get a glimpse of what's in the pipeline for elementary OS 8, just set up a monthly sponsorship to help fund its continued existence.