Search for processes by start time

Ghost Hunter

© Lead Image © batareykin, 123RF.com

© Lead Image © batareykin, 123RF.com

Article from Issue 239/2020
Author(s): , Author(s):

How do you find a process running on a Linux system by start time? The question sounds trivial, but the answer is trickier than it first appears.

As the maintainer of a computing cluster [1], Frank also provides his users with commercial software for calculations based on the fair-use principle. A limited number of license keys are available for this software (e.g., 10 keys for the MATLAB [2] simulation software).

Some of these calculations can take up to a week. When a calculation is finished and the process terminates, the license key is automatically returned to the pool of free keys and can be grabbed by another user. However, if users forget to end their processes, no more keys can be handed out, as these have all been allocated. To prevent this, the admins want to automatically search for processes that are older than 10 days. If they find a process matching this criteria, they can check with the users to clarify what should happen to the process.

The Linux kernel manages processes and makes information relating to them available to the user in the /proc filesystem. At the command line, ps is the reliable interface to process management. Unfortunately, ps has dozens of options, and its output is often not very clear either. This can be remedied with a little shell code or possibly a scripting language. This article compares several potential solutions using Bash, Python and Perl scripts, and the Go programming language.

Our goal is to find a solution that detects processes that are still running and were launched at least 10 days ago and then output the results in a list that is sorted in descending chronological order. The output will also include the user's login name or user ID, the PID, the executed program, and the time when the respective process began. If possible, we want to use only on-board tools. For the solutions based on Bash, you will need the ancient procps 3.3.0 release or newer (earlier versions lack some of the features used here).

Bash Variant 1

The first obvious solution is based on the ps command in combination with awk, date, sed, and sort. ps supports an optional output field lstart, which outputs a process's start time (and date) in a uniform, long format. Additionally, the option -h must be used to completely suppress the headers in the ps output.

While finding and implementing the solution (Listing 1) was quick, parsing ps's output is not trivial, which makes the script relatively unreadable as well as quite long. We encountered the following problems with this solution:

  • You have to set the LC_TIME environment variable to make sure that localized month names do not suddenly appear (env LC_TIME=C).
  • The day of the month has additional spaces before the single-digit numbers. To sort, you have to replace them with a zero using the sed parameter (lines 5 and 6, Listing 1).
  • The start date contains the months in letters instead of numbers; you have to convert them to digits. This can be done with sed as shown in lines 7 through 18.
  • The order of the date components is not suitable for sorting (first month, then day, then time, and finally the year). awk changes the order of these four components.
  • The same applies to filtering from a certain date, since awk can also compare strings with <.
  • The script uses date to generate the appropriate comparison date right at the outset, especially since it can also calculate data with relative specifications. The specification "10 days ago from now" is returned by calling:
date -d 'now -10 days'

date can format the output very flexibly.

  • If you do not specify any parameters when calling the script, it shows all processes older than 10 days.
  • All numeric fields must be explicitly specified in sort; otherwise sort will only consider the first field as numeric.

Listing 1

First Bash Attempt

01 #!/bin/sh
02 if [ -n "$1" ]; then limit=$1; else limit=10; fi
03 date="$(date '+%Y %m %d %T' -d "now -$limit days")"
04 env LC_TIME=C ps -eaxho pid,lstart,user,cmd | \
05   sed -e 's/^ *//;
06           s/  \([1-9]\) / 0\1 /;
07           s/Jan/01/;
08           s/Feb/02/;
09           s/Mar/03/;
10           s/Apr/04/;
11           s/May/05/;
12           s/Jun/06/;
13           s/Jul/07/;
14           s/Aug/08/;
15           s/Sep/09/;
16           s/Oct/10/;
17           s/Nov/11/;
18           s/Dec/12/' | \
19   awk '$6" "$3" "$4" "$5" "$1 < "'"$date"'" {print $6" "$3" "$4" "$5" "$1" "$7" "$8}' | \
20   sort -n -k1 -k2 -k3 -k4 -k5

The output from Listing 1 without the sort parameter with -k looks like Listing 2 on a computer that was last booted on April 3, 2020.

Listing 2

Output from

$ ./list-processes1.sh | head
2020 04 03 22:32:34 1 root init
2020 04 03 22:32:34 10 root [ksoftirqd/0]
2020 04 03 22:32:34 104 root [kintegrityd]
2020 04 03 22:32:34 105 root [kblockd]
2020 04 03 22:32:34 106 root [blkcg_punt_bio]
2020 04 03 22:32:34 11 root [rcu_sched]
2020 04 03 22:32:34 12 root [migration/0]
2020 04 03 22:32:34 13 root [cpuhp/0]
2020 04 03 22:32:34 14 root [cpuhp/1]
2020 04 03 22:32:34 15 root [migration/1]

In Listing 2, you can immediately see that the sequence of the processes cannot be correct. This is because the time stamps in the field lstart are only accurate to the second, not to the micro- or nanosecond. Sorting the output by process numbers at the very end solves this problem for the most part. You have to specify all fields up to and including the process number in the sort call, as shown in Listing 2. The output now looks like Listing 3.

Listing 3

Sorted Output

$ ./list-processes1.sh | head
2020 04 03 22:32:34 1 root init
2020 04 03 22:32:34 2 root [kthreadd]
2020 04 03 22:32:34 3 root [rcu_gp]
2020 04 03 22:32:34 4 root [rcu_par_gp]
2020 04 03 22:32:34 6 root [kworker/0:0H-kblockd]
2020 04 03 22:32:34 9 root [mm_percpu_wq]
2020 04 03 22:32:34 10 root [ksoftirqd/0]
2020 04 03 22:32:34 11 root [rcu_sched]
2020 04 03 22:32:34 12 root [migration/0]
2020 04 03 22:32:34 13 root [cpuhp/0]

Now the script only fails if so many processes are started within a single second that the process numbers are reassigned starting from the beginning. For a long time, the limit for this was 65,535 processes, but now Linux systems can also cope with larger process IDs (PIDs).

Bash Variant 2

An in-depth study of the ps man page reveals other fields that are useful for the task at hand, such as the etimes output field. etimes tells you the number of seconds since the process was started, reducing the complexity considerably because you no longer have to parse month names or re-sort fields. This shrinks the command so it can be written in one line. Listing 4 returns all processes that are more than two days old.

Listing 4

Compact Bash Variant

$ ps -eaxho etimes,pid,user,cmd | sort -nr | awk '$1 > 2*24*60*60 {print}' | head
 227081  106 root  [blkcg_punt_bio]
 227081  105 root  [kblockd]
 227081  104 root  [kintegrityd]
 227081   57 root  [khugepaged]
 227081   56 root  [ksmd]
 227081   55 root  [kcompactd0]
 227081   54 root  [writeback]
 227081   53 root  [oom_reaper]
 227081   52 root  [khungtaskd]
 227081   51 root  [kauditd]

However, this variant also works with an accuracy of one second. Since the code sorts backwards, this is even more noticeable, because the PID 1 does not appear at the beginning of the list. This can be patched up by reading the sort command options such that if the process age is identical, the PID is used as the sort criterion in ascending order. This is ensured by the parameter specification k1nr,2n (Listing 5).

Listing 5

Improved Compact Bash Variant

$ ps -eaxho etimes,pid,user,cmd | sort -k1nr,2n | awk '$1 > 2*24*60*60 {print}' | head
 226597   1 root  init [2]
 226597   2 root  [kthreadd]
 226597   3 root  [rcu_gp]
 226597   4 root  [rcu_par_gp]
 226597   6 root  [kworker/0:0H-kblockd]
 226597   9 root  [mm_percpu_wq]
 226597  10 root  [ksoftirqd/0]
 226597  11 root  [rcu_sched]
 226597  12 root  [migration/0]
 226597  13 root  [cpuhp/0]

The previous call contains the calculation of seconds by awk in detailed form: 2*24*60*60 corresponds to two times 24 hours of 60 minutes each with 60 seconds each. Instead, the value can also be written directly as 172800.

The value 86400 is useful for the number of seconds per day when parameterizing the script. Listing 6 expects a parameter for the number of days. You then multiply the passed numerical value by 86,400.

Listing 6

Number of Days as a Parameter

01 #!/bin/sh
02 if [ -n "$1" ]; then
03   limit=$1;
04 else
05   limit=10;
06 fi
07 ps -eaxho etimes,pid,user,cmd | sort -k1nr,2n | awk '$1 > '"$limit"'*86400 {print}'

If you do not enter a numeric value as a call parameter, the script uses a value of 10 as the default case (10 days).

Bash Variant 3

The fact that split seconds were missing induced us to make a third attempt. Instead of the ps command, entries from the /proc filesystem are used as the basis here.

The required specification is found in field number 22 (starttime) of the /proc/<pid>/stat file. It tells you the number of clock ticks after the Linux kernel started up at the time a process is launched. Specifying the clock ticks is tricky; it is based on the assumption of a clock speed of 100Hz (i.e., 100 ticks per second [3]):

$ getconf CLK_TCK
100

Not all distributions adhere to this: Some use 250 or 1000Hz internally instead. However, they always outwardly report 100Hz. We could not clarify why this is the case. On Debian GNU/Linux, the two values are identical: 100Hz.

Like the previous shell scripts, the one in Listing 7 first reads a parameter again and, if no time span was specified, assumes 10 days as the default. Then awk reads out two fields: 1 and 22 (the PID and number of clock ticks) in two calls. The first one determines the values for awk's own process (whose PID in a shell typically resides in $$); the second one determines the current time in clock ticks since the computer booted.

Listing 7

Bash Script with Clock Ticks

01 #!/bin/sh
02 if [ -n "$1" ]; then
03   limit=$1;
04 else
05   limit=10;
06 fi
07 now=$(awk '{print $22}' /proc/$$/stat)
08 awk '$22 < '$now'-(100*86400*'$limit') {printf "Sec. since boot: %.2f - PID: %i\n", $22/100, $1}' /proc/[1-9]*/stat | sort -n -k4 -k7

Then awk reads the stat files of all running processes; this is done by specifying:

/proc/[1-9]*/stat

The number of clock ticks per second (100) and seconds per day (86,400) are hardwired values here for simplicity's sake.

Since we wanted the output as a floating-point number to look nice, the output is restricted to just two decimal places using printf – the clock ticks are no more accurate than this anyway. sort then numerically sorts the two relevant fields as columns. The first numeric column lists the number of clock ticks, while the second lists the user ID.

The solution comes quite close to our objective, but cannot display the usernames for the processes. In addition, some processes that were definitely started long after the system booted (for example, the Tor Browser) unexpectedly appear as if they were started zero seconds after the system booted. The init process, on the other hand, did not start until 468 clock ticks or 4.68 seconds after startup. In the test case, this was probably because the hard disk encryption password had to be entered first.

Removing awk from the code and specifying the matching fields 22 and 1 directly as parameters of the sort command makes everything a bit easier. Unfortunately, the result is unreadable output with a huge volume of data.

Annoyingly, the time data is still too imprecise to do without a final sort by PID. In theory, the data should be more precise than in the previous versions, because clock ticks provide more precise information than whole seconds. However, the problem of inaccuracy in case of a PID overflow obviously still exists. All in all, the variants with ps seem to be the better approach.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Exploring /proc

    The Linux /proc virtual filesystem offers a window into a running system – look inside for information on processes and kernel activity.

  • Command Line: Processes

    Innumerable processes may be running on your Linux system. We’ll show you how to halt, continue, or kill tasks, and we’ll examine how to send the remnants of crashed programs to the happy hunting grounds.

  • Command Line: sort

    sort helps you organize file lists and program

    output. And if you like, you can even use this small

    but powerful tool to merge and sort multiple files.

  • Command Line: Process Control

    What is happening on your Linux machine? Various shell commands give you details about system processes and help you control them.

  • Command Line – Probing /proc

    The mysterious /proc virtual filesystem is a rich mine of information about everything in your system.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News