Better Bash

Shell scripts from hell: Shebang

Nils Magnus

In the beginning was the double pound sign and the exclamation mark – or at least shell scripts always start this way. The inventor, Dennis Ritchie, really didn’t know how much pain this was going to cause users.

When a user opens an interactive program, the shell unfolds more magic than you might imagine, especially if you happen to be talking about scripts. Bash is so omnipresent that people tend to forget it is also code that follows certain rules.

Bash has very little to do with accepting key presses. This job is handled by the terminal driver, which feeds a pseudo-terminal either locally at the console or via a detour through SSH or an X client. The pseudo-terminal passes the input on to the interactive shell, which sends a prompt to the terminal beforehand.

The shell waits until it receives an EOL and then interprets the string up to that point according to Bash syntax. If the string is an internal command – such as a while loop, an if condition, or an assignment that uses = – the shell executable can take the necessary action directly. This also applies to the many shell built-ins, such as ulimit or history, or any shell functions you defined yourself.

If none of these cases applies, Bash assumes the user wants to launch an external program, but it first needs to find the program. To do so, it iterates against the content of the PATH environmental variable, which it first separates by the delimiting colons.

Bash opens each element as a path and searches for a file with the execute flag set and with the name of the command input. Recent versions of Bash use a cache for this to avoid the need to search the physical filesystem for the complete path, but this doesn’t really change the approach just described.

External Binaries

If the command interpreter finds something, it initially leaves the rest of the job to the kernel by enabling the execve() call in a new process. It passes in the full path, the command name input, and – if command-line arguments exist – the arguments to the process. The kernel opens the file and checks the first 2 bytes. If they reveal that the file is a genuine binary (e.g., in ELF format), the kernel launches it directly. In the meantime, the shell waits for the program to terminate and then proceeds to process the next command.

#! All Change!

However, if the first two bytes are #!, the control flow takes a convoluted path back to linux/fs/binfmt_script.c in the kernel; Listing 1 shows some simplified code. This triggers a complicated mechanism:lines 10 through 19 delimit the #! line with an end marker and remove any white spaces that occur before this. Line 20 eliminates any blanks that directly follow the Shebang, because Unix creator Dennis Ritchie explicitly permitted this behavior in an email back in 1980.

Listing 1: Kernel Parses Shebang

01 static int
02 load_script(struct linux_binprm *bprm, struct pt_regs *regs) {
03&nbsp;&nbsp; const char *i_arg, *i_name;
04&nbsp;&nbsp; char *cp;
05&nbsp;&nbsp; struct file *file;
06 
07&nbsp;&nbsp; if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
08&nbsp;&nbsp; return -ENOEXEC;
09 
10&nbsp;&nbsp; if ((cp = strchr(bprm->buf, '\n')) == NULL)
11&nbsp;&nbsp; cp = bprm->buf+BINPRM_BUF_SIZE-1;
12&nbsp;&nbsp; *cp = '\0';
13&nbsp;&nbsp; while (cp > bprm->buf) {
14&nbsp;&nbsp;&nbsp;&nbsp; cp--;
15&nbsp;&nbsp;&nbsp;&nbsp; if ((*cp == ' ') || (*cp == '\t'))
16&nbsp;&nbsp;&nbsp;&nbsp; *cp = '\0';
17&nbsp;&nbsp;&nbsp;&nbsp; else
18&nbsp;&nbsp;&nbsp;&nbsp; break;
19&nbsp;&nbsp; }
20&nbsp;&nbsp; for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++);
21&nbsp;&nbsp; if (*cp == '\0')
22&nbsp;&nbsp; return -ENOEXEC; /* No interpreter name found */
23&nbsp;&nbsp; i_name = cp;
24&nbsp;&nbsp; i_arg = NULL;
25&nbsp;&nbsp; for ( ; *cp && (*cp != ' ') && (*cp != '\t'); cp++);
26&nbsp;&nbsp; while ((*cp == ' ') || (*cp == '\t'))
27&nbsp;&nbsp; *cp++ = '\0';
28&nbsp;&nbsp; if (*cp)
29&nbsp;&nbsp; i_arg = cp;
30&nbsp;&nbsp; [...]
31 }

If trimming in line 21 fails to create a meaningful string, there is no interpreter name and the function reports an error in line 22. If this is not the case, the interpreter name is now available in i_name; this is typically /bin/sh. Lines 24 through 29 then take precisely one argument from the remaining line – if it exists – and store it in i_arg. The kernel ignores the rest of the first line.

Finally, the function calls itself recursively within the kernel using the command interpreter it extracted and reappends the original command name and the arguments. For example, if /tmp/runme contains an initial line of #!/foo/bar --myarg, PATH=/tmp is true, and if the user types runme alpha beta, the kernel actually calls execve("/foo/bar", "bar", "--myarg", "/tmp/runme", "alpha", "beta"). But if the kernel fails to determine an interpreter by following this procedure, Bash takes control of the execution, assumes that the file contains valid shell code, opens the file, and interprets its content.

Weird

You might be wondering why the kernel takes this roundabout approach. Ritchie’s response to this was that it allows shell scripts to be launched by exec() calls, that the process display and accounting show more intuitive names, and – this might surprise you – that you can assign set UID flags to shell scripts, too.

Unfortunately, this idea caused administrators no end of security worries. If a Unix user links to a set UID script that starts with #!/bin/sh, and cunningly names the link -i, the resulting call is /bin/sh -i, which gives the user a very convenient, interactive root shell. This explains why Linux dumps the privileges that result from additional flags in the extra Shebang loop.

At the same time, there is also an attack vector for a race condition in which the file changes between parsing and execution. All told, this mechanism – no matter how well-meant it was on Ritchie’s part – has mainly caused confusion. In a survey, Sven Mascheck investigated this aspect in 48 Unix derivatives, finding very little common ground, but a number of vulnerabilities.

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Canonical Releases Ubuntu 24.04

Gnome , Linux , open source , Ubuntu

After a brief pause because of the XZ vulnerability, Ubuntu 24.04 is now available for install.
Linux Servers Targeted by Akira Ransomware

Enterprise Linux , Linux , ransomware , Security

A group of bad actors who have already extorted $42 million have their sights set on the Linux platform.
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU

Games , Hardware , laptop , Linux

This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
XZ Gets the All-Clear

Arch Linux , Fedora , Linux , open source , Security , Ubuntu

The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
Canonical Collaborates with Qualcomm on New Venture

Artificial Inte... , Linux , open source , Security , Ubuntu

This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
Kodi 21.0 Open-Source Entertainment Hub Released

audio , Multimedia , Music , open source , streaming video , Video

After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
Linux Usage Increases in Two Key Areas

Games , Linux , open source , Steam

If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
Vulnerability Discovered in xz Libraries

Fedora , Linux , malware , Security

An urgent alert for Fedora 40 has been posted and users should pay attention.
Canonical Bumps LTS Support to 12 years

Linux , open source , Operating Systems , Ubuntu

If you're worried that your Ubuntu LTS release won't be supported long enough to last, Canonical has a surprise for you in the form of 12 years of security coverage.
Fedora 40 Beta Released Soon

Fedora , Gnome , open source , Plasma , Wayland

With the official release of Fedora 40 coming in April, it's almost time to download the beta and see what's new.

Better Bash

Shell scripts from hell: Shebang

Related content

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

News

Canonical Releases Ubuntu 24.04

Linux Servers Targeted by Akira Ransomware

TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU

XZ Gets the All-Clear

Canonical Collaborates with Qualcomm on New Venture

Kodi 21.0 Open-Source Entertainment Hub Released

Linux Usage Increases in Two Key Areas

Vulnerability Discovered in xz Libraries

Canonical Bumps LTS Support to 12 years

Fedora 40 Beta Released Soon

Better Bash

Shell scripts from hell: Shebang

Related content

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters