Delve into ELF Binary Magic

Delve into ELF Binary Magic

Article from Issue 202/2017
Author(s):

Discover what goes on inside executable files, how to reverse-engineer them, and how to make them as small as possible.

Back in the good old days, you could leave your door unlocked at night, music made sense, and writing computer programs was simply a case of putting some CPU instructions in the right order. Today, we have a mammoth range of libraries, toolkits, abstraction layers, and other things that make writing large programs easier – but it's increasingly difficult to understand what the CPU is actually doing. Open up LibreOffice, for example, and type a dot (period) character. What exactly happens here? How many CPU instructions are being executed between your finger hitting the key and that dot appearing on the screen?

Now, we don't want to sound like old codgers who think that everything should be written in assembly language. There's a reason why we have these layers of abstraction, to make software safer, easier to understand, and more portable. But sometimes it's good to go low-level and interact more closely with the CPU and operating system, to better understand what's going on. So, in this article, we'll get down and dirty with CPU instructions, the ELF executable format, and reverse-engineering binary files so you can see what they do.

I Can C Clearly Now

Let's start by writing a very simple C program. Put this into a file called test.c in your home directory:

#include <stdio.h>
int main()
{
        puts("Ciao!");
}

Now compile it in a terminal and then run it, using the following commands:

gcc test.c -o test
./test

As you'd expect, our "test" program simply prints the word "Ciao" on the screen, using the standard C library's puts (put string) routine – no surprises there. But enter ls -l test, and you'll notice something odd: The program is around 8KB in size! Sure, that may sound trivial in today's world of terabyte hard drives, but 8KB is actually huge for a program so simple. (Consider that space exploration classic Elite, back in 1984, was squeezed into 22KB of RAM [1]. That included a whole galaxy to explore, 3D spacecraft, missions, trading, and more. And yet our "Ciao" program is a third of the size.)

Well, this "test" executable includes some information generated by the compiler that we can use for debugging purposes. Let's remove that:

strip test

Now do ls -l test again, and you'll see that it's slightly smaller – down to 6KB. But that still feels overly large. The Commodore 64's operating system and support routines (aka "KERNAL") fit into 8KB, so we must be able to do better.

When we start poking around inside our "test" binary executable file, however, we see that most of the data inside it has nothing to do with the printing bit. Run this command to see what kind of ASCII (text) data is stored inside the file:

strings test

You'll see results like in Figure 1. There are lots of text strings there generated by GCC that are of no importance to us, but if we look closely we can see the "Ciao" string somewhere among all the gobbledygook. Let's see exactly what kind of file test is, using the command file test. Your results will look something like this:

test: ELF 64-bit LSB shared object, x86-64,
version 1 (SYSV), dynamically linked,
interpreter /lib64/ld-linux-x86-64.so.2,
for GNU/Linux 2.6.32, BuildID[sha1]=
83b8ef795a8d78af706be36db142d9d64aca2307, stripped

Wow, that's quite a bit of info. The most important part is "ELF": This means that the file is in "Executable and Linkable Format," which is the standard format on all GNU/Linux systems. But what exactly is so special about this format? What does it do?

Figure 1: After compiling our simple C program, we can see the extra data that GCC puts inside the executable file.

What's in an ELF?

Well, back in the days of early 8-bit and 16-bit computers and .com files on MS-DOS, executable files were simply bundles of CPU instructions and data. The operating system loaded them into a position in RAM and handed over execution to that position. There was no checking of the executable code beforehand, nor was there a distinct separation of code and data. It was a free for all, so to speak.

In modern operating systems, the situation is different. Multiple programs are run simultaneously, they can be loaded into many different places in RAM, and they are made up of multiple sections. In ELF files, like our test program, there's a "header" section that doesn't include executable code but provides the operating system with information about the program (e.g., the data we see in the file command above).

Following this, the ELF file contains other sections: one for executable code, one for read-only data (like the "Ciao" string), and one for data that can be changed. Keeping these separate is an important security measure, as the operating system then knows which things can be changed, and which cannot. You don't want a compromised program able to start modifying its own code on the fly, for instance. A simplified view of ELF file structure is shown in Figure 2.

Figure 2: Here's a basic outline of the sections included in an ELF file (Image: Wikipedia).

So, that's one reason why our "test" file is larger than we'd expect – but there are others. We can "disassemble" the executable file to show the assembly language that corresponds to the CPU instructions like so:

objdump -d test

This produces a lot of output, so you may want to pipe it through less to view it: obdjump -d | less. As you scroll around, look for the .text section, which actually contains the main code of our program, despite its name. If you've never seen assembly language before, you may be surprised at how many instructions are there, just to print a single word. Are they all necessary?

The answer is no. GCC and other parts of the compiler tool chain add a lot of boilerplate and setup code that's useful but not strictly needed. Most of this is added later in the compilation process. To see the raw assembly language that GCC initially generates from our C code, run this command:

gcc -S test.c

This generates a file called test.s – have a look inside it, and you'll see results like in Figure 3. The parts on the left, beginning with the dot characters and no indentation, are labels that point to specific parts of the code. You can see that the .LC0 label points to our "Ciao" string, while the main program code begins at .LFB0.

Figure 3: GCC can show us the assembly language instructions it generates from our C code.

Various assembly language instructions set up the program and memory, but the two that do the work of putting "Ciao" on the screen are these:

leaq    .LC0(%rip), %rdi
call    puts@PLT

We won't go into the specifics of assembly language right now, but in a nutshell: This code executes (calls) the standard C library's puts routine, giving it the location of the text string to print. Once the C library has done its work, it hands control back to our program, which does a bit of cleaning up before it ends with the ret instruction (return – basically, give control back to the operating system).

You Can Go Your Own Way

So, we've poked around inside an executable generated from a C file and done some reverse-engineering on it; now let's look at making the program as small as possible. One thing we want to do is remove our dependency on the GNU C library (glibc). Running this command

ldd test

shows the libraries on which test depends – and one of them is libc.so.6. The output of ldd shows where that library is on your system (it's probably a symlink to another file), so with ls -l followed by the full filename you can see how big it is. On our system, the C library weighs in at 1.8MB, but if you're running a super-sized Gentoo setup with optimizations galore, you may have shrunk it down a bit. In any case, it's a hefty dependency that we'd like to get rid of.

But, how do we print a message on the screen, without using puts, printf, or other common routines from the C library? Well, we can actually get the kernel to do the work for us. The Linux kernel includes a bunch of system calls for doing crucial tasks: opening and closing files, starting processes, and basic input and output. Many standard C library routines act as fancy wrappers around these system calls, adding extra features and checks to reduce bugs, which is why few programs interact directly with the kernel. But we're going to do it!

We'll write a short assembly language program that does the exact same job as test.c created earlier. (Note that we're using 32-bit x86 assembly language code here, so it'll work on 32-bit and 64-bit Intel/AMD PCs, but not on other architectures like the Raspberry Pi.) To convert the assembly code into an executable, we'll use the NASM assembler, so install it from your distro's package manager or – on Ubuntu-based distributions – enter the following:

sudo apt-get install nasm

Then, enter the following into a text editor and save it as test2.asm (Figure 4):

section .data
        msg db "Ciao!", 10
section .text
global _start
_start:
        mov ecx, msg
        mov edx, 6
        mov ebx, 1
        mov eax, 4
        int 0x80
        mov eax, 1
        int 0x80
Figure 4: Reverse-engineer your own binary! Run objdump -d -M mnemonic-intel test2, and you'll see the same instructions as in test2.asm.

Assemble it into a binary executable file (test2) and run it using these commands:

nasm -f elf test2.asm
ld -m elf_i386 -s -o test2 test2.o
./test2

Et voila – "Ciao!" is printed on the screen, just like with the C program we created at the start of this tutorial. But this program is very different, in that it uses a kernel system call to display the text on the screen.

At the start, we set up a "data" section, which contains our "Ciao!" text string. This is put next to a label called msg, which identifies exactly where the string can be found. Note that we end our string with the number 10, which is the ASCII character for a line feed (like pressing enter – see the online ASCII chart [2] for a reference).

So with our string prepared, we can start writing CPU instructions and talk to the Linux kernel. You may recall that the "text" section is the one that contains code, rather confusingly, so we start with this section. We immediately create another label called _start, which points to the beginning of the code – this is used by the operating system to determine exactly where in the file the program begins.

Next up, we need to populate some registers with important data. Registers are a bit like variables, in that they can store many different values, but they are actually memory storage spaces built in to the CPU. Working with them is extremely fast, compared to regular RAM, but there is a very limited set of registers.

Anyway, before we tell the Linux kernel to display the string, we need to provide it with some information. First, we put the location of the string into the ecx register. (NASM instructions go from right to left, so the first mov here means move – actually copy – the value of msg into the ecx register.)

Second, we need to determine how many characters in the string we want the kernel to print. With the exclamation point and trailing line feed (10) character, that's six characters in total, so we put that in the edx register. Then we put 1 and 4 into the ebx and eax registers, respectively, which tell the kernel which specific system call to use (in our case, write) and where to print the text (stdout).

With all the registers set up, the int 0x80 instruction does the magic of "interrupting" our program and handing control over to the Linux kernel. The kernel looks at the eax register and thinks: "Aha, the calling program wants me to run the 'write' system call. Let's see what's in the other registers, to find out where the string is, how long it is, and where I should display it."

Once the kernel has done its work, it hands execution back to our program. Then we put l into the eax register and call the kernel again – this time the l value tells the kernel to safely terminate our program. And that's it!

Now run ls -l test2, and you'll see that the executable is down to around 350 bytes! That's way, way smaller than the C equivalent. We've still created a valid ELF executable file, but there's none of the extra startup and cleanup code added by the C compiler, nor are we using a C library.

And guess what? It's possible to make this executable even smaller! This involves some rather advanced tricks and hacks, but if this tutorial has whetted your appetite for minimalism, check out the fascinating "Creating Really Teensy ELF Executables for Linux" guide by Brian Raiter [3].

ELF and ARM

Although we focused on x86 assembly language in the latter part of this tutorial, ELF files are not constrained by any particular CPU architecture. If you have a Raspberry Pi and go into the /bin directory, for instance, and run file on a few of the executables there, you'll notice that they're also ELF files – but for the ARM architecture.

ARM is arguably a much more elegant and better designed instruction set than x86. The latter has a more limited set of registers (in 32-bit mode), and some instructions can only be performed on certain registers. There's also baggage everywhere, due to backwards compatibility over many decades. So, if you really want to get into assembly language, we recommend going with ARM first.

Sure, it's not architecture used by most desktop and laptop PCs, but it's absolutely everywhere – in smartphones, embedded devices, and of course the Raspberry Pi. We ran a tutorial on ARM assembly previously that you can find on the website [4]. One especially fun ARM device to play around with and write assembly code for is the Nintendo Game Boy Advance. It's a fairly simple machine compared to the Pi, but you can do a lot with it. Parater [5] has a pretty good outline of the essentials, covering common ARM CPU instructions and how to interface with the Game Boy Advance's hardware.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Tutorials – COBOL

    Despite being more than half a century old, COBOL is still in use. Explore this fascinating old-school language and see how it ticks.

  • WebAssembly

    The WebAssembly project makes a portable binary for browsers, with a focus on minimizing size and load time. C and C++ programs are used as source, which makes it possible to compile virtually any application for the web.

  • BCPL

    The venerable BCPL procedural structured programming language is fast to compile, is reliable and efficient, offers a wide range of software libraries and system functions, and is available on several platforms, including the Raspberry Pi.

  • 01000010

    Talk to your Raspberry Pi in its native assembler language.

  • ELF File Format

    Linux and other Unix-based systems use the ELF file format for executables, object code, and shared libraries. Take a peek inside to learn how an ELF file is organized.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News