Fuzzing and the quest for safer software

Successful Thanks to Mutation

For all their power, model-based fuzzers have one decisive disadvantage: You first have to write a grammar for them. For rare or application-specific input formats, this requires considerable overhead, especially if you first need to understand the format. However, if you have sample input data, you might wish to try the youngest class of test generators: evolution-based fuzz generators.

Evolution-based fuzz generators also aim to produce the most valid input possible in order to get deep into the program. However, they achieve this goal not by using an input description such as a grammar, but by systematically changing (mutating) existing input. For instance, consider a German address: Putzbrunner Straße 71, 81739 Munich. A mutation might consist of swapping two characters: Puztbrunner Straße 71, 81739 Munich. Alternatively, you can replace one character with another (Putzbrunner Straxe 71, 81739 Munich) or delete a random character (Putzbrunner Straße 71, 8173 Munich).

Each mutation thus delivers new input to test. The range of variation is not as high as with purely random input or with model-based testing, but the probability of valid input is still high. In the preceding examples, for instance, deleting just one digit from the postal code leads to an invalid address.

But the real trick with evolution-based fuzzers is that they use mutations to systematically evolve a set of successful inputs. The goal is to cover as much program behavior as possible. For this purpose, fuzzers maintain a set (known as a population) of particularly promising input. They start with a set of known and valid input, which they mutate, increasing the population.

The fuzzer now measures which places in the program the input reaches (coverage). Input that reaches previously uncovered locations is considered particularly promising, is retained, and serves as the basis for further mutations. In contrast, input that does not reach new locations is dropped from the population (selection). In this way, the population continues to evolve with respect to the goal of reaching as many program locations as possible and thus covering as many program behaviors as possible.

In the previous sample mutations, for example, the last mutation (8173) would be classified as promising because it is the first to reach the error-handling code with the four-digit postal code and thus would expose new program behavior. The mutation from Putzbrunner to Puztbrunner, on the other hand, is unlikely to cover new code; it would be removed from the population in the long run.

What about the chance of generating the special postal code 00000? If the code is structured to query the postal code bit by bit like in Listing 2, as each partial condition is met, another piece of code is executed that checks the next digit. After numerous runs, 81739 mutates first to 01739, then to 00739, then to 00039, and finally to 00000, so that again more code is discovered. In fact, evolution-based fuzzers are also able to detect partial success when comparing strings and numbers, so that they slowly work their way towards a goal through more and more mutations.

Listing 2

Gradual Postal Code Test

if (len(zip) == 5 and zip[0] == '0'
  and zip[1] == '0' and zip[2] == '0'
  and zip[3] == '0' and zip[4] == '0'):
    do_something_special()

Evolution-based fuzzers resemble natural evolution – many failed tests, but effective in the long run. Most importantly, they are frugal: Test input is often available or can be gleaned from concrete sequences; the coverage of a test is easy to measure. Moreover, evolution-based fuzzers do not require further knowledge about input formats. Their major drawback is their dependence on good starting input: If the initial input does not contain a certain feature, it is unlikely that the feature will ever be generated.

The classic evolution-based fuzzer is American Fuzzy Lop (AFL [4]), a highly optimized fuzzer that has been able to detect numerous bugs in existing programs. The majority of security-critical Linux programs are now tested around the clock by AFL and its offshoots, such as in Google's OSS Fuzz project.

Although AFL and its offshoots are primarily used in "hostile" scenarios where the program authors are not expected to cooperate, Sapienz [5] is an evolution-based fuzzer that Facebook uses to test its own programs and applications. Anyone who uses the command line on a Linux system or opens the Facebook app on their cell phone will benefit from the work of evolution-based fuzzers.

Detecting Conditions

The fourth class of fuzz generators also aims to reach as many places in the program code as possible. Instead of the trial-and-error principle that evolution-based fuzzers use, it relies on systematic recognition and resolution of path conditions. This refers to conditions under which execution takes a particular path in the program and thus reaches a particular code location.

In these constraint-based fuzzers, specialized constraint solvers are used – programs that automatically seek a solution for a given set of conditions. For a zip code constraint like the one described previously, a constraint solver produces the solution 00000 within hundredths of a second.

However, constraint solvers show their true strength in complex arithmetic calculations. If the condition is something like the one in Listing 3, a constraint solver will also determine suitable values for x and y in a very short time. Grammar-based fuzzers, on the other hand, still rely on chance, and evolution-based fuzzers need to hope that their starting population contains suitable values for x and y. Such strengths are of particular interest when deep conditions of program logic must be satisfied. But constraint solvers can also quickly solve non-trivial properties of input files such as checksums.

Listing 3

Constraint Condition

if x * x + y * y > 3
  and x * x * x + y < 5:
    do_something_interesting()

Constraint solvers are powerful, but not omnipotent. If it is necessary to recover the original from an encrypted text, even a constraint solver very, very often cannot do more than guess the key. Moreover, constraint solvers work slowly. Grammar- or evolution-based fuzzers can sometimes test thousands of input items and – depending on the complexity of the constraints – end up getting to home base faster than a constraint solver by making clever guesses.

KLEE [6], a very popular constraint-based fuzzer, has found hundreds of program errors in C programs. Microsoft uses its own constraint-based fuzzer named SAGE [7], which uses grammars for Office documents to systematically search for bugs in office software. It has saved Microsoft and its users hundreds of millions of dollars of damage. NASA also relies on constraint-based fuzzers to fully test any situation a spacecraft might encounter [8].

Outlook

Modern fuzzers have long since stopped relying on just one technique and have resorted to pragmatic combinations. In the end, the fuzzer that finds the worst errors the fastest is the right choice. For example, it makes sense to use simple random strings to look for errors in input processing first, before a grammar or mutated test inputs target errors at deeper levels.

Grammars, evolution, and constraints can be combined in a variety of ways. Which combination is most successful under which conditions is the subject of heated debate among fuzzing researchers and developers. The user is less interested in which variant of which tool is used – the most important thing is to get started with fuzzing in the first place. Software that has never been exposed to a fuzzer is very likely to have errors that can be triggered by random input.

For C programs, fuzzers such as AFL or KLEE promise quick results; a grammar for simple input is created in next to no time. But for developers and attackers alike, if you don't use fuzzing yourself, you risk others doing so and quickly exploiting the bugs they find. However, once you have set up a fuzzer, you can sit back and relax: From here on, the fuzzer takes over, tirelessly testing with new input until the red light comes on, indicating that a bug has been found. Then the troubleshooting and debugging begins – how to automate that is another story.

Talking Fuzz Testing with Bart Miller

By Jens-Christoph Brendel

We asked fuzzing pioneer Bart Miller for his thoughts on fuzz testing and the role it plays in software development.

[UCC:interviewer]Linux Magazine:[/UCC]  Are some applications more suitable than others for fuzz testing?

It's not a question of suitability. It's a question of identifying the interface to the application and figuring out how best to test against that interface. For example, some programs are very simple and take their input from a file or keyboard ("standard input"). Generating input for such a program is pretty simple. Other programs might use the mouse and keyboard, which means their input comes from the window system (usually X Window in Unix/Linux, Win32/Win64 in Microsoft Windows, or Aqua in macOS). This type of input requires the test program to identify the windows and their sizes and then generate proper window events (mouse movements, key up, and key down events). A phone app has input based on touches and gestures, so you have to be able to randomly generate these events.

[UCC:interviewer]LM:[/UCC]  What if one uses fuzz tests for security purposes? An attacker uses sophisticated techniques to overcome a victim's security measures. How likely is it that a random input string will match such a precisely constructed attack?

It's the other way around. An attacker often uses fuzz testing to see if they can crash a program. This means that they can cause the program to enter a state that the programmer didn't intend. In the intelligence community, they call this "owning the bits" … i.e., owning bits in the program state that you should not be able to own (for example, overflowing a buffer or causing a number to be negative when it should always be non-negative, such as an array subscript).

Once an attack (or analyst) causes a crash, they know that the program is not checking everything that it should check. So, now the task is to craft your input in such a way so as to exploit this reduced checking.

[UCC:interviewer]LM:[/UCC]  Does a fuzz test possibly reveal rather simple errors? If the application crashes due to a gross violation of its expectations, might errors remain undetected that only occur at the end of a chain?

The random nature of fuzz testing means that you are randomly walking through the state space of the program. Pure random input ("unstructured" fuzz testing) is likely to trigger errors earlier in processing and not cause execution so deep into the program's logic. However, it might and sometimes does find more complex logic errors.

You can add structure to your input: for example, properly structured window events or properly structured network packets. These will have random contents but valid structure so will allow testing of deeper structures in the code.

The key to any test regime is "coverage" – in other words, how much of the program's code have you actually tested? Many modern fuzz testers, like AFL, libfuzz, or HongFuzz are "coverage guided" fuzz testers. They track which blocks of code were executed for each input and then try to generate new inputs that will test other parts of the program's code.

[UCC:interviewer]LM:[/UCC]  How confidently can one infer from a discovered bug that a class of bugs might be hiding behind it?

Certainly a human analyst, when they find a specific type of bug, should look elsewhere in the code for the same class of bug. I haven't seen any good way to automate this idea.

[UCC:interviewer]LM:[/UCC]  Should fuzz testing always go hand in hand with debugging of detected errors to uncover the exact mechanism leading to the error?

This is a good question and not just for fuzz testing. When you find erroneous behavior in your program, you have not found the bug until you can identify the specific line(s) of code that caused the behavior and explain why the behavior happened. If you cannot do this, then you are just guessing about the fix. It might appear that the bug went away, but you have likely just disguised it or moved it. No good programmer will be happy until they have found the cause.

The Author

Andreas Zeller is a senior researcher at the CISPA Helmholtz Center for Information Systems and professor of software engineering at Saarland University. He has been honored with numerous awards for his work on program analysis and debugging. His textbooks The Fuzzing Book and The Debugging Book teach the basic techniques of automatic testing and automatic debugging with code examples that can be executed directly in the browser.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Security Lessons

    We explain how file or protocol fuzzing leads to direct improvements in code quality. You'll also learn more about available open source fuzzing tools.

  • Obfuscation Filter

    Mike Schilli loves his privacy. That's why he's created a Go program that adds a geo-obfuscation layer to cellphone photos before they are published on online platforms to prevent inquisitive minds from inferring the location.

  • nUbuntu Security Tools

    Study your network’s defenses with the Ubuntu-based nUbuntu security testing distribution.

  • Portspoof

    The Internet is a tough place to live – especially for publicly accessible computers. A small tool called Portspoof makes port scanning a real challenge for attackers.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News