Hashes, salt, and pepper

Cracked

MD5 has been cracked: In 2005, Wang and Yu [2] presented a process in which as few as two appropriately designed 128-bit blocks in the middle of a message are sufficient to make the status vector the same again after the two blocks [3]. Two files created this way may then differ in these 128 bits, but they have the same MD5 hash.

Although 128 bits doesn't sound like a lot, it is enough to prepare an attack. If attackers know which program their victim wants to use, they can incorporate a piece of code that checks which 128-bit block is in the middle. In theory, the query looks like this:

if (128-bit block == good version) then do_something_good() else do_something_evil();

The only thing left to do now is to calculate the 128-bit block correctly. Tools such as Evilize [4] are of help here. Anyone who takes a look at the hello_erase.c sample program on the project page will recognize the structure above. The selection of the desired function is defined with goodevil.c. If the memory block has the content for the good process, then the "good" program version should start, otherwise the "evil" one will. The file crib.h defines the memory area using a character string that the Evilize program later replaces with two appropriate 128-bit blocks, which lead to the same MD5 hash.

Anyone wanting to try this technique out themselves should extract the Evilize archive and compile the hello_erase.c source file:

gcc hello-erase.c goodevil.o -o hello-erase

The following call then generates two programs, good and evil, with the same MD5 hash:

./evilize hello-erase -g good -e evil

A check using diff and md5sum shows that the two files are different, but that the hash values are the same (Figure 3).

Figure 3: Two different files with the same MD5 hash value are created by cleverly manipulating 128 bits.

Passwords

An attack pattern similar to the one described in the previous section is conceivable for cryptographic signatures provided with PostScript, PDF, Word documents, TIFF images, and Flash movies [5]. At the Chaos Communication Congress in 2008, some participants demonstrated how easy it is to forge certificates and signatures using this technology [6].

If it's that easy to have the same MD5 hash for two different files, are passwords on a Linux computer really still safe? It depends on how easy it is to find a second password that has the same hash as another. Fortunately, that's not particularly likely because most passwords are likely to have fewer than 16 characters (128 bits), which is the minimum number required for an attack.

Moreover, the fairly simple MD5 algorithm takes up quite a bit of computing time. Even if an attacker calculates the MD5 hash in advance from all the possible passwords, it wouldn't be very practical. With eight-digit passwords consisting of uppercase and lowercase letters and numbers (no punctuation and special characters; i.e., passwords 8 bytes long), there are as many as 62^8 (2 x 10^14) possible combinations; such a list would require about 5 petabytes of memory space.

The approach is therefore not very practical and is only worthwhile for cracking multiple passwords. Brute force attacks are more efficient for one-off jobs.

Behind the Rainbow

One way to reduce memory requirements for a pure MD5 list and also to shorten computing time is to use rainbow tables. The rainbow table method uses a reduction function to generate a new password from a hash. Listing 2 shows an example implemented in Perl. The reduce function focuses on the printable ASCII characters ranging between 32 and 127. md5_hex calculates a hash value from the password obtained from this reduction function; the reduction function then generates a new password, and so on. Depending on the implementation, a chain of passwords and their hashes is created, and it is possible to restore the original password at any point. Figure 4 shows the Perl script output with Linux as the starting password.

Listing 2

Perl Script hashing.pl

01 #!/usr/bin/perl -w
02 use Digest::MD5 qw(md5 md5_hex md5_base64);
03 sub reduce
04 { my $what = shift;
05   for ($res='', $i=0; $i<8; $i++)
06     { $res.=chr(hex(substr($what,$i*2,2)) % 96 + 32)
07     }
08   return $res;
09 }
10
11 $password = "Linux";
12
13 for ($count=0; $count < 10; $count++)
14   {
15     print $password." -> ";
16     $hash = md5_hex($password);
17     $password = reduce($hash);
18     print $hash." -> ";
19   }
20 print "\n";
Figure 4: The Perl script in Listing 2 generates a rainbow table with a chain of passwords and their hashes from the password Linux.

This chain may be of any length; but it's not a good idea for it to be too long, because a long chain increases search time and memory requirements. Common implementations reduce memory hunger by storing just the first and the last password from a chain. Otherwise, the memory requirements would be just as big as with a full list of passwords and their hashes.

The process has two catches: First, there is no guarantee you will actually get hold of all the possible passwords, and second, it could lead to collisions and to the same password eventually appearing in several lists. From that point on, two tables would be identical, which massively reduces the efficiency and wastes memory.

In a paper published at the DFRWS conference in Montreal in 2009, Thing and Ying [7] presented a modification of the rainbow tables attack that reduces the risk of collision and improves speed.

Attackers wanting to calculate a password from a hash apply the reduction function to the hash value. They then receive a chain of hashes and passwords to compare with the rainbow tables. If a row ends in a password generated this way, it is clear that the sought-after hash is also in the row. The attacker then constructs the row, again starting with the first password, and compares each hash with the sought-after one. Once the values match, the password is to the left of the hash – and the password is cracked.

It is only worth using rainbow tables if attackers want to crack passwords repeatedly. Brute force attacks are more efficient for one-off attacks. Several ready-made rainbow tables [8] [9] that potential intruders can use are on the Internet. The question is how far they'll get with them. If the sought-after password doesn't appear in the list, a rainbow table attack won't find it either. It works particularly badly with pretty long passwords because most tables assume a password length between six and 10 characters.

Modern Linux systems use a salt, which is appended to the password, to make such attacks more difficult. The salt may be in the /etc/shadow file and is therefore not a secret, but the rainbow tables technique fails because a separate table would be needed for every conceivable salt. Instead of a single table, 256 are needed with a salt that's 8 bits long. Linux uses a 12-bit salt; the number of necessary tables thus increases by a factor of 4,096.

Linux uses a separate salt for each password. An attacker therefore needs to design a separate rainbow table for each password-salt combination. It's worth discussing whether to use an individual or a common salt (called pepper).

Pepper is beneficial if, for example, a PHP script writes password hashes to a database. The pepper can then be in the script and the hashes in the database. An attacker reading them via SQL injection will not know the pepper yet, meaning the effort required to create rainbow tables increases significantly.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Security Lessons: Password Storage

    High-performance graphics cards and proper storage can help keep your passwords secure.

  • Secure Online Passwords

    Securely storing passwords online can be a complex task. With a few tools, websites can offer better security, but users still need to choose their passwords wisely.

  • Security Lessons – Hash Maps

    What do all programs have in common? They store data at some point, usually in arrays – everything from commandline options to the input and output. But how is data actually stored by the program? Kurt explains.

  • Investigating Windows Systems

    A forensics expert explains how to extract interesting details from a confiscated Windows hard disk using standard Linux tools.

  • DM-Crypt

    If you’re serious about keeping secrets, try hard disk encryption with DM-Crypt and LUKS.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News