m.schmidt am 4. December, 2006
At the weekend, i found the time to have a closer look on md5 and the attacks released for this algorithm. One of these attacks, published by Xiaoyun Wang deals with the problem of finding collisions. This attack is about two so called initial vectors having the same md5 hash. Afterwards it is possible to append arbitrary data to these vectors without changing the fact that the md5 hashes remain identical. (this is an inherited problem of block based (hash-)algorithms. If one has two Blocks with the same hash, after appending the same arbitrary data to these Blocks, they still have the same hash.) In clear:
if md5(x)==md5(y) then md5(x+z)=md5(y+z)
The initial vectors differ in Wangs example only by 6 Bit, but appending more of these blocks results in some byte free for adding payload. A closer mathematical view would exceed the frame of this article, therefore I like to refer (beside the initial paper from the beginning) to the text by Peter Selinger, which deals with the problem in a more detailed way. Here, a small example should show what to do with two initial vectors, named vec1 and vec2. Both are having a identical md5 hash, but are different, as proven by the sha1 hash.
ftp:/md5coll# md5sum vec*; sha1sum vec*
da5c61e1edc0f18337e46418e48c1290 vec1
da5c61e1edc0f18337e46418e48c1290 vec2
8f42c29f6ac45423d2a7dd614d666a26e39f29ee vec1
dfce366c23c88044ad57a5eaa7d5420024a7fd14 vec2
Now lets append some data, and observe that the md5 hashes stay identical.
ftp:/md5coll# dd if=/dev/urandom of=foo bs=32460 count=1
1+0 Datensätze ein
1+0 Datensätze aus
32460 Bytes (32 kB) kopiert, 0,044771 Sekunden, 725 kB/s
ftp:/md5coll# cat foo >> vec1
ftp:/md5coll# cat foo >> vec2
ftp:/md5coll# md5sum vec*; sha1sum vec*
64dbc8e1f2cc1855f09f37528181484b vec1
64dbc8e1f2cc1855f09f37528181484b vec2
b73dde6a98b46c53fd32fb709330dad835a9d116 vec1
e60795346ab6041ff1a315e8ca745c854ffe6ae2 vec2
While the attack is not published completely yet (there are only initial vectors, while it should be possible to have these ‘doppelgangers’ at any point in the data) there are some nice things to do with this.
Fore example a binary that record it’s way through p2p file sharing networks like Kkazaa, collecting Information about the users sharing this binary. Kazaas hashes not the whole file, but does this with 32KB blocks. So you can use one Bit for every block for your own purposes. Talking about a binary of 60MB, this makes 1920 bit of payload, without changing the hash of the file (changing the hash would result in multiple appearance of the file in the p2p network, which would surely be noticed) Now its not necessary for the binary to phone home, its enough to download it from time to time via the p2p network an see what it has collected. (For example MAC addresses, email data or host names.)
Security Tools that check the md5’s of a systems files also have no chance to detect a replacement.
Peter Selinger has published a nice tool that’s able to create a pair of executables with identical Hash, but distinct behaviour. While my computer is still computing my own example, I’ll use Peter’s example files here.
mschmidt@ftp:~$ md5sum erase; md5sum hello; sha1sum erase; sha1sum hello
da5c61e1edc0f18337e46418e48c1290 erase
da5c61e1edc0f18337e46418e48c1290 hello
dfce366c23c88044ad57a5eaa7d5420024a7fd14 erase
8f42c29f6ac45423d2a7dd614d666a26e39f29ee hello
mschmidt@ftp:~$ ./erase This program is evil!!!
Erasing hard drive…1Gb…2Gb… just kidding! Nothing was erased.
(press enter to quit)
mschmidt@ftp:~$ ./hello
Hello, world! (press enter to quit)
It scares me a Bit, that
if md5(binA)==md5(binB) then behaviour(binA)==behaviour(binB)
is no longer supposed to be true. So is this the end of md5? Well, i don’t think so, because the usage of the published attack is rather limited. But it should be clear that the end of md5 has started, and moving to an alternative (sha1) should happen as fast as possible.
Who’s interested in more details shoul have a look at this book, especially chapter 3 and 11 are showing what’s possible with md5 collisions.
Update
Security, Adventskalender | Kein Kommentar »
Tags: md5, hack, security, hash, algorithm