Ross Anderson

Security Engineering


Скачать книгу

upper M 2 right-parenthesis"/>. Unless the opponent can find a shortcut attack (which would mean the function wasn't pseudorandom) then the best way of finding a collision is to collect a large set of messages upper M Subscript i and their corresponding hashes h left-parenthesis upper M Subscript i Baseline right-parenthesis, sort the hashes, and look for a match. If the hash function output is an n-bit number, so that there are 2 Superscript n possible hash values, then the number of hashes the enemy will need to compute before he can expect to find a match will be about the square root of this, namely 2 Superscript n slash 2 hashes. This fact is of huge importance in security engineering, so let's look at it more closely.

      The birthday theorem gets its name from the following problem. A maths teacher asks a class of 30 pupils what they think is the probability that two of them have the same birthday. Most pupils intuitively think it's unlikely, and the maths teacher then asks the pupils to state their birthdays one after another. The odds of a match exceed 50% once 23 pupils have been called. As this surprises most people, it's also known as the ‘birthday paradox’.

      The birthday theorem was first used in the 1930’s to count fish, so it's also known as capture-recapture statistics [1668]. Suppose there are upper N fish in a lake and you catch m of them, ring them and throw them back, then when you first catch a fish you've ringed already, m should be ‘about’ the square root of upper N. The intuitive reason why this holds is that once you have StartRoot upper N EndRoot samples, each could potentially match any of the others, so the number of possible matches is about StartRoot upper N EndRoot x StartRoot upper N EndRoot or upper N, which is what you need3.

      This theorem has many applications for the security engineer. For example, if we have a biometric system that can authenticate a person's claim to identity with a probability of only one in a million that two randomly selected subjects will be falsely identified as the same person, this doesn't mean that we can use it as a reliable means of identification in a university with a user population of twenty thousand staff and students. This is because there will be almost two hundred million possible pairs. In fact, you expect to find the first collision – the first pair of people who can be mistaken for each other by the system – once you have somewhat over a thousand people enrolled. It may well, however, be OK to use it to verify a claimed identity (though many other things can go wrong; see the chapter on Biometrics in Part 2 for a discussion).

      There are some applications where collision-search attacks aren't a problem, such as in challenge-response protocols where an attacker has to find the answer to the challenge just issued, and where you can prevent challenges repeating. In identify-friend-or-foe (IFF) systems, for example, common equipment has a response length of 48 to 80 bits. You can't afford much more than that, as it costs radar accuracy.

      But there are other applications in which collisions are unacceptable. When we design digital signature systems, we typically pass the message upper M through a cryptographic hash function first, and then sign the hash h left-parenthesis upper M right-parenthesis, for a number of reasons we'll discuss later. In such an application, if it were possible to find collisions with h left-parenthesis upper M 1 right-parenthesis equals h left-parenthesis upper M 2 right-parenthesis but upper M 1 not-equals upper M 2, then a Mafia owned bookstore's web site might precalculate suitable pairs upper M 1 comma upper M 2, get you to sign an upper M 1 saying something like “I hereby order a copy of Rubber Fetish volume 7 for $32.95” and then present the signature together with an upper M 2 saying something like “I hereby mortgage my house for $75,000 and please send the funds to Mafia Holdings Inc., Bermuda.”

      For this reason, hash functions used with digital signature schemes have n large enough to make them collision-free. Historically, the two most common hash functions have been MD5, which has a 128-bit output and will thus require at most 2 Superscript 64 computations to break, and SHA1 with a 160-bit output and a work factor for the cryptanalyst of at most 2 Superscript 80. However, collision search gives at best an upper bound on the strength of a hash function, and both these particular functions have turned out to be disappointing, with cryptanalytic attacks that I'll describe later in section 5.6.2.

      To sum up: if you need a cryptographic hash function to be collision resistant, then you'd better choose a function with an output of at least 256 bits, such as SHA-2 or SHA-3. However if you only need to be sure that nobody will find a second preimage for an existing, externally given hash, then you can perhaps make do with less.

      5.3.2 Random generators – stream ciphers

      It can be used to protect the confidentiality of our backup data as follows: we go to the keystream generator, enter a key, get a long file of random bits, and exclusive-or it with our plaintext data to get ciphertext, which we then send to our backup service in the cloud. (This is also called an additive stream cipher as exclusive-or is addition modulo 2.) We can think of the elf generating a random tape of the required length each time he is presented with a new key, giving it to us and keeping a copy on his scroll for reference in case he's given the same input key again. If we need to recover the data, we go back to the generator, enter the same key, get the same keystream, and exclusive-or it with our ciphertext to get our plaintext back again. Other people with access to the keystream generator won't be able to generate the same keystream unless they know the key. Note that this would not give us any guarantee of file integrity; as we saw in the discussion of