alt="epsilon"/>.
The third model, which many theoreticians now call the standard model, is about indistinguishability. This enables us to reason about the specific properties of a cipher we care about. For example, most cipher systems don't hide the length of a message, so we can't define a cipher to be secure by just requiring that an adversary not be able to distinguish ciphertexts corresponding to two messages; we have to be more explicit and require that the adversary not be able to distinguish between two messages and
of the same length. This is formalised by having the cryptographer and the cryptanalyst play a game in which the analyst wins by finding an efficient discriminator of something she shouldn't be able to discriminate with more than negligible probability. If the cipher doesn't have perfect security this can be asymptotic, where we typically want the effort to grow faster than any polynomial function of a security parameter
– say the length of the key in bits. A security proof typically consists of a reduction where we show that if there exists a randomised (i.e., probabilistic) algorithm running in time polynomial in
that learns information it shouldn't with non-negligible probability, then this would give an efficient discriminator for an underlying cryptographic primitive that we already trust. Finally, a construction is said to have semantic security if there's no efficient distinguisher for the plaintext regardless of any side information the analyst may have about it; even if she knows all but one bit of it, and even if she can get a decryption of any other ciphertext, she can't learn anything more from the target ciphertext. This skips over quite a few mathematical details, which you can find in a standard text such as Katz and Lindell [1025].
The fourth model is the random oracle model, which is not as general as the standard model but which often leads to more efficient constructions. We call a cryptographic primitive pseudorandom if there's no efficient way of distinguishing it from a random function of that type, and in particular it passes all the statistical and other randomness tests we apply. Of course, the cryptographic primitive will actually be an algorithm, implemented as an array of gates in hardware or a program in software; but the outputs should “look random” in that they're indistinguishable from a suitable random oracle given the type and the number of tests that our model of computation permits.
To visualise a random oracle, we might imagine an elf sitting in a black box with a source of physical randomness and some means of storage (see Figure 5.9) – represented in our picture by the dice and the scroll. The elf will accept inputs of a certain type, then look in the scroll to see whether this query has ever been answered before. If so, it will give the answer it finds there; if not, it will generate an answer at random by throwing the dice, and keep a record for future reference. We'll further assume finite bandwidth – the elf will only answer so many queries every second. What's more, our oracle can operate according to several different rules.
Figure 5.9: The random oracle
5.3.1 Random functions – hash functions
The first type of random oracle is the random function. A random function accepts an input string of any length and outputs a string of fixed length, say bits long. The same input gives the same output, but the set of outputs appears random. So the elf just has a simple list of inputs and outputs, which grows steadily as it works.
Random functions are our model for cryptographic hash functions. These were first used in computer systems for one-way encryption of passwords in the 1960s and have many more uses today. For example, if the police seize your laptop, the standard forensic tools will compute checksums on all the files, to identify which files are already known (such as system files) and which are novel (such as user data). These hash values will change if a file is corrupted and so can assure the court that the police haven't tampered with evidence. And if we want evidence that we possessed a given electronic document by a certain date, we might submit it to an online time-stamping service or have it mined into the Bitcoin blockchain. However, if the document is still secret – for example an invention for which we want to establish a priority date – then we would not upload the whole document, but just the message hash. This is the modern equivalent of Hooke's anagram that we discussed in section 5.2.4 above.
5.3.1.1 Properties
The first main property of a random function is one-wayness. Given knowledge of an input we can easily compute the hash value
, but it is very difficult given
to find
if such an input is not already known. (The elf will only pick outputs for given inputs, not the other way round.) As the output is random, the best an attacker can do to invert a random function is to keep on feeding in more inputs until he gets lucky; with an
-bit output this will take about
guesses on average. A pseudorandom function will have the same properties, or they could be used to distinguish it from a random function, contrary to our definition. So a pseudorandom function will also be a one-way function, provided there are too many possible outputs for the opponent to guess an input that has a desired target output by chance. This means choosing
so that the opponent can't do anything near
computations. If we claim, for example, that SHA256 is a pseudorandom function, then we're saying that there's no practical way to find an input that hashes to a given 256-bit value, unless you knew it already and used it to compute that value.
A second property of pseudorandom functions is that the output will not give any information at all about even part of the input. So we can get a one-way encryption of the value by concatenating it with a secret key
and computing
. If the hash function isn't random enough, though, using it for one-way encryption in this manner is asking for trouble. (I'll discuss an example later in section 22.3.1: the hash function used by many phone companies in the 1990s and early 2000s to authenticate mobile phone users wasn't random enough, which led to attacks.)
A third property of pseudorandom functions with sufficiently long outputs is that it is hard to find collisions, that is, different messages with