chp 4.pdf

PART 3: CRYPTOGRAPHIC DATA INTEGRITY ALGORITHMS CHAPTER CRYPTOGRAPHIC HASH FUNCTIONS 11.1 Applications of Cryptographic Hash Functions Message Authentication Digital Signatures Other Applications 11.2 Two Simple Hash Functions 11.3 Requirements and Security Security Requirements for Cryptographic Hash Functions Brute-Force Attacks Cryptanalysis 11.4 Hash Functions Based on Cipher Block Chaining 11.5 Secure Hash Algorithm (SHA) SHA-512 Logic SHA-512 Round Function Example 11.6 SHA-3 11.7 Recommended Reading and Web Sites 11.8 Key Terms, Review Questions, and Problems Appendix 11A Mathematical Basis of the Birthday Attack 327 328 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS Each of the messages, like each one he had ever read of Stern’s commands, began with a number and ended with a number or row of numbers. No efforts on the part of Mungo or any of his experts had been able to break Stern’s code, nor was there any clue as to what the preliminary number and those ultimate numbers signified. —Talking to Strange Men, Ruth Rendell The Douglas Squirrel has a distinctive eating habit. It usually eats pine cones from the bottom end up. Partially eaten cones can indicate the presence of these squirrels if they have been attacked from the bottom first. If, instead, the cone has been eaten from the top end down, it is more likely to have been a crossbill finch that has been doing the dining. —Squirrels: A Wildlife Handbook, Kim Long KEY POINTS ◆ A hash function maps a variable-length message into a fixed-length hash value, or message digest. ◆ Virtually all cryptographic hash functions involve the iterative use of a compression function. ◆ The compression function used in secure hash algorithms falls into one of two categories: a function specifically designed for the hash function or an algorithm based on a symmetric block cipher. SHA and Whirlpool are examples of these two approaches, respectively. A hash function H accepts a variable-length block of data M as input and produces a fixed-size hash value h = H(M). A “good” hash function has the property that the results of applying the function to a large set of inputs will produce outputs that are evenly distributed and apparently random. In general terms, the principal object of a hash function is data integrity. A change to any bit or bits in M results, with high probability, in a change to the hash code. The kind of hash function needed for security applications is referred to as a cryptographic hash function. A cryptographic hash function is an algorithm for which it is computationally infeasible (because no attack is significantly more efficient than brute force) to find either (a) a data object that maps to a pre-specified hash result (the one-way property) or (b) two data objects that map to the same hash result (the collision-free property). Because of these characteristics, hash functions are often used to determine whether or not data has changed. Figure 11.1 depicts the general operation of a cryptographic hash function. Typically, the input is padded out to an integer multiple of some fixed length (e.g., 1024 bits), and the padding includes the value of the length of the original message in bits.The length field is a security measure to increase the difficulty for an attacker to produce an alternative message with the same hash value. 11.1 / APPLICATIONS OF CRYPTOGRAPHIC HASH FUNCTIONS 329 L bits Message or data block M (variable length) L H Hash value h (fixed length) Figure 11.1 Black Diagram of Cryptographic Hash Function; h = H(M) This chapter begins with a discussion of the wide variety of applications for cryptographic hash functions. Next, we look at the security requirements for such functions. Then we look at the use of cipher block chaining to implement a crypto- graphic hash function. The remainder of the chapter is devoted to the most impor- tant and widely used family of cryptographic hash functions, the Secure Hash Algorithm (SHA) family. Appendix N describes Whirlpool, another popular cryptographic hash function. 11.1 APPLICATIONS OF CRYPTOGRAPHIC HASH FUNCTIONS Perhaps the most versatile cryptographic algorithm is the cryptographic hash function. It is used in a wide variety of security applications and Internet protocols. To better understand some of the requirements and security implications for cryptographic hash functions, it is useful to look at the range of applications in which it is employed. Message Authentication Message authentication is a mechanism or service used to verify the integrity of a message. Message authentication assures that data received are exactly as sent (i.e., contain no modification, insertion, deletion, or replay). In many cases, there is a requirement that the authentication mechanism assures that purported identity of the sender is valid. When a hash function is used to provide message authenti- cation, the hash function value is often referred to as a message digest. Figure 11.2 illustrates a variety of ways in which a hash code can be used to provide message authentication, as follows. 330 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS Source A Destination B H M || E D M Compare K K E(K, [M || H(M)]) H H(M) (a) H M || M K K Compare H E D (b) E(K, H(M)) M || M || H S Compare S || H H(M || S) (c) M || E D M || H S Compare K K E(K, [M || H(M || S)]) S || H (d) H(M || S) Figure 11.2 Simplified Examples of the Use of a Hash Function for Message Authentication a. The message plus concatenated hash code is encrypted using symmetric encryption. Because only A and B share the secret key, the message must have come from A and has not been altered. The hash code provides the structure or redundancy required to achieve authentication. Because encryption is applied to the entire message plus hash code, confidentiality is also provided. b. Only the hash code is encrypted, using symmetric encryption. This reduces the processing burden for those applications that do not require confidentiality. c. It is possible to use a hash function but no encryption for message authentication. The technique assumes that the two communicating parties share a common secret value S.A computes the hash value over the concatenation of M and S and 11.1 / APPLICATIONS OF CRYPTOGRAPHIC HASH FUNCTIONS 331 appends the resulting hash value to M. Because B possesses S, it can recompute the hash value to verify. Because the secret value itself is not sent, an opponent cannot modify an intercepted message and cannot generate a false message. d. Confidentiality can be added to the approach of method (c) by encrypting the entire message plus the hash code. When confidentiality is not required, method (b) has an advantage over methods (a) and (d), which encrypts the entire message, in that less computation is required. Nevertheless, there has been growing interest in techniques that avoid encryption (Figure 11.2c). Several reasons for this interest are pointed out in [TSUD92]. Encryption software is relatively slow. Even though the amount of data to be encrypted per message is small, there may be a steady stream of messages into and out of a system. Encryption hardware costs are not negligible. Low-cost chip implementations of DES are available, but the cost adds up if all nodes in a network must have this capability. Encryption hardware is optimized toward large data sizes. For small blocks of data, a high proportion of the time is spent in initialization/invocation overhead. Encryption algorithms may be covered by patents, and there is a cost associated with licensing their use. More commonly, message authentication is achieved using a message authentica- tion code (MAC), also known as a keyed hash function. Typically, MACs are used between two parties that share a secret key to authenticate information exchanged between those parties. A MAC function takes as input a secret key and a data block and produces a hash value, referred to as the MAC. This can then be transmitted with or stored with the protected message. If the integrity of the message needs to be checked, the MAC function can be applied to the message and the result compared with the stored MAC value. An attacker who alters the message will be unable to alter the MAC value without knowledge of the secret key. Note that the verifying party also knows who the sending party is because no one else knows the secret key. Note that the combination of hashing and encryption results in an overall function that is, in fact, a MAC (Figure 11.2b). That is, E(K, H(M)) is a function of a variable-length message M and a secret key K, and it produces a fixed-size output that is secure against an opponent who does not know the secret key. In practice, specific MAC algorithms are designed that are generally more efficient than an encryption algorithm. We discuss MACs in Chapter 12. Digital Signatures Another important application, which is similar to the message authentication application, is the digital signature. The operation of the digital signature is similar to that of the MAC. In the case of the digital signature, the hash value of a message is encrypted with a user’s private key. Anyone who knows the user’s public key can verify the integrity of the message that is associated with the digital signature. In this 332 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS case, an attacker who wishes to alter the message would need to know the user’s private key. As we shall see in Chapter 14, the implications of digital signatures go beyond just message authentication. Figure 11.3 illustrates, in a simplified fashion, how a hash code is used to provide a digital signature. a. The hash code is encrypted, using public-key encryption with the sender’s pri- vate key. As with Figure 11.2b, this provides authentication. It also provides a digital signature, because only the sender could have produced the encrypted hash code. In fact, this is the essence of the digital signature technique. b. If confidentiality as well as a digital signature is desired, then the message plus the private-key-encrypted hash code can be encrypted using a symmetric secret key. This is a common technique. Other Applications Hash functions are commonly used to create a one-way password file. Chapter 20 explains a scheme in which a hash of a password is stored by an operating system rather than the password itself. Thus, the actual password is not retrievable by a hacker who gains access to the password file. In simple terms, when a user enters a password, the hash of that password is compared to the stored hash value for verification. This approach to password protection is used by most operating systems. Hash functions can be used for intrusion detection and virus detection. Store H(F) for each file on a system and secure the hash values (e.g., on a CD-R that is Source A Destination B H M || M PRa PUa Compare H E D (a) E(PRa, H(M)) H M || E D M PRa PUa Compare K K E(K, [M || E(PRa, H(M))]) H E D E(PRa, H(M)) (b) Figure 11.3 Simplified Examples of Digital Signatures 11.2 / TWO SIMPLE HASH FUNCTIONS 333 kept secure). One can later determine if a file has been modified by recomputing H(F). An intruder would need to change F without changing H(F). A cryptographic hash function can be used to construct a pseudorandom func- tion (PRF) or a pseudorandom number generator (PRNG). A common application for a hash-based PRF is for the generation of symmetric keys. We discuss this appli- cation in Chapter 12. 11.2 TWO SIMPLE HASH FUNCTIONS To get some feel for the security considerations involved in cryptographic hash functions, we present two simple, insecure hash functions in this section. All hash functions operate using the following general principles. The input (message, file, etc.) is viewed as a sequence of n-bit blocks. The input is processed one block at a time in an iterative fashion to produce an n-bit hash function. One of the simplest hash functions is the bit-by-bit exclusive-OR (XOR) of every block. This can be expressed as Ci = bi1 䊝 bi2 䊝 Á 䊝 bim where Ci = ith bit of the hash code, 1 … i … n m = number of n-bit blocks in the input bij = ith bit in jth block 䊝 = XOR operation This operation produces a simple parity for each bit position and is known as a longitudinal redundancy check. It is reasonably effective for random data as a data integrity check. Each n-bit hash value is equally likely. Thus, the probability that a data error will result in an unchanged hash value is 2-n. With more predictably formatted data, the function is less effective. For example, in most normal text files, the high-order bit of each octet is always zero. So if a 128-bit hash value is used, instead of an effectiveness of 2-128, the hash function on this type of data has an effectiveness of 2-112. A simple way to improve matters is to perform a one-bit circular shift, or rota- tion, on the hash value after each block is processed. The procedure can be summa- rized as follows. 1. Initially set the n-bit hash value to zero. 2. Process each successive n-bit block of data as follows: a. Rotate the current hash value to the left by one bit. b. XOR the block into the hash value. This has the effect of “randomizing” the input more completely and overcoming any regularities that appear in the input. Figure 11.4 illustrates these two types of hash functions for 16-bit hash values. Although the second procedure provides a good measure of data integrity, it is virtually useless for data security when an encrypted hash code is used with a plaintext 334 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS 16 bits XOR with 1-bit rotation to the right XOR of every 16-bit block Figure 11.4 Two Simple Hash Functions message, as in Figures 11.2b and 11.3a. Given a message, it is an easy matter to produce a new message that yields that hash code: Simply prepare the desired alternate message and then append an n-bit block that forces the new message plus block to yield the desired hash code. Although a simple XOR or rotated XOR (RXOR) is insufficient if only the hash code is encrypted, you may still feel that such a simple function could be useful when the message together with the hash code is encrypted (Figure 11.2a). But you must be careful. A technique originally proposed by the National Bureau of Standards used the simple XOR applied to 64-bit blocks of the message and then an encryption of the entire message that used the cipher block chaining (CBC) mode. We can define the scheme as follows: Given a message M consisting of a sequence of 64-bit blocks X1, X2, Á , XN, define the hash code h = H(M) as the block-by-block XOR of all blocks and append the hash code as the final block: h = XN+1 = X1 䊝 X2 䊝 Á 䊝 XN 11.3 / REQUIREMENTS AND SECURITY 335 Next, encrypt the entire message plus hash code using CBC mode to produce the encrypted message Y1, Y2, Á , YN+1. [JUEN85] points out several ways in which the ciphertext of this message can be manipulated in such a way that it is not detectable by the hash code. For example, by the definition of CBC (Figure 6.4), we have X1 = IV 䊝 D(K, Y1) Xi = Yi - 1 䊝 D(K, Yi) XN+1 = YN 䊝 D(K, YN+1) But XN+1 is the hash code: XN+1 = X1 䊝 X2 䊝 Á 䊝 XN = IV 䊝 D(K, Y1)] 䊝 [Y1 䊝 D(K, Y2)] 䊝 Á 䊝 [YN - 1 䊝 D(K, YN)] Because the terms in the preceding equation can be XORed in any order, it follows that the hash code would not change if the ciphertext blocks were permuted. 11.3 REQUIREMENTS AND SECURITY Before proceeding, we need to define two terms. For a hash value h = H(x), we say that x is the preimage of h. That is, x is a data block whose hash function, using the function H, is h. Because H is a many-to-one mapping, for any given hash value h, there will in general be multiple preimages. A collision occurs if we have x Z y and H(x) = H(y). Because we are using hash functions for data integrity, collisions are clearly undesirable. Let us consider how many preimages are there for a given hash value, which is a measure of the number of potential collisions for a given hash value. Suppose the length of the hash code is n bits, and the function H takes as input messages or data blocks of length b bits with b 7 n. Then, the total number of possible messages is 2b and the total number of possible hash values is 2n. On average, each hash value corresponds to 2b/n preimages. If H tends to uniformly distribute hash values then, in fact, each hash value will have close to 2b/n preimages. If we now allow inputs of arbitrary length, not just a fixed length of some number of bits, then the number of preimages per hash value is arbitrarily large. However, the security risks in the use of a hash function are not as severe as they might appear from this analysis. To understand better the security implications of cryptographic hash functions, we need precisely define their security requirements. Security Requirements for Cryptographic Hash Functions Table 11.1 lists the generally accepted requirements for a cryptographic hash function. The first three properties are requirements for the practical application of a hash function. The fourth property, preimage resistant, is the one-way property: it is easy to generate a code given a message, but virtually impossible to generate a message given a code. This property is important if the authentication technique involves the use of a secret value (Figure 11.2c). The secret value itself is not sent. However, if the 336 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS Table 11.1 Requirements for a Cryptographic Hash Function H Requirement Description Variable input size H can be applied to a block of data of any size. Fixed output size H produces a fixed-length output. Efficiency H(x) is relatively easy to compute for any given x, making both hardware and software implementa- tions practical. Preimage resistant (one-way property) For any given hash value h, it is computationally infeasible to find y such that H(y) = h. Second preimage resistant (weak collision For any given block x, it is computationally resistant) infeasible to find y Z x with H(y) = H(x). Collision resistant (strong collision resistant) It is computationally infeasible to find any pair (x, y) such that H(x) = H(y). Pseudorandomness Output of H meets standard tests for pseudorandomness. hash function is not one way, an attacker can easily discover the secret value: If the attacker can observe or intercept a transmission, the attacker obtains the message M, and the hash code h = H(S 7 M). The attacker then inverts the hash function to obtain S 7 M = H-1(MDM). Because the attacker now has both M and SAB 7 M, it is a trivial matter to recover SAB. The fifth property, second preimage resistant, guarantees that it is impossible to find an alternative message with the same hash value as a given message. This pre- vents forgery when an encrypted hash code is used (Figure 11.2b and Figure 11.3a). If this property were not true, an attacker would be capable of the following sequence: First, observe or intercept a message plus its encrypted hash code; second, generate an unencrypted hash code from the message; third, generate an alternate message with the same hash code. A hash function that satisfies the first five properties in Table 11.1 is referred to as a weak hash function. If the sixth property, collision resistant, is also satisfied, then it is referred to as a strong hash function. A strong hash function protects against an attack in which one party generates a message for another party to sign. For example, suppose Bob writes an IOU message, sends it to Alice, and she signs it. Bob finds two messages with the same hash, one of which requires Alice to pay a small amount and one that requires a large payment. Alice signs the first message, and Bob is then able to claim that the second message is authentic. Figure 11.5 shows the relationships among the three resistant properties. A function that is collision resistant is also second preimage resistant, but the reverse is not necessarily true. A function can be collision resistant but not preimage resistant and vice versa. A function can be collision resistant but not second preimage resistant and vice versa. See [MENE97] for a discussion. Table 11.2 shows the resistant properties required for various hash function applications. 11.3 / REQUIREMENTS AND SECURITY 337 Second preimage resistant Preimage Collision resistant resistant Figure 11.5 Relationship Among Hash Function Properties The final requirement in Table 11.1, pseudorandomness, has not traditionally been listed as a requirement of cryptographic hash functions but is more or less implied. [JOHN05] points out that cryptographic hash functions are commonly used for key derivation and pseudorandom number generation, and that in message integrity applications, the three resistant properties depend on the output of the hash function appearing to be random. Thus, it makes sense to verify that in fact a given hash function produces pseudorandom output. Brute-Force Attacks As with encryption algorithms, there are two categories of attacks on hash functions: brute-force attacks and cryptanalysis. A brute-force attack does not depend on the specific algorithm but depends only on bit length. In the case of a hash function, a brute-force attack depends only on the bit length of the hash value. A cryptanalysis, in contrast, is an attack based on weaknesses in a particular cryptographic algorithm. We look first at brute-force attacks. Table 11.2 Hash Function Resistance Properties Required for Various Data Integrity Applications Preimage Resistant Second Preimage Collision Resistant Resistant Hash + digital signature yes yes yes* Intrusion detection and virus yes detection Hash + symmetric encryption One-way password file yes MAC yes yes yes* * Resistance required if attacker is able to mount a chosen message attack 338 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS PREIMAGE AND SECOND PREIMAGE ATTACKS For a preimage or second preimage attack, an adversary wishes to find a value y such that H(y) is equal to a given hash value h. The brute-force method is to pick values of y at random and try each value until a collision occurs. For an m-bit hash value, the level of effort is proportional to 2m. Specifically, the adversary would have to try, on average, 2m - 1 values of y to find one that generates a given hash value h. This result is derived in Appendix 11A [Equation (11.1)]. C OLLISION R ESISTANT ATTACKS For a collision resistant attack, an adversary wishes to find two messages or data blocks, x and y , that yield the same hash function: H(x) = H(y). This turns out to require considerably less effort than a preimage or second preimage attack. The effort required is explained by a mathematical result referred to as the birthday paradox. In essence, if we choose random variables from a uniform distribution in the range 0 through N - 1, then the probability that a repeated element is encountered exceeds 0.5 after 1N choices have been made. Thus, for an m-bit hash value, if we pick data blocks at random, we can expect to find two data blocks with the same hash value within 22m = 2m/2 attempts. The mathematical derivation of this result is found in Appendix 11A. Yuval proposed the following strategy to exploit the birthday paradox in a collision resistant attack [YUVA79]. 1. The source, A, is prepared to sign a legitimate message x by appending the appropriate m-bit hash code and encrypting that hash code with A’s private key (Figure 11.3a). 2. The opponent generates 2m/2 variations x¿ of x, all of which convey essentially the same meaning, and stores the messages and their hash values. 3. The opponent prepares a fraudulent message y for which A’s signature is desired. 4. The opponent generates minor variations y¿ of y, all of which convey essentially the same meaning. For each y¿ , the opponent computes H(y¿), checks for matches with any of the H(x¿) values, and continues until a match is found. That is, the process continues until a y¿ is generated with a hash value equal to the hash value of one of the x¿ values. 5. The opponent offers the valid variation to A for signature. This signature can then be attached to the fraudulent variation for transmission to the intended recipient. Because the two variations have the same hash code, they will produce the same signature; the opponent is assured of success even though the encryption key is not known. Thus, if a 64-bit hash code is used, the level of effort required is only on the order of 232 [see Appendix 11A, Equation (11.7)]. The generation of many variations that convey the same meaning is not difficult. For example, the opponent could insert a number of “space-space-backspace” charac- ter pairs between words throughout the document. Variations could then be gener- ated by substituting “space-backspace-space” in selected instances. Alternatively, the 11.3 / REQUIREMENTS AND SECURITY 339 Figure 11.6 A Letter in 237 Variation [DAVI89] opponent could simply reword the message but retain the meaning. Figure 11.6 [DAVI89] provides an example. To summarize, for a hash code of length m, the level of effort required, as we have seen, is proportional to the following. Preimage resistant 2m Second preimage resistant 2m Collision resistant 2m/2 340 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS If collision resistance is required (and this is desirable for a general-purpose secure hash code), then the value 2m/2 determines the strength of the hash code against brute-force attacks. Van Oorschot and Wiener [VANO94] presented a design for a $10 million collision search machine for MD5, which has a 128-bit hash length, that could find a collision in 24 days. Thus, a 128-bit code may be viewed as inadequate. The next step up, if a hash code is treated as a sequence of 32 bits, is a 160-bit hash length. With a hash length of 160 bits, the same search machine would require over four thousand years to find a collision. With today’s technology, the time would be much shorter, so that 160 bits now appears suspect. Cryptanalysis As with encryption algorithms, cryptanalytic attacks on hash functions seek to exploit some property of the algorithm to perform some attack other than an exhaustive search. The way to measure the resistance of a hash algorithm to crypt- analysis is to compare its strength to the effort required for a brute-force attack. That is, an ideal hash algorithm will require a cryptanalytic effort greater than or equal to the brute-force effort. In recent years, there has been considerable effort, and some successes, in devel- oping cryptanalytic attacks on hash functions. To understand these, we need to look at the overall structure of a typical secure hash function, indicated in Figure 11.7. This structure, referred to as an iterated hash function, was proposed by Merkle [MERK79, MERK89] and is the structure of most hash functions in use today, including SHA, which is discussed later in this chapter. The hash function takes an input message and partitions it into L fixed-sized blocks of b bits each. If necessary, the final block is padded to b bits. The final block also includes the value of the total length of the input to the hash function. The inclusion of the length makes the job of the opponent more difficult. Either the opponent must find two messages of equal length that hash to the same value or two messages of differing lengths that, together with their length values, hash to the same value. The hash algorithm involves repeated use of a compression function, f, that takes two inputs (an n-bit input from the previous step, called the chaining variable, and a b-bit block) and produces an n-bit output. At the start of hashing, the chaining variable has an initial value that is specified as part of the algorithm. The final value Y0 Y1 YL1 b b b n IV n f n f n n f CVL CV0 CV1 CVL1 IV Initial value L Number of input blocks CVi Chaining variable n Length of hash code Yi ith input block b Length of input block f Compression algorithm Figure 11.7 General Structure of Secure Hash Code 11.4 / HASH FUNCTIONS BASED ON CIPHER BLOCK CHAINING 341 of the chaining variable is the hash value. Often, b 7 n; hence the term compression. The hash function can be summarized as CV0 = IV = initial n-bit value CVi = f(CVi - 1, Yi - 1) 1 … i … L H(M) = CVL where the input to the hash function is a message M consisting of the blocks Y0, Y1, Á , YL - 1. The motivation for this iterative structure stems from the observation by Merkle [MERK89] and Damgard [DAMG89] that if the compression function is collision resis- tant, then so is the resultant iterated hash function.1 Therefore, the structure can be used to produce a secure hash function to operate on a message of any length. The problem of designing a secure hash function reduces to that of designing a collision-resistant compression function that operates on inputs of some fixed size. Cryptanalysis of hash functions focuses on the internal structure of f and is based on attempts to find efficient techniques for producing collisions for a single execution of f. Once that is done, the attack must take into account the fixed value of IV. The attack on f depends on exploiting its internal structure. Typically, as with symmetric block ciphers, f consists of a series of rounds of processing, so that the attack involves analysis of the pattern of bit changes from round to round. Keep in mind that for any hash function there must exist collisions, because we are mapping a message of length at least equal to twice the block size b (because we must append a length field) into a hash code of length n, where b Ú n. What is required is that it is computationally infeasible to find collisions. The attacks that have been mounted on hash functions are rather complex and beyond our scope here. For the interested reader, [DOBB96] and [BELL97] are recommended. 11.4 HASH FUNCTIONS BASED ON CIPHER BLOCK CHAINING A number of proposals have been made for hash functions based on using a cipher block chaining technique, but without using the secret key. One of the first such proposals was that of Rabin [RABI78]. Divide a message M into fixed-size blocks M1, M2, Á , MN and use a symmetric encryption system such as DES to compute the hash code G as H0 = initial value Hi = E(Mi, Hi - 1) G = HN This is similar to the CBC technique, but in this case, there is no secret key.As with any hash code, this scheme is subject to the birthday attack, and if the encryption algorithm is DES and only a 64-bit hash code is produced, then the system is vulnerable. 1 The converse is not necessarily true. 342 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS Furthermore, another version of the birthday attack can be used even if the opponent has access to only one message and its valid signature and cannot obtain multiple signings. Here is the scenario: We assume that the opponent intercepts a message with a signature in the form of an encrypted hash code and that the unencrypted hash code is m bits long. 1. Use the algorithm defined at the beginning of this subsection to calculate the unencrypted hash code G. 2. Construct any desired message in the form Q1, Q2, Á , QN - 2. 3. Compute Hi = E(Qi, Hi - 1) for 1 … i … (N - 2). 4. Generate 2m/2 random blocks; for each block X, compute E(X, HN - 2). Generate an additional 2m/2 random blocks; for each block Y, compute D(Y, G), where D is the decryption function corresponding to E. 5. Based on the birthday paradox, with high probability there will be an X and Y such that E(X, HN - 2) = D(Y, G). 6. Form the message Q1, Q2, Á , QN - 2, X, Y. This message has the hash code G and therefore can be used with the intercepted encrypted signature. This form of attack is known as a meet-in-the-middle-attack. A number of researchers have proposed refinements intended to strengthen the basic block chaining approach. For example, Davies and Price [DAVI89] describe the variation: Hi = E(Mi, Hi - 1) 䊝 Hi - 1 Another variation, proposed in [MEYE88], is Hi = E(Hi - 1, Mi) 䊝 Mi However, both of these schemes have been shown to be vulnerable to a variety of attacks [MIYA90]. More generally, it can be shown that some form of birthday attack will succeed against any hash scheme involving the use of cipher block chain- ing without a secret key, provided that either the resulting hash code is small enough (e.g., 64 bits or less) or that a larger hash code can be decomposed into independent subcodes [JUEN87]. Thus, attention has been directed at finding other approaches to hashing. Many of these have also been shown to have weaknesses [MITC92]. 11.5 SECURE HASH ALGORITHM (SHA) In recent years, the most widely used hash function has been the Secure Hash Algorithm (SHA). Indeed, because virtually every other widely used hash func- tion had been found to have substantial cryptanalytic weaknesses, SHA was more or less the last remaining standardized hash algorithm by 2005. SHA was devel- oped by the National Institute of Standards and Technology (NIST) and published 11.5 / SECURE HASH ALGORITHM (SHA) 343 as a federal information processing standard (FIPS 180) in 1993. When weak- nesses were discovered in SHA, now known as SHA-0, a revised version was issued as FIPS 180-1 in 1995 and is referred to as SHA-1. The actual standards document is entitled “Secure Hash Standard.” SHA is based on the hash function MD4, and its design closely models MD4. SHA-1 is also specified in RFC 3174, which essentially duplicates the material in FIPS 180-1 but adds a C code implementation. SHA-1 produces a hash value of 160 bits. In 2002, NIST produced a revised version of the standard, FIPS 180-2, that defined three new versions of SHA, with hash value lengths of 256, 384, and 512 bits, known as SHA-256, SHA-384, and SHA-512, respectively. Collectively, these hash algorithms are known as SHA-2. These new versions have the same underlying structure and use the same types of modular arithmetic and logical binary operations as SHA-1. A revised document was issued as FIP PUB 180-3 in 2008, which added a 224-bit version (Table 11.3). SHA-2 is also specified in RFC 4634, which essentially duplicates the material in FIPS 180-3 but adds a C code implementation. In 2005, NIST announced the intention to phase out approval of SHA-1 and move to a reliance on SHA-2 by 2010. Shortly thereafter, a research team described an attack in which two separate messages could be found that deliver the same SHA-1 hash using 269 operations, far fewer than the 280 operations previously thought needed to find a collision with an SHA-1 hash [WANG05]. This result should hasten the transition to SHA-2. In this section, we provide a description of SHA-512. The other versions are quite similar. SHA-512 Logic The algorithm takes as input a message with a maximum length of less than 2128 bits and produces as output a 512-bit message digest. The input is processed in 1024-bit blocks. Figure 11.8 depicts the overall processing of a message to produce a digest. This follows the general structure depicted in Figure 11.7. The processing consists of the following steps. Table 11.3 Comparison of SHA Parameters SHA-1 SHA-224 SHA-256 SHA-384 SHA-512 Message 160 224 256 384 512 Digest Size Message Size < 264 < 264 < 264 < 2128 < 2128 Block Size 512 512 512 1024 1024 Word Size 32 32 32 64 64 Number of Steps 80 64 64 80 80 Note: All sizes are measured in bits. 344 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS N 1024 bits 128 bits L bits Message 1000000,... ,0 L 1024 bits 1024 bits 1024 bits M1 M2 MN F F F + + + IV = H0 H1 H2 HN hash code 512 bits 512 bits 512 bits + = word-by-word addition mod 2 64 Figure 11.8 Message Digest Generation Using SHA-512 Step 1 Append padding bits. The message is padded so that its length is congruent to 896 modulo 1024 [length K 896(mod 1024)]. Padding is always added, even if the message is already of the desired length. Thus, the number of padding bits is in the range of 1 to 1024. The padding consists of a single 1 bit followed by the necessary number of 0 bits. Step 2 Append length. A block of 128 bits is appended to the message. This block is treated as an unsigned 128-bit integer (most significant byte first) and contains the length of the original message (before the padding). The outcome of the first two steps yields a message that is an integer multiple of 1024 bits in length. In Figure 11.8, the expanded message is repre- sented as the sequence of 1024-bit blocks M1, M2, Á , MN, so that the total length of the expanded message is N * 1024 bits. Step 3 Initialize hash buffer. A 512-bit buffer is used to hold intermediate and final results of the hash function. The buffer can be represented as eight 64-bit reg- isters (a, b, c, d, e, f, g, h). These registers are initialized to the following 64-bit integers (hexadecimal values): a = 6A09E667F3BCC908 e = 510E527FADE682D1 b = BB67AE8584CAA73B f = 9B05688C2B3E6C1F c = 3C6EF372FE94F82B g = 1F83D9ABFB41BD6B d = A54FF53A5F1D36F1 h = 5BE0CD19137E2179 11.5 / SECURE HASH ALGORITHM (SHA) 345 These values are stored in big-endian format, which is the most significant byte of a word in the low-address (leftmost) byte position. These words were obtained by taking the first sixty-four bits of the fractional parts of the square roots of the first eight prime numbers. Step 4 Process message in 1024-bit (128-word) blocks. The heart of the algorithm is a module that consists of 80 rounds; this module is labeled F in Figure 11.8. The logic is illustrated in Figure 11.9. Each round takes as input the 512-bit buffer value, abcdefgh, and updates the contents of the buffer.At input to the first round, the buffer has the value of the intermediate hash value, Hi - 1. Each round t makes use of a 64-bit value Wt, derived from the current 1024-bit block being processed (Mi). These values are derived using a message schedule described subsequently. Each round also makes use of an additive constant Kt, where 0 … t … 79 indicates one of the 80 rounds. These words represent the first 64 bits of the fractional parts of the cube roots of the first 80 prime numbers.The constants provide a “randomized” set of 64-bit patterns, which should eliminate any regularities in the input data. Table 11.4 shows these constants in hexadecimal format (from left to right). The output of the eightieth round is added to the input to the first round (Hi - 1) to produce Hi.The addition is done independently for each of the eight Mi Hi1 Message schedule 64 a b c d e f g h W0 K0 Round 0 a b c d e f g h Wt Kt Round t a b c d e f g h W79 K79 Round 79 Hi Figure 11.9 SHA-512 Processing of a Single 1024-Bit Block 346 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS words in the buffer with each of the corresponding words in Hi - 1, using addi- tion modulo 264. Step 5 Output. After all N 1024-bit blocks have been processed, the output from the Nth stage is the 512-bit message digest. We can summarize the behavior of SHA-512 as follows: H0 = IV Hi = SUM64(Hi - 1, abcdefghi) MD = HN where IV = initial value of the abcdefgh buffer, defined in step 3 abcdefghi = the output of the last round of processing of the ith message block N = the number of blocks in the message (including padding and length fields) SUM64 = addition modulo 264 performed separately on each word of the pair of inputs MD = final message digest value SHA-512 Round Function Let us look in more detail at the logic in each of the 80 steps of the processing of one 512-bit block (Figure 11.10). Each round is defined by the following set of equations: A a 1 e B + Wt + Kt 512 T1 = h + Ch(e, f, g) + A a 0 a B + Maj(a, b, c) 512 T2 = h = g g = f f = e e = d + T1 d = c c = b b = a a = T1 + T2 where t = step number; 0 … t … 79 Ch(e, f, g) = (e AND f) 䊝 (NOT e AND g) the conditional function: If e then f else g 11.5 / SECURE HASH ALGORITHM (SHA) 347 Table 11.4 SHA-512 Constants 428a2f98d728ae22 7137449123ef65cd b5c0fbcfec4d3b2f e9b5dba58189dbbc 3956c25bf348b538 59f111f1b605d019 923f82a4af194f9b ab1c5ed5da6d8118 d807aa98a3030242 12835b0145706fbe 243185be4ee4b28c 550c7dc3d5ffb4e2 72be5d74f27b896f 80deb1fe3b1696b1 9bdc06a725c71235 c19bf174cf692694 e49b69c19ef14ad2 efbe4786384f25e3 0fc19dc68b8cd5b5 240ca1cc77ac9c65 2de92c6f592b0275 4a7484aa6ea6e483 5cb0a9dcbd41fbd4 76f988da831153b5 983e5152ee66dfab a831c66d2db43210 b00327c898fb213f bf597fc7beef0ee4 c6e00bf33da88fc2 d5a79147930aa725 06ca6351e003826f 142929670a0e6e70 27b70a8546d22ffc 2e1b21385c26c926 4d2c6dfc5ac42aed 53380d139d95b3df 650a73548baf63de 766a0abb3c77b2a8 81c2c92e47edaee6 92722c851482353b a2bfe8a14cf10364 a81a664bbc423001 c24b8b70d0f89791 c76c51a30654be30 d192e819d6ef5218 d69906245565a910 f40e35855771202a 106aa07032bbd1b8 19a4c116b8d2d0c8 1e376c085141ab53 2748774cdf8eeb99 34b0bcb5e19b48a8 391c0cb3c5c95a63 4ed8aa4ae3418acb 5b9cca4f7763e373 682e6ff3d6b2b8a3 748f82ee5defb2fc 78a5636f43172f60 84c87814a1f0ab72 8cc702081a6439ec 90befffa23631e28 a4506cebde82bde9 bef9a3f7b2c67915 c67178f2e372532b ca273eceea26619c d186b8c721c0c207 eada7dd6cde0eb1e f57d4f7fee6ed178 06f067aa72176fba 0a637dc5a2c898a6 113f9804bef90dae 1b710b35131c471b 28db77f523047d84 32caab7b40c72493 3c9ebe0a15c9bebc 431d67c49c100d4c 4cc5d4becb3e42b6 597f299cfc657e2a 5fcb6fab3ad6faec 6c44198c4a475817 Maj(a, b, c) = (a AND b) 䊝 (a AND c) 䊝 (b AND c) the function is true only of the majority (two or three) of the arguments are true A a 0 a B = ROTR28(a) 䊝 ROTR34(a) 䊝 ROTR39(a) 512 A a1 eB 512 = ROTR14(e) 䊝 ROTR18(e) 䊝 ROTR41(e) n ROTR (x) = circular right shift (rotation) of the 64-bit argument x by n bits Wt = a 64-bit word derived from the current 512-bit input block Kt = a 64-bit additive constant + = addition modulo 264 Two observations can be made about the round function. 1. Six of the eight words of the output of the round function involve simply per- mutation (b, c, d, f, g, h) by means of rotation. This is indicated by shading in Figure 11.10. 2. Only two of the output words (a, e) are generated by substitution. Word e is a function of input variables (d, e, f, g, h), as well as the round word Wt and the constant Kt. Word a is a function of all of the input variables except d, as well as the round word Wt and the constant Kt. 348 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS a b c d e f g h Maj Ch + + + + Wt + + Kt + a b c d e f g h 512 bits Figure 11.10 Elementary SHA-512 Operation (single round) It remains to indicate how the 64-bit word values Wt are derived from the 1024-bit message. Figure 11.11 illustrates the mapping. The first 16 values of Wt are taken directly from the 16 words of the current block. The remaining values are defined as Wt = s512 1 (Wt - 2)+Wt - 7 +s0 (Wt - 15)+Wt - 16 512 where s512 0 (x) = ROTR1(x) 䊝 ROTR8(x) 䊝 SHR7(x) s512 1 (x) = ROTR19(x) 䊝 ROTR61(x) 䊝 SHR6(x) ROTRn(x) circular right shift (rotation) of the 64-bit argument x by n bits = SHRn(x) = left shift of the 64-bit argument x by n bits with padding by zeros on the right + = addition modulo 264 1024 bits W0 W1 W9 W14 Wt–16 Wt–15 Wt–7 Wt–2 W63 W65 W71 W76 Mi σ0 σ1 σ0 σ1 σ0 σ1 + + + W0 W1 W15 W16 Wt W79 64 bits Figure 11.11 Creation of 80-word Input Sequence for SHA-512 Processing of Single Block 11.5 / SECURE HASH ALGORITHM (SHA) 349 Thus, in the first 16 steps of processing, the value of Wt is equal to the corre- sponding word in the message block. For the remaining 64 steps, the value of Wt consists of the circular left shift by one bit of the XOR of four of the preceding values of Wt, with two of those values subjected to shift and rotate operations. This introduces a great deal of redundancy and interdependence into the message blocks that are compressed, which complicates the task of finding a different message block that maps to the same compression function output. Figure 11.12 summarizes the SHA-512 logic. The SHA-512 algorithm has the property that every bit of the hash code is a function of every bit of the input. The complex repetition of the basic function F produces results that are well mixed; that is, it is unlikely that two messages chosen at random, even if they exhibit similar regularities, will have the same hash code. Unless there is some hidden weakness in SHA-512, which has not so far been published, the difficulty of coming up with two messages having the same message digest is on the order of 2256 operations, while the difficulty of finding a message with a given digest is on the order of 2512 operations. Example We include here an example based on one in FIPS 180. We wish to hash a one-block message consisting of three ASCII characters: “abc”, which is equivalent to the fol- lowing 24-bit binary string: 01100001 01100010 01100011 Recall from step 1 of the SHA algorithm, that the message is padded to a length congruent to 896 modulo 1024. In this case of a single block, the padding consists of 896 - 24 = 872 bits , consisting of a “1” bit followed by 871 “0” bits. Then a 128-bit length value is appended to the message, which contains the length of the original message (before the padding). The original length is 24 bits, or a hexadecimal value of 18. Putting this all together, the 1024-bit message block, in hexadecimal, is 6162638000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000018 This block is assigned to the words W0, Á ,W15 of the message schedule, which appears as follows. W0 = 6162638000000000 W5 = 0000000000000000 W1 = 0000000000000000 W6 = 0000000000000000 W2 = 0000000000000000 W7 = 0000000000000000 W3 = 0000000000000000 W8 = 0000000000000000 W4 = 0000000000000000 W9 = 0000000000000000 350 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS W10 = 0000000000000000 W13 = 0000000000000000 W11 = 0000000000000000 W14 = 0000000000000000 W12 = 0000000000000000 W15 = 0000000000000018 As indicated in Figure 11.12, the eight 64-bit variables, a through h, are initial- ized to values H0,0 through H0,7. The following table shows the initial values of these variables and their values after each of the first two rounds. a 6a09e667f3bcc908 f6afceb8bcfcddf5 1320f8c9fb872cc0 b bb67ae8584caa73b 6a09e667f3bcc908 f6afceb8bcfcddf5 c 3c6ef372fe94f82b bb67ae8584caa73b 6a09e667f3bcc908 d a54ff53a5f1d36f1 3c6ef372fe94f82b bb67ae8584caa73b e 510e527fade682d1 58cb02347ab51f91 c3d4ebfd48650ffa f 9b05688c2b3e6c1f 510e527fade682d1 58cb02347ab51f91 g 1f83d9abfb41bd6b 9b05688c2b3e6c1f 510e527fade682d1 h 5be0cd19137e2179 1f83d9abfb41bd6b 9b05688c2b3e6c1f Note that in each of the rounds, six of the variables are copied directly from variables from the preceding round. The process continues through 80 rounds. The output of the final round is 73a54f399fa4b1b2 10d9c4c4295599f6 d67806db8b148677 654ef9abec389ca9 d08446aa79693ed7 9bb4d39778c07f9e 25c96a7768fb2aa3 ceb9fc3691ce8326 The hash value is then calculated as H1,0 = 6a09e667f3bcc908 + 73a54f399fa4b1b2 = ddaf35a193617aba H1,1 = bb67ae8584caa73b + 10d9c4c4295599f6 = cc417349ae204131 H1,2 = 3c6ef372fe94f82b + d67806db8b148677 = 12e6fa4e89a97ea2 H1,3 = a54ff53a5f1d36f1 + 654ef9abec389ca9 = 0a9eeee64b55d39a H1,4 = 510e527fade682d1 + d08446aa79693ed7 = 2192992a274fc1a8 H1,5 = 9b05688c2b3e6c1f + 9bb4d39778c07f9e = 36ba3c23a3feebbd H1,6 = 1f83d9abfb41bd6b + 25c96a7768fb2aa3 = 454d4423643ce80e H1,7 = 5be0cd19137e2179 + ceb9fc3691ce8326 = 2a9ac94fa54ca49f The resulting 512-bit message digest is ddaf35a193617aba cc417349ae204131 12e6fa4e89a97ea2 0a9eeee64b55d39a 2192992a274fc1a8 36ba3c23a3feebbd 454d4423643ce80e 2a9ac94fa54ca49f 11.5 / SECURE HASH ALGORITHM (SHA) 351 The padded message consists blocks M1, M2, …, MN. Each message block Mi consists of 16 64-bit words Mi,0, Mi,1, …, Mi,15. All addition is performed modulo 264. H0,0 6A09E667F3BCC908 H0,4 510E527FADE682D1 H0,1 BB67AE8584CAA73B H0,5 9B05688C2B3E6C1F H0,2 3C6EF372FE94F82B H0,6 1F83D9ABFB41BD6B H0,3 A54FF53A5F1D36F1 H0,7 5BE0CDI9137E2179 for i 1 to N 1. Prepare the message schedule W for t 0 to 15 Wt Mi,t for t 16 to 79 1 1Wt - 22 + Wt - 7 + s0 1Wt - 152 + Wt - 16 Wt = s512 512 2. Initialize the working variables a Hi–1,0 e Hi–1,4 b Hi–1,1 f Hi–1,5 c Hi–1,2 g Hi–1,6 d Hi–1,3 h Hi–1,7 3. Perform the main hash computation for t 0 to 79 T1 h + Ch(e, f, g) + a a 1 eb + Wt + Kt 512 T2 a a 0 ab + Maj(a, b, c) 512 h g g f f e e d + T1 d c c b b a a T1 + T2 4. Compute the inermediate hash value Hi,0 a + Hi–1,0 Hi,4 a + Hi–1,4 Hi,1 a + Hi–1,1 Hi,5 a + Hi–1,5 Hi,2 a + Hi–1,2 Hi,6 a + Hi–1,6 Hi,3 a + Hi–1,3 Hi,7 a + Hi–1,7 return {HN,0 || HN,1 || HN,2 || HN,3 || HN,4 || HN,5 || HN,6 || HN,7} Figure 11.12 SHA-512 Logic 352 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS Suppose now that we change the input message by one bit, from “abc” to “cbc”. Then, the 1024-bit message block is 6362638000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000018 And the resulting 512-bit message digest is 531668966ee79b70 0b8e593261101354 4273f7ef7b31f279 2a7ef68d53f93264 319c165ad96d9187 55e6a204c2607e27 6e05cdf993a64c85 ef9e1e125c0f925f The number of bit positions that differ between the two hash values is 253, almost exactly half the bit positions, indicating that SHA-512 has a good avalanche effect. 11.6 SHA-3 As of this writing, SHA-1 has not yet been “broken.”That is, no one has demonstrated a technique for producing collisions in less than brute-force time. However, because SHA-1 is very similar in structure and in the basic mathematical operations used to MD5 and SHA-0, both of which have been broken, SHA-1 is considered insecure and has been phased out for SHA-2. SHA-2, particularly the 512-bit version, would appear to provide unassailable security. However, SHA-2 shares the same structure and mathematical operations as its predecessors, and this is a cause for concern. Because it will take years to find a suitable replacement for SHA-2, should it become vulnerable, NIST decided to begin the process of developing a new hash standard. Accordingly, NIST announced in 2007 a competition to produce the next generation NIST hash function, to be called SHA-3. NIST would like to have a new standard in place by the end of 2012, but emphasizes that this is not a fixed timeline and that the schedule could slip well beyond that date. The basic requirements that must be satisfied by any candidate for SHA-3 are the following. 1. It must be possible to replace SHA-2 with SHA-3 in any application by a simple drop-in substitution. Therefore, SHA-3 must support hash value lengths of 224, 256, 384, and 512 bits. 2. SHA-3 must preserve the online nature of SHA-2. That is, the algorithm must process comparatively small blocks (512 or 1024 bits) at a time instead of requiring that the entire message be buffered in memory before processing it. Beyond these basic requirements, NIST has defined a set of evaluation criteria. These criteria are designed to reflect the requirements for the main applications sup- ported by SHA-2, which include digital signatures, hashed message authentication 11.8 / KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS 353 codes, key generation, and pseudorandom number generation. The evaluation crite- ria for the new hash function, in decreasing order of importance, are as follows. Security: The security strength of SHA-3 should be close to the theoretical maximum for the different required hash sizes and for both preimage resis- tance and collision resistance. SHA-3 algorithms must be designed to resist any potentially successful attack on SHA-2 functions. In practice, this probably means that SHA-3 must be fundamentally different than the SHA-1, SHA-2, and MD5 algorithms in either structure, mathematical functions, or both. Cost: SHA-3 should be both time and memory efficient over a range of hard- ware platforms. Algorithm and implementation characteristics: Consideration will be given to such characteristics as flexibility (e.g., tunable parameters for security/ performance tradeoffs, opportunity for parallelization, and so on) and simplic- ity. The latter characteristic makes it easier to analyze the security properties of the algorithm 11.7 RECOMMENDED READING AND WEB SITES [PREN99] is a good survey of cryptographic hash functions. [GILB03] examines the security of SHA-256 through SHA-512. GILB03 Gilbert, H. and Handschuh, H. “Security Analysis of SHA-256 and Sisters.” “Proceedings, CRYPTO ’03, 2003; published by Springer-Verlag. PREN99 Preneel, B. “The State of Cryptographic Hash Functions.” Proceedings, EURO- CRYPT ’96, 1996; published by Springer-Verlag. Recommended Web Sites: NIST Secure Hashing Page: SHA FIPS and related documents. Cryptographic Hash Algorithm Competition: NIST page on its competition for a new standardized hash algorithm, to be called SHA-3. 11.8 KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS Key Terms big endian birthday paradox compression function birthday attack collision resistant cryptographic hash function 354 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS hash code MD4 SHA-224 hash function MD5 SHA-256 hash value message digest SHA-384 keyed hash function one-way hash function SHA-512 little endian preimage resistant strong collision resistance message authentication code second preimage resistant weak collision resistance (MAC) SHA-1 Review Questions 11.1 What characteristics are needed in a secure hash function? 11.2 What is the difference between weak and strong collision resistance? 11.3 What is the role of a compression function in a hash function? 11.4 What is the difference between little-endian and big-endian format? 11.5 What basic arithmetical and logical functions are used in SHA? Problems 11.1 The high-speed transport protocol XTP (Xpress Transfer Protocol) uses a 32-bit checksum function defined as the concatenation of two 16-bit functions: XOR and RXOR, defined in Section 11.4 as “two simple hash functions” and illustrated in Figure 11.4. a. Will this checksum detect all errors caused by an odd number of error bits? Explain. b. Will this checksum detect all errors caused by an even number of error bits? If not, characterize the error patterns that will cause the checksum to fail. c. Comment on the effectiveness of this function for use as a hash function for authentication. 11.2 a. Consider the Davies and Price hash code scheme described in Section 11.4 and assume that DES is used as the encryption algorithm: Hi = Hi - 1 䊝 E1Mi, Hi - 12 Recall the complementarity property of DES (Problem 3.14): If Y = E(K, X), then Y¿ = E(K¿, X¿). Use this property to show how a message consisting of blocks M1, M2, Á , MN can be altered without altering its hash code. b. Show that a similar attack will succeed against the scheme proposed in [MEYE88]: Hi = Mi { E1Hi - 1, Mi2 11.3 a. Consider the following hash function. Messages are in the form of a sequence of t numbers in Zn, M = 1a1, a2, Á at2. The hash value h is calculated as a a ai b for i=1 some predefined value n. Does this hash function satisfy any of the requirements for a hash function listed in Table 11.1? Explain your answer. t b. Repeat part (a) for the hash function h = a a (ai)2 b mod n. i=1 c. Calculate the hash function of part (b) for M = (189, 632, 900, 722, 349) and n = 989. 11.4 It is possible to use a hash function to construct a block cipher with a structure similar to DES. Because a hash function is one way and a block cipher must be reversible (to decrypt), how is it possible? 11.8 / KEY TERMS, REVIEW QUESTIONS, AND PROBLEMS 355 11.5 Now consider the opposite problem: using an encryption algorithm to construct a one- way hash function. Consider using RSA with a known key. Then process a message consisting of a sequence of blocks as follows: Encrypt the first block, XOR the result with the second block and encrypt again, etc. Show that this scheme is not secure by solving the following problem. Given a two-block message B1, B2, and its hash RSAH1B1, B22 = RSA1RSA1B12 䊝 B22 Given an arbitrary block C1, choose C2 so that RSAH(C1, C2) = RSAH(B1, B2).Thus, the hash function does not satisfy weak collision resistance. 11.6 Suppose H(m) is a collision-resistant hash function that maps a message of arbitrary bit length into an n-bit hash value. Is it true that, for all messages x, x¿ with x Z x¿ , we have H(x) Z H(x¿) Explain your answer. 11.7 In Figure 11.11, it is assumed that an array of 80 64-bit words is available to store the values of Wt, so that they can be precomputed at the beginning of the processing of a block. Now assume that space is at a premium. As an alternative, consider the use of a 16-word circular buffer that is initially loaded with W0 through W15. Design an algo- rithm that, for each step t, computes the required input value Wt. 11.8 For SHA-512, show the equations for the values of W16, W17, W18, and W19. 11.9 State the value of the padding field in SHA-512 if the length of the message is a. 1919 bits b. 1920 bits c. 1921 bits 11.10 State the value of the length field in SHA-512 if the length of the message is a. 1919 bits b. 1920 bits c. 1921 bits 11.11 Suppose a1a2a3a4 are the 4 bytes in a 32-bit word. Each ai can be viewed as an integer in the range 0 to 255, represented in binary. In a big-endian architecture, this word represents the integer a1224 + a2216 + a328 + a4 In a little-endian architecture, this word represents the integer a4224 + a3216 + a228 + a1 a. Some hash functions, such as MD5, assume a little-endian architecture. It is impor- tant that the message digest be independent of the underlying architecture. Therefore, to perform the modulo 2 addition operation of MD5 or RIPEMD-160 on a big-endian architecture, an adjustment must be made. Suppose X = x1 x2 x3 x4 and Y = y1 y2 y3 y4. Show how the MD5 addition operation (X + Y) would be carried out on a big-endian machine. b. SHA assumes a big-endian architecture. Show how the operation (X + Y) for SHA would be carried out on a little-endian machine. 11.12 This problem introduces a hash function similar in spirit to SHA that operates on letters instead of binary data. It is called the toy tetragraph hash (tth).2 Given a mes- sage consisting of a sequence of letters, tth produces a hash value consisting of four letters. First, tth divides the message into blocks of 16 letters, ignoring spaces, punctu- ation, and capitalization. If the message length is not divisible by 16, it is padded out with nulls. A four-number running total is maintained that starts out with the value (0, 0, 0, 0); this is input to the compression function for processing the first block. The compression function consists of two rounds. Round 1 Get the next block of text and arrange it as a row-wise 4 * 4 block of text and covert it to numbers (A = 0, B = 1 , etc.). For example, for the block ABCDEFGHIJKLMNOP, we have 2 I thank William K. Mason, of the magazine staff of The Cryptogram, for providing this example. 356 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS A B C D 0 1 2 3 E F G H 4 5 6 7 I J K L 8 9 10 11 M N O P 12 13 14 15 Then, add each column mod 26 and add the result to the running total, mod 26. In this example, the running total is (24, 2, 6, 10). Round 2 Using the matrix from round 1, rotate the first row left by 1, second row left by 2, third row left by 3, and reverse the order of the fourth row. In our example: B C D A 1 2 3 0 G H E F 6 7 4 5 L I J K 11 8 9 10 P O N M 15 14 13 12 Now, add each column mod 26 and add the result to the running total. The new running total is (5, 7, 9, 11). This running total is now the input into the first round of the compression function for the next block of text. After the final block is processed, convert the final running total to letters. For example, if the message is ABCDEFGHIJKLMNOP, then the hash is FHJL. a. Draw figures comparable to Figures 11.8 and 11.9 to depict the overall tth logic and the compression function logic. b. Calculate the hash function for the 48-letter message “I leave twenty million dol- lars to my friendly cousin Bill.” c. To demonstrate the weakness of tth, find a 48-letter block that produces the same hash as that just derived. Hint: Use lots of A’s. The remaining problems deal with the hash function Whirlpool, described in Appendix N. 11.13 Develop a table similar to Table 4.9 for GF(28) with m(x) = x8 + x4 + x3 + x2 + 1. 11.14 Show the E and E - 1 mini-boxes in Table N.2 in the traditional S-box square matrix format, such as that of Table 5.4. 11.15 Verify that Figure N.5 is a valid implementation of the S-box shown in Table N.2a. Do this by showing the calculations involved for three input values: 00, 55, 1E. 11.16 Provide a Boolean expression that defines the S-box functionality of Figure N.5. 11.17 Whirlpool makes use of the construction Hi = E1Hi - 1, Mi2 䊝 Hi - 1 䊝 Mi. Another construction that was shown by Preneel to be secure is Hi = E1Hi - 1, Mi2 䊝 Mi. Now notice that the key schedule for Whirlpool resembles encryption of the cipher key under a pseudo-key defined by the round constants, so that the core of the hashing process could be formally viewed as two interacting encryption lines. Consider the encryption E1Hi - 1, Mi2. We could write the final round key for this block as K10 = E1RC, Hi - 12. Now show that the two hash constructions are essentially equiv- alent because of the way that the key schedule is defined. APPENDIX 11A MATHEMATICAL BASIS OF THE BIRTHDAY ATTACK In this appendix, we derive the mathematical justification for the birthday attack. We begin with a related problem and then look at the problem from which the name “birthday attack” is derived. APPENDIX 11A / MATHEMATICAL BASIS OF THE BIRTHDAY ATTACK 357 Related Problem A general problem relating to hash functions is the following. Given a hash func- tion H, with n possible outputs and a specific value H(x), if H is applied to k ran- dom inputs, what must be the value of k so that the probability that at least one input y satisfies H(y) = H(x) is 0.5? For a single value of y, the probability that H(y) = H(x) is just 1/ n. Conversely, the probability that H(y) Z H(x) is [1 - 11/n)]. If we generate k ran- dom values of y, then the probability that none of them match is just the product of the probabilities that each individual value does not match, or [1 - (1/n)]k. Thus, the probability that there is at least one match is 1 - [1 - (1/n)]k. The binomial theorem can be stated as k1k - 12 k1k - 121k - 22 11 - a2k = 1 - ka + a2 - a3 Á 2! 3! For very small values of a, this can be approximated as (1 - ka). Thus, the probabil- ity of at least one match is approximated as 1 - [1 - (1/n)]k L 1 - [1 - (k/n)] = k/n. For a probability of 0.5, we have k = n/2. In particular, for an m-bit hash code, the number of possible codes is 2m and the value of k that produces a probability of one-half is k = 21m - 12 (11.1) The Birthday Paradox The birthday paradox is often presented in elementary probability courses to demonstrate that probability results are sometimes counterintuitive. The prob- lem can be stated as follows. What is the minimum value of k such that the prob- ability is greater than 0.5 that at least two people in a group of k people have the same birthday? Ignore February 29 and assume that each birthday is equally likely. We can reason to the answer as follows. The probability that the birthdays of any two people are not alike is clearly 364/365 (since there is only one chance in 365 that one person’s birthday will coincide with another’s). The probability that a third person’s birthday will differ from the other two is 363/365; a fourth person’s, 362/365; and so on, until we reach the 24th person (342/365). We thus obtain a series of 23 fractions which must be multiplied together to reach the probability that all 24 birthdays are different. The product is a fraction that reduces to about 0.507, or slightly better than 1/2, for a coincidence among 23 people. To derive this answer formally, let us define P1n, k2 = Pr[at least one duplicate in k items, with each item able to take on one of n equally likely values between 1 and n Thus, we are looking for the smallest value of k such that P(365, k) Ú 0.5. It is easier first to derive the probability that there are no duplicates, which we designate as Q(365, k). If k 7 365, then it is impossible for all values to be different. So we assume k … 365. Now consider the number of different ways, N, that we can have k 358 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS values with no duplicates. We may choose any of the 365 values for the first item, any of the remaining 364 numbers for the second item, and so on. Hence, the number of different ways is 365! N = 365 * 364 * Á 1365 - k + 12 = (11.2) 1365 - k2! If we remove the restriction that there are no duplicates, then each item can be any of 365 values, and the total number of possibilities is 365k. So the probability of no duplicates is simply the fraction of sets of values that have no duplicates out of all possible sets of values: 365!/(365 - k)! 365! Q(365, k) = = (365) k (365 - k)!(365)k and 365! P(365, k) = 1 - Q(365, k) = 1 - (11.3) (365 - k)!(365)k This function is plotted in Figure 11.13. The probabilities may seem surpris- ingly large to anyone who has not considered the problem before. Many people would guess that to have a probability greater than 0.5 that there is at least one duplicate, the number of people in the group would have to be about 100. In fact, the number is 23, with P(365, 23) = 0.5073. For k = 100, the probability of at least one duplicate is 0.9999997. 1.0 0.9 0.8 0.7 0.6 P(365, k) 0.5 0.4 0.3 0.2 0.1 0.0 0 10 20 30 40 50 60 70 k Figure 11.13 The Birthday Paradox APPENDIX 11A / MATHEMATICAL BASIS OF THE BIRTHDAY ATTACK 359 Perhaps the reason that the result seems so surprising is that if you consider a particular person in a group, the probability that some other person in the group has the same birthday is small. But the probability that we are concerned with is the probability that any pair of people in the group has the same birthday. In a group of 23, there are (23(23 - 1))/2 = 253 different pairs of people. Hence the high proba- bilities. Useful Inequality Before developing a generalization of the birthday problem, we derive an inequality that will be needed: (1 - x) … e-x for all x Ú 0 (11.4) Figure 11.14 illustrates the inequality. To see that the inequality holds, note that the lower line is the tangent to e - x at x = 0. The slope of that line is just the derivative of e - x at x = 0: f(x) = e - x d -x f¿(x) = e = -e - x dx f¿(0) = - 1 The tangent is a straight line of the form ax + b with a = -1, and the tangent at x = 0 must equal e - 0 = 1. Thus, the tangent is the function (1 - x), confirming the inequality of Equation (11.4). Furthermore, note that for small x, we have (1 - x) L e - x. 1.0 0.8 ex 0.6 0.4 1x 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Figure 11.14 A Useful Inequality 360 CHAPTER 11 / CRYPTOGRAPHIC HASH FUNCTIONS The General Case of Duplications The birthday problem can be generalized to the following problem. Given a random variable that is an integer with uniform distribution between 1 and n and a selection of k instances (k … n) of the random variable, what is the probability, P(n, k), that there is at least one duplicate? The birthday problem is just the special case with n = 365. By the same reasoning as before, we have the following generalization of Equation (11.3): n! P(n, k) = 1 - (11.5) (n - k)!nk We can rewrite this as n * (n - 1) * Á * (n - k + 1) P(n, k2 = 1 - nk n - 1 n - 2 n - k+1 = 1 - c * * Á * d n n n k - 1 = 1 - c a1 - b * a1 - b * Á * a1 - bd 1 2 n n n Using the inequality of Equation (11.4), P(n, k) 7 1 - [(e - 1/n) * (e - 2/n) * Á * (e - (k - 1)/n)] 7 1 - e - [(1/n)+(2/n) + Á + 11k - 12>n2] 7 1 - e - Ak * (k - 1)B>2n Now let us pose the question: What value of k is required such that P(n, k) 7 0.5? To satisfy the requirement, we have 1>2 = 1 - e - (k * (k - 1))>2n 2 = e(k * (k - 1))>2n k * (k - 1) In 2 = 2n For large k, we can replace k * (k - 1) by k2, and we get k = 12(ln2)n = 1.181n L 1n (11.6) As a reality check, for n = 365, we get k = 1.18 * 1365 = 22.54, which is very close to the correct answer of 23. We can now state the basis of the birthday attack in the following terms. Suppose we have a function H, with 2m possible outputs (i.e., an m-bit output). If H is applied tok random inputs, what must be the value of k so that there is the probability of at least one duplicate [i.e., H(x) = H(y) for some inputs x , y )]? Using the approximation in Equation (11.6), k = 22m = 2m>2 (11.7) APPENDIX 11A / MATHEMATICAL BASIS OF THE BIRTHDAY ATTACK 361 Overlap between Two Sets There is a problem related to the general case of duplications that is also of rele- vance for our discussions. The problem is this: Given an integer random variable with uniform distribution between 1 and n and two sets of k instances (k … n) of the random variable, what is the probability, R(n, k), that the two sets are not disjoint; that is, what is the probability that there is at least one value found in both sets? Let us call the two sets X and Y, with elements {x1, x2,... , xk} and {y1, y2,... , yk}, respectively. Given the value of x1, the probability that y1 = x1 is just 1/n, and therefore the probability that y1 does not match x1 is [1 - (1/n)]. If we generate the k random values in Y, the probability that none of these values is equal to x1 is [1 - (1/n)]k. Thus, the probability that there is at least one match to x1 is 1 - [1 - (1/n)]k. To proceed, let us make the assumption that all the elements of X are distinct. If n is large and if k is also large (e.g., on the order of 1n), then this is a good approximation. In fact, there may be a few duplications, but most of the values will be distinct. With that assumption, we can make the following derivation: 1 k Pr[no match in Y to x1] = a1 - b n 2 1 k k 1 k Pr[no match in Y to X] = ¢ a1 - b ≤ = a1 - b n n 2 1 k R(n, k) = Pr[at least one match in Y to X] = 1 - a1 - b n Using the inequality of Equation (11.4), R(n, k) 7 1 - A e-1>n B k 2 R(n, k) 7 1 - A e-k >n B 2 Let us pose the question: What value of k is required such that R(n, k) 7 0.5? To sat- isfy the requirement, we have 1>2 = 1 - (e - k >n) 2 2 = ek >n 2 k2 ln(2) = n k = 1(ln(2))n = 0.831n L 1n (11.8) We can state this in terms related to birthday attacks as follows. Suppose we have a function H with 2m possible outputs (i.e., an m-bit output). Apply H to k random inputs to produce the set X and again to k additional random inputs to produce the set Y.What must be the value of k so that there is the probability of at least 0.5 that there is a match between the two sets (i.e., H(x) = H(y) for some inputs x H X, y H Y)? Using the approximation in Equation (11.8): k = 22m = 2m>2 CHAPTER MESSAGE AUTHENTICATION CODES 12.1 Message Authentication Requirements 12.2 Message Authentication Functions Message Encryption Message Authentication Code 12.3 Requirements for Message Authentication Codes 12.4 Security of MACs Brute-Force Attacks Cryptanalysis 12.5 MACs Based on Hash Functions: HMAC HMAC Design Objectives HMAC Algorithm Security of HMAC 12.6 MACs Based on Block Ciphers: DAA and CMAC Data Authentication Algorithm Cipher-Based Message Authentication Code (CMAC) 12.7 Authenticated Encryption: CCM and GCM Counter with Cipher Block Chaining-Message Authentication Code Galois/Counter Mode 12.8 Pseudorandom Number Generation Using Hash Functions and Macs PRNG Based on Hash function PRNG Based on MAC function 12.9 Recommended Reading and Web Site 12.10 Key Terms, Review Questions, and Problems 362 12.1 / MESSAGE AUTHENTICATION REQUIREMENTS 363 At cats’ green on the Sunday he took the message from the inside of the pillar and added Peter Moran’s name to the two names already printed there in the “Brontosaur” code. The message now read: “Leviathan to Dragon: Martin Hillman, Trevor Allan, Peter Moran: observe and tail.” What was the good of it John hardly knew. He felt better, he felt that at last he had made an attack on Peter Moran instead of waiting passively and effecting no retaliation. Besides, what was the use of being in possession of the key to the codes if he never took advantage of it? —Talking to Strange Men, Ruth Rendell KEY POINTS ◆ Message authentication is a mechanism or service used to verify the integrity of a message. Message authentication assures that data received are exactly as sent by (i.e., contain no modification, insertion, deletion, or replay) and that the purported identity of the sender is valid. ◆ Symmetric encryption provides authentication among those who share the secret key. ◆ A message authentication code (MAC) is an algorithm that requires the use of a secret key. A MAC takes a variable-length message and a secret key as input and produces an authentication code. A recipient in posses- sion of the secret key can generate an authentication code to verify the integrity of the message. ◆ One means of forming a MAC is to combine a cryptographic hash function in some fashion with a secret key. ◆ Another approach to constructing a MAC is to use a symmetric block cipher in such a way that it produces a fixed-length output for a variable- length input. One of the most fascinating and complex areas of cryptography is that of message authentication and the related area of digital signatures. It would be impossible, in anything less than book length, to exhaust all the cryptographic functions and pro- tocols that have been proposed or implemented for message authentication and dig- ital signatures. Instead, the purpose of this chapter and the next is to provide a broad overview of the subject and to develop a systematic means of describing the various approaches. This chapter begins with an introduction to the requirements for authentica- tion and digital signature and the types of attacks to be countered. Then the basic approaches are surveyed. The remainder of the chapter deals with the fundamental approach to message authentication known as the message authentication code (MAC). Following an overview of this topic, the chapter looks at security considera- tions for MACs. This is followed by a discussion of specific MACs in two categories: those built from cryptographic hash functions and those built using a block cipher 364 CHAPTER 12 / MESSAGE AUTHENTICATION CODES mode of operation. Next, we look at a relatively recent approach known as authen- ticated encryption. Finally, we look at the use of cryptographic hash functions and MACs for pseudorandom num

Document Details

Tags

Related

Full Transcript

Upgrade to continue