CPSC 52500 Encryption and Authentication Fall 2024 PDF
Document Details
Uploaded by EasyToUseWisdom5714
Lewis University
2024
Dr. Jason Perry
Tags
Summary
These lecture slides cover topics in the CPSC 52500 course on encryption and authentication for Fall 2024. The document includes course descriptions, objectives, grading, required readings, and topics such as the one-time pad and attack models. The document is from Lewis University.
Full Transcript
CPSC 52500 Encryption and Authentication S EC TIONS 0 0 1 A N D 0 0 4 , FA L L - 2 2 0 2 4 I N STR UC TOR : D R. JA S O N PE R RY ( PE R RYJN@L EWI S U.EDU ) M E E TING: W E DNESDAY A S - 1 0 4- A 1 Survey Please fill out the in...
CPSC 52500 Encryption and Authentication S EC TIONS 0 0 1 A N D 0 0 4 , FA L L - 2 2 0 2 4 I N STR UC TOR : D R. JA S O N PE R RY ( PE R RYJN@L EWI S U.EDU ) M E E TING: W E DNESDAY A S - 1 0 4- A 1 Survey Please fill out the initial “get-to-know-you” survey that was handed out. I have extra pens if you need to borrow one. You get 10 points of homework credit for completing it. 2 Why is this course important? We depend on encryption every day, even if we are not aware we are using it. Anytime we visit a secure web site, the web browser and web server negotiate and use an encrypted connection— without any help from us! Banks use encryption to keep our fund transfers secure. Encryption algorithms are some of the security technologies in which we have the highest confidence. However, many security breaches are caused by security professionals who have a poor understanding of the fundamental principles of encryption. 3 Learning Objectives Upon completing the course, you should understand: the principal concepts of secure communication; the basic algorithms and protocols of encryption and authentication; how encryption is used in existing internet applications; the basics of cryptanalysis (how people try to break encryption); best practices for employing cryptography to address real-world security problems. 4 Course Grading Weekly written homeworks and participation 35% Weekly quiz (online) 15% Midterm Exam (Week 4) 25% Final Exam (Week 8) 25% Each week’s quiz and homework will be due the next Tuesday at 11:59pm. ◦ Sunday night is a more common due date, but since the lecture is Wednesday I’m giving additional days. As this is an in-person section, the exams will be in person and on-paper only. See syllabus for details. 5 Attendance Requirement To receive credit for attending this in-person section, you must attend all class sessions. Any absence must be excused according to University policy and documented. Remember that Weeks 4 and 8 are exam weeks. You will sign an attendance sheet each week, including your Lewis ID number, so please bring your ID card. On-time arrival is important. After the first week, if you show up more than 5 minutes late, your attendance will scored as late (50%). 6 Required video viewing There is much more content to this course than can be presented in a single 2-hour weekly session. ◦ After this week, the in-person sessions will include going over homework solutions, as well as to make announcements and take your questions (going over homework solutions will not be recorded.) This means that the majority of the course material will be provided to you in the form of prerecorded lecture video. Since the lecture videos are such an important part of how the material is presented, you are required to watch any segments not recorded in class. The Panopto tool tracks whether you individually have watched the videos or not. You will need to watch the videos while logged in. 10 points of your homework grade each week will come from whether you have watched the videos. To receive this credit, Panopto must show that you have watched at least 80% of all video not presented in class. 7 Course Topic Outline Defining Secure Communication The math of Public-key Cryptography Block Ciphers (DES & AES) Diffie-Hellman Key Exchange Modes of Operation RSA Encryption Stream Ciphers Digital Signatures Hash Functions Digital Certificates Message Authentication Internet Protocols and Applications The Secure Channel Randomness and pseudorandomness 8 The Textbook Ferguson, Schneier, and Kohno: Cryptography Engineering Reading assignments for relevant chapters/sections will be given each week. 9 Note on Math Without the use of constructions described with mathematical precision, and mathematically rigorous arguments, we would have no basis for confidence in the security of any encryption schemes. We cannot claim to be knowledgeable in encryption without some level of understanding of the math behind it. But don’t worry! I do not assume a strong math background; in fact, my goal is to help everyone grow in their mathematical ability. Math is not ultimately about computation but about concepts and how they interrelate. ◦ Any object you can define precisely is a subject of mathematics—not just numbers. 10 Week 1 Assignments All outstanding materials (videos) and assignments for each week will be posted to the Blackboard page by 10am Thursday (often on Wednesday night.) Homework 1 ◦ You can type answers to the questions in the Word document or handwrite and scan (if you scan, PDF is preferred). ◦ Showing work may allow for partial credit Quiz 1 ◦ Multiple-choice, single attempt, 15 minutes to complete ◦ So please review the material well before starting! Both assignments are due by 11:59pm next Tuesday. If you haven’t already, please introduce yourself on the discussion forum. 11 On Academic Integrity All work must be your own. To fairly evaluate your understanding of the material, I need to see words that you wrote yourself and work that you did yourself. You may discuss broad solution approaches with each other and read or view supplemental material for understanding; but obtaining direct answers from any other source is unacceptable and will not be tolerated, including from AI text generators. The first time I detect copying or use of AI text generators in an assignment, a grade of zero for the entire assignment will be given, with no possibility to redo or change your grade. Further violations will result in a grade of F for the course and potential academic dismissal. 12 Communication All course material and announcements will be posted on Blackboard. Feel free to email me for any concern: [email protected]. Please use your Lewis email so it is easier for me to identify you. Online office hours by appointment through Zoom: send me an email to schedule. It is a course requirement to check your Lewis email once each weekday for any announcements. 13 Week 1 Topics Outline 1. Definitions for Encryption Schemes 2. Review of Binary and Hexadecimal Numbers 3. The One-Time Pad Encryption Scheme 4. Attack Models 5. Two “Toy” Encryption Schemes 14 Defining Encryption Schemes AND THEIR SECURITY 15 What is (Modern) Cryptography? The study of algorithms for secure communication and computation. This course will center on the first: problems of secure communication. Secure computation has been considered an esoteric field; but its importance is growing ◦ For example, data scientists may desire to perform computations on encrypted data to preserve privacy. In the goal of securing information systems, cryptography is a crucial component; but it’s not everything! 16 Relation of Cryptography to Security Cryptography Security None of the algorithms we will study can make anything secure by themselves; they are tools that can help make a system secure, but only so far as they are correctly implemented and deployed. 17 The Importance of Precise Definitions The first and most important step is to give a precise mathematical definition of the security we want cryptographic algorithms to give us. ◦ If you don’t precisely define the problem you want to solve, how do you know when you’ve solved it? ◦ Defining exactly what we mean by “secure” is surprisingly tricky. ◦ A precise definition goes a long way toward designing a secure algorithm. The goal of modern cryptography is algorithms that have a mathematical proof of security. ◦ This has not been fully achieved yet. ◦ The community will not accept a new algorithm without rigorous arguments for its security. 18 Three Problems of Secure Communication 1. Confidentiality ◦ Preventing messages from being read by unintended persons ◦ Solved with encryption schemes 2. Authenticity ◦ Ensuring that a message is from who it claims to be from and has not been tampered with ◦ Solved with MACs and Digital Signatures 3. Key Distribution ◦ Solutions for problems 1 & 2 make use of short strings of data known as keys ◦ How to securely derive and share keys is a problem in itself. 19 Encryption Schemes The intuitive idea: “scrambling” and “unscrambling” the letters/words/bits in a message, to keep it private. …not wrong, but we want to have a more precise definition. Defining Communication We define encryption in the context of an abstract communication model that applies to many real- world scenarios. The model consists of two parties, a communication channel, a single message, and a potential eavesdropper. The parties are traditionally named Alice and Bob (A and B), and eavesdropper Eve Eve 𝑚 Alice Bob 𝑚 𝑚 𝑚 How can we stop Eve from reading the message intended for only Bob? 21 Defining an Encryption Scheme An Encryption Algorithm, E, and Decryption Algorithm, D, which use a key K. Eve 𝑐 Alice Bob 𝑐 = 𝐸(𝐾, 𝑚) 𝑐 𝑚 = 𝐷(𝐾, 𝑐) E takes a message m and key K and produces a ciphertext, c. D takes a ciphertext c plus same key K and recovers the message m. The message m is also called the plaintext. 22 A Third (Randomized) Algorithm Key Generation: A randomized algorithm for outputting a key to use with our scheme: 𝐺(1𝑛) → 𝐾 The input 1n is a unary encoding of the number of bits of output we want. For the first part of the course, the key generation algorithm will always be the obvious thing: “produce a uniform random string of n bits” But we need to include it in our definitions because proofs of security depend on its randomness properties. Therefore, an encryption scheme is formally specified by three algorithms G, E, and D. 23 The three algorithms of an encryption scheme Key Generation (G): 𝐺(1𝑛) → 𝐾 Encryption (E): 𝐸(𝐾, 𝑚) = 𝑐 Decryption (D): 𝐷(𝐾, 𝑐) = 𝑚 24 Summary of Definitions Seen So Far Plaintext: the unencrypted message Ciphertext: the encrypted message Encryption algorithm: transforms a plaintext to a ciphertext Decryption algorithm: transforms a ciphertext to a plaintext Key: an additional input to the encryption and decryption algorithms Party: an entity participating in communication (Alice or Bob) Adversary: the hypothetical entity who tries to break the security of the communication (such as an eavesdropper) 25 When is an encryption scheme secure? An encryption scheme is considered perfectly secure when Eve receives no information about the message, apart from: ◦ its length ◦ the time it was sent. Q. Is this a precise mathematical definition? A. Yes, if we define what is meant by “no information”. Q. What if Eve already learned something about the message by some other means? A. That is outside the scope of this definition; yet, this definition implies that she can’t gain any additional information from eavesdropping. 26 Kerchoffs’ Principle The security of a scheme must not rely on keeping the encryption and decryption algorithms secret—only the key. Why? If security of a system depends on keeping an algorithm secret, and that secret is leaked, the security of every system that uses it is compromised. ◦ On the other hand, if only a single key is compromised, the scheme is still secure Auguste Kerchoffs, 1835-1903 for everyone else who uses a different key. ◦ Algorithms are harder to change than keys. If the scheme is made public, it can be inspected by experts, exposing flaws and having them fixed, building confidence in its security. ◦ On the other hand, if you make your encryption scheme secret so no one can test it, the first person who finds a vulnerability in it will probably be an attacker! 27 Binary and Hexadecimal Numbers 28 All we encrypt is strings of bits Ultimately, that’s how every kind of data is represented inside a computer. When we talk about encryption algorithms, we don’t care what kind of data we are encrypting. ◦ In fact, we depend on encryption algorithms to scramble things so completely that you can’t tell what kind of data it is. Typically, we will chop data strings up into fixed-size chunks. Remember that 1 byte = 8 bits. 29 Place value in the Decimal System In the decimal (base-10) numbering system, we use digits 0 through 9, and place values are powers of ten. The digit in the ones place is called the least significant digit, and the one at the other end is the most significant digit. 30 Binary Numbers Base-2 numbering The place values are powers of 2: 20, 21, 22, 23, … A bit is a “binary digit”, 0 or 1 ◦ The smallest possible amount of information? To convert binary numbers to decimal, just add up the place values where there is a 1 bit. When comparing values written in different bases, we can add a subscript indicating the base to make it clear. 31 “Counting up” in binary Write all possible three-bit binary strings in order. 32 How many possible bit strings are there of a given length? This is the most crucial mathematical fact behind the security of encryption. Consider a string of n bits. How many different values could it have? For each of the n positions, there are 2 possible values, 0 or 1. Therefore, by multiplication of possibilities, there are 2𝑛 different possible strings of n bits. ◦ Example: There are 24 = 16 different 4-bit strings. ◦ Example: There are 28 = 256 different byte values (8-bit strings). If we consider n-bit strings as binary numbers, they represent the values 0 to 2𝑛 – 1. ◦ But remember that bits in a computer can represent any kind of data at all. 33 Converting Decimal to Binary Subtract the largest power of 2 that you can from the number, and put a 1 in that column in the place-value table. Keep going until you get zero. Example: 57 34 Making it easier to work with binary numbers Binary is fine for computers, but for us binary strings are very long to write out – a digit for every bit. We want to use a base that lets us write values more concisely. Our familiar decimal base (base 10) is more concise, but is a poor choice for working with binary numbers, because each decimal digit does not correspond to a whole number of bits. ◦ This is because 10 is not a power of 2. ◦ Example: to write either 9 or 10 in binary takes 4 bits (1001, 1010), but they are different numbers of decimal digits. ◦ This makes converting back and forth awkward and error-prone. 35 Hexadecimal – base 16 We need 16 symbols for digits, so we use 0-9 plus a, b, c, d, e, f to represent 10-15 Because 16 is a power of 2, there is a simple and direct correspondence between binary and hexadecimal numbers. Each hexadecimal digit corresponds to exactly 4 bits, because 24 = 16 A convention is to prefix a hexadecimal number with 0x to indicate that it is hexadecimal. This is not part of the number itself. 36 Decimal-Hex-Binary Table Decimal Hex Binary Decimal Hex Binary 0 0 0000 8 8 1000 1 1 0001 9 9 1001 2 2 0010 10 a 1010 3 3 0011 11 b 1011 4 4 0100 12 c 1100 5 5 0101 13 d 1101 6 6 0110 14 e 1110 7 7 0111 15 f 1111 37 Bits and Bytes in Hexadecimal The bit length of a string is the same regardless of which base you write it in. But if you write it in hexadecimal, you use one-fourth as many digits as if you had written it in binary. One byte (8 bits) can conveniently be written as two hexadecimal digits. 0xC3 = 1100 00112 A 4-byte (32-bit) number can be written as eight hex digits: 1111 1100 1100 0011 0000 0011 1100 0000 = f c c 3 0 3 c 0 (in decimal: 4,240,638,912) 38 Thinking about key lengths In cryptography, a key is almost always a string of random bits whose length is a power or 2 (or multiple of 8) ◦ 128-bit key, 1024-bit key, etc. ◦ Usually, the longer the key used, the better the security. How long are these when written out in hexadecimal? 39 Real-world key lengths 128-bit key (= 16 bytes): 4d e9 9e db a5 16 43 94 c5 86 1f 60 67 5f 34 1d 1024-bit key: 27 72 ef cc a0 af 57 fc c3 03 c0 4f e3 bb be d6 07 75 9c eb fa a0 a1 18 24 89 2a 8b 75 7e 13 b2 b8 ec 22 a5 2c 19 6f 8c 57 cf 0e 96 bb f7 c6 dd 51 a9 f5 fc ab b6 84 b7 dc 6b b2 2e be 79 2c bc ac cc 23 f1 86 c3 d6 4d 48 12 a0 d5 b4 65 98 49 66 7e f0 27 38 75 65 5c a7 00 5e db 58 d2 b3 24 00 53 d5 fe 3a b8 da ab d9 2b f5 f5 cd f8 09 75 b9 66 88 14 32 02 02 13 83 f1 e2 c1 78 67 d3 a3 ◦ In decimal, a 309-digit number (approximately) 40 The One-Time Pad 41 Is there provably unbreakable encryption? Yes. It’s known as the one-time pad. 42 Bitwise Binary Operators AND (⋀), OR (⋁), XOR (⊕) We perform these operations on longer bit strings by applying the operator to each bit individually. 43 The laws of XOR Identity and cancellation laws: 𝟎⊕𝑎 =𝑎 𝑎⊕𝑎 =𝟎 Commutative and Associative laws: 𝑎⊕𝑏 =𝑏⊕𝑎 (𝑎 ⊕ 𝑏) ⊕ 𝑐 = 𝑎 ⊕ (𝑏 ⊕ 𝑐) Notation: A variable represents a bit string; 0 is a bit string of all zeroes. All bit strings in an equation are the same length. 44 The One-Time Pad encryption scheme (OTP) The key is a string of random bits of the same length as the message. The encryption function is to XOR the plaintext with the key: 𝑐 = 𝐸(𝐾, 𝑚) = 𝑚 ⊕ 𝐾 The decryption function is to XOR the ciphertext with the key. 𝐷(𝐾, 𝑐) = 𝑐 ⊕ 𝐾 Why does this work? = (𝑚 ⊕ 𝐾) ⊕ 𝐾 = 𝑚 ⊕ (𝐾 ⊕ 𝐾) =𝑚 A given key can only be used once (hence the name) 45 Defining One-time Pad Formally as an Encryption Scheme 𝐺(1𝑛) → string of 𝑛 random bits, 𝑛 = |𝑚| Key Generation algorithm G K = G(1n) Encryption algorithm E 𝐸(𝐾, 𝑚) = 𝐾 ⊕ 𝑚 c = E(K, m) Decryption algorithm D 𝐷(𝐾, 𝑐) = 𝐾 ⊕ 𝑐 m = D(K, c) Would OTP work with AND or OR? Why or why not? 46 Why is the one-time pad secure? Intuitively: the operation of XORing with a random string hides (“masks”) all information about the plaintext. More rigorously: For a given ciphertext, all plaintexts of the same length are equally likely to be its decryption. ◦ If you see a 0 in the ciphertext, it’s either from 0 ⊕ 0 or 1 ⊕ 1 ◦ If you see a 1 in the ciphertext, it’s either from 1 ⊕ 0 or 0 ⊕ 1 ◦ In either case, the bit of the original message could be 0 or 1 with equal probability (50%/50%), because the key bits are uniformly random. ◦ This is a mathematical definition of “no information”, from the field of information theory. This is a perfectly secure encryption scheme. 47 Why must you never reuse bits from a one-time pad, not even once? Say the same key K is used for two messages, m and m' : 𝑐 = 𝑚⊕𝐾 𝑐′ = 𝑚′ ⊕ 𝐾 An eavesdropper sees both c and c'. What can she do? XOR them together, of course! 𝑐 ⊕ 𝑐′ = (𝑚 ⊕ 𝐾) ⊕ (𝑚′ ⊕ 𝐾) = 𝑚 ⊕ 𝑚′ ⊕ 𝐾 ⊕ 𝐾 = 𝑚 ⊕ 𝑚′ The eavesdropper learns the XOR of the two plaintexts, which is potentially very revealing. 48 So why not use the one-time pad? It’s not practical for most real-life uses, especially the internet ◦ Requires a new key as long as the message for every message ◦ Privately exchanging such a key every time is impractical Intelligence agencies have used it; cryptographers would prepare two identical paper pads full of random numbers, and agents would receive them in person before going to the field. ◦ …and it was sometimes broken because people ran out of key bits and had to reuse the same pad! 49 Historical Information The one-time pad scheme was patented by Electrical Engineer G. S. Vernam in 1917, for the purpose of encrypting telegraph communications. Its secrecy properties were proved by C. E. Shannon in 1949. Claude Shannon (1916-2001) – the “Father of Information Theory” 50 Attack Models 51 What the eavesdropper can do An essential aspect of the practice of security is to think from the attacker’s perspective; encryption is no different. To have confidence that schemes are secure, we need to show that they stand up against adversaries who may potentially obtain a lot of information. In encryption, an attack model specifies the kinds of information an adversary may gain about the messages being encrypted. ◦ It doesn’t limit how the adversary will try to crack the encryption; we must assume they can try any computation. Cryptographers want to prove that the adversary can’t break the scheme in any reasonable amount of time, even with access to that information. We will see four attack models, in increasing order of what the attacker knows (or is able to learn) 52 1. Ciphertext-only attack The easiest attack type to defend against: the attacker only gets to see ciphertexts c In other words, attacker can do nothing but eavesdropping. Attacker has: a5 16 43 94 c5 86 c 53 2. Known Plaintext Attack The attacker gains possession of some plaintext m and its encryption c. (maybe more than one) Attacker tries to use this knowledge to decrypt other ciphertexts encrypted with the same key (perhaps by finding the key itself) Attacker learns that: a5 16 43 94 c5 86 corresponds to “Sunday” c m Could easily occur in the real world. An email sent to many people could become publicly known. 54 3. Chosen Plaintext Attack (CPA) Attacker can choose any plaintext m he wants and obtain the encryption of it. Attacker chooses any plaintext: “Monday” m Then finds out its encryption: a5 16 43 94 c5 86 c Note: When public-key encryption is used, the attacker always has this power. ◦ Referred to as giving the adversary an “encryption oracle” 55 4. Chosen Ciphertext Attack (CCA) Attacker can choose any ciphertext he wants and get the decryption of it. Attacker chooses any ciphertext: a5 16 43 94 c5 86 c Then finds out its decryption (“decryption oracle”): “Friday” m Only the key is not revealed to the attacker, or the decryption of one “challenge” message 56 How does OTP stack up? The One-Time Pad, if used properly, is still perfectly secure in all four attack models. ◦ For a trivial reason: knowledge of plaintext/ciphertext pairs cannot help when a different key is used with every message. The encryption schemes we study from now on will use the same short key for multiple messages; so these attack models will gain more significance. ◦ The adversary could potentially analyze many, many plaintext/ciphertext pairs to try to break the encryption. 57 Perfect vs. Computational Security Unlike the one-time pad, modern practical encryption schemes are technically all breakable. ◦ When the key is shorter than the message, perfect security is impossible to achieve (proved by Shannon) ◦ An adversary who does not know the key could eventually decrypt the message. The best we can hope for is that decryption by an adversary would take an unreasonably long time, even on the fastest computers/clusters; this is known as computational security. Computational security is mathematically formalized by the notion of indistinguishability— essentially, that there is no efficient way for the adversary to distinguish a ciphertext from random bits. If an encryption scheme is computationally secure, it can only be broken by a brute force attack. 58 Brute Force Attack A brute force attack is to try decrypting a ciphertext with every possible key. a5 16 43 94 c5 86 You know when you’ve succeeded, because with high c probability only one of the keys will produce a meaningful decrypted message. Once found, the key can be used to decrypt future messages. The one-time pad is not vulnerable to a brute-force attack, but all the schemes we’ll see from here on out are. If a scheme is only vulnerable to a brute-force attack, we feel very good about it! 59 How long does a brute force attack take? It’s a function of the key length, since you potentially have to try all possible keys ◦ 128-bit key: up to 2128 guesses ◦ On average, you expect to try half that many before hitting the right one. How long is that in actual time? 60 Brute Force Effort If an encryption scheme can only be broken by brute force, and it allows the key length to be increased, it is effectively “future-proof”. Why? 61 Two ‘toy’ encryption schemes 62 A Very, Very Simple Encryption Scheme Consider a 26-character alphabet. Encrypt by substituting each character with the character a fixed number of places further in the alphabet. Call this number the shift. Example with Shift = 2: VOYAGE becomes XQACIG abcdefghijklmnopqrstuvwxyz This is a Substitution Cipher because letters are substituted for other letters. It’s a very simple kind of substitution cipher, known as a Shift Cipher. What is the key in this scheme? How do you decrypt? 63 Why is this easy to break? For one thing, the number of keys is small: only 26 possible shifts, just try them all (brute force attack) But even if there were a large number of keys, this kind of cipher would be easy to break; brute force isn’t necessary. ◦ All A’s get mapped to the same letter, all B’s, etc. ◦ This is called monoalphabetic substitution We say that this encryption scheme does not hide the statistical properties (patterns) of the plaintext. Such schemes can be broken by statistical analysis of ciphertexts. 64 Statistical Analysis of Letter Frequency in English Text 65 Another Simple Encryption Scheme Take the plaintext and swap every pair of adjacent characters. VOYAGE becomes OVAYEG This is a type of Transposition Cipher, because it works by shuffling the plaintext characters/bytes around. Of course, this one is also easy to break. 66 Almost ready for the real thing The real-world encryption schemes we will study also use substitution and transposition operations (on sequences of bits, not letters of the alphabet). ◦ If you only use one-to-one substitution and transposition, you can guarantee that your algorithm is reversible, and thus ciphertexts can be decrypted. In a sense, the two ‘toy’ schemes we just looked at were not doing the wrong thing; they were just not doing enough of it! 67