class: big, middle # ECE 7420 / ENGI 9823: Security .title[ .lecture[Lecture 12:] .title[Modes, MACs and hashes] ] --- # Last time ### Classical cryptography ⇒ one-time pad ### Block ciphers -- ## Today: Modes, MACs and hash functions --- # Block cipher modes .floatright[ <img src="tux-ecb.png" width="200"/> <img src="beastie-ecb.png" width="200"/> ] ### Sounds pretty secure, right? ### Uhhh... .footnote[ Encrypted images generated with [encrypt-image.py](encrypt-image.py) ] -- * passing the same plaintext to a block cipher with the same key will yield the same ciphertext output -- * block ciphers alone lacks _semantic security_ ??? Semantic security (see: [Encyclopedia of Cryptography and Security, 2011 Edition, Springer](https://link.springer.com/referenceworkentry/10.1007/978-1-4419-5906-5_23)) is defined as the _indistinguishability_ of encryptions, i.e., an adversary cannot tell which of two candidate plaintexts has been encrypted to ciphertext. -- ##### Can you tell which of these is $m_0$ and which is $m_1$? --- # Block cipher modes ??? Block cipher modes are schemes for handling **multiple** blocks of plaintext and ciphertext. There are lots of modes (ECB, CBC, CTR, GCM, XTS, ...), each of which can be used with **any block cipher**. So, to identify a cipher, we need more than just the **algorithm** (e.g., AES): we also need to specify the **mode**. For example, AES-128-CBC is different from AES-128-GCM. -- .floatright[ <img src="tux-ecb.png" height="150"/> <img src="beastie-ecb.png" height="150"/> ] ### _Electronic codebook_ (ECB) mode * "bare" block cipher * encrypt each chunk of plaintext directly -- .floatright[ <img src="tux-cbc.png" height="150"/> <img src="beastie-cbc.png" height="150"/> ] ### More sophisticated modes * provide semantic security * e.g., Cipher Block Chaining (CBC) --- # Cipher Block Chaining .centered[ <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/80/CBC_encryption.svg/800px-CBC_encryption.svg.png"/> ] -- ## Ciphertext depends on _all_ previous blocks ??? The fact that each block of ciphertext depends on all previous blocks is an example of **diffusion** at work. --- # Other modes -- ### CTR and GCM modes * used to make **stream ciphers** out of block ciphers -- ### XTS mode * used for full-disk encryption -- #### ... and many others ... --- # Message Authentication Code .centered[ <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/80/CBC_encryption.svg/800px-CBC_encryption.svg.png" width="500" align="right"/> ] #### What if we: -- 1. encrypt in CBC mode and -- 2. throw away most of the ciphertext? -- #### Message Authentication Code (MAC): * _cryptographic_ checksum that can verify message integrity **even** in the presence of an attacker -- (vs. checksum like CRC32) --- # MAC Requirements <img src="https://y.yarn.co/a66ed947-2394-483f-9209-8e279f3cc191_screenshot.jpg" width="500" align="right"/> ??? Note: the Sealed Authenticator System (SAS) codes on a nuclear-armed submarine probably don't use keyed MACs, but rather purely-random codes that no human eyes have ever seen. Source: [Waller, "Practicing for Doomsday", Time Magazine, 4 Mar 2001]( http://content.time.com/time/magazine/article/0,9171,101361-3,00.html). -- 1. Arbitrary-length message -- 2. Small, fixed MAC length -- 3. Computationally efficent -- 4. _Collision resistance:_ -- * can't generate another message with the same MAC -- * can't generate another message with any valid MAC --- # MAC generalization -- ### Newer modes: _Authenticated encryption with associated data_ (AEAD) -- ### What if we don't want to use a key? ??? But why wouldn't we want to use a key? --- # AAA[A] | Category | Question | |:-|:-| | **Authentication** | Is something/someone authentic (is it really you)? | **Authorization** | Are you allowed to do that? | **Accounting** | Who has used which resources? | **Audit** | Who did what to what? -- #### _Message_ authentication ??? Examples of **message authentication** include the authenticated orders in _Crimson Tide_ and the payment authorization messages described by the [EMV protocol](https://www.emvco.com). In both of these cases, there are **secrets** required besides the **message** itself. -- vs _principal_ authentication ??? When authenticating **someone** instead of **something**, we can use messages in which the message itself is the secret, for example... --- # Passwords -- ### Old and terrible, but... ??? We'll talk later in the term about protocols that we can use for authentication based on a third party, but at some point, _somebody_ has to store a password -- ### Dictionary attack ??? A dictionary attack is a brute-force attack: instead of trying every possible key for a cipher, you try every possible password from a dictionary. This is generally cleverer than trying "aaaaaa", "aaaaab", etc., as some passwords are (unfortunately) likelier to be chosen than others. Also, the dictionary may include more than just "dictionary" words! -- * online -- * offline -- – ??? --- # Threats to authentication -- ### External threats -- * password guessing -- * MAC-based challenge/response guessing -- — human-computable? -- ### Internal threats -- * password database could be stolen -- * ... but so could a secret key for validating MACs! ??? MAC-based schemes only work when the secret key **is actually secret**. We can't guarantee that in general-purpose computers. We'll talk later about public-key schemes that can help with the theft issue, but they don't help with the human-computability problem. --- # Cryptographic hash functions ### Remember hash tables' hash functions? ??? These properties sound like some of the properties of MACs: variable-length input, fixed-length output, computationally efficient and avoiding collisions. However, while regular hash functions try to avoid collisions, they do happen, because the consequences of a collision aren't terribly serious. If we start to see lots of collisions in a hash table, we can always increase the size of the table. -- * variable-length input -- * fixed-length output -- ### Cryptographic hash functions ??? _Cryptographic_ hash functions, however, are something entirely different. A cryptographic hash function should still be fairly efficient to compute (in practice, we can hash millions of MB/s), but efficiency has to be traded off for _much_ stronger **collision resistance**. Once we start sending messages around with cryptographic hashes, we can't recall all of the messages and re-hash them. Instead, we must be very strict about **collisions** up front. -- MD4, MD5, SHA-1, RIPEMD-160, Whirlpool, SHA-2 (SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/256), SHA-3, BLAKE2/3... --- # Cryptographic hash function -- ### _Diffusion:_ small changes ⇒ large effects -- ### All values should be equally likely -- ### Should resist: Collision attack ??? ### Collision attack Finding **any two messages** that hash to the same value. When we get to digital signatures, we'll see that collision attacks can be quite important: if you can generate two messages with different meanings but the same hash, you can cause a lot of trouble! However, such attacks aren't so useful for password security. Even with the strongest hash function, collisions are **easier to find than you might think** due to the [birthday paradox](https://en.wikipedia.org/wiki/Birthday_problem). However, "easier" doesn't have to be "easy": if the hash output is large, you can still have a lot of work to do! $\sqrt{2^n}$ can still be a large number if $n$ is big enough... -- : find $X_1, X_2$ s.t. $h(X_1) = h(X_2)$ -- Preimage attack ??? ### Preimage attack Finding an input that hashes to the same value as a given hash. This could be the same input that was originally used to generate the hash or a different one. -- : given $h(X_1)$, find $X_2$ s.t. $h(X_1) = h(X_2)$ -- 2nd preimage attack ??? ### Second preimage attack Finding a **different** input that will hash to the same value as a given input. This is like a collision attack, but much harder: instead of generating lots of messages and finding two that hash to the same value, you have to find one that hashes to the same value **as a specific input**. -- : given $X_1$, find $X_2 \neq X_1$ s.t. $h(X_1) = h(X_2)$ --- # Password hashing ### What does this have to do with passwords? -- ### Resisting *offline* dictionary attacks* .footnote[ * see, e.g., <a href="https://www.openwall.com/john/">John the Ripper</a> ] ??? We don't need _any_ cryptography to resist an online dictionary attack. Protecting password databases is, instead, all about resisting **offline attack**, where an adversary has gained access to a password database and they want to get passwords from it. Without any cryptography, they can simply do a database lookup. With cryptography, however, we can make things much harder for them. --- # Password hashing ### What does this have to do with passwords? ### Resisting *offline* dictionary attacks* ### Rainbows† and salt .footnote[ * see, e.g., <a href="https://www.openwall.com/john/">John the Ripper</a> † Oeschslin, "Making a Faster Cryptanalytic Time-Memory Trade-Off", CRYPTO 2003: Advances in Cryptology - CRYPTO 2003, 2003. DOI: [10.1007/978-3-540-45146-4_36]( https://dx.doi.org/10.1007/978-3-540-45146-4_36). ] ??? We don't need _any_ cryptography to resist an online dictionary attack. Protecting password databases is, instead, all about resisting **offline attack**, where an adversary has gained access to a password database and they want to get passwords from it. Without any cryptography, they can simply do a database lookup. With cryptography, however, we can make things much harder for them. As a (very bad!) alternative to password hashing, check out [this analysis](https://nakedsecurity.sophos.com/2013/11/04/anatomy-of-a-password-disaster-adobes-giant-sized-cryptographic-blunder) of a major password database breach at Adobe. -- ### Iterative password hashing (KDFs) ??? Tools like GPUs are really good at parallel computation. Attackers can use them to try lots and lots of passwords concurrently to see if they can find the correct one (a bit like the Bombes in Bletchley Park!). **Key Derivation Functions** (KDFs) make life harder for an attacker by forcing computation to be **serial**. There is a cost for the user, too, but it's insignificant compared to the benefit of not having your password cracked when a business suffers a data breach! --- # What makes a good password? -- ## (we'll answer this next time) --- # MAC generalization ### ~~What if we don't want to use a key?~~ -- ### What if we don't use a block cipher? -- #### HMAC: hash-based message authentication code* .footnote[ * Bellare, Canetti and Krawczyk, "Keying Hash Functions for Message Authentication", [CRYPTO 1996](https://link.springer.com/chapter/10.1007/3-540-68697-5_1), 1996. Standardized by NIST ([FIPS 198-1](https://csrc.nist.gov/publications/detail/fips/198/1/final)) and the IETF ([RFC 2104](https://datatracker.ietf.org/doc/html/rfc2104)). ] -- $$ h\left((k \oplus p_o) || h((k \oplus p_i) || text) \right) $$ ??? An HMAC uses a hash function _with_ a key. This provides the same security properties as a block-cipher–based MAC, just with a different underlying cryptographic algorithm. HMACs are pretty popular in circumstances where you'd be doing a bunch of hashing anyway (e.g., Transport Layer Security cipher suites, which we'll talk about later). --- class: big, middle The End.