class: big, middle # ECE 7420 / ENGI 9823: Security .title[ .lecture[Lecture 17:] .title[Code signing] ] --- # Today ### Code signing * interpreted code * native code ### Platforms --- # Software installation ### In days of yore: <img src="https://www.unixmen.com/wp-content/uploads/2016/08/make-install-2.png " align="right" width="550"/> -- * build the code ??? Once upon a time (or still today in some Unix environments!) the way to install code was to run `make install` — or worse, `make` followed by coping files into place. --- # Software installation <img src="https://i.stack.imgur.com/Ld8Uy.jpg" align="right" width="550"/> ### In days of yore: * build the code * setup programs ??? Once upon a time (or still today in some Unix environments!) the way to install code was to run `make install` — or worse, `make` followed by coping files into place. Things evolved a bit with **setup programs** that could, well, set up a program. These programs typically ran (at least on PCs) in the days before Biba policy application, so they could generally **put whatever they wanted wherever they wanted**. --- # Software installation <img src="https://img-16.ccm2.net/sko1tI0wIJ10h6h_mFKNKFjr-c8=/500x/c4de28d80fd4459385b6312a68bf93d8/ccm-download/installer.jpg" align="right" width="550"/> ### In days of yore: * build the code * setup programs ### Coherence: * installer frameworks ??? Once upon a time (or still today in some Unix environments!) the way to install code was to run `make install` — or worse, `make` followed by coping files into place. Things evolved a bit with **setup programs** that could, well, set up a program. These programs typically ran (at least on PCs) in the days before Biba policy application, so they could generally **put whatever they wanted wherever they wanted**. Eventually, this led to the creation of coherent abstractions for software installation. In Windows-land, Microsoft pulled the rug out from underneath InstallShield by creating the Windows Installer framework, which allowed applications to specify what files they needed to install, registry keys the needed to update, etc., without having to resort to arbitrary code execution at install time. Clean abstractions lead to nice outcomes like idempotency, coherent transactions and rollback/uninstallation, etc. (though you _can_ still run arbitrary code via [hilarious workarounds](https://stackoverflow.com/questions/98778/executing-a-script-file-from-a-windows-installer-custom-action)). --- # Software installation <div style="float: right; text-align: right"> <img src="https://upload.wikimedia.org/wikipedia/commons/8/8a/Apt-get_logo.jpg" width="250"/> <br/> <img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTxOwHrXwOjXv_Ynh-WUdmJjKEeqJcxDsiUHAcsOa6AwGJyUkAulSfriVidmLpRRZF01tw&usqp=CAU" width="250"/> <br/> <img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSHdkhwxBjU22ZZ8TI1KPokvXK4hwAftNL8wXqMKl5bWWSIYeo9a7OR2cuwu7t1x9Qa1Qk&usqp=CAU" width="250"/> <br/> <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/0/00/RPM_Logo.svg/640px-RPM_Logo.svg.png" width="150"/> </div> ### In days of yore: * build the code <img src="https://learn-inside.com/wp-content/uploads/2019/09/chocolate-feature-800x400.png" align="right" width="400"/> * setup programs ### Coherence: * installer frameworks * package managers ??? Once upon a time (or still today in some Unix environments!) the way to install code was to run `make install` — or worse, `make` followed by coping files into place. Things evolved a bit with **setup programs** that could, well, set up a program. These programs typically ran (at least on PCs) in the days before Biba policy application, so they could generally **put whatever they wanted wherever they wanted**. Eventually, this led to the creation of coherent abstractions for software installation. In Windows-land, Microsoft pulled the rug out from underneath InstallShield by creating the Windows Installer framework, which allowed applications to specify what files they needed to install, registry keys the needed to update, etc., without having to resort to arbitrary code execution at install time. Clean abstractions lead to nice outcomes like idempotency, coherent transactions and rollback/uninstallation, etc. (though you _can_ still run arbitrary code via [hilarious workarounds](https://stackoverflow.com/questions/98778/executing-a-script-file-from-a-windows-installer-custom-action)). In the open-source world, the abstraction of a "package" was typically managed by a _package manager_, which would also add the benefit of dependency management and automatic fetching and installation of dependencies (after all, you only want to run freely-available open source software, right?). Packages could contain arbitrary setup scripts to, e.g., set up new users and groups, but over time we're moving towards having package managers handle these things too so that there doesn't need to be any **arbitrary code execution**. Hooray! --- # Software installation today ```terminal $ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh $ curl -sSL https://get.rvm.io | bash $ curl -L https://omnitruck.chef.io/install.sh | sudo bash ``` ??? Having learned all of these lessons (reduce arbitrary code execution as much as possible, use abstractions that can be rolled up into reversible transactions, etc.), we've proceeded to throw them all away over the last decade or so. -- ### What are the risks? -- * running with user privilege? ??? Installation scripts that run with user privilege can **do anything that your user account can do**. "But I can inspect the script and make sure it doesn't do anything malicious," you say... sure, but do you? -- * running with system privilege? ??? Installation scripts that run with superuser privilege can **do anything that the superuser can do.** That sounds pretty scary! However, **package managers can also run arbitrary scripts as root**... so what's the critial difference? --- # Recall: integrity -- ### Where did that high-integrity software come from? -- ### Where do your software updates come from? ??? You might _think_ you're getting Windows updates from update.microsoft.com, but how do you know? We'll see in the Network Security module that it isn't actually all that hard to spoof domain names on many networks (including Memorial's!), so **what are you trusting** when you install a Windows update? What is your **TCB**? -- ### Can we check these things _after_ download? ??? Is there anything that we can do to authenticate software updates _after_ we download them in order to **keep the whole network stack out of our TCB**? --- # Recall ### Message authentication codes Allow us to verify things... -- but what's the problem? ??? We can use a MAC to validate the integrity of a message, but only if **both parties know the symmetric key**. That's not a great fit for such an **asymmetric** use case as software updates, where one vendor may send updates to millions (or billions!) of end users. -- ### Digital signatures $$ S = D_{K_S} \left\\{ h(M) \right\\} $$ ??? How about digital signatures? A vendor can create a signature over a message (e.g., a software package) and ship that along with the software itself. -- $$ V = E_{K_P} \left\\{ S \right\\} \overset{?}{=} h(M^\prime) $$ ??? Then, anyone who has the vendor's public key can verify that the package was actually sent my that vendor. However, **how can I reliably acquire the public key of every software vendor that I interact with**? Moreover, **what happens if the vendor ever loses their private key**? Do they have to expect **perfection** of their code signing system/team? --- # Code signing .floatright[ <img src="Chain_of_trust.svg" width="500"/> .caption[ Source: [Wikimedia Commons](https://commons.wikimedia.org/wiki/File:Chain_of_trust.svg) ] ] #### Chain of Trust -- I sign my code with my key -- _Certificate authority_ signs that -- * -- A _root CA_ is ultimately **trusted** ??? Rather than distributing every vendor's public key reliably, we can distribute a (comparatively) small number of high-value root CA keys. Their corresponding private keys need to be protected, which is part of why we have a chain of trust: **the CA only rarely takes their root key out for use**. Instead, they **sign an intermediate key with the root key** and then **use the intermediate key for actual signing**. -- #### Common instantiation: X.509 * standard for representing _who_ has signed _what_ -- * aside: ASN.1 BER originally a telco standard, easy to get wrong ??? ASN.1 is a classic example of an overcomplicated standard make what ought to be simple extraordinarily complex and error-prone. It features _more than one_ set of binary encoding rules, and getting those serialization and deserialization rules right has led to a [shocking number of security vulnerabilities](https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=asn.1). It shouldn't be this hard, but it is. --- # Code signing certificates #### Hype vs fact .centered[ <img src="java-code-signing-cert.png" width="800"/> ] ??? Code signing is useful, but like anything to do with security (or computing, or life really), don't believe the hype! --- # Purpose of code signing ### Goal: verify _identity_ of code -- > This OS update was released by Microsoft, and I'm choosing to trust > Microsoft, so I will choose to trust this OS update -- ### Note: not verifying _goodness_ of code > This code was signed by Microsoft, so it doesn't have any bugs ??? This skeptical attitude towards verification is useful across all of computer security (and, again, across all of life!). "My bank called and said that..." Someone who **claimed to be from my bank** called and said that... "This code is signed, so it must be trustworthy." This code is signed, so we can attribute its origin to a vendor (or, even better/worse, we can attribute its signature to a root CA who **we trust to have been following the right procedures**). "We can trust this shipping manifest because it's on the blockchain." This possibly-fraudulent manifest was published before some other stuff was published. Probably. --- # Code verification ### When? ??? We can verify digital signatures on code at install time or at run time. The details of the techniques will differ slightly, but what's really interesting is what **policies** these two **mechanism** provide support for. -- <img src="https://community.intel.com/t5/image/serverpage/image-id/8304i323CB4700615CD1E/image-size/large?v=1.0&px=999" align="right" width="550"/> * installation time -- * run time -- ### How? --- # Java -- <img src="java-getPermissions.png" align="right" width="500"/> ### `SecureClassLoader` * loads code like any `ClassLoader` -- <img src="java-CodeSource.png" align="right" width="450"/> * adds `CodeSource` and `getPermissions` -- #### Code signing tied to privileges ??? Signed JAR files can be loaded and associated with permissions that other code loaded at run time wouldn't have. For example, if I write a Java program that provides run-time plugin support, I can write a `SecureClassLoader` that will give my own plugins (or plugins that I've signed through my online store) permission to access the filesystem, but unsigned plugins may not be able to perform any such operations. -- (no _confused deputy_) ??? Java provides support for tracking these privileges up and down the call stack, so that the Java runtime can tell whether code is being invoked [on behalf of only privileged code](https://www.ibm.com/docs/en/was-nd/8.5.5?topic=security-access-control-exception-java). This avoids the _confused deputy_ problem, in which privileged code is tricked into executing a privileged operation on behalf of malicious and unprivileged code (for example, a malicious plugin calling a legitimate method that saves data into a configuration file, allowing malicious data to be written there). --- # Native code signing ### That's nice for Java... ??? This works really well for bytecode-interpreted languages, where the language runtime is able to interpose itself in the execution of a program. -- ### ... but how about native code? ??? When executing native code, however, there is no monitor checking every instruction that gets executed. The only thing outside of the binary code which sees the instructions is the processor itself, and it ain't checking digital signatures when you invoke a function! -- ### The processor doesn't verify signatures... who does? --- # Signed native code -- <img src="https://1.bp.blogspot.com/-8JPq3oo1Qe8/XPfWIkz2lgI/AAAAAAAAC2I/LFomsIUexLsp6xeKopKxEMEtkPkE55iAgCLcBGAs/s1600/apt-update-gpg-error-no_pubkey.png" align="right" width="350"/> ### Installation time: -- _installer_ ??? Installers can check signatures and require that packages have been signed by someone with a trusted public key. That does, however, lead to funny advice sometimes, e.g.: ``` curl -fsSL https://download.docker.com/linux/ubuntu/gpg \ | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg ``` -- ### Module load time: -- _kernel loader_ -- ### Ordinary applications: -- _kernel_ <img src="https://i1.wp.com/jimmytechsf.com/wp-content/uploads/2019/10/Screen-Shot-2019-10-09-at-8.16.21-AM.png?ssl=1" align="right" width="350"/> -- ### Bootstrapping: -- a bunch of places! --- # Windows -- <img src="https://upload.wikimedia.org/wikipedia/en/6/61/XPDriverWarning.png" width="300" align="right"/> #### Device drivers ??? The Windows Quality Hardware Labs (WQHL) initiative didn't so much start as a security initiative as a "stop blaming us" initiative. Hardware vendors aren't always good at software, and buggy device drivers have caused a lot of Blue Screens of Death over the years. Microsoft created the WQHL program to ensure that, if a driver was to be loaded by the Windows kernel, that it would have passed some quality assurance tests first (some of which were actually pretty cool, e.g., proving termination of interrupt handling routines). -- #### Windows update -- #### SmartScreen and EV certificates -- #### Installers and UAC <img src="uac.png" width="300" align="right"/> --- # macOS <img src="app-store-only.png" width="500" align="right"/> ### Not just for drivers! -- ### macOS vs iOS ??? A key distinction between macOS and iOS (at least for now) is that **you can turn mandatory verification off in macOS**. The mechanism exists in both, but the policy is different. However, **who knows if that freedom to install whatever you want will remain**... --- # Unix-ey systems -- <img src="https://1.bp.blogspot.com/-8JPq3oo1Qe8/XPfWIkz2lgI/AAAAAAAAC2I/LFomsIUexLsp6xeKopKxEMEtkPkE55iAgCLcBGAs/s1600/apt-update-gpg-error-no_pubkey.png" align="right" width="500"/> ### Package managers -- ### Merkle trees ??? Signing only the root of a package tree is a nice example of a Merkle DAG (which is also a key technology used in other places, e.g., copy-and-write filesystems and blockchains). If you sign a hash of a bunch of hashes of a bunch of hashes, you can effectively sign **an enormous tree of content** with a single signature! --- # Merkle trees .floatright[ <img src="merkle-tree-small.png" width="600"/> ] -- ### Committment ??? A cryptographic hash can be used to express a _committment:_ without revealing any data now, I can promise you what data I will reveal in the future. When I reveal the data, you can check that it matches the committment using the hash that I already gave you. If I change even a single bit in the data, it will alter the hash in dramatic ways. --- # Merkle trees .floatright[ <img src="merkle-tree-bigger.png" width="550"/> ] ### Committment ### Layers .footnote[ Merkle, R. C., "A Digital Signature Based on a Conventional Encryption Function", _Advances in Cryptology — CRYPTO '87_, 1987. DOI: [10.1007/3-540-48184-2_32](https://doi.org/10.1007/3-540-48184-2_32). ] ??? A cryptographic hash can be used to express a _committment:_ without revealing any data now, I can promise you what data I will reveal in the future. When I reveal the data, you can check that it matches the committment using the hash that I already gave you. If I change even a single bit in the data, it will alter the hash in dramatic ways. A hash of data allows us to commit to that data. A hash of a bunch of hashes of data allows us to commit to all of the data. We can add more layers arbitrarily to this tree, allowing a single hash to speak for arbitrary amounts of data. --- # Merkle DAGs .centered[ <img src="merkle-dag.png" width="700"/> ] ??? Technically, a Merkle tree can actually be a more general DAG. This is because various pieces of data within the DAG can actually reference each other, but never with cycles: the only way to generate a hash $h(X)$ is to have $X$ in its entirety, so $X$ can't include a hash that refers back to $h(h(X))$. --- # Aside: blockchain .floatright[ <img src="blockchain.png" width="700"/> ] -- ### Hashes ??? Hash functions are now old hat to us. Nothing new here. -- ### Merkle trees ??? Using Merkle trees, each block in the blockchain can refer to an arbitrary amount of data. This could represent shipping manifests, quasi-financial transactions or anything else you might care to think of. -- ### Blocks ??? The new thing about a blockchain is that it has **blocks** which represent a **canonical serialization** of data added to the chain. This **serialization** doesn't depend on timestamps per se, it depends on the logical **happens-before** relationship that's indicated by a hash function: in order to hash the previous block, it must have been available to you **before you created your hash** (which went into the new block). -- ### Permissions ??? The tricky thing about anything "canonical" is deciding **how that canon is determined / recognized**. People can get [surprisingly detailed about these questions when discussing things like comic books](https://www.quora.com/What-defines-a-comic-as-canon-or-non-canon-Marvel-and-DC). If we're going to establish a canonical serialization of all the transactions that have occurred in our new crypto-nerd utopia, who gets to decide whether my preferred transaction ordering is correct or yours is? In a _permissioned_ blockchain, we can express authority to say "this happened next" via cryptographic mechnisms like digital signatures. Every client can check, "was this signed using a public key whose certificate was signed by an appropriate authority?" In an _unpermissioned_ blockchain, we need some other way to determine who gets to say what comes next. Public blockchains like Bitcoin and Ethereum (at least for now) use a **proof-of-work** scheme in which whoever can solve a cryptographic puzzle: find $x$ such that $h(prev, x)$ starts with at least $n$ zeroes. As $n$ increases, this requires [an obscene amount of energy for computation](https://www.nature.com/articles/s41467-021-22256-3), so much so that [some people want to use all of Muskrat falls to make a small dent in the global demand](https://www.saltwire.com/atlantic-canada/news/labrador-blockchain-company-wants-all-the-power-100598148/). Etherium might ever switch to a **proof-of-stake** scheme in which all of the most well-moneyed interests get to say what's what... and that's better than fiat currency how? (/end skeptical rant) --- # Unix-ey systems <img src="https://1.bp.blogspot.com/-8JPq3oo1Qe8/XPfWIkz2lgI/AAAAAAAAC2I/LFomsIUexLsp6xeKopKxEMEtkPkE55iAgCLcBGAs/s1600/apt-update-gpg-error-no_pubkey.png" align="right" width="500"/> ### Package managers ### Merkle trees -- ### `veriexec` ??? Once you've installed your signed software, it's also possible to check digital signature at run time when you execute a program. JunOS / NetBSD / FreeBSD have a `veriexec(1)` scheme that allows execution to be limited to signed files only. If you're running an embedded applicance running a large portion of the Internet backbone, you probably want to ensure that only your code is running on that applicance! However, much like Data Execution Prevention (DEP, used for `W^X` and `noexecstack`), that doesn't prevent an attacker from "living off the land" if they manage to subvert your code. It does, however, make their job harder, which is a worthwhile thing. -- #### ... but where does it all start? --- # Boot process .centered[ (a.k.a., _bootstrapping_) ] -- ### "Secure" boot ??? The term "secure" boot is, unfortunately, a bit ambiguous. It can be used to mean one of two **very different** things. -- , "verified" boot ??? _Verified boot_ means that a hardware component called the **Trusted Platform Module** (TPM) gets involved in the boot process. This allows the bootloader's digital signature to be checked and policies such as "you must use a bootloader signed by Microsoft" can be **enforced**. If that initial "root of trust" verification fails, the system doesn't boot. -- , "measured" boot... ??? An alternative is called _measured boot_, in which signatures are checked and the results are **stored in the TPM** for later inspection by software. Nothing stops the computer from booting with unsigned code, or code with an incorrect signature, but software can later check to see what code booted it. In particular, the TPM can provide its "measurement list" as part of a **remote attestation** procedure, allowing, e.g., a server to only accept connections from computers booted from specific software signed by specific vendors. -- ### _Trusted Computing_ initiative ??? This is all part of the _trusted computing_ initiative, which was enormously controversial when it was introduced. On the one hand, "trusted computing" could be help identify what code was running where, which could have some security benefit. On the other hand, it could also be used to prevent users from accessing DRM-protected content in "unapproved" ways or even prevent users from running "unapproved" OSes. That is to say, the ownership model of your computer would look more like that of your (non-jailbroken) phone: it wouldn't really be **your computer**. -- ### UEFI, "Certified for Windows 10/RT"... ??? These days [Microsoft has a program](https://docs.microsoft.com/en-us/windows/security/information-protection/secure-the-windows-10-boot-process) via which they will sign an open-source bootloader like GRUB for a small fee. This allows even systems locked down with mandatory verified boot to run non-Windows operating systems **if the owner wants that**. --- # Summary ### Code signing * interpreted code * native code ### Platforms --- class: big, middle The End.