class: big, middle

# ECE 7420 / ENGI 9823: Security

.title[
  .lecture[Lecture 7:]
  .title[Finding memory unsafety]
  
  
]

---

# Last time

### Memory safety

### Memory-safe language concepts

### Safe unsafe languages?

---

# So... perfection?

### Write all software in a memory-safe language?

### TCB considerations

### Memory safety in compiled languages

1. Compiler-added run-time safety checks
2. Limited unsafety
3. Continued dangers of native instructions

???

High-level language interpreters have to be written in something.
You might be able to write a lot of a Java interpreter in Java, but at the
lowest levels you will find lots and lots of C++ code.
At the lowest levels of the C standard library, you will find
**assembly code**, sometimes **generated from scripts**.

Languages like **Go** and **Rust** claim to provide memory safety, but they
are compiled languages.
How is this possible?

The compiler can add extra code to check some accesses at run time.
For example, if you are indexing within an array, the compiler can implicitly
add code such as `if 0 <= i < n`.

Languages that aspire to "systems programming" (i.e., things that have to be
aware of or manipulate the lowest-level primitives such as hardware registers)
have to allow for unsafe operations.
There is no memory-safe way to perform arbitrary register, memory or I/O
operations, so these kinds of languages have to provide some way to break
abstraction layers.
C code can include assembly via the `asm` keyword.
Rust code can explictly violate memory safety guarantees if it uses the `unsafe`
keyword.

Even with those checks, however, if you load someone else's native instructions
and execute them, **all bets are off**!

---

# Safe compiled code?

### What is a language?

### Software

[AddressSanitizer](https://clang.llvm.org/docs/AddressSanitizer.html),
[CCured](https://doi.org/10.1145/565816.503286),
[Cyclone](https://cyclone.thelanguage.org/wiki/Papers),
"fat pointers",
[Go](https://golang.org),
[Rust](https://www.rust-lang.org),
...

### Hardware:

[Arm MTE](https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/enhancing-memory-safety),
[CHERI](https://doi.org/10.1109/ISCA.2014.6853201),
[Hardbound](https://doi.org/10.1145/1353534.1346295),
[MPX](https://doi.org/10.1145/3224423),
segmentation,
[Watchdog](https://doi.org/10.1109/ISCA.2012.6237017), ...

???

When we think of a language, we typically think about **source code** and the
**rules** for writing it.
However, in addition to **rules**, we also have **tools** that are defined by
language specifications and — crucially — **runtime support libraries**.
If we take this expanded view of what makes a language, we can see a number of
approaches applied in various places that can be used to improve the security
of compiled code, too.

### Software

[AddressSanitizer](https://clang.llvm.org/docs/AddressSanitizer.html)
(and other "sanitizers" like Thread Sanitizer and the Undefined Behaviour
 Sanitizer) can help spot memory errors during testing that might otherwise have
 gone unnoticed.
[CCured](https://doi.org/10.1145/565816.503286) is an example of an approach
that uses static analysis to figure out how pointers in a C program are "meant"
to be used and dynamic analysis to ensure that they are, in fact, used that way.
[Cyclone](https://cyclone.thelanguage.org/wiki/Papers) is a C dialect with
better memory safety properties than vanilla C, which it is designed to be
compatible with (or at least easy to adapt from).
Newer languages like [Go](https://golang.org) and
[Rust](https://www.rust-lang.org) have more expressive type systems that make
it possible to write memory-safe code even in high-performance compiled
languages with limited run-time checking.

### Hardware

[Arm MTE](https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/enhancing-memory-safety)
has been adopted by Android to detect memory safety violations at run time.
[Hardbound](https://doi.org/10.1145/1353534.1346295),
[MPX](https://doi.org/10.1145/3224423) and
[Watchdog](https://doi.org/10.1109/ISCA.2012.6237017) attempt to provide
various forms of hardware memory safety enforcement.
[CHERI](https://doi.org/10.1109/ISCA.2014.6853201) is a designed-for-security
instruction set extension for ARM and MIPS that is just about to ship its first
hardware prototypes; it has the potential to change **everything** by allowing
high-level object accesses to be precisely enforced by hardware.

---

# Today

### Finding memory safety violations

* testing
 * formal methods
 * hybrid approaches

---

# Testing

### Default approach... if we're lucky!

???

Hopefully software developers are employing practices such as
**test-driven development**.
Testing software helps us find defects, but it's more valuable than just that:
designing software to be testable also tends to encourage **modularity**
via **clear abstrations** with **information hiding**, all of which also help
with software quality.
So, testing is good in more than one way, and we know we ought to do it!
Let's be honest, though: we don't always do what we know we ought to do.
And even when we do, **how much testing is enough**?

### Code coverage

* what's that?
--

* `gcov`, `llvm-cov`, `SanitizerCoverage`...

???

We'll have a little demo involving code coverage data, how it's generated
and how it's used.

### Limitations

???

Testing is great, but ultimately testing can never
**prove the absence of bugs**: it can only **demonstrate their presence**.

---

# Formal methods

### Modeling programs

???

Constructing a model of a high-level functional program is one thing:
after all, functional programming languages take their cues from mathematics!
Constructing a model of a low-level program in a language with lots of
**side effects** and pointer-based **indirection**, however, is something else.

People do, however, use formal methods in real (albiet someone size-limited)
systems.
The seL4 microkernel has been formally verified by first verifying a model,
then generating code _from_ that model, then proving that the generated code
corresponds _to_ the model.
That's not quite proving properties about C code, but it's close!

### Proving properties of programs&ast;

.footnote[
&ast; Can prove more than you'd think, e.g.,
Cook et al.,
[Proving that programs eventually do something good](https://dx.doi.org/10.1145/1190215.1190257),
_ACM SIGNPLAN Notices 42(1)_, 2007
(DOI: [10.1145/1190215.1190257](https://dx.doi.org/10.1145/1190215.1190257))
and
Cook et al.,
[Proving Program Termination](https://dx.doi.org/10.1145/1941487.1941509),
_Communications of the ACM 54(5)_, 2011,
(DOI: [10.1145/1941487.1941509](https://dx.doi.org/10.1145/1941487.1941509)).
]

???

In addition to proving properties about software in its source code,
one practical (though computationally-challenging) approach is to prove
properties about software in its **binary executable** form.
This is useful for the artifact that ultimately matters most:
**what runs on the real machine**.
To do this, you need a formal model not of your software, but of
[the hardware it will execute on](https://www.cs.utexas.edu/users/moore/acl2/manuals/current/manual/index-seo.php/ACL2____SOFTWARE-VERIFICATION?path=3764/36439).

---

# Formal methods

### Modeling programs

### Proving properties of programs&ast; and compilers&dagger;

&dagger; e.g., Kästner et al.,
<a href="https://hal.inria.fr/hal-01643290">
CompCert: Practical experience on integrating and qualifying a
formally veritied optimizing compiler
</a>,
_ERTS 2018: Embedded Real Time Software and Systems_, 2018.
]

???

People also use **certifying compiler** to prove that the output of a compiler
really matches its inputs.
This is meant to address the problem raised in Ken Thompson's famous Turing
Award lecture,
[Reflections on trusting trust](https://dl.acm.org/doi/10.1145/358198.358210).
COMPCERT is such a compiler... although every new release "fixes a few bugs"??

### Curry-Howard Correspondence

???

If you like monads, $\pi$-calculus or intuitionistic modalities...
or just **better type systems**!
Why are functional programs easier to verify?
Because the language **doesn't let you do certain things**.
The more sophisticated the type system, the fewer programs you can write...
with the caveat that the programs you _can't_ write are the **unsound** ones.
For a lot of programmers (including me!), a **type checker** is more natural
to work with than a **proof assistant**.

---

# Hybrid approaches

### _Concolic_ testing:

???

_Concolic_ is a portmanteau of **concrete** and **symbolic**.
It aims to be practical (the concrete execution part, not just all theorems)
while also gaining some of the generality of symbolic approaches.

* model checking
???

_Model checking_ is heavily used by hardware engineers to verify
(not just **test**) things like state machines.

* concolic execution

???

Concolic execution can be applied to real software using tools like
[KLEE](https://verificaeconvalida.gitlab.io/gitbook-appunti/KLEE.html).
In such testing (for which we'll have a brief demo if there's time),
we can run real test cases while telling the tool (KLEE in this case)
that it should treat some values as **symbolic**, i.e., effectively try
**all possible values** instead of one specific one.
This can help us spot tricky corner cases that might escape our testing!

### Fuzzing

???

_Fuzzing_ is the process of running software with inputs that we mutate a
little bit every time we run it.
This isn't strictly as powerful as concolic execution, but it benefits from
being highly practical: we can fuzz software **without adapting it**
for concolic execution.

---

# Fuzzing

### Black-box fuzzing

???

_Black-box fuzzing_ mutates input data with no knowledge of the target
program's internals.
This can find bugs, but it's not as powerful as _glass-box_ fuzzing.

### Glass-box fuzzing

???

In glass-box fuzzing, we allow the compiler to instrument programs,
much like **coverage tools**.
This information is then used to decide whether we've explored the control-flow
graph of a particular function, etc., "enough".
It can also be used to decide whether two fuzz-created crashes are
**effectively the same**!

[DART](https://patricegodefroid.github.io/public_psfiles/pldi2005.pdf),
[SAGE](https://patricegodefroid.github.io/public_psfiles/ndss2008.pdf),
[AFL](http://lcamtuf.coredump.cx/afl),
[LibFuzzer](https://www.llvm.org/docs/LibFuzzer.html)

???

There are lots of tools for doing glass-box fuzzing.
We'll have a demo (time permitting) of
[AFL: American Fuzzy Lop](https://github.com/google/AFL)
(named for a species of fuzzy rabbit).

[OSS-Fuzz](https://github.com/google/oss-fuzz)
and its
[trophies](https://bugs.chromium.org/p/oss-fuzz/issues/list?can=1&q=status%3AFixed%2CVerified+Type%3ABug%2CBug-Security+-component%3AInfra+)

???

The large-scale OSS-Fuzz project applies fuzzing to a number of critical,
widely-used open-source software projects.
It includes components with fun names like
[ClusterFuzz](https://google.github.io/clusterfuzz),
and it also has quite a few
[trophies in large, complex projects](https://bugs.chromium.org/p/oss-fuzz/issues/list?can=1&q=status%3AFixed%2CVerified+Type%3ABug%2CBug-Security+-component%3AInfra+).

---

# Summary

### Testing

### Formal methods

### Hybrid approaches

## ... to _mitigate_ risk

???

All of this work — with the exception of formal proof, which is limited in
scope — serves to _mitigate_ the risks of imperfect software.

---
class: big, middle

The End.