class: big, middle

# ECE 7420 / ENGI 9823: Security

.title[
  .lecture[Lecture 6:]
  .title[Memory safety]
  
  
]

---

# Previously

### Stages of code injection

#### 1. Inject code

#### 2. Hijack control flow

## But step 1 is getting harder!

---

# Code reuse attacks

### ~~0. Inject code~~

### 1. Hijack control flow

## How do we stop the hijacking?

---

# Stopping hijacking

### Stack protection

### CFI: control flow integrity

### Full _memory safety_

... which we'll discuss next time
--
 ... which is now

---

# Memory safety

### Two categories:

.floatright[
<img src="https://www.chromium.org/Home/chromium-security/memory-safety/piechart.png"
     width="500"/>

.caption[Source: [Chromium project](https://www.chromium.org/Home/chromium-security/memory-safety)]
]

* spatial memory safety
???

**Spatial** memory safety refers to an inability for code to write outside of
defined boundaries.
For example, modifications to an array should not be able to cause changes
outside of that array.
Modifications to an object should not be able to cause changes outside of that
object.

* temporal memory safety

???

A related concept is **temporal** memory safety: an inability of code to access
memory **when it's not supposed to**.
For example, some code could be given a pointer to a heap-allocated object;
we would like to know that this code will only be able to modify that memory
as long as **the object remains allocated**.
This is also why friends don't let friends return
**pointers to local variables** from functions: that pointer _used_ to point
at a local variable, but now it points at some arbitrary chunk of stack memory
that could be used for anything.

### How to achieve?

* write perfect software!
???

Writing perfect software is... not a realistic plan.
People make mistakes, so we had better build systems that can accommodate the
occasional human error!

<blockquote class="bluesky-embed" data-bluesky-uri="at://did:plc:2vjn5xqhbft2yzqiejabaqqx/app.bsky.feed.post/3ksddsmegzw2e" data-bluesky-cid="bafyreic2qlq55s6sztn3qwwc2dffeq4xbu7ngipnk6sm3yph4svoexnjwq">
  <p lang="en">
    Counterpoint: if one person’s brief lapse in judgement can bring down the whole org, we’re building our systems all wrong.
    <br><br>
    <a href="https://bsky.app/profile/did:plc:2vjn5xqhbft2yzqiejabaqqx/post/3ksddsmegzw2e?ref_src=embed">
      <blockquote class="bluesky-embed" data-bluesky-uri="at://did:plc:m3topjwoknohxc4cwcr7zwra/app.bsky.feed.post/3kscyexgags2i" data-bluesky-cid="bafyreictxc3v74ni2jnsyhmuid7xpvdobunojy7dgwrlju36ecfuhh7bum"><p lang="en">We need to make online security a mandatory subject in our schools. It&#x27;s not just about protection of personal devices and data, but one person&#x27;s brief lapse in judgement can bring down a school, a payroll system, or a hospital.
2/2</p>&mdash; Kimler for SC (<a href="https://bsky.app/profile/did:plc:m3topjwoknohxc4cwcr7zwra?ref_src=embed">@kimlerforsc.bsky.social</a>) <a href="https://bsky.app/profile/did:plc:m3topjwoknohxc4cwcr7zwra/post/3kscyexgags2i?ref_src=embed">May 12, 2024 at 5:56 PM</a></blockquote>
    </a>
  </p>&mdash; <a href="https://bsky.app/profile/did:plc:2vjn5xqhbft2yzqiejabaqqx?ref_src=embed">@trombonehero.bsky.social</a>
</blockquote><script async src="https://embed.bsky.app/static/embed.js" charset="utf-8"></script>

* _memory-safe_ languages

???

Memory-safe languages, although excellent, are only a **partial answer** to
the general problem.
We'll talk about why at the end of this lecture.

---

# Program execution

### Q: how do we load a value from memory?

### A: it depends on the language!

???

Different languages provide for different modes of memory access.

How do we categorize languages?

* programing **paradigm** (OO, functional, etc.)
 * memory management (manual vs **garbage-collected**)
 * **compiled** vs **interpreted**

* compiled
--

* interpreted
--

* bytecode-interpreted

---

# Compiled languages

### Examples?

???

Examples of languages that compile to machine instructions:
**C**, **C++**, **Fortran**, **Go**, **Haskell**, **Rust**...

### Where are memory access decisions made?

???

The **compiler** may prevent certain kinds of accesses at compile time.
For example, some code is supposed to be able to access **private fields**
but other code isn't (see example: [private.cpp](private.cpp)).
However, at runtime, all we have are **machine instructions** that
**load** and **store** values.

---

# Interpreted languages

### Examples?

???

Examples of languages that are _at least primarily_ intepreted
(they may use **just-in-time (JIT) compilation** or even
 **ahead-of-time** (AOT) compilation as an implementation detail) include
**JavaScript**, **Lua**, **Python**, **Ruby** and, of course, **shell scripts**.

### Where are memory access decisions made?

???

In such languages, other people's code doesn't get compiled directly to native
machine instructions, it is **interpreted**.
An interpreted language has an **interpreter** that can make additional
decisions about how (or whether!) to honour a request made by an
interpreted statement or expression.

For example, in [private.js](private.js), the code outside of the
`f` function has no way to inspect the low-level memory details of the
object returned from `f`.
The question of whether or not to allow an access doesn't depend on
**machine instructions**, it depends on the **interpreter**.

---

# Bytecode-interpreted languages

### What's different?

???

A bytecode-interpreted language (e.g., anything that runs on the **JVM**)
includes a **specification** for its bytecode.
Instead of interpreting Java or Scala, those languages can be compiled to
the Java bytecode format, which is executed by a lower-level **interpreter**.
This is also true for **WebAssembly**: you can compile languages like
**C**, **Go**, **Java** and **Rust**
(see: https://github.com/appcypher/awesome-wasm-langs)
into **WebAssembly** and then execute the result in any Web browser
with much greater speed than interpreting from source.

### Why?

???

In a bytecode-interpreted language, we get some of the benefits of compilation,
e.g., we don't have to parse a bunch of program text every time we run the
program.
We _also_ get some of the benefits of an interpreter, such as
**run-time checking of memory accesses**!
That means we can't, for example, walk off the end of an array.

---

# Example: Java

.footnote[
  Li Gong _et al._,
  <a href="https://www.usenix.org/legacy/publications/library/proceedings/usits97/full_papers/gong/gong.pdf">"Going Beyond the Sandbox: An Overview of the New Security Architecture in the Java Development Kit 1.2"</a>,
  in _USITS '97: Proceedings of the USENIX Symposium on Internet Technologies
  and Systems_, 1997.
]

### Memory management

???

A Java program, like any other program, runs in a **process** that has a
**virtual address space**.
One key difference from compiled programs, however, is that the user code
is never exposed to those **virtual addresses**.
It's kind of like a **virtualization of a virtualization**
of real physical memory.
Instead of pointers, Java programs see **references**, and unlike pointers,
**you can't just dream up new references**.

### Memory access

???

In such a bytecode-interpreted language, all memory accesses have to go through
**the interpreter**.

### Bytecode and TCBs

???

However, there is no such thing as a free lunch.
One of the costs of using any sort of interpreter is that the interpreter
becomes **part of the TCB**... and thus we tend to have a
**very large TCB**!

---

# Example: Java

### Memory management

### Memory access

### Bytecode and TCBs

???

In such a bytecode-interpreted language, all memory accesses have to go through
**the interpreter**.

### `SecurityManager`

???

Java, in particular, also has interesting facilities for disabling features
like reflection, which by design circumvent the normal type rules of the
language.
A `SecurityManager` running on the JVM will also allow you to control access
to external resources like files and network sockets.
You can even attach privileges like "can access this external URL" to specific
pieces of code based on the code's identity... but more about that later
when we get to the lecture on Code Signing.

---

# So... perfection?

### Write all software in a memory-safe language?

### TCB considerations

???

High-level language interpreters have to be written in something.
You might be able to write a lot of a Java interpreter in Java, but at the
lowest levels you will find lots and lots of C++ code.
At the lowest levels of the C standard library, you will find
**assembly code**, sometimes **generated from scripts**.

### Memory safety in compiled languages

???

Languages like **Go** and **Rust** claim to provide memory safety, but they
are compiled languages.
How is this possible?

1. Compiler-added run-time safety checks
???

The compiler can add extra code to check some accesses at run time.
For example, if you are indexing within an array, the compiler can implicitly
add code such as `if 0 <= i < n`.

2. Limited unsafety
???

Languages that aspire to "systems programming" (i.e., things that have to be
aware of or manipulate the lowest-level primitives such as hardware registers)
have to allow for unsafe operations.
There is no memory-safe way to perform arbitrary register, memory or I/O
operations, so these kinds of languages have to provide some way to break
abstraction layers.
C code can include assembly via the `asm` keyword.
Rust code can explictly violate memory safety guarantees if it uses the `unsafe`
keyword.

3. Continued dangers of native instructions

???

Even with those checks, however, if you load someone else's native instructions
and execute them, **all bets are off**!

---

# Safe compiled code?

### What is a language?

???

When we think of a language, we typically think about **source code** and the
**rules** for writing it.
However, in addition to **rules**, we also have **tools** that are defined by
language specifications and — crucially — **runtime support libraries**.
If we take this expanded view of what makes a language, we can see a number of
approaches applied in various places that can be used to improve the security
of compiled code, too.

### Software

[AddressSanitizer](https://clang.llvm.org/docs/AddressSanitizer.html),
[CCured](https://doi.org/10.1145/565816.503286),
[Cyclone](https://cyclone.thelanguage.org/wiki/Papers),
"fat pointers",
[Go](https://golang.org),
[Rust](https://www.rust-lang.org),
...

???

### Software

[AddressSanitizer](https://clang.llvm.org/docs/AddressSanitizer.html)
(and other "sanitizers" like Thread Sanitizer and the Undefined Behaviour
 Sanitizer) can help spot memory errors during testing that might otherwise have
 gone unnoticed.
[CCured](https://doi.org/10.1145/565816.503286) is an example of an approach
that uses static analysis to figure out how pointers in a C program are "meant"
to be used and dynamic analysis to ensure that they are, in fact, used that way.
[Cyclone](https://cyclone.thelanguage.org/wiki/Papers) is a C dialect with
better memory safety properties than vanilla C, which it is designed to be
compatible with (or at least easy to adapt from).
Newer languages like [Go](https://golang.org) and
[Rust](https://www.rust-lang.org) have more expressive type systems that make
it possible to write memory-safe code even in high-performance compiled
languages with limited run-time checking.

### Hardware:

[Arm MTE](https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/enhancing-memory-safety),
[CHERI](https://doi.org/10.1109/ISCA.2014.6853201),
[Hardbound](https://doi.org/10.1145/1353534.1346295),
[MPX](https://doi.org/10.1145/3224423),
segmentation,
[Watchdog](https://doi.org/10.1109/ISCA.2012.6237017), ...

???

### Hardware

[Arm MTE](https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/enhancing-memory-safety)
has been adopted by Android to detect memory safety violations at run time.
[Hardbound](https://doi.org/10.1145/1353534.1346295),
[MPX](https://doi.org/10.1145/3224423) and
[Watchdog](https://doi.org/10.1109/ISCA.2012.6237017) attempt to provide
various forms of hardware memory safety enforcement.
[CHERI](https://doi.org/10.1109/ISCA.2014.6853201) is a designed-for-security
instruction set extension for ARM and MIPS that is just about to ship its first
hardware prototypes; it has the potential to change **everything** by allowing
high-level object accesses to be precisely enforced by hardware.

---

# Summary

### Memory safety

### Memory-safe language concepts

### Safe unsafe languages?

---
class: big, middle

The End.