class: big, middle

# ECE 7420 / ENGI 9823: Security

.title[
  .lecture[Lecture 4:]
  .title[Control-flow hijacking]
  
  
]

---

# Last time

### Code injection

1. Inject code (e.g., copying payload into buffers)
--

2. Hijack control flow (e.g., stack smashing)

---

# Today

### Mitigations

### Counter-mitigation attacks

### Counter-counter-mitigation mitigations

---

# Mitigations

### How can we prevent/reduce stack smashing?

* stack canaries: `-fstack-protector`
???

A _stack canary_, like a
[canary in a coal mine](https://www.smithsonianmag.com/smart-news/story-real-canary-coal-mine-180961570/)
(fun picture [here](http://history.alberta.ca/energyheritage/coal/the-early-development-of-the-coal-industry-1874-1914/early-methods-and-technology/canaries-in-the-coal-mine.aspx)),
is something that can be checked to see if conditions are too dangerous to
continue normal operations.
In the case of a canary, it would faint from carbon dioxide before humans would,
sending a signal that the mine wasn't safe.
In the case of a stack, **random values** can be written to the stack in between
functions' allocations.
Code is inserted to check this "canary" value **when returning from a function**
to ensure that **it hasn't been overwritten**.

* non-executable stacks
--
 ([we needed `-z execstack` to demo!](Makefile))
--

* `W^X`: memory regions writable **or** executable (limitations?)
???

Marking memory as _non-executable_ is something that wasn't possible on 32-bit
x86 computers, but _is_ possible on **64-bit x86_64 computers**.
This functionality can be used to prevent executable stacks (always a good
idea!) and/or a full `W^X` policy.

* ASLR: address space layout randomization (more later)

### ... and more to follow

---

# The attacker strikes back

### Guessing precise addresses is hard

???

For this demo to work, I had to embed the stack address that would store the
program counter.
How did I do this?
By **printing out the address** when I ran the program!
That is not a very reproducible solution.
In reality, it may not be possible to guess where a piece of data will land
in a program's memory.

`nop` sleds, relative addressing

???

To deal with this difficulty, shellcode can include "nop sleds", which are
long chains of `nop` instructions with shellcode at the end.
If the program counter lands anywhere within the `nop` sled, it will
"slide" all the way to the end and then execute the payload that's found there.

### Shellcode authors avoid zeroes (why?)

???

These kinds of payloads are often delivered via code that expects to read
strings from somewhere (a file, the network, the user, etc.).
When the target code receives a string and passes to around to functions, etc.,
it's very likely to need to run functions like `strlen` to figure out how
much data to pass, etc.
So, a shellcode author needs to avoid zeros in their strings: if not,
`strlen` will think that **it's reached the end of the string** and then
the code will **cut off the payload**!

### Is shellcode easy to spot?
--
 See: [English shellcode](https://www.cs.jhu.edu/~sam/ccs243-mason.pdf)&ast;

.footnote[
&ast;
"English Shellcode",
Mason, Small, Monrose and MacManus,
in
_CCS '09: Proceedings of the 16th ACM conference on Computer and communications security_, 2009.
DOI: [10.1145/1653662.1653725](https://dx.doi.org/10.1145/1653662.1653725)
]

???

You might think that shellcode would be easy to spot, but you can hide all
kinds of things inside innocuous-looking content.
Remember how, when we examined our malicious file with tools like `hexyl` or
`xxd`, many of the characters in the shellcode were displayed like ordinary
ASCII characters?
A lot of instructions' opcodes **are also valid ASCII text**!

---

# Higher-level languages?

### One mitigation: no stack access

???

One way to stop stack smashing is to avoid letting user code
**access the stack**: if they can't write to **stack memory**,
they can't _overwrite_ it.
However, that's not the only memory of interest.
In fact, with non-executable stacks, the stack isn't really the most interesting
any more!

### Alternative technique: _heap spraying_

* Create lots of shellcode strings
???

Even very high-level languages running under bytecode interpreters will allow
user code to create strings on the heap.
These strings can contain things like `nop` sleds that lead to shellcode...
lots of strings.
How much heap data can you create?
Check `window.performance.memory.jsHeapSizeLimit` in your browser, for one.
It's no big deal to create hundreds of MiB of `nop` sleds all around a browser's 
memory, just waiting to be exploited by a control-flow hijack.

--
 (how much?
  <a onClick="alert('jsHeapSizeLimit: ' + window.performance.memory.jsHeapSizeLimit)">
    try me!</a>)
--
* Even further: _Heap Feng Shui&ast;_

.footnote[
&ast; Alexander Sotirov,
"Heap Feng Shui in JavaScript",
[Black Hat Europe](https://www.blackhat.com/presentations/bh-europe-07/Sotirov/Presentation/bh-eu-07-sotirov-apr19.pdf),
2007.
]

--
* Just need _one_ control-flow hack to trigger

???

It's important to note that heap spraying (in all of its flavours) doesn't
actually _execute_ an attack: you still need to **hijack control flow**.
We'll talk about some more ways this can be done in a few minutes.

---

# Stages of code injection

### 1. Inject code

### 2. Hijack control flow

---

# Code injection

### Writable buffers

* any executable memory region

???

In general, code injection can occur anywhere that code can be executed.
That's typically not the stack anymore, and we'll soon see that the executable
places an attacker can write to are getting scarcer over time.

### User-driven memory allocation

* user is _supposed_ to be able to request allocation
???

However, we can't stop the attacker from allocating _any_ memory:
allocating memory is a pretty important and legitimate function of every
programming language environment!

* e.g., untrusted JavaScript allocates strings

---

# Control-flow hijacking

???

If code injection is the first step of a software attack, the second step is
to make the injected code actually **run**.
This is done by subverting the regular control of the victim program.
Anything that can be used for legitimate control flow can also be subverted
for **malicious** control flow.

### Targets:

???

### Targets:

return addresses,
???

**Return addresses:**
As we saw last time, call-and-return is a critical form of control flow for
most programs, and it hinges on a detail of stack layout.
If we can overwrite return addresses on the stack, we can cause all sorts of
mischief (even if we can no longer do the classic stack smashing attack due to
the default **non-executable stack**).

function pointers
--
 (inc. vtables),
???

**Function pointers:**
There are lots of reasons to use function pointers in real code.
One of the most prominent is in _vtables,_ which support
**virtual methods** in object-oriented systems (whether or not the languages
themselves are object-oriented!).

conditions...

???

**Conditions:**
Sometimes all an attacker wants to do is to make your program decide one thing
incorrectly.
Should I let this user access that thing?
Should I let the player into the game without a license?

### Approaches:

???

### Approaches:

buffer overflows,
???

**Buffer overflows:**
We saw these last time!

integer under/over-flows,
???
**Integer under/over-flows:** We'll look at these on the next slide.

format string vulnerabilities,
???
**Format string vulnerabilities:** We'll talk about these in just a few minutes.

application-level errors...

???

**Application-level errors:**
We'll talk a _lot_ about these when we get to Web security (SQLi, XSS, CSRF...)

---

# Integer overflow

### Q: What is an integer?
--
 How about on a computer?

???

An integer is a whole number that can be positive, negative or zero.
What is the maximum value of an integer? **There is none**.

On a computer, however, an integer _in a register_ is not exactly the same
as an integer in mathematical terms.
They are _almost_ identical, but the small differences can matter a lot.

### See [demo code](integers.c)

### Lesson: the details matter!

* don't assume that integers behave like, well, integers
--

* don't trust user input
--

* use safe integer arithmetic
  ([US-CERT](https://www.us-cert.gov/bsi/articles/knowledge/coding-practices/safe-integer-operations),
   [Microsoft](https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/ntintsafe-design-guide))

---

# Integer overflow...
--
 still???

???

Integer overflow is _still_ very much a going concern!

[Over 3,000 reported CVEs](https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=integer+overflow),
including dozens in 2024!

* Firefox: [CVE-2024-2608](https://nvd.nist.gov/vuln/detail/CVE-2024-2608)
* LLaMA: [CVE-2024-21836](https://nvd.nist.gov/vuln/detail/CVE-2024-21836)
* TP-Link router: [CVE-2024-25139](https://nvd.nist.gov/vuln/detail/CVE-2024-25139)
* Windows Defender: [CVE-2024-21420](https://nvd.nist.gov/vuln/detail/CVE-2024-21420)
--

* Probably [Cellebrite](https://arstechnica.com/information-technology/2021/04/in-epic-hack-signal-developer-turns-the-tables-on-forensics-firm-cellebrite) (older)

???

Cellebrite is a system for digital forensics relied on by police services around
the world, and apparently their own security practices were... not good.
Sadly, this is all too common in the security world: people not practicing
what they preach.
We'll talk more about Cellebrite later in the course, but for now you may enjoy
the following (genuinely amusing) read:

https://cyberlaw.stanford.edu/blog/2021/05/i-have-lot-say-about-signal’s-cellebrite-hack

---

# Format string vulnerabilities

### See [demo code](format-strings.c)

### Lesson: the details matter!

* don't trust user input
--

* put user strings in _values_, sure
--

* do **not** put user strings in _format_
--

* also important for higher-level languages (e.g.,
  [Ruby](https://nvd.nist.gov/vuln/detail/CVE-2008-2664))

---

# Notes about code injection

### Modern MMUs and DEP

???

Your computer's memory management unit (MMU) is the thing that translates
virtual addresses to physical addresses.
Along the way, there is an opportunity to check
**whether such a translation should be allowed**.
Specific mappings can be marked as read-only, or as inaccessible to user code,
and on modern machines, as **non-executable**.
This allows us to prevent the execution of bytes in specific regions like
**the stack**.

### `W^X` policy

???

However, it's more general than that!
In general, we would like to have memory be writable XOR executable.
If it's possible for an attacker to write in to the memory (whether directly,
like providing a buffer of shellcode, or indirectly, by tricking a program into
writing some data in a particular place), it should _not_ be possible to
execute that code.
There are some exceptions (a JIT engine, by definition, needs to be able to
write out executable code), but normally we would like to enforce a `W^X`
policy that will completely prevent some of the attacks described in the
previous slides.

---

# Stages of code injection

### 1. Inject code

### 2. Hijack control flow

## But step 1 is getting harder!

???

Policies such as `W^X` make it much tougher to inject attacker-controlled code
into memory that can actually be executed.
However, that doesn't mean that attackers just gave up!
Instead, they did what attackers do: they thought creatively, out of the box,
not limited by the constraints that defenders impose on them.

## What if...

---

# What if...

### ~~0. Inject code~~

### 1. Hijack control flow

???

Is it possible to attack running software _without_ injecting code?
If we could still hijack the control flow of a program (which seems
to often be the case!) and put non-executable data in memory (e.g., on the
stack), how could we still have a viable attack?

## What code do we execute?

???

What code would we even excute?

---

# Return to libc

### Uses existing code from `libc`

???

If you can't add code to memory, you'll just have to use what's already there!
This kind of "living off the land" is possible because there is already
quite a lot of code lying around in memory.
For example, there is _lots_ of code in the standard C library, which gets
loaded into just about every process running on your system.

### e.g., return to `system()`

???

One common thing we'd like to be able to do when we attack a program is...
anything!
We'd like a general-purpose tool for letting us execute arbitrary commands once
we've broken into a process, and `libc` provides us with just such a tool:
the `system(2)` system call.
This will allow us to execute any program we like, and if that program is a
shell program, we can execute _more_ arbitrary actions.

### Especially easy on 32b x86

---

# ROP

### _Return-oriented programming_&ast;

.footnote[
  &ast;
  See, e.g., Roemer et al,
  "Return-Oriented Programming: Systems, Languages, and Applications",
  ACM TISSEC 15(1), 2012.
  DOI: [https://doi.org/10.1145/2133375.2133377](10.1145/2133375.2133377)
]

### Generalization of return-to-libc attack

### Relies on existing "gadgets" (instruction + `ret`)

### Can be automated (e.g., [ROPC](https://github.com/pakt/ropc), [Ropper](https://github.com/sashs/Ropper))

???

For fun, try out the tutorials at https://ropemporium.com !

---

# ASLR

### _Address Space Layout Randomization_

???

Defenders can make the attacker's life harder by ensuring that `libc`
(and other code) isn't loaded at the same location every time.

### Not super-helpful on 32b platforms

???

On a 32b machine, however, we might only have 16b or even 8b available for
randomization.
A lack of randomness _seems_ bad in a defensive technique called
"randomization", but why?
What would more randomness give us?

### Increases "work factor"

???

ASLR **doesn't provide definitive protection**.
Unlike other security techniques, it won't always say "no" to an attack.
What it will do is make an attacker have to do **additional work**.
For example, on a 32b system, an attacker might have to
**try their attack 128 or 32,768 times** in order to succeed.

### But maybe not by as much as you think!&ast;

.footnote[
&ast; "ASLR on the Line: Practical Cache Attacks on the MMU",
Gras, Razavi, Bosmen, Box an Giuffrida,
_Proceedings of the 2017 Networked and Distributed Systems Security Symposium_,
2017.
DOI: https://dx.doi.org/10.14722/ndss.2017.23271.
]

???

Practical attacks exist that use low-level properties of things like
memory management units (MMUs) to break ASLR, even from JavaScript code!

---

# Code reuse attacks

### ~~0. Inject code~~

### 1. Hijack control flow

## How do we stop the hijacking?

---

# Stopping hijacking

### Stack protection

Non-executable memory  
Stack canaries (`-fstack-protector`)

### CFI: control flow integrity

Static analysis, dynamic enforcement

### Full _memory safety_
--
 (next time)

---
class: big, middle

The End.