Impalabs is releasing Hyperpom, a 64-bit ARM binary fuzzer written in Rust and based on the Apple Silicon's hypervisor. It is mutation-based and coverage-guided. This article gives an overview of its internals, presents the different components it consists of and how they relate to each other. Most importantly, it also gathers all the resources you need to get started and begin fuzzing your own 64-bit ARM targets.
Fuzzing a binary efficiently, without its source code, is not easy. It gets even harder if you target uncommon architectures, want some degree of instrumentation, as well as good performances. At Impalabs, we generally resort to emulation or symbolic execution, with tools such as Unicorn or Manticore. However, translating from one architecture to another at runtime means that you will always lose out on some precious CPU cycles.
For the past few years, we have been targeting mobile devices, which run primarily on ARM SoCs. And while it's not as bad as the other "exotic" architectures, tools are definitely lacking compared to x86 and its variants. However, with the introduction of their M1 — and now M2 — Apple Silicon SoCs, Apple has opened the door to new possibilities when it comes to ARM binary analysis, thanks in part to the existing macOS ecosystem.
Binary fuzzing can be tackled through different methods, but in this blog post we will focus in particular on hypervisor-based fuzzing. Using a hypervisor, which operates at EL2 on ARM, allows us to fuzz both kernel and userland targets, which run at EL1 and EL0 respectively. On MacOS, there is a dedicated framework that abstracts accesses to the hypervisor, the Hypervisor.framework
. The rest of this blog post details how we leveraged this framework to build Hyperpom.
Introducing Hyperpom
Hyperpom is a 64-bit ARM binary fuzzer based on the Apple Silicon's hypervisor and is developed entirely in Rust. It is mutation-based and coverage-guided. Using a hypervisor provides complete control on the targeted binaries and allows adding introspection mechanisms pretty easily. For example, we can implement code coverage gathering, a hooking system, add instrumentation, etc.
As stated in introduction, Hyperpom is based on Apple's Hypervisor.framework
, a framework designed to create and manage virtual machines. It abstracts virtual machines as processes, and virtual processors as threads. When a virtual machine is created, the fuzzer can then manage its physical memory and virtual CPUs just like a regular OS would. In order to access the Hypervisor.framework
from Rust, bindings have been developed and can be found here.
Hyperpom's architecture is pretty standard. Each virtual CPU runs a worker that fuzzes its own instance of the binary. These instances then share information, such as the corpus and coverage data, to speed up the process.
However, at this point, considering we only have access to physical memory and the vCPUs, we have to build the whole fuzzer from the ground up. First, we had to deal with memory management. Using unique virtual address spaces for each worker gives us a better control over memory accessible to them and also prevents inadvertent accesses to each other’s memory while fuzzing (e.g. an OOB that goes undetected because the access was on a page allocated for another guest). To do this, we implemented:
Then we implemented exception handling, which is one of the backbone of the fuzzer. By raising exceptions, from the guest, we are able to give back control to the hypervisor and use them to detect if a crash occurred, a hook was placed, a new coverage path was discovered, etc. Once an exception has been handled by the hypervisor, the guest resumes its execution and continues normally. Additionally, since virtual CPUs behave like real ones, we also have to handle specificities of ARM implementations, such as Caches.
With these low-level building blocks, we can now start implementing actual features for the fuzzer, like a hooking system, to instrument and get information about our targets at runtime. Then, using hooks as a foundation, we can provide:
- calls to user-defined functions, by placing hooks at arbitrary addresses;
- code coverage, by hooking instructions that change the execution flow and storing their address when they are reached;
- tracing, by hooking every instructions of the binary.
A fuzzing campaign is orchestrated by the main process, which spawns fuzzing workers. These workers then execute instances of the target and feed them mutated testcases from a shared corpus. Then, using runtime information from the instrumentation of the binary, we can decide if the testcase should be kept depending on the paths it covered. When a crash occurs, information related to the current state of the fuzzer is stored in a file and can be reviewed later.
Fuzzing a Target
Now that we have a general idea of the fuzzer's components and how they relate to each other, we can explain how targets are actually harnessed. Hyperpom is a framework that provides an API that mirrors the lifecycle of the fuzzer. By implementing the Loader trait, a user can customize the fuzzer however they see fit. It gives access to core components of the fuzzer, such as the virtual memory allocators or virtual CPUs, to load an arbitrary binary, define hooks and initialize the state of the CPU before starting the fuzzer.
The methods defined by this trait try to reflect as best as possible all the steps a binary goes through while being fuzzed.
- The binary is first parsed to be mapped into the virtual address space of the fuzzer using Loader::map.
- User-defined hooks can then be applied using Loader::hooks.
- We’ve now reached the pre-snapshot stage. The method Loader::pre_snapshot performs all the remaining operations before a snapshot of the virtual address space and the CPU state is taken. This is the step where we can call, for example, initialization functions from the binary so that we don’t have to do it every iteration. After these operations have been performed, a snapshot of the virtual memory and vCPUs is taken.
- From this point on, the fuzzer enters the iteration loop, which means that we’ll return to this step when an iteration finishes and reset the fuzzer's state using the snapshot. For every iteration, the first operations are to retrieve a testcase from the corpus, mutate it using Loader::mutate, and pass it to the Loader::load_testcase function where it can be arbitrarily loaded into the address space and consumed by the targeted binary.
- Every action that needs to happen after the snapshot, but before the actual execution can be defined in Loader::pre_exec.
- Now the execution actually happens, this is the fuzzer’s job, nothing to do here. :)
- If something needs to be cleaned-up after the execution of a testcase, you can do it using Loader::post_exec.
When a crash occurs, we break away from this lifecycle and switch over to the crash verification process. Because internal and global states can evolve while the fuzzer is active, we need to be able to control them when a testcase is replayed. If you need to reset variables that could influence crash reruns, you can do so by implementing Loader::reset_state. If it is a legitimate crash, it is formatted using Loader::format_crash and written to a file by the fuzzer.
The information in this section is just an overview of the fuzzer's API. You can have a look at the documentation of the Loader trait to get a more thorough description.
Getting Started
To start using Hyperpom, you can have a look at the README in Hyperpom's GitHub repository. You should find all the information you need to install Hyperpom, setup a development environment and start fuzzing.
You can also have a look at the documentation, which explains the fuzzer's internals, and in particular the Loader trait, to get an in-depth presentation of the fuzzer's API.
It is also recommended to have a look at the examples provided in the repository, to get a better understanding of how the framework operates.
Conclusion
In this blog post, we have presented Hyperpom, a 64-bit ARM binary fuzzer written in Rust and based on the Apple Silicon's hypervisor. Hyperpom is still in its early stages and requires additional polishing to be more effective. Later versions should implement new coverage strategies, allow more control on the fuzzer by providing additional trait methods, etc.
Regarding performances, one of the main limitation is the time needed to make a context switch between the host and its guests. It would be best to revamp some parts of the fuzzer to reduce their number as much as possible. In hindsight, it might have been a better idea to just write an operating system and implement everything inside it (maybe for a v2.0, who knows).
At the end of the day, while there is still work left to do, Hyperpom can already fuzz AArch64 targets and find bugs in proprietary binaries with decent speed. And even if it is not good enough in its current state for your use cases, you can hopefully repurpose some parts of the code to build fuzzers that better suit your needs. In any case, if you're facing a problem you can't fix or have suggestions to improve the project, you're welcome to open an issue on our GitHub repository to discuss it.
References
- Hyperpom: AArch64 fuzzer based on the Apple Silicon hypervisor.
- Applevisor: Rust bindings fo the Apple Silicon Hypervisor.framework.