9/13/2007

HITB 2007 - CTF Daemon 03 writeup

As requested by many, by the voice of Hyperion, our reverse engineer the explanation about the HITB2007 CTF's daemon 03. Quite a job!

Among the flag daemons featured in the HITBSecConf2007 Kuala Lumpur CTF, daemon03 proved an unique and unusual challenge. While all other daemons had to be coerced into revealing their flag, number 3 eagerly offered its flag to inquirers… in its own, cryptic language, that had to be learned first. It was not enough to attack daemon03: one had to get to know it, greet it, listen to what it had to say. It was only fortunate that me, the team's reverser/coder, had to be the one to deal with the shy one.

Technical

In my experience, I found it useful to perform some high-level analysis before wasting time with the minutiae of decompilation (I have become notorious in some circles for beginning most of my reversing sessions from inside notepad). Thus, first things first, I gave daemon03 the ritual objdump greeting:

Hyperion@Regulus ~/HITB
$ objdump -x daemon03

daemon03: file format elf32-i386
daemon03
architecture: i386, flags 0x00000102:
EXEC_P, D_PAGED
start address 0x080489e0

Program Header:
PHDR off 0x00000034 vaddr 0x08048034 paddr 0x08048034 align 2**2
filesz 0x00000100 memsz 0x00000100 flags r-x
INTERP off 0x00000134 vaddr 0x08048134 paddr 0x08048134 align 2**0
filesz 0x00000013 memsz 0x00000013 flags r--
LOAD off 0x00000000 vaddr 0x08048000 paddr 0x08048000 align 2**12
filesz 0x00009bf8 memsz 0x00009bf8 flags r-x
LOAD off 0x00009bf8 vaddr 0x08052bf8 paddr 0x08052bf8 align 2**12
filesz 0x000001a8 memsz 0x000001b8 flags rw-
DYNAMIC off 0x00009c0c vaddr 0x08052c0c paddr 0x08052c0c align 2**2
filesz 0x000000d8 memsz 0x000000d8 flags rw-
NOTE off 0x00000148 vaddr 0x08048148 paddr 0x08048148 align 2**2
filesz 0x00000020 memsz 0x00000020 flags r--
STACK off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**2
filesz 0x00000000 memsz 0x00000000 flags rw-
0x65041580 off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**2
filesz 0x00000000 memsz 0x00000000 flags --- aaa0

Sections:
Idx Name Size VMA LMA File off Algn
SYMBOL TABLE:
no symbols

It seemed already daemon03 played the shy card: no section table. Since the executable clearly contained strings betraying the presence of a section table, I had to assume this was just a matter of a zeroed-out counter field somewhere. I could have restored the field, but it seemed unnecessarily invasive. I'm a shy guy myself and I could sympathize with daemon03: going so intimate with his internal structures (mutilated structures, at it) so early in the game would have only proven awkward and ultimately counterproductive. My knowledge of ELF is a bit rusty, but from having implemented an ELF loader for the ReactOS kernel I knew loadable program headers were what really mattered to the memory layout of an ELF executable. I could expect difficulties in locating internal tables (external symbols, the GOT, etc.), but the program was simple enough for that not to matter much.

IDA Pro, the ever reliable digital anatomopathologist, in fact, did not get squeamish and correctly loaded daemon03 on her examining table in a matter of seconds, despite some understandable perplexion that left some "organs" unidentified and some "nerve endings" unconnected. As expected, imported functions laid orphaned of cross-references, their stubs left unnamed, and line addresses in the disassembly view were to be prefixed by "LOAD" instead of the familiar ".text". I did not get squeamish, either, recognizing a familiar overall structure I spent hours examining a day prior, the conspicuous many fingers pointing at the tiny pale circle in the sky called main.

LOAD:080489E0 ; ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦
LOAD:080489E0
LOAD:080489E0
LOAD:080489E0 public start
LOAD:080489E0 start proc near
LOAD:080489E0 xor ebp, ebp
LOAD:080489E2 pop esi
LOAD:080489E3 mov ecx, esp
LOAD:080489E5 and esp, 0FFFFFFF0h
LOAD:080489E8 push eax
LOAD:080489E9 push esp
LOAD:080489EA push edx
LOAD:080489EB push offset sub_804C990
LOAD:080489F0 push offset sub_804C930
LOAD:080489F5 push ecx
LOAD:080489F6 push esi
LOAD:080489F7 push offset sub_8048B92
LOAD:080489FC call sub_8048918

My experience with Linux executables was near zero before the competition. It's now at zero-point-something, but at least I've learned to recognize the above piece of code as the initial call to __libc_start_main. So let's mark "start", "sub_804C990" (actually _fini), "sub_804C930" (_init), "sub_8048918" and all routines below their call graph as library functions (Alt+P,L Enter), and rename "sub_8048B92" to "main", and move on to it.

Once inside main, we continue our campaign of marking uninteresting functions as "library routines", and manually naming imported external functions. We clearly begin with the initial call to read, which we annotate accordingly (and, while we're at it, we give readable names to outgoing argument pseudo-variables):

LOAD:08048BB1                 mov     [esp+2A8h+arg8], 512
LOAD:08048BB9 lea eax, [ebp+inbuf]
LOAD:08048BBF mov [esp+2A8h+arg4], eax
LOAD:08048BC3 mov [esp+2A8h+arg0], STDIN_FILENO
LOAD:08048BCA call _read
LOAD:08048BCF mov [ebp+inbufsize], eax

We know it will be followed by the call to flag_func, the function that authenticates and processes calls from the scorebot:

LOAD:08048BD2                 mov     [esp+2A8h+arg8], offset aEtcFlagsDaemon ; "/etc/flags/daemon03.txt"
LOAD:08048BDA mov eax, [ebp+inbufsize]
LOAD:08048BDD mov [esp+2A8h+arg4], eax
LOAD:08048BE1 lea eax, [ebp+inbuf]
LOAD:08048BE7 mov [esp+2A8h+arg0], eax
LOAD:08048BEA call sub_8048B24

The flag_func in daemon03 is identical to the function found in the other daemons, which let me mark three more functions (sub_804A9F0, sub_8048EC0, and sub_804A6C0, a jumbled mess of anti-hardening code, CRC32, MD5 and SHA1 I wasted hours delving into by mistake and my inexperience with CTFs) as library routines and identify the import stubs for write, fopen, fprintf, malloc, etc. At this point I finally had enough data to reconstruct all the imports, and make my reversing work easier. IDA Pro reverse-engineered the imports as such:

extern:8052DB0 ; ---------------------------------------------------------------------------
extern:8052DB0
extern:8052DB0 ; Segment type: Externs
extern:8052DB0 ; extern
extern:8052DB0 ; int feof(FILE *)
extern:8052DB0 extrn feof:near
extern:8052DB4 ; int write(int fildes,const void *buf,size_t nbyte)
extern:8052DB4 extrn write:near
extern:8052DB8 ; int fileno(FILE *)
extern:8052DB8 extrn fileno:near
extern:8052DBC ; int fprintf(FILE *,const char *,...)
extern:8052DBC extrn fprintf:near
extern:8052DC0 ; int fflush(FILE *)
extern:8052DC0 extrn fflush:near
extern:8052DC4 ; int system(const char *string)
extern:8052DC4 extrn system:near
extern:8052DC8 extrn random:near
extern:8052DCC ; void *malloc(size_t size)
extern:8052DCC extrn malloc:near
extern:8052DD0 ; size_t fread(void *,size_t size,size_t n,FILE *)
extern:8052DD0 extrn fread:near
extern:8052DD4 ; time_t time(time_t *timer)
extern:8052DD4 extrn time:near
extern:8052DD8 extrn __fxstat:far
extern:8052DDC extrn __libc_start_main:near
extern:8052DE0 ; void *memcpy(void *,const void *,size_t)
extern:8052DE0 extrn memcpy:near
extern:8052DE4 ; int fclose(FILE *)
extern:8052DE4 extrn fclose:near
extern:8052DE8 ; void exit(int status)
extern:8052DE8 extrn exit:near
extern:8052DEC ; void free(void *)
extern:8052DEC extrn free:near
extern:8052DF0 ; void *memset(void *,int,size_t)
extern:8052DF0 extrn memset:near
extern:8052DF4 ; FILE *fopen(const char *name,const char *type)
extern:8052DF4 extrn fopen:near
extern:8052DF8 extrn srandom:near
extern:8052DFC ; int sprintf(char *,const char *,...)
extern:8052DFC extrn sprintf:near
extern:8052E00 ; size_t fwrite(const void *,size_t size,size_t n,FILE *)
extern:8052E00 extrn fwrite:near
extern:8052E04 ; int read(int fildes,void *buf,size_t nbyte)
extern:8052E04 extrn read:near

In a normal program, all of these would be cross-referenced to their callers. In the poor, brutalized daemon03, I had to intervene manually with some reconstructive surgery. While identifying and renaming the stubs, I noticed they were in the same order as symbols in the externs section:

LOAD:08048868 ; [00000010 BYTES: COLLAPSED FUNCTION sub_8048868. PRESS KEYPAD "+" TO EXPAND]
LOAD:08048878 ; [00000010 BYTES: COLLAPSED FUNCTION _write. PRESS KEYPAD "+" TO EXPAND]
LOAD:08048888 ; [00000010 BYTES: COLLAPSED FUNCTION sub_8048888. PRESS KEYPAD "+" TO EXPAND]
LOAD:08048898 ; [00000010 BYTES: COLLAPSED FUNCTION _fprintf. PRESS KEYPAD "+" TO EXPAND]
LOAD:080488A8 ; [00000010 BYTES: COLLAPSED FUNCTION sub_80488A8. PRESS KEYPAD "+" TO EXPAND]
LOAD:080488B8 ; [00000010 BYTES: COLLAPSED FUNCTION sub_80488B8. PRESS KEYPAD "+" TO EXPAND]
LOAD:080488C8 ; [00000010 BYTES: COLLAPSED FUNCTION sub_80488C8. PRESS KEYPAD "+" TO EXPAND]
LOAD:080488D8 ; [00000010 BYTES: COLLAPSED FUNCTION _malloc. PRESS KEYPAD "+" TO EXPAND]
LOAD:080488E8 ; [00000010 BYTES: COLLAPSED FUNCTION sub_80488E8. PRESS KEYPAD "+" TO EXPAND]
LOAD:080488F8 ; [00000010 BYTES: COLLAPSED FUNCTION sub_80488F8. PRESS KEYPAD "+" TO EXPAND]
LOAD:08048908 ; [00000010 BYTES: COLLAPSED FUNCTION sub_8048908. PRESS KEYPAD "+" TO EXPAND]
LOAD:08048918 ; [00000010 BYTES: COLLAPSED FUNCTION ___libc_start_main. PRESS KEYPAD "+" TO EXPAND]
LOAD:08048928 ; [00000010 BYTES: COLLAPSED FUNCTION sub_8048928. PRESS KEYPAD "+" TO EXPAND]
LOAD:08048938 ; [00000010 BYTES: COLLAPSED FUNCTION sub_8048938. PRESS KEYPAD "+" TO EXPAND]
LOAD:08048948 ; [00000010 BYTES: COLLAPSED FUNCTION _exit. PRESS KEYPAD "+" TO EXPAND]
LOAD:08048958 ; [00000010 BYTES: COLLAPSED FUNCTION sub_8048958. PRESS KEYPAD "+" TO EXPAND]
LOAD:08048968 ; [00000010 BYTES: COLLAPSED FUNCTION sub_8048968. PRESS KEYPAD "+" TO EXPAND]
LOAD:08048978 ; [00000010 BYTES: COLLAPSED FUNCTION sub_8048978. PRESS KEYPAD "+" TO EXPAND]
LOAD:08048988 ; [00000010 BYTES: COLLAPSED FUNCTION _fopen. PRESS KEYPAD "+" TO EXPAND]
LOAD:08048998 ; [00000010 BYTES: COLLAPSED FUNCTION sub_8048998. PRESS KEYPAD "+" TO EXPAND]
LOAD:080489A8 ; [00000010 BYTES: COLLAPSED FUNCTION sub_80489A8. PRESS KEYPAD "+" TO EXPAND]
LOAD:080489B8 ; [00000010 BYTES: COLLAPSED FUNCTION sub_80489B8. PRESS KEYPAD "+" TO EXPAND]
LOAD:080489C8 ; [00000010 BYTES: COLLAPSED FUNCTION _read. PRESS KEYPAD "+" TO EXPAND]

Even in my ignorance of Linux executables, this looked too perfect to be a coincidence, so I proceeded to rename them and give them the Alt-P,L treatment. At this point, most of the executable's code was marked with the light blue of library routines. I briefly examined the remaining routines, marking a couple more: sub_8048AD4, which looks like an unused hex_dump function (it must be part of a standard library of CTF-specific routines together with flag_func), and sub_8048A32, a helper routine that lets the compiler emulate PC-relative addressing on the uncooperative x86 architecture, for the purposes of position-independent code. This long and extensive scrubbing gave us a sparkling clean call graph (Ctrl-F12):

Still too messy for my tastes, so I gave it some manual clean-up, removing the orphaned externs, the noisy light-blues and other irrelevant elements:

My top-down approach to reverse-engineering was finally paying off, revealing the program in all its straightforwardness: some standard library code, some custom library code shared with the other daemons, and a mere two custom routines, main and sub_804C9E0. It was finally time to get myself intimate with main and its mistery companion.

The flow graph of main appeared straightforward, as well:

I could clearly identify two failure points (the exhaustive approach really paid off, as IDA Pro automatically marked the "_exit" function as non-returning, removing some spurious execution paths from the graph) and a loop followed by a success path. As per standard procedure, I proceeded to collapse the function's prolog and group the failure points together, to make for a leaner graph. The success path and function epilog looked simple enough to be hidden away, too:

LOAD:08048E5B                 mov     eax, stdout
LOAD:08048E60 mov [esp+2A8h+argc], eax
LOAD:08048E64 mov eax, [ebp+inbufsize]
LOAD:08048E67 mov [esp+2A8h+arg8], eax
LOAD:08048E6B mov [esp+2A8h+arg4], 1
LOAD:08048E73 mov eax, [ebp+outbuf]
LOAD:08048E76 mov [esp+2A8h+arg0], eax
LOAD:08048E79 call _fwrite
LOAD:08048E7E mov eax, stdout
LOAD:08048E83 mov [esp+2A8h+arg0], eax
LOAD:08048E86 call _fflush
LOAD:08048E8B mov eax, [ebp+var_C]
LOAD:08048E8E mov [esp+2A8h+arg0], eax
LOAD:08048E91 call _free
LOAD:08048E96 mov eax, [ebp+outbuf]
LOAD:08048E99 mov [esp+2A8h+arg0], eax
LOAD:08048E9C call _free
LOAD:08048EA1 mov [esp+2A8h+arg0], offset aUsrBinDate ; "/usr/bin/date"
LOAD:08048EA8 call _system
LOAD:08048EAD mov eax, 0
LOAD:08048EB2 leave
LOAD:08048EB3 retn

The normal exit points of a function can be quite telling. This one's told me a buffer was written on standard output, two objects allocated on the heap, one of which being the aforementioned buffer, and that the "date" command was executed at the end so that its output would be concatenated with the daemon's. None of this appeared vital to the understanding of the loop (most certainly the heart of the program, which at this point pretty transparently appeared to be an encryption algorithm with a pseudo-random component to the secret key), so I collapsed the success path with the failure exits to form a single exit point. The beginning of the main function was quite familiar too: a fixed-width read from standard input followed by a call to flag_func, so I collapsed that section too. A cursory look at the loop confirmed my suspicions: a trivial loop of "inbufsize" iterations, and, luckily, the body only appeared to perform basic arithmetics. For the first time since the CTF had begun, I had a really good feeling about it…

Before turning my undivided attention to the loop, very obviously the meat-and-potatoes of daemon03, I started chipping away at its surroundings. First of all, it appeared only the first 4 bytes of client input were deemed of interest:

LOAD:08048BEF                 mov     [esp+2A8h+arg8], 4
LOAD:08048BF7 lea eax, [ebp+inbuf]
LOAD:08048BFD mov [esp+2A8h+arg4], eax
LOAD:08048C01 lea eax, [ebp+inbuf4bytes]
LOAD:08048C07 mov [esp+2A8h+arg0], eax
LOAD:08048C0A call _memcpy

Then, the flag file was opened:

LOAD:08048C0F                 mov     [esp+2A8h+arg4], offset aRb ; "rb"
LOAD:08048C17 mov [esp+2A8h+arg0], offset aEtcFlagsDaemon ; "/etc/flags/daemon03.txt"
LOAD:08048C1E call _fopen
LOAD:08048C23 mov [ebp+flagfile], eax

Next, the opened file description number of the flag file was retrieved with fileno, and passed to the mysterious sub_804C9E0, along with what appeared to be a structured output buffer:

LOAD:08048C55                 mov     eax, [ebp+flagfile]
LOAD:08048C58 mov [esp+2A8h+arg0], eax
LOAD:08048C5B call _fileno
LOAD:08048C60 mov edx, eax
LOAD:08048C62 lea eax, [ebp+outstruct]
LOAD:08048C65 mov [esp+2A8h+arg4], eax
LOAD:08048C69 mov [esp+2A8h+arg0], edx
LOAD:08048C6C call sub_804C9E0
LOAD:08048C71 mov eax, [ebp+field]
LOAD:08048C74 mov [ebp+var_10], eax

Much to my relief, sub_804C9E0 proved to be a thin wrapper around __fxstat, which I assumed to be some kind of versioned fstat system call that could multiplex between multiple modes (such as 32-bit vs 64-bit offsets), so I simply renamed sub_804C9E0 to "_fstat" and returned my attention to main. IDA Pro didn't seem to have a definition of struct stat that matched the Linux version the program was compiled for, but it was nevertheless pretty clear that fstat was being used to retrieve the flag file's size. The program, armed with this knowledge, then allocates two buffers as large as the flag file, one of which will be written to standard output in the success path:

LOAD:08048C77                 mov     eax, [ebp+flagfilesize]
LOAD:08048C7A mov [esp+2A8h+arg0], eax
LOAD:08048C7D call _malloc
LOAD:08048C82 mov [ebp+buffer1], eax
LOAD:08048C85 mov eax, [ebp+flagfilesize]
LOAD:08048C88 mov [esp+2A8h+arg0], eax
LOAD:08048C8B call _malloc
LOAD:08048C90 mov [ebp+outbuffer], eax

The flag is then read in memory, into the first dynamic buffer:

LOAD:08048CC0                 mov     eax, [ebp+flagfile]
LOAD:08048CC3 mov [esp+2A8h+argc], eax
LOAD:08048CC7 mov eax, [ebp+flagfilesize]
LOAD:08048CCA mov [esp+2A8h+arg8], eax
LOAD:08048CCE mov [esp+2A8h+arg4], 1
LOAD:08048CD6 mov eax, [ebp+flag]
LOAD:08048CD9 mov [esp+2A8h+arg0], eax
LOAD:08048CDC call _fread
LOAD:08048CE1 mov [ebp+inbufsize], eax
LOAD:08048CE4 mov eax, [ebp+flagfile]
LOAD:08048CE7 mov [esp+2A8h+arg0], eax
LOAD:08048CEA call _fclose

I could then group all the preceding basic blocks into a single graph node I called simply "initialization". The algorithm found itself cornered before me, corraled into a mere two-and-a-half basic blocks:

A realization hit me. I paused, smiled, turned right to Emanuele and asked "You know what time is it?" "Is it time for [expletive]?" (we had taken to using curse words as punctuation by that point) "Oh, no", I said, my smile now a chesiresque grin, "It's time to write code".

About two hours later, spent switching back and forth between Visual Studio, IDA Pro, the Windows calculator, pen-and-paper notes, a remote GDB session (on the live team server no less!) and my own overheating brain, I had the algorithm fully decompiled and tested against the binary. Here is it, in form of inline annotations to the disassembly:

LOAD:08048D13                 mov     [esp+2A8h+arg0], 0
LOAD:08048D1A call _time
LOAD:08048D1F mov [esp+2A8h+arg0], eax
LOAD:08048D22 call _srandom ; srandom(time(NULL));
LOAD:08048D27 call _random
LOAD:08048D2C mov edx, eax
LOAD:08048D2E lea eax, [ebp+key]
LOAD:08048D34 xor [eax], edx ; key ^= random(); // key comes from input
LOAD:08048D36 mov eax, [ebp+key]
LOAD:08048D3C mov [ebp+state], eax
LOAD:08048D42 mov ecx, [ebp+state] ; state = key;
LOAD:08048D48 and ecx, 0FF00FFFFh ; // in retrospect, I know this to be some clever
LOAD:08048D48 ; // compiler-generated code to perform division
LOAD:08048D48 ; // by 10 without divisions
LOAD:08048D4E mov eax, 0CCCCCCCDh
LOAD:08048D53 mul ecx
LOAD:08048D55 mov eax, edx
LOAD:08048D57 shr eax, 3
LOAD:08048D5A mov [ebp+index], eax
LOAD:08048D60 mov edx, [ebp+index]
LOAD:08048D66 mov eax, edx
LOAD:08048D68 shl eax, 2
LOAD:08048D6B add eax, edx
LOAD:08048D6D add eax, eax
LOAD:08048D6F sub ecx, eax
LOAD:08048D71 mov eax, ecx
LOAD:08048D73 mov [ebp+index], eax ; index = state % 10;
LOAD:08048D79 mov eax, [ebp+index]
LOAD:08048D7F mov edx, ds:keytable[eax*4]
LOAD:08048D86 lea eax, [ebp+state]
LOAD:08048D8C xor [eax], edx ; state ^= keytable[index];
LOAD:08048D8E mov [ebp+i], 0
LOAD:08048D98
LOAD:08048D98 loc_8048D98:
LOAD:08048D98 mov eax, [ebp+i]
LOAD:08048D9E cmp eax, [ebp+inbufsize]
LOAD:08048DA1 jge loc_8048E5B ; for(i = 0; i < inbufsize; ++ i)
LOAD:08048DA1 ; {
LOAD:08048DA7 mov eax, [ebp+i]
LOAD:08048DAD add eax, [ebp+flag]
LOAD:08048DB0 movzx eax, byte ptr [eax]
LOAD:08048DB3 mov [ebp+flagbyte], al ; flagbyte = flag[i];
LOAD:08048DB9 mov [ebp+const], 65h
LOAD:08048DC0 movzx eax, [ebp+const]
LOAD:08048DC7 add eax, eax
LOAD:08048DC9 mov [ebp+const], al ; const = 0xca; // no idea why this wasn't inlined
LOAD:08048DCF mov edx, [ebp+state]
LOAD:08048DD5 mov eax, edx
LOAD:08048DD7 add eax, eax
LOAD:08048DD9 add eax, edx
LOAD:08048DDB mov [ebp+state], eax ; state *= 3;
LOAD:08048DE1 mov eax, [ebp+state] ; // probably some more compiler-generated
LOAD:08048DE1 ; // code. No idea as to the intended purpose
LOAD:08048DE7 mov ecx, eax
LOAD:08048DE9 shr ecx, 18h ; tmp1 = key2 >> 24;
LOAD:08048DEC mov eax, 0F0F0F0F1h
LOAD:08048DF1 mul ecx
LOAD:08048DF3 shr edx, 4 ; tmp2 = ((0xf0f0f0f1 * (long long)tmp1) >> 32) >> 4;
LOAD:08048DF6 mov eax, edx
LOAD:08048DF8 shl eax, 4
LOAD:08048DFB add eax, edx ; tmp3 = (tmp2 << 4) + tmp2;
LOAD:08048DFD sub ecx, eax ; tmp1 -= tmp3;
LOAD:08048DFF mov eax, ecx
LOAD:08048E01 add eax, 8
LOAD:08048E04 movzx ecx, al ; tmp3 = (tmp1 + 8) & 0xff;
LOAD:08048E07 mov eax, [ebp+state]
LOAD:08048E0D shr eax, cl
LOAD:08048E0F and eax, 0FFh
LOAD:08048E14 xor eax, [ebp+state]
LOAD:08048E1A inc eax
LOAD:08048E1B mov [ebp+state], eax ; state = (((state >> tmp3) & 0xff) ^ state) + 1;
LOAD:08048E21 mov eax, [ebp+state]
LOAD:08048E27 mov dl, al
LOAD:08048E29 add dl, 2Eh ; key = ((state & 0xff) + 0x2e) & 0xff;
LOAD:08048E2C lea eax, [ebp+flagbyte]
LOAD:08048E32 xor [eax], dl ; flagbyte ^= key;
LOAD:08048E34 mov eax, [ebp+i]
LOAD:08048E3A mov edx, [ebp+outbuffer]
LOAD:08048E3D add edx, eax
LOAD:08048E3F movzx eax, [ebp+const]
LOAD:08048E46 add al, [ebp+flagbyte]
LOAD:08048E4C mov [edx], al ; outbuffer[i] = flagbyte + const;
LOAD:08048E4E lea eax, [ebp+i]
LOAD:08048E54 inc dword ptr [eax]
LOAD:08048E56 jmp loc_8048D98 ; }

Basically, the algorithm has a 4-byte initialization vector, filled with a value derived from the 4-byte key in the program input, the current timestamp after a round of srandom/random and an internal table of 10 magic numbers (that seem to have been borrowed at random from the S-boxes of the Korean SEED algorithm, RFC4269). This vector is used to produce a key the same length as the input. The key and input are combined with XOR into a temporary output. Finally, the value of every byte in the temporary output is rotated by 202 positions (0xCA in hexadecimal) to produce the final, encrypted output. Apart from the rotation step, the algorithm is symmetric: given an identical initialization vector and the encrypted output as input, it will produce the plain input. The tricky part is, of course, to reverse-engineer the initialization vector for any particular iteration.

Not having the time (nor inclination) to mathematically prove how many bits of the state were actually used to derive the key, and so whether a brute-force attack over a possibly limited keyspace was feasible, I had to attack the algorithm in the intended way: parsing the output of the UNIX date command back into a timestamp. I hate parsing. With a hint of shame for a rushed job (and a sigh of relief for a rushed parsing job), I limited myself to parsing the time fields, assuming the remote machine to have the same date and timezone as the local machine. The score server wanted a hexadecimal dump of the flag, so I quickly whipped up a dump routine. Getting impatient, I tested my half-finished decoder against an actual output from daemon03, and with a mounting wave of pride I witnessed a single, perfect line of output that matched the actual flag. My creature was ready for the field.

Tactical

The decoder written and tested (and "ported" from Windows to Linux — a mere recompilation sufficed — to use the right srandom/random), attacking daemon03 looked trivial. In went our half of the secret key (I picked a fixed value, "ABCD", or 0x44434241), out went the flag, ready for automated submission. This was the theory. The field proved less cooperative.

Intermittent failures were to be expected from the outset. More than one second could conceivably pass between the daemon's call to time and /usr/bin/date's, so we expected having to execute the attack multiple times in a row until the output stabilized. What we didn't expect was corrupted output. Flags were meant to be 20 bytes long, but some actually were 21 bytes, others 41 (I had to change the decoder to take a variable-length input. Most inconvenient. I'm not a huge vim fan). Some daemons didn't seem to execute /usr/bin/date (I hope all of these were at least supposed to count as failures. Corrupted output means denied service). In a few cases, completely bogus outputs resulted. To make it worse, we just happened to deploy our attack in the middle of a "flap" of bad flags, scorebot glitches and network issues.

By that time, though, I had turned my attention to daemons number 5 and 7 (trivially exploitable and they contained the shellcode themselves! sure wished I had looked at them earlier), the joys and worries of daemon number 3 already behind me.

Strategical

The overall impact of the daemon03 effort on the ongoing war was, frankly, disappointing. Very few valid flags could be milked from it, and the breakthrough was not worth much. Flags weren't refreshed as often as we expected (or ever, really), and only half-heartedly did we launch multiple waves of attack.

All in all, for a challenge that required hours of work, reversing, coding, debugging, and not just a canned input and/or some trial-and-error, we were pretty underwhelmed. Nothing to do but keep our ol' reliable strategy going until the end, slowly and steadily racking up points and solidifying our second place. We had to accept that the daemon03 challenge would be its own reward, and move on.

1 comment:

Anonymous said...

Argh! Quite a headache, wasn'it? Congratulation for the analysis.