Emulating decryption function with radare2
This is the first part of the three-part series about code Emulation for Reversing Malware :
Part 1 describes how to use radare2 function emulation along with an exercise of cracking password of function implemented using radare2 python scripting plugin r2pipe.
Part 2 describes how to use the feature to decode a configuration of a Mirai IOT botnet, by implementing the solution in radare python scripting capabilities.
Part 3 improves the script created in the previous by adding more features of searching for addresses of encrypted string and creating function signature to search for decryption function instead of using the hard-coded address of the function.
radare2 is reverse engineering tool that can be very useful to reverse engineer malware or any type of binary as it supports many CPU architectures. One of the most striking features I found about radare is the partial code emulation. I was initially sceptical about this feature what could it be actually used for but think it about for a while and playing with that feature I realized its potential, it’s simply amazing.
lets consider frequent scenarios where, malware author encrypt the string in binary and decrypts these string just before using the string when running the code like, for example when you inspect the executable malware in PEView import won’t let you much besides LoadLibrary which helps you to load DLL library into the process and GetProcAddress helps you to resolve particular method in that library, as the malware is loading DLL and resolving functions manually. The imports section of the binary is empty and the functions are resolved dynamically, decryption of these function names is done before calling LoadLibrary and GetProcAddress, nothing fancy simple string obfuscation. A string can also give you the attack IP address/URL. Malware author may be using some encryption routine which you might have to reverse and write your own python script to replicate the algorithm and decrypt all those strings. And you might also have to go through the pain of locating those string and going back and forth between IDA and shell. We could have avoided static analysis of the malware string decryption routine if we could run that part of the binary by passing a parameter to that function and reading the decrypted sting from the memory.
String decryption is not the only problem we have there is also a self-modifying code/shellcode which can be analyzed automatically. the radare emulator can do a lot more the monitoring CPU register and we can even modify register in the middle of execution. One thing to note is radare emulator is only partial code emulator it doesn’t emulate full OS so making system call and invoking system function isn’t going to work. Well, there are workarounds, but we need to keep this mind that requires some extra work. But usually, these decrypting functions are self-sufficient and not depending on external library functions. So let’s get right into it.
For sake of simplicity I have created a simple C program which taken input strings, it decrypts the string and compares with string its looking for if the string is equal and we have cracked the code. Below this the for the challenge C code.
Just to brush up the basic function parameter passing concepts, parameters are passed to function via the stack, there are two types of parameters either value directly copied on to the stack or a reference of the variable is passed on the stack. Reference is just a pointer variable which contains the address of the variable. When the function is done executing it return the result into eax/rax register. In memory, function’s first parameter can be found on ebp+0x8, the second parameter at ebp+0xc and so on for x86 32-bit machine. If we were to emulate the function call then we can set up the stack frame before calling the function and then run the function and read the value from the eax register, this way we have emulated the function.
Before using the emulator we need to initialize it, providing the environment information like CPU architecture, and little/big-endian byte ordering, allocating a memory for stack etc. all this can be done with commands below
- e asm.bits=32 : specify this its a 32 bit address space
- e asm.arch=x86 : its x86 architecture
- aei: initialized the VM
- aeim: this command allocates a stack at memory address at address 0x100000 of size 0xf0000 this can be changed by passing it a parameter to the command
- s sym.is_valid : seek the current radare pointer to the start of function “is_valid”.
- aeip: this sets the current value of EIP with current seek. You use this command after you have seek to the function you are trying to emulate.
- pxw @ esp: this is just to see the stack in 32-bit word format.
In our case we will write string on stack memory somewhere in higher memory address where we don’t overlap with the stack allocated memory for this particular function call and push there reference on top of stack of that memory address, this pushed reference of string is the parameter to the stack, this sets up the stack for the function call.
w o0ekma @ 0x001780f0 : “w” command writes string “o0ekma” at memory address “0x001780f0”, this memory address is within the memory allocated by radare emulator.
wx 0xf0801700 @ ebp+0x4 : “wx” command writes hex value “0xf0801700” at “ebp+0x4” address. There are couple things in this command that needs to be explained :
- Note that we are writing function parameter at “ebp+0x4” instead of “ebp+0x8” this is because when we call the function we have just push on the current EIP value on stack and function prologue “push ebp” is not executed yet, code after execution of function prologue refer its parameters using ebp+0x8, and so on.
- The address that we have written onto the stack is reversed of the address to which is the string was written to its because of the byte ordering as in x86 little endian, so we have to write the address in reverse order.
stack data Register func param n EBP + 0xc func param 2 EBP + 0x8 func param 1 EBP + 0x4 callee EIP <= SP when Emulator is initialized callee EBP caller local variables
pxw @ esp: this command is just for visual examination of the stack, see if thing are as we expect it.
now that we have done all the setup necessary for the execution, as an exercise I would recommend that you step through each instruction and observe the stack and register, this what the next command helps you to do so.
aes; aer ; pxW 50 @ esp ; pd 10 @ eip : this command steps one instruction at a time and displays the stack, registers and next 10 assembly instruction set from current instruction. As you might have guessed this command is a combination of multiple commands separated by “;”, below is the explanation for each command.
- aes: single stepping the code
- aer: display all the register
- pxW 10 @ esp : displays stack (pointed by esp register) 10 bytes in hex 32 bit word format.
- pd 10 @ eip : disassemble 10 instruction from address
aecu 0x5f1: single stepping each instruction is great for dynamic debugging but you might just want to run the code this the end of the function and just observe the return value. This is precisely what this command does “aecu [addr]” executes the code till the specified address and stops, then we can read off the value from eax register with “aer eax” command.
now that we have the all the commands necessary to complete the challenge which we described earlier let use radare scripting to complete the task.
As described in the challenge, the program decrypts the input string and compares it with the string if the value matches the function return true with the emulator we will try different inputs and observe the eax register which is the function return value, if the value is 1 then it means its true and 0 otherwise. There is a python binding for radare called “r2pipe” you can install it with pip.
Below is the code to accomplish just what we described.
- path of the binary is pass as the first parameter to function r2pipe.open it returns the instance, this initialized the radare instance to which the commands will be passed.
- radare instance has a function cmd to which we pass the string parameter, which is the command we described earlier, these commands may return values depending on the command which can be used to read memory or register like in case of aer eax.
- A nice thing about radare command is that it can even return command response in json format with by just specifying j in the command. for example, you can issue aerj command in it will return you all the registers as key in and its value as the register value.
We saw how we could use a function as a black box, feed input and monitor the output we could use the same functionality to decrypt obfuscated stings, by setting up appropriate function parameter and read the return value. In the next post, we will use the method to de-obfuscate Mirai botnet configuration.