Decrypting Mirai configuration With radare2 (Part 2)
This is the third part of the three-part series about code Emulation for Reversing Malware :
Part 1 describes how to use radare2 function emulation along with an exercise of cracking password of function implemented using radare2 python scripting plugin r2pipe.
Part 2 describes how to use the feature to decode a configuration of a Mirai IOT botnet, by implementing the solution in radare python scripting capabilities.
Part 3 improves the script created in the previous by adding more features of searching for addresses of encrypted string and creating function signature to search for decryption function instead of using the hard-coded address of the function.
In the previous two posts we looked at how to emulate a string decryption function call and we created a radare2 macro and python script to use that emulation, we also managed to decrypt some configuration, but not all. In this post we will continue to improve the script, we will continue with the problem finding the address of encrypted data in the previous post and use those address to decrypt the configuration. There another interesting problem I came across when testing this script on the other variant of Mirai samples, the decryption function was not present at the same address as in the previous binary, all thought the function implementation was the same, well I managed to fix that by creating function signature other cool feature of radare. We will also explore many other features improve the script and make it more portable such that if the sample is using the same decryption method then our python script should be able to decrypt the configuration. Let get right into it.
If you have paid a close attention to the reversing of the encryption function in the previous post, you would have argued that we took the wrong approach of decrypting the configuration, instead of setting the configuration struct on stack and changing the register values we could have just past the configuration index as argument and the rest would have been taken care by the function emulation. I would agree with you, but that would be true if the whole array of structures was already in place at that memory, but that was not the case. These array of the data structure is created at run-time. Let’s see in radare what is present at that location address 0x08052800 which is the base address of the data structure.
As you can see there are lots of references round this memory location pointed out by DATA XREFS from
As you can see there are lots of references to global data, maybe its copying the encrypted string and its length. You can see this pattern push one byte and push the global reference address and push register and call to a function. coping string from one location to other location and the length is passed as parameter let disassemble that function (address 0x0804e0f0).
Maybe this function(address 0x0804e0f0) coping string from one location to other location and the length is passed as a parameter. As you can see there is a loop and this loop is coping byte from edx + ebx to edx + esi edx been the loop counter variable. So we can conclude that this is a memory copy function. Question is why not use the standard library memory copy function? that because standard library function might not be available on all Linux environment remember this malware is trying to run on all the possible devices running Linux, not all environment might not have the luxury of libc standard functions, there might be many other such functions which mimic the standard C run-time function. Let’s go back to the function we came from.
Now that we know that there are global data been referenced from this function lets find all the data reference form this function and see if using these addresses we can get any meaningful string. To get all the data references from this function we can use agaj command this will give us the result in json format. This command returns the global referenced address in the title field of the json and we are not interested in other fields. Below is the python code to iterate this json and run decryption function on these global references.
# start address of the function sub.7_1_700
This is the output we get.
As you can see there are lots of meaningful string like domain names, /proc/* etc which were not decrypted by earlier string reference method. But there are other configurations which were present in the previous method but not in this method, that means we still don’t have the full configuration we can use the combination of both the method or we could try another method as shown in the next section.
Earlier we saw there was a push, push, push and call pattern to copy the encrypted string from one address to another we could find all the push instruction and extract the address from that instruction and try to decrypt the data at that address. Again, we can use the instruction search functionality which we used in the previous post to find all the push type of instruction, for that we will use /atj push to search all push function and return the result in json format.
Before searching we first have to adjust the limit of search to just this function or else radare2 will search push instruction in whole binary we can do that with e search.to and e search.from configuration. We will set the from and to of configuration to start and end of function respectively. Below is the python code to do what we just discussed.
# start address of the function sub.7_1_700
the output of this method is as below.
As you can see there are lots of meaningful string this time we can see strings that were not present in both the earlier methods like busybox commands etc. Seems like push, push pattern is the best method. Now that we are able to decrypt the data let now move the focus to make this script more portable i.e. try to remove the hardcoded address of the function and try to search those methods in the binary and used the discovered address instead.
To search a function in a binary we first have to create a signature of the function we want to search, radare has its own format of signature. All the functionality related to signatures can be found by z? command. Anyways to creating a function signature is very simple all you have to do is seek to the function and type zaf [function_name] [signature_name] this will generate the signature for the function with that name. For our example, there are two functions for which we need to create the signature:
- zaf sub.7_1__700 config_func: this command creates the signature for the function which creates the configuration data structure which is used to find the addresses of the string of encrypted configuration, this will result in signature name config_func.
- zaf fcn.decrypt decrypt: this will create the signature for decryption function.
Signatures can be saved to file with zos [filename] command and to reload the signature use zo [filename] command. We will later use this signature in our python script to search these functions start and end address.
Now to search the signature you will have to use z/ command and the resulting address can be found in sign flag space, address at which these functions are found are flagged as sign.bytes.[signature name]. To see the search results just switch to sign flag space by fs sign command and the use f command to list flag as the results. To get all the search result you can use the fj command to get all the address in json format will be used for our python script.
The next task is to get the start and end address of the function which is done by the function below. It returns start address, the address of instruction where we set the base address of the configuration data structure and end address of the function.
Similarly, there is another function which searches for the function which has the references for all the data structure.
This completes our script, we have discovered the function address and used it will last two lines of code.
In this experiment I have got into lots of trouble with running the emulation, so here are some of the debugging tips :
- Deep nested calls can have system calls which might not be emulated by radare which might hang up the execution.
- The uninitialized global variable used inside of function might hang up the emulation.
- Take care of byte ordering(Little/Big Endian) when setting up the structure on stack or else emulator might reference address outside its valid memory range.
This post ends the three part series of partial code emulation feature of radare, we used this feature to decrypt the configuration of Mirai malware and we also saw how to make the script more portable by removing the hard-coded address of the functions and replacing it by signature-based search approach. We also saw some of the technique we used to find the address of the encrypted string, the point of this exercise was to explore other capabilities of radare.