Docker IDA is an open-source tool used to make reverse engineering on a large-scale simpler and faster.
Thwarting Malware Threats – It’s Time to Stop Playing in the Sandbox
Many companies contend with hundreds or thousands of suspicious malware threats per day. Current popular solutions for dealing with such threats are sandboxes or simplistic static analysis tools. These are insufficient most of the time. You can get inconsistent results about files, or, investigate a file with anti-VM or anti-debugging techniques that often lead to dead-ends. In most cases, it is necessary for a professional reverse engineer expert to manually analyze the code for deeper inspection and complex insights. On a large amount of files, this can be a slow process and not very cost-effective.
Reverse engineering can get you accurate results about a file, but it is very time-consuming. In the optimal scenario, the researcher would be able to go over all of the files. With the current available crop of reverse engineering tools such as IDA Pro or Ollydbg , it is not simple to do that on a large scale. For example, if you try to execute thousands of files in a sandbox and you are able to filter 80% of the files, a reverse engineer still has to manually go through each and every one of those unfiltered files. Automating this deep analysis process would make a huge impact in the malware analysis world.
Here is what the ideal solution would look like:
So why hasn’t this process been scaled before?
Current reverse engineering tools are not designed to work on a large-scale. IDA, for example, has a resource-heavy GUI and the terminal GUI is very challenging to use and automate. You cannot easily deploy IDA, with IDAPython , or Ollydbg, with ODbgScript , instances to a cluster of servers in a simple manner, as many scripts, processes, and manual work would be involved.
Mass-Scale Malware Threats Needs a Mass-Scale Solution
We, atIntezer Labs, have developed just such a solution to this problem: Large-Scale Reverse Engineering
Our Solution ‘Contains’ the Problem
Intezer Labs was inspired by a relatively new trend in software development that allows processing at scale: containers. Containers solve the problem of how to get software to run reliably when moved from one computing environment to another. One container can wrap up a piece of software in a complete file system that includes everything it needs to run. This can be very useful when moving software from a development environment to a test or production environment and for quick and large-scale deployment. Realizing that the container solution is consistent and reliable, we combine the power of containers with the power of reverse engineering – achieving our goal of large-scale reverse engineering.
The Docker IDA Project
We chose IDA as our preferred reverse engineering tool because it is the most powerful choice for many reverse engineers. We selected Docker as our container technology since it is the most supported and stable.
There were many challenges in dockerizing IDA. For example, IDA was not designed to run at scale. Another challenge was getting all of the necessary libraries for Python, to be used with IDAPython, that we wanted to embed (i.e., Sark ) for easy automation development. With Docker IDA and the framework we created in hand, one could very easily create a docker image, deploy thousands of IDA instances, and have an automated reverse engineering experience.
What can be done with Large-Scale Reverse Engineering?
Run scripts that can do things such as:
- Automated unpacking
- String de-obfuscation
- Checking for buffer overflow exploits
And, much more… all on a large scale!
Proof of Concept
In order to show you the power of Docker IDA, we made a proof of concept on a large amount of files.
We made a script that counts the amount of calls per API function – according to the data it can give you the amount of calls for each “family” of APIs. For example, with the script, we would be able to know that a file has 40% calls to network functions, 30% to I/O functions, and 30% to cryptography functions. This is something that we could not do in any static tool or sandbox because we are not only checking the import table, but counting the amount of calls for each function which only a disassembler tool could accurately achieve.
In fact, we gathered about 1 million malware samples from various online malware repositories. Next, we created a cluster with many Docker IDA instances. Then, we ran our script on our cluster of deployed Docker IDA containers.
We got back the results in a short period of time, and now there are multiple ways to analyze the data and come to a useful conclusion.
Where can I get Docker IDA?
The Docker IDA project, including the Docker file with which you can build the Docker image, is available atIntezer’s GitHub repository. If any researcher or team wants to run an IDAPython script on a large amount of samples, they are most welcome to contact us. We have a cluster of IDA instances which are already up and running.
Thanks for reading, and also a special thanks to Hex-Rays for creating IDA. We hope you find Docker IDA useful!
After years of experience and research, we are soon launching a game-changing cloud service for instant, deep analysis of unknown cyber threats. It will transform unknown files and memory dumps into an open book – as if an experienced reverse engineer has analyzed the whole assembly code.
Subscribe to our mailing list through ourwebsite to get notified about our upcoming launch!