Decompilation gets real

Analyzing binary executables can be a very boring activity, especially when you get used to the
regular patterns. You see the same things again and again. A tool to automate the analysis or
diminish the amount of text to browse quickly becomes a dream.

On the other hand, if you are new to the binary analysis, the assembly language is very unusual
at first. You need to learn not only the processor instructions but also the standard code
sequences, calling conventions, and lots of other stuff. It is understandable that you might
nostalgically remember the high level languages you know.

Here is the tool that can help you in both cases: a decompiler that can handle the real world
code. Take a binary file, analyze it, and get a nicely formatted C text as the output.
I’m tempted to say that you could even recompile it but not, recompilation is not the goal
- the program analysis is.

Let’s see how it works. Take a file, say, a virus – these days many of them are written in a high
level language. One could say that the virus writers have more comfortable environment than the
virus analysts. Hopefully the decompiler will change the situation a bit.

Go to the WinMain, check the code.

Here we call several functions, apparently the worm works
with the Internet (see WSAStartup). We could switch to the graph view to display the logic more
explicitly.

As we see, the first block checks for a condition, if it is not satisfied, we go to the end
of the function, otherwise there are some checks and some actions. We have too zoom in
and check each block to understand how the worm works.

Or we could use the decompiler to get this nice text:

Watch the demo to see the decompiler at work (5.5MB flash with audio):

Decompilation demo

The decompiler as it is today can handle compiler generated code. While it produces nice results, it lacks many features, like floating point support, exception handling, proper type derivation (and I’m sure there are hundreds of bugs are there) but eventually these things will be implemented and fixed.

The beta testing will be open soon. If you want to participate, please apply at (only professional email addresses, please; keep in mind that the decompiler works only with IDA v5.1)

For more news, subscribe to the
mailing list. It is a read only list.

I’d like to say a couple of words about the decompiler internals. As anything else in reverse
engineering, the decompiler uses many probabilistic methods and heuristics. This means that
its output can not be made 100% reliable. This is just a caveat to anyone who wants a
decompiler to recover a lost source code. If you want an automatic recovery, try something else,
this decompiler won’t work for you.

It heavily uses the data-flow analysis methods to analyze the program. In fact the decompiler
consists of two parts: the first part is an engine which works with a microcode. This engine can
reason about the microcode and optimize it with the goal of making it
as concise as possible.
The second part converts microcode to a human readable form, to a C text.

The second part is quite simple: it just displays a nicely formatted text on the screen. The
first part, the optimization engine (I haven’t come up with a nice name for it yet, neither for the
decompiler), it much more interesting. It can be developed into something bigger. Something
which is capable of answering questions about the variable ranges, code and data coverage (it
will need to use inter-function analysis for that), and other things. It could check if some
invariants hold at the given program locations. A program verification tool can be built on the top
of such an engine quite easily. Imagine an automatically generated nice report about not only
trivial buffer overflows but also about other logic flaws in the program. This engine can evolve
into such a platform. This is how I see its bright and promising future :)

This entry was posted in Uncategorized. Bookmark the permalink.

25 Responses to Decompilation gets real

  1. Joe says:

    Looks very promising, Ilfak! I’ll have to bug people about our request to approve funds for upgrades to 5.1… ;-)

  2. Jeremy says:

    This looks very nice and reminds me of REC (http://www.backerstreet.com/rec/rec.htm), although this looks much more complete than REC.

  3. It is miles and miles beyond REC. It could be said that I am biased because of my relationship with Ilfak, but which decompiler prototype can be run large real life programs and yield useful results? It does build upon previous research, sure, but one needs coding abilities to bring past and new research into the realm of usability and practicality.

  4. Chris Rohlf says:

    Ilfak,
    Nice work! One of the things I have implemented in VIPE (my personal disassembler pet project) is dumping all function calls into a C prototype next to the disassembly inline.
    I also build and store that function prototype in a first pass and when I come across the actual function code, I print all the C function prototypes from the locations that reference it (if possible). So it ends up being a nice cross reference when you can see not only that other pieces of code call that function, but exactly what arguments are pushed to it as well.

  5. D says:

    Sorry if this is a stupid question, but I suppose that this decompiler only works on x86 code?

  6. Ilfak Guilfanov says:

    Yes, currently only the x86 processor in the 32-bit flat mode is supported. However, the main engine uses universal intermediate representation (microcode) and can be improved to handle other processors.

  7. Cool stuff, good luck with the beta.

  8. Robert Burns says:

    This is a nice step forward in the field of Binary Analysis, great work Ilfak Guilfanov. I would love to beta test this, I sent you an email if you can get back to me I would love to hear more about this

  9. Marty says:

    What are the conditions of eligibility to this beta testing ? If I have a license for IDA 5.1 and that I send today an e-mail to your address, could I inevitably take part in the beta testing ?

  10. Ilfak Guilfanov says:

    The eligibility conditions are not strict and may change. One thing is certain: if you use a non-professional email address, do not want to disclose your identity, do not seem to have a valid IDA license, or you want the decompiler to recover some lost source code, your application will be rejected. The final decision is mine, sorry for that.
    Subscription to the beta testing seems to be very successful and I already have more beta testers than I planned initially. However, it is still possible to apply. If you have experience in compilers, program optimization, your input will probably be valuable.

  11. lallous says:

    Ilfak, nice work.
    I am more interested in the part of how it analyzes and figures out complex / optimized expressions, loops and if/switches.
    Imagine a scenario of code written in by hand in ASM (where you have no really known code constructs/patterns), how well would it perform?

  12. Ilfak Guilfanov says:

    It depends on how many decompiler assumption are broken in the hand written code and what you except as a result.
    For example, note the following.
    If you pass parameters in unusual places or return them in non-standard registers (like CF or EDI), then the decompiler will fail to notice it (without a human intervertion).
    If you use EBP for your purposes in a function marked as “bp-framed”, than it will fail.
    It can handle complex expressions but subexpression grouping won’t always be what you want.
    However, I have to tell that it does really good job with loops, ifs, and other constructs. I hardly could do better myself.

  13. J.C. Roberts says:

    wow.
    Just this morning I was reading over docs for COQ (don’t laugh, it’s the real name of the software -link below) to look at some of the available code verification tools out there, and sure enough, (as always), you’re working on something (potentially) better. Congrats Ilfak!
    http://coq.inria.fr/coq-eng.html
    kind regards,
    jcr

  14. GiM says:

    Now this look great. Congratulations Ilfak.
    I’d like to save some money for IDA just to see your baby in action.

  15. Martin Mocko says:

    Great job.
    I hope that there will be some “interactivity” for this feature too.
    That is what i like about IDA most – machine does it’s automatic part, and human resolves what machine can’t.
    Do you plan some interactivity for decompilation too? For example things like switching for/while/goto, giving decompiler hints about range of loops, renaming temporary variables (which represent registers) etc..

  16. Jawmht says:

    Doesn’t this neglect the possibility that the program might be packed, if it is then wouldn’t that just generate millions of integers and a single function?
    Also, what would IDA make of programs written in Delphi or Assembly?

  17. Iifak, this is wonderful, Its going to make the reversing easier than ever.
    But how about the packed code, say program is packed with UPX…will it be able to handle?
    anyway..its going to be interesting to watch the decompiler in action…!

  18. Ilfak Guilfanov says:

    How about unpacking the file before analyzing it?…

  19. Greg Wroblewski says:

    You may want to research previous work in this area, for an example:
    http://www.itee.uq.edu.au/~cristina/dcc.html
    and further research done by Cristina.
    Greg

  20. Ilfak Guilfanov says:

    Your comment is somewhat late to put it midly.
    I read and was inspired by her PhD thesis more than ten years ago. Have you seen my paper from 2001?
    Some background checks before posting comments like this would be really appreciated ;)

  21. Florian Maier says:

    Great work, i love it. Will there be a release for the public?
    //Flo

  22. reflux says:

    yea, will there be a public release?

  23. Mike says:

    Have you considered the problem of aliased sub-registers, like al/ah/ax for eax? Experience tells me that can be a bitch if using SSA (it can create a virtual explosion of assignments), but it unfortunately is a problem that must be solved.
    Also, just in case you haven’t thought of it, eax:edx for int64 return values. :-)

  24. Ilfak Guilfanov says:

    The decompiler knows that al/ax/eax have common parts. I don’t use SSA but had to find a solution for the problem anyway. I plan to describe my approach in a paper. It solves the problem with multiple register return values too.

  25. beou says:

    Hello!!
    Can this stuff decompile a hex file and turn it to C programm?
    What about 8051 and avr code???