Some functions are neater than the decompiler thinks

The decompiler makes some assumptions about the input code. Like that call instructions usually return, the memory model is flat, the function frame is set properly, etc. When these assumptions are correct, the output is good. When they are wrong, well, the output does not correspond to the input. Take, for example, the following snippet:

The decompiler produces the following pseudocode:

Apparently, the v3 variable (it corresponds to edx) is not initialized at all. Why?

This happens because called functions usually spoil some registers. The calling conventions on x86 stipulate that only the esi, edi, ebx, and ebp registers are saved across calls. In other words, other registers may change their values (or be spoiled) by a function call. Since the decompiler assumes that functions obey the regular calling conventions, it separates edx before the call and after the call into two variables. The first variable gets optimized away and is replaced by a1. The second variable (v3) becomes uninitialized.
In fact, there are three possible cases. The edx register could be:

  1. unmodified

  2. used to return a value
  3. spoiled

by the called function. The decompiler chose the default case (#3). Let’s check if it was right. Here’s the disassembly of sub_2A795:

As we see, the edx register is not referenced at all, so we have the case #1. If the decompiler could find it out itself, without our help, our life would be much easier (maybe it will do so in the future!) Meanwhile, we have to add the required information ourselves. We do it using the Edit, Functions, Set function type command in IDA. The callee does not spoil any registers:

The decompiler produces different pseudocode:

Since it knows that edx is not modified by the call, it creates just one variable for both edx instances (before and after the call).
Were the called function returning its value in edx (the case #2), we would set its type like this:

(this prototype means: function with one argument on the stack, the argument will be popped by the callee; the result is returned in edx)
The decompiler would create two separate variables for edx, as in the case #3. The first one would be optimized away, but the second one would be initialized with the returned value:

As you see, the type information plays very important role in decompilation. In order to get a correct output, a correct input (or assumptions) must be given. Otherwise the decompiler works in the “garbage in – garbage out” mode.
Always pay attention to the types, it is a good thing to do.

4 thoughts on “Some functions are neater than the decompiler thinks”

  1. Ryan,
    It would really surprise me if Microsoft did that. If EBX is modified by inline assembler, the compiler will detect it and preserve the register:

    In addition, by using EBX, ESI or EDI in inline assembly code, you force the compiler to save and restore those registers in the function prologue and epilogue.

    Four registers, EBX, ESI, EDI, and EBP must be preserved across calls (unless the function is static and the compiler may change the calling convention to a non-standard one – a very common optimization today)

  2. I noticed some bug in IDA, it currently assumes that every procedure argument takes exactly 1 dword of stack space, and in the case of LongLong arguments stack variables are mess :(
    call ds:NdisAllocateMemory ; (PVOID *VirtualAddress,UINT Length,UINT MemoryFlags,NDIS_PHYSICAL_ADDRESS HighestAcceptableAddress)
    Last arg above is LARGE_INTEGER type, that is compiler pushes twice, and IDA don’t notices that. :(

  3. Please note that this is not the right place for technical support.
    The described problems were all solved long ago, feel free to upgrade to the latest version.

Comments are closed.