Decompiling floating point

It is a nice feeling, when, after long debugging nights, your software
finally runs and produces meaningful results. Another hallmark is when other users
start to use it and obtain useful results. Usually this period is very busy: lots
of new bugs are discovered and fixed, unforeseen corner cases are handled.
Then another period starts: when users come back
for more copies,with more ideas, request more functionality, etc. This is what is happening
with the decompiler now and I feel it is time to update you with the latest news.


In short, things go well. We currently can handle floating point instructions
for Borland and Visual Studio, and some GCC generated stuff. Problems remain (especially with optimized code)but we advance well. Below are a couple of samples. The first one is very simple. The following assembly function:

_my_sincos proc near

arg_0 = qword ptr 8

push ebp
mov ebp, esp
fld [ebp+arg_0]
fsincos
fxch st(1)
fmul st, st
fxch st(1)
fmul st, st
faddp st(1), st
fsqrt
mov esp, ebp
pop ebp

retn
_my_sincos endp

is converted into the following one-liner:

long double __cdecl my_sincos(double a1)
{
return
sqrt(sin(a1) * sin(a1) + cos(a1) * cos(a1));
}

Pretty simple, you may say… Well, here’s a longer one (sorry for the length of the assembler listing, please scroll down):

?ld_ull_test@@YAOO_K@Z proc near

var_40 = qword ptr -40h
var_38 = qword ptr -38h
var_30 = qword ptr -30h
var_28 = qword ptr -28h
var_20 = qword ptr -20h
var_18 = qword ptr -18h
var_10 = qword ptr -10h
var_8 = qword ptr -8
arg_0 = qword ptr 8
arg_8 = qword ptr 10h

push ebp
mov ebp, esp
sub esp, 28h
mov eax, dword ptr [ebp+arg_8+4]
push eax
; int
mov ecx, dword ptr [ebp+arg_8]
push ecx
; int
sub esp, 8
fld [ebp+arg_0]
fstp [esp+
38h+var_38]
call
?ld_ull_add@@YAOO_K@Z
add esp, 10h
fstp [ebp+arg_0]
sub esp,
8
fld [ebp+arg_0]
fstp [esp+
30h+var_30]
call
?ld_ull_cvt@@YA_KO@Z
add esp, 8
mov dword ptr [ebp+arg_8], eax
mov dword ptr [ebp+
arg_8+4], edx
mov edx, dword ptr [ebp+
arg_8+4]
push edx
; int
mov eax, dword ptr [ebp+arg_8]
push eax
; int
sub esp, 8
fld [ebp+arg_0]
fstp [esp+
38h+var_38]
call
?ld_ull_sub@@YAOO_K@Z
add esp, 10h
fstp [ebp+arg_0]
mov ecx, dword ptr [ebp+
arg_8+4]
push ecx
; int
mov edx, dword ptr [ebp+arg_8]
push edx
; int
sub esp, 8
fld [ebp+arg_0]
fstp [esp+
38h+var_38]
call
?ld_ull_mul@@YAOO_K@Z
add esp, 10h
fstp [ebp+arg_0]
mov eax, dword ptr [ebp+
arg_8]
mov ecx, dword ptr [ebp+
arg_8+4]
mov dword ptr [ebp+
var_8], eax
mov dword ptr [ebp+
var_8+4], ecx
mov edx, dword ptr [ebp+
var_8+4]
mov dword ptr [ebp+
var_10+4], edx
and dword ptr [ebp+
var_8+4], 7FFFFFFFh
fild [ebp+var_8]
and dword ptr [ebp+
var_10+4], 80000000h
mov dword ptr [ebp+var_10], 0
fild [ebp+var_10]
fchs
faddp st(1), st
fcomp [ebp+
arg_0]
fnstsw ax
test ah,
41h
jnz short loc_F0A
mov eax, dword ptr [ebp+
arg_8+4]
push eax
; int
mov ecx, dword ptr [ebp+arg_8]
push ecx
; int
sub esp, 8
fld [ebp+arg_0]
fstp [esp+
38h+var_38]
call
?ld_ull_div@@YAOO_K@Z
add esp, 10h
fstp [ebp+arg_0]
jmp short loc_F2D
; —————————————————————————
loc_F0A: ; int
push 0
push 4D2h ; int
mov edx, dword ptr [ebp+arg_8+4]
push edx
; int
mov eax, dword ptr [ebp+arg_8]
push eax
; int
sub esp, 8
fld [ebp+arg_0]
fstp [esp+
40h+var_40]
call
?ld_ull_calc@@YAOO_K0@Z
add esp, 18h
fstp [ebp+arg_0]
loc_F2D:
mov ecx, dword ptr [ebp+
arg_8+4]
push ecx
; int
mov edx, dword ptr [ebp+arg_8]
push edx
; int
sub esp, 8
fld [ebp+arg_0]
fstp [esp+
38h+var_38]
call
?ld_ull_cmpeq@@YA_NO_K@Z
add esp, 10h
movzx eax, al
test eax, eax
jz short loc_F83
mov ecx, dword ptr [ebp+
arg_8]
mov edx, dword ptr [ebp+
arg_8+4]
mov dword ptr [ebp+
var_18], ecx
mov dword ptr [ebp+
var_18+4], edx
mov eax, dword ptr [ebp+
var_18+4]
mov dword ptr [ebp+
var_20+4], eax
and dword ptr [ebp+
var_18+4], 7FFFFFFFh
fild [ebp+var_18]
and dword ptr [ebp+
var_20+4], 80000000h
mov dword ptr [ebp+var_20], 0
fild [ebp+var_20]
fchs
faddp st(1), st
fstp [ebp+
var_28]
jmp short loc_F89
; —————————————————————————
loc_F83:
fld [ebp+
arg_0]
fstp [ebp+
var_28]
loc_F89:
fld [ebp+
var_28]
mov esp, ebp

pop ebp
retn
?ld_ull_test@@YAOO_K@Z endp

The above code is translated into:

double __cdecl ld_ull_test(double a1, __int64 a2)
{
double v2; // st7@1
double v4; // [sp+18h] [bp-28h]@5
double v5; // [sp+48h] [bp+8h]@1
double v6; // [sp+48h] [bp+8h]@1
double v7; // [sp+48h] [bp+8h]@2
unsigned __int64 v8; // [sp+50h] [bp+10h]@1
v6
= ld_ull_add(a1, a2);
v8 = ld_ull_cvt(v6);
v2 = ld_ull_sub(v6, v8);
v5 = ld_ull_mul(v2, v8);
if ( (
double)v8 <= v5 )
v7 = ld_ull_calc(v5, v8, 1234i64);
else
v7 = ld_ull_div(v5, v8);
if (
ld_ull_cmpeq(v7, v8) )
v4 = (double)v8;
else
v4 = v7;
return
v4;
}

I strongly prefer the second listing to the first. In fact, the more I use
the decompiler, the less I want to return to the assembly level (this means that
you may expect source level debugging and other similar improvements in the future ;)

In order to handle floating point, we also had to improve many other aspects
of the decompiler. Here are the things I remember offhand:

  • We changed the stack variable allocation mechanism to use data flow information.
    In practice this means that reused stack frame slots are recognized and multiple
    variables are created for them. No more funny casts because of a stack slot reuse!

  • The stack variables are considered as first class citizens by the propagation and
    other algorithms. Previous versions of the decompiler were optimizing registers
    but stack variables were not optimized much. In practice: shorter and cleaner output.
    This improvement, combined with the previous one, allows us to handle reused
    function stack arguments very smoothly. It goes without saying that aliased
    stack variables are still not optimized (unfortunately, it can not be done
    automatically)

  • Made the optimization rules more robust and more efficient
  • Added more rules to remove unnecessary casts
  • Add a new algorithm to recognize call arguments
  • Better user interface (as usual, improving ui is always a good idea ;)

This list could go on with more details but let’s stop here.
Since there are some substantial changes, we will make a beta testing for the next
release. It is not that far away now – probably even this month!

This entry was posted in Decompilation. Bookmark the permalink.

4 Responses to Decompiling floating point

  1. Kender says:

    Excellent! I’ve been waiting/hoping for this for a long time.
    I *hate* floating point in assembly.
    And nicely within my year of free updates too :)

  2. karthik says:

    awesome! looking forward to it.
    Just waiting for my purchase dept to get a copy.

  3. Alex Ionescu says:

    Excellent — although there’s not much FPU code in the areas I work on, I’m really happy to see improvements #1 and 2. It seems with kernel code especially (and on checked builds), Hex-Rays was way too nice with stack allocations.
    I can finally stop having an NTSTATUS Status1, Status2, Status3, Status4, Status5, Status6, Status7……15 in a function, if I understand correctly that Hex-Rays now understands that they’re all actually the same variable.
    Feature request: Have cmpxchg, xchg recognized and converted to MSVC inlines (_InterlockedCompareExchange, etc)

  4. Jerremy says:

    Great step forward, I’ve been waiting for floating point support to be added before my purchase.
    Although the reverse engineer project I’m working on uses a lot of MMX instructions.