x64 decompiler not far away

Just a short post to show you the current state of the x64 decompiler. In fact, it already mostly works but we still have to solve some minor problems. Let us consider this source code:

struct color_t
  short red;
  short green;
  short blue;
  short alpha;

extern color_t lighten(color_t c);

color_t func(int red, int green, int blue, int alpha)
  color_t c;
  c.red = red;
  c.green = green;
  c.blue = blue;
  c.alpha = alpha;
  return lighten(c);

After compilation we get the following binary code:

.text:0000000000000000 ?func@@YA?AUcolor_t@@HHHH@Z proc near
c = color_t ptr -18h
.text:0000000000000000 var_10 = qword ptr -10h
.text:0000000000000000 arg_0 = dword ptr 8
.text:0000000000000000 arg_8 = dword ptr 10h
.text:0000000000000000 arg_10 = dword ptr 18h
.text:0000000000000000 arg_18 = dword ptr 20h
mov [rsp+arg_18], r9d ; $LN3
.text:0000000000000005 mov [rsp+arg_10], r8d
.text:000000000000000A mov [rsp+arg_8], edx
.text:000000000000000E mov [rsp+arg_0], ecx
.text:0000000000000012 sub rsp, 38h
.text:0000000000000016 movzx eax, word ptr [rsp+38h+arg_0]
.text:000000000000001B mov [rsp+38h+c.red], ax
.text:0000000000000020 movzx eax, word ptr [rsp+38h+arg_8]
.text:0000000000000025 mov [rsp+38h+c.green], ax
.text:000000000000002A movzx eax, word ptr [rsp+38h+arg_10]
.text:000000000000002F mov [rsp+38h+c.blue], ax
.text:0000000000000034 movzx eax, word ptr [rsp+38h+arg_18]
.text:0000000000000039 mov [rsp+38h+c.alpha], ax
.text:000000000000003E mov rcx, qword ptr [rsp+38h+c.red] ; c
.text:0000000000000043 call ?lighten@@YA?AUcolor_t@@U1@@Z ; lighten(color_t)
.text:0000000000000048 mov [rsp+38h+var_10], rax
.text:000000000000004D mov rax, [rsp+38h+var_10]
.text:0000000000000052 add rsp, 38h
.text:0000000000000056 retn
.text:0000000000000056 ?func@@YA?AUcolor_t@@HHHH@Z endp

Please note that the c, which is a structure, is passed by value in 2 registers: rcx and rdx. We had to rework quite many things in the decompiler to support such variables (we call them scattered variables). However, the output was worth it:

color_t __fastcall func(__int16 cx0, __int16 dx0, __int16 r8_0, __int16 r9_0)
  color_t c;

  c.red = cx0;
  c.green = dx0;
  c.blue = r8_0;
  c.alpha = r9_0;
  return lighten(c);

There is still some work to be done, but it seems we solved most problematic issues. Stay tuned, there will be more decompiler news soon!


New features in Hex-Rays Decompiler 1.6

Last week we released IDA 6.2 and Hex-Rays Decompiler 1.6. Many of the new IDA features have been described in previous posts, but there have been notable additions in the decompiler as well. They will let you make the decompilation cleaner and closer to the original source. However, it might be not very obvious how to use some of them, so we will describe them in more detail.

1. Variable mapping

This is probably the simplest new feature and can be used without any extra preparation.

Sometimes the compiler stores the same variable in several places (e.g. a register and a stack slot). While the decompiler often manages to combine such locations, sometimes it’s not able to prove that they always contain the same value (especially in presence of calls that take address of stack variables). In such cases the user can help by performing such a merge or mapping manually.

Consider the following very common case:

int __stdcall SciFreeFilterInstance(_FILTER_INSTANCE *pFilterInstance)
  _FILTER_INSTANCE *v1; // esi@1

  v1 = pFilterInstance;
  if ( pFilterInstance->Signature != 'FrtS' )
  StreamClassDebugPrint(2, "Freeing filterinstance %p still open streams\n", v1);

The compiler copied an incoming argument (pFilterInstance) into a register (v1==esi). To get rid of the extra name, right-click the left-hand variable and choose “Map to another variable”, or place cursor on it and press ‘=’:


Choose the right-hand variable from the list.


Once decompilation is refreshed, both the left-hand variable (v1) and the assignment are gone. Now we have only one variable – the incoming argument.

int __stdcall SciFreeFilterInstance(_FILTER_INSTANCE *pFilterInstance)
  if ( pFilterInstance->Signature != 'FrtS' )
  StreamClassDebugPrint(2, "Freeing filterinstance %p still open streams\n",

You can map several variables to the same name, if necessary.

Made a mistake or mapped too much? It’s simple to fix. Right-click the wrongly mapped name and choose “Unmap variables”. Then choose the variable you want to see again.

2. Union selection.

This feature, naturally, only applies to unions. That means that you need to have union types in your database and assign the types to some variables or fields.

Normally the decompiler tries to choose a union field which matches the expression best, but sometimes there are several equally valid matches, and sometimes other types in the expression are wrong. In such cases, you can override the decompiler’s decision. For example, this code is common in Windows drivers:

NTSTATUS __stdcall DispatchDeviceControl(PDEVICE_OBJECT DeviceObject, PIRP Irp)
  PIO_STACK_LOCATION stacklocation; // ebx@1

  stacklocation = Irp->Tail.Overlay.CurrentStackLocation;
  if ( *&stacklocation->Parameters.Create.FileAttributes == 0x224010 )
    v8 = stacklocation->Parameters.Create.Options == 20;
    if ( !v8 )
      goto LABEL_18;
    if ( stacklocation->Parameters.Create.SecurityContext < 1 )
      goto LABEL_87;
    v23 = Irp->AssociatedIrp.MasterIrp;

Since we know we’re in a DeviceControl handler, it’s likely the code is inspecting the Parameters.DeviceIoControl substructure and not Parameters.Create.

Right-click the field and choose “Select union field”, or place cursor on it and press Alt-Y.


Choose the Parameters.DeviceIoControl.IoControlCode field.


Other references to Parameters.Create can be fixed the same way. The updated decompilation makes more sense:

NTSTATUS __stdcall DispatchDeviceControl(PDEVICE_OBJECT DeviceObject, PIRP Irp)
  PIO_STACK_LOCATION stacklocation; // ebx@1

  stacklocation = Irp->Tail.Overlay.CurrentStackLocation;
  if ( stacklocation->Parameters.DeviceIoControl.IoControlCode == 0x224010 )
    v8 = stacklocation->Parameters.DeviceIoControl.InputBufferLength == 20;
    if ( !v8 )
      goto LABEL_18;
    if ( stacklocation->Parameters.DeviceIoControl.OutputBufferLength < 1 )
      goto LABEL_87;


This macro is commonly use in Windows drivers to get a pointer to the parent structure when we have a pointer to one of its fields.

For example, consider these two structures, used in a driver:

  ULONG  SizeOfThisPacket;
  ULONG  StreamNumber;
  PVOID  HwStreamExtension;

  _FILE_OBJECT *FilterFileObject;
  _FILE_OBJECT *FileObject;
  _FILTER_INSTANCE *FilterInstance;
  _HW_STREAM_OBJECT HwStreamObject;

The following function accepts a pointer to _HW_STREAM_OBJECT:

void __cdecl StreamClassStreamNotification(
  int NotificationType,
  _HW_STREAM_OBJECT *StreamObject,
  _KSEVENT_ENTRY *EventEntry,
  GUID *EventSet,
  ULONG EventId);

But immediately converts it into the containing _STREAM_OBJECT:

mov     eax, [ebp+StreamObject]
test    eax, eax
push    ebx
push    esi
lea     esi, [eax-_STREAM_OBJECT.HwStreamObject]

Default decompilation doesn’t look great:

  char *v6; // esi@1
  v6 = (char *)&StreamObject[-2] - 36;

There are two ways to make it nicer:

  1. Change type of v6 to be _STREAM_OBJECT*. The decompiler will detect that the expression “lines up” and convert it to use the macro.
  2. Right-click on the delta being subtracted (-36), select “Structure offset” and choose _STREAM_OBJECT from the list.

In both cases you should get a nice expression:

  v6 = CONTAINING_RECORD(StreamObject, _STREAM_OBJECT, HwStreamObject);

N.B.: currently you need to refresh the decompilation (press F5) to see the changes. We’ll improve it to happen automatically in future.

4. Kernel and user-mode macros involving fs segment access.

On Windows, the fs segment is used to store various thread-specific (for user-mode) or processor-specific (for kernel mode) data. Hex-Rays Decompiler 1.6 detects the most common ways of accessing them and converts them to corresponding macros. However, this functionality requires presence of specific types in the database. For user mode, it is the _TEB structure, for kernel mode it’s the KPCR structure.

For example, consider the following code:

mov     eax, large fs:18h
mov     eax, [eax+30h]
push    24h
push    8
push    dword ptr [eax+18h]
call    ds:__imp__RtlAllocateHeap@12 ; RtlAllocateHeap(x,x,x)
mov     esi, eax

If you don’t have the _TEB structure in types, this will be decompiled to:

  v5 = RtlAllocateHeap(*(_DWORD *)(*(_DWORD *)(__readfsdword(24) + 48) + 24), 8, 36);

However, if you do add the type, it will look much nicer:

  v5 = RtlAllocateHeap(NtCurrentTeb()->ProcessEnvironmentBlock->ProcessHeap, 8, 36);

Currently we support the following macros:

Macro Required types
NtCurrentTeb _TEB
KeGetCurrentPrcb KPCR, KPCRB
KeGetCurrentProcessorNumber KPCR
KeGetCurrentThread KPCR, _KTHREAD

Hint: the easiest way to get _TEB or KPCR types into your database is using the PDB plugin. Invoke it from File|Load file|PDB file…, enter a path to kernel32.dll (for user-mode code) or ntoskrnl.exe (for kernel-mode code), and check the “Types only” checkbox.


PDBs for those two files usually contain the necessary OS structures.

We hope you will like these new additions. Note that the version 1.6 includes even more improvements and fixes, see the full list of the new features and the comparison page.

Recon 2011: Practical C++ Decompilation

Last month I visited the Recon conference and had a great time again. I gave a talk on C++ decompilation and how to handle it in IDA and Hex-Rays decompiler. You can get the slides here, and download the recorded talk here.

Edit: for some reason the streaming version does not show anything after the intro, please download the Quicktime version until it’s fixed.


Hex-Rays against Aurora

As everyone knows, Google and some other companies were under a targeted attack a few days ago. A vulnerability in the Internet Explorer was used to penetrate the computers.
An IDA user very kindly sent us the following link

Continue reading Hex-Rays against Aurora

Hex-Rays Decompiler primer

The Hex-Rays Decompiler 1.0 was released more than two years ago.
Since then it has improved a lot and does a great job decompiling real-life code, but sometimes there are additional things that you might wish to do with its output.
For that purpose we have released the Hex-Rays Decompiler SDK and several sample plugins.
However, the header files alone do not give a complete picture and it can be difficult to see where to start.

In this post we will outline the architecture of the Hex-Rays Decompiler SDK, cover some principles and finally wrap everything we discussed and write a small plugin.

Continue reading Hex-Rays Decompiler primer

Decompiling floating point

It is a nice feeling, when, after long debugging nights, your software
finally runs and produces meaningful results. Another hallmark is when other users
start to use it and obtain useful results. Usually this period is very busy: lots
of new bugs are discovered and fixed, unforeseen corner cases are handled.
Then another period starts: when users come back
for more copies,with more ideas, request more functionality, etc. This is what is happening
with the decompiler now and I feel it is time to update you with the latest news.

Continue reading Decompiling floating point

From simple to complex

The last week Elias ran a sample malware in the Bochs emulator and I was curious to see what it exactly does. So I took the unpacked version of the malware and fed it into the decompiler. It turned out to be a pretty short downloadler (different AV vendors give it different names: Lighty after the compression method, or FraudLoad, or FakeAlert, etc). Such simple code is very easy to decompile. I renamed some functions and added some
comments to it. The final text looks like this:

Continue reading From simple to complex

BITS used as a covert channel

The idea to use BITS to download files from the internet is not new. If you check the corresponding page from Wikipedia, you will find that
Background Intelligent Transfer Service (BITS) is a component of modern Microsoft Windows operating systems that facilitates prioritized, throttled, and asynchronous transfer of files between machines using idle network bandwidth.
The web page ends with a list of third-party applications that use BITS. However, as any technical method, it can be used for evil purposes as well. Eric Landuyt analyzed a malware that exploits it for bad:
I liked the “proof of concept” WinDbg script that runs the malware in a controlled manner. Breakpoints with actions are very powerful, indeed.
Nice work, Eric!

Some functions are neater than the decompiler thinks

The decompiler makes some assumptions about the input code. Like that call instructions usually return, the memory model is flat, the function frame is set properly, etc. When these assumptions are correct, the output is good. When they are wrong, well, the output does not correspond to the input. Take, for example, the following snippet:

The decompiler produces the following pseudocode:

Apparently, the v3 variable (it corresponds to edx) is not initialized at all. Why?

Continue reading Some functions are neater than the decompiler thinks