Recon 2012: Compiler Internals

This year I again was lucky to present at Recon in Montreal. There were many great talks as usual. I combined the topic of my last year’s talk on C++ reversing and my OpenRCE article on Visual C++ internals. New material was implementation of exceptions and RTTI in MSVC x64 and GCC (including Apple’s iOS).

The videos are not up yet but here are the slides of my presentation and a few demo scripts I made for it to parse GCC’s RTTI structures and exception tables. I also added my old scripts from OpenRCE which I amended slightly for the current IDA versions (mostly changed hotkeys).


Calling IDA APIs from IDAPython with ctypes

IDAPython provides wrappers for a big chunk of IDA SDK. Still, there are some APIs that are not wrapped because of SWIG limitations or just because we didn’t get to them yet. Recently, I needed to test the get_loader_name() API which is not available in IDAPython but I didn’t want to write a full plugin just for one call. For such cases it’s often possible to use the ctypes module to call the function manually.

The IDA APIs are provided by the kernel dynamic library. In Windows, it’s called ida.wll (or ida64.wll), in Linux libida[64].so and on OS X libida[64].dylib. ctypes provides a nice feature that dynamically creates a callable wrapper for a DLL export by treating it as an attribute of a special class instance. Here’s how to get that instance under the three platforms supported by IDA:

import ctypes
idaname = "ida64" if __EA64__ else "ida"
if sys.platform == "win32":
    dll = ctypes.windll[idaname + ".wll"]
elif sys.platform == "linux2":
    dll = ctypes.cdll["lib" + idaname + ".so"]
elif sys.platform == "darwin":
    dll = ctypes.cdll["lib" + idaname + ".dylib"]

We use “windll” because IDA APIs use stdcall calling convention on Windows (check the definition of idaapi in pro.h).

Now we just need to call our function just as if it was an attribute of the “dll” object. But first we need to prepare the arguments. Here’s the declaration from loader.hpp:

idaman ssize_t ida_export get_loader_name(char *buf, size_t bufsize);

ctypes provides a convenience functions for creating character buffers:

buf = ctypes.create_string_buffer(256)

And now we can call the function:

dll.get_loader_name(buf, 256)

To retrieve the contents of the buffer as a Python byte string, just use its .raw attribute. The complete script now looks like this:

import ctypes
idaname = "ida64" if __EA64__ else "ida"
if sys.platform == "win32":
    dll = ctypes.windll[idaname + ".wll"]
elif sys.platform == "linux2":
    dll = ctypes.cdll["lib" + idaname + ".so"]
elif sys.platform == "darwin":
    dll = ctypes.cdll["lib" + idaname + ".dylib"]
buf = ctypes.create_string_buffer(256)
dll.get_loader_name(buf, 256)
print "loader:", buf.raw

ctypes offers many means to interface with C code, so you can use it to call almost any IDA API.

The trace replayer

One of the new features that will be available in the next version of IDA is a trace re-player. This pseudo-debugger allows to re-play execution traces of programs debugged in IDA. The replayer debugger allows replaying traces recorded with any of the currently supported debuggers, ranging from local Linux or win32 debuggers to remote GDB targets. Currently supported targets include x86, x86_64, ARM, MIPS and PPC.

When we are re-playing a recorded trace, we can step forward and backward, set breakpoints, inspect register values, change the instruction pointer to any recorded IP, etc…

Also, trace management capabilities have been added to IDA in order to allow saving and loading recorded execution traces. Let’s see an example.

Continue reading The trace replayer

New features in Hex-Rays Decompiler 1.6

Last week we released IDA 6.2 and Hex-Rays Decompiler 1.6. Many of the new IDA features have been described in previous posts, but there have been notable additions in the decompiler as well. They will let you make the decompilation cleaner and closer to the original source. However, it might be not very obvious how to use some of them, so we will describe them in more detail.

1. Variable mapping

This is probably the simplest new feature and can be used without any extra preparation.

Sometimes the compiler stores the same variable in several places (e.g. a register and a stack slot). While the decompiler often manages to combine such locations, sometimes it’s not able to prove that they always contain the same value (especially in presence of calls that take address of stack variables). In such cases the user can help by performing such a merge or mapping manually.

Consider the following very common case:

int __stdcall SciFreeFilterInstance(_FILTER_INSTANCE *pFilterInstance)
  _FILTER_INSTANCE *v1; // esi@1

  v1 = pFilterInstance;
  if ( pFilterInstance->Signature != 'FrtS' )
  StreamClassDebugPrint(2, "Freeing filterinstance %p still open streams\n", v1);

The compiler copied an incoming argument (pFilterInstance) into a register (v1==esi). To get rid of the extra name, right-click the left-hand variable and choose “Map to another variable”, or place cursor on it and press ‘=’:


Choose the right-hand variable from the list.


Once decompilation is refreshed, both the left-hand variable (v1) and the assignment are gone. Now we have only one variable – the incoming argument.

int __stdcall SciFreeFilterInstance(_FILTER_INSTANCE *pFilterInstance)
  if ( pFilterInstance->Signature != 'FrtS' )
  StreamClassDebugPrint(2, "Freeing filterinstance %p still open streams\n",

You can map several variables to the same name, if necessary.

Made a mistake or mapped too much? It’s simple to fix. Right-click the wrongly mapped name and choose “Unmap variables”. Then choose the variable you want to see again.

2. Union selection.

This feature, naturally, only applies to unions. That means that you need to have union types in your database and assign the types to some variables or fields.

Normally the decompiler tries to choose a union field which matches the expression best, but sometimes there are several equally valid matches, and sometimes other types in the expression are wrong. In such cases, you can override the decompiler’s decision. For example, this code is common in Windows drivers:

NTSTATUS __stdcall DispatchDeviceControl(PDEVICE_OBJECT DeviceObject, PIRP Irp)
  PIO_STACK_LOCATION stacklocation; // ebx@1

  stacklocation = Irp->Tail.Overlay.CurrentStackLocation;
  if ( *&stacklocation->Parameters.Create.FileAttributes == 0x224010 )
    v8 = stacklocation->Parameters.Create.Options == 20;
    if ( !v8 )
      goto LABEL_18;
    if ( stacklocation->Parameters.Create.SecurityContext < 1 )
      goto LABEL_87;
    v23 = Irp->AssociatedIrp.MasterIrp;

Since we know we’re in a DeviceControl handler, it’s likely the code is inspecting the Parameters.DeviceIoControl substructure and not Parameters.Create.

Right-click the field and choose “Select union field”, or place cursor on it and press Alt-Y.


Choose the Parameters.DeviceIoControl.IoControlCode field.


Other references to Parameters.Create can be fixed the same way. The updated decompilation makes more sense:

NTSTATUS __stdcall DispatchDeviceControl(PDEVICE_OBJECT DeviceObject, PIRP Irp)
  PIO_STACK_LOCATION stacklocation; // ebx@1

  stacklocation = Irp->Tail.Overlay.CurrentStackLocation;
  if ( stacklocation->Parameters.DeviceIoControl.IoControlCode == 0x224010 )
    v8 = stacklocation->Parameters.DeviceIoControl.InputBufferLength == 20;
    if ( !v8 )
      goto LABEL_18;
    if ( stacklocation->Parameters.DeviceIoControl.OutputBufferLength < 1 )
      goto LABEL_87;


This macro is commonly use in Windows drivers to get a pointer to the parent structure when we have a pointer to one of its fields.

For example, consider these two structures, used in a driver:

  ULONG  SizeOfThisPacket;
  ULONG  StreamNumber;
  PVOID  HwStreamExtension;

  _FILE_OBJECT *FilterFileObject;
  _FILE_OBJECT *FileObject;
  _FILTER_INSTANCE *FilterInstance;
  _HW_STREAM_OBJECT HwStreamObject;

The following function accepts a pointer to _HW_STREAM_OBJECT:

void __cdecl StreamClassStreamNotification(
  int NotificationType,
  _HW_STREAM_OBJECT *StreamObject,
  _KSEVENT_ENTRY *EventEntry,
  GUID *EventSet,
  ULONG EventId);

But immediately converts it into the containing _STREAM_OBJECT:

mov     eax, [ebp+StreamObject]
test    eax, eax
push    ebx
push    esi
lea     esi, [eax-_STREAM_OBJECT.HwStreamObject]

Default decompilation doesn’t look great:

  char *v6; // esi@1
  v6 = (char *)&StreamObject[-2] - 36;

There are two ways to make it nicer:

  1. Change type of v6 to be _STREAM_OBJECT*. The decompiler will detect that the expression “lines up” and convert it to use the macro.
  2. Right-click on the delta being subtracted (-36), select “Structure offset” and choose _STREAM_OBJECT from the list.

In both cases you should get a nice expression:

  v6 = CONTAINING_RECORD(StreamObject, _STREAM_OBJECT, HwStreamObject);

N.B.: currently you need to refresh the decompilation (press F5) to see the changes. We’ll improve it to happen automatically in future.

4. Kernel and user-mode macros involving fs segment access.

On Windows, the fs segment is used to store various thread-specific (for user-mode) or processor-specific (for kernel mode) data. Hex-Rays Decompiler 1.6 detects the most common ways of accessing them and converts them to corresponding macros. However, this functionality requires presence of specific types in the database. For user mode, it is the _TEB structure, for kernel mode it’s the KPCR structure.

For example, consider the following code:

mov     eax, large fs:18h
mov     eax, [eax+30h]
push    24h
push    8
push    dword ptr [eax+18h]
call    ds:__imp__RtlAllocateHeap@12 ; RtlAllocateHeap(x,x,x)
mov     esi, eax

If you don’t have the _TEB structure in types, this will be decompiled to:

  v5 = RtlAllocateHeap(*(_DWORD *)(*(_DWORD *)(__readfsdword(24) + 48) + 24), 8, 36);

However, if you do add the type, it will look much nicer:

  v5 = RtlAllocateHeap(NtCurrentTeb()->ProcessEnvironmentBlock->ProcessHeap, 8, 36);

Currently we support the following macros:

Macro Required types
NtCurrentTeb _TEB
KeGetCurrentPrcb KPCR, KPCRB
KeGetCurrentProcessorNumber KPCR
KeGetCurrentThread KPCR, _KTHREAD

Hint: the easiest way to get _TEB or KPCR types into your database is using the PDB plugin. Invoke it from File|Load file|PDB file…, enter a path to kernel32.dll (for user-mode code) or ntoskrnl.exe (for kernel-mode code), and check the “Types only” checkbox.


PDBs for those two files usually contain the necessary OS structures.

We hope you will like these new additions. Note that the version 1.6 includes even more improvements and fixes, see the full list of the new features and the comparison page.

IDA Pro 6.2 beta

Soon we are going to start testing the next IDA version. There will be many improvements. Some of them we have mentioned previously:

Proximity view
PE+ support for Bochs (64-bit PE files)
UI shortcut editor
Filters in choosers
Database snapshots

Other new major features:

  • GUI installers for Linux and OS X

  • Automatic check for new versions:

  • Cross-references to structure members:

  • Floating licenses: our licensing system is now more flexible and allows big enterprises to purchase floating licenses. Contact for more information.

If you have an active license and would like to test the beta, please send a message to

Filters & Shortcuts

Two of the new UI highlights in the upcoming IDA release are filtering capability for choosers and shortcut management. I’ll be discussing them in this post, although seeing them live in action is much nicer. 😉


Filters make it possible to either show, hide or highlight one or more categories of items. But enough talk, let’s start with a screenshot.

Filters demo
Continue reading Filters & Shortcuts

Book review: IDA Pro Book, 2nd Edition

A few weeks ago we received an electronic copy of the “IDA Pro Book, 2nd Edition”. In the second edition of his 26 chapters book, Chris Eagle did a good job updating the book and covering the latest changes in IDA Pro 6.1: the IDA Qt graphical interface is illustrated in this edition (all screenshots are up to date), some chapters are slightly updated whereas some have new sections that cover topics such as IDAPython, various debugger plugins and other features.

Continue reading Book review: IDA Pro Book, 2nd Edition

Recon 2011: Practical C++ Decompilation

Last month I visited the Recon conference and had a great time again. I gave a talk on C++ decompilation and how to handle it in IDA and Hex-Rays decompiler. You can get the slides here, and download the recorded talk here.

Edit: for some reason the streaming version does not show anything after the intro, please download the Quicktime version until it’s fixed.