Posts by Igor Skochinsky:

    Recon 2012: Compiler Internals

    June 22nd, 2012

    This year I again was lucky to present at Recon in Montreal. There were many great talks as usual. I combined the topic of my last year’s talk on C++ reversing and my OpenRCE article on Visual C++ internals. New material was implementation of exceptions and RTTI in MSVC x64 and GCC (including Apple’s iOS).

    The videos are not up yet but here are the slides of my presentation and a few demo scripts I made for it to parse GCC’s RTTI structures and exception tables. I also added my old scripts from OpenRCE which I amended slightly for the current IDA versions (mostly changed hotkeys).

    Slides
    Scripts

    6 Comments "

    Calling IDA APIs from IDAPython with ctypes

    April 5th, 2012

    IDAPython provides wrappers for a big chunk of IDA SDK. Still, there are some APIs that are not wrapped because of SWIG limitations or just because we didn’t get to them yet. Recently, I needed to test the get_loader_name() API which is not available in IDAPython but I didn’t want to write a full plugin just for one call. For such cases it’s often possible to use the ctypes module to call the function manually.

    The IDA APIs are provided by the kernel dynamic library. In Windows, it’s called ida.wll (or ida64.wll), in Linux libida[64].so and on OS X libida[64].dylib. ctypes provides a nice feature that dynamically creates a callable wrapper for a DLL export by treating it as an attribute of a special class instance. Here’s how to get that instance under the three platforms supported by IDA:

    import ctypes
    idaname = "ida64" if __EA64__ else "ida"
    if sys.platform == "win32":
        dll = ctypes.windll[idaname + ".wll"]
    elif sys.platform == "linux2":
        dll = ctypes.cdll["lib" + idaname + ".so"]
    elif sys.platform == "darwin":
        dll = ctypes.cdll["lib" + idaname + ".dylib"]
    

    We use “windll” because IDA APIs use stdcall calling convention on Windows (check the definition of idaapi in pro.h).

    Now we just need to call our function just as if it was an attribute of the “dll” object. But first we need to prepare the arguments. Here’s the declaration from loader.hpp:

    idaman ssize_t ida_export get_loader_name(char *buf, size_t bufsize);

    ctypes provides a convenience functions for creating character buffers:

    buf = ctypes.create_string_buffer(256)

    And now we can call the function:

    dll.get_loader_name(buf, 256)

    To retrieve the contents of the buffer as a Python byte string, just use its .raw attribute. The complete script now looks like this:

    import ctypes
    idaname = "ida64" if __EA64__ else "ida"
    if sys.platform == "win32":
        dll = ctypes.windll[idaname + ".wll"]
    elif sys.platform == "linux2":
        dll = ctypes.cdll["lib" + idaname + ".so"]
    elif sys.platform == "darwin":
        dll = ctypes.cdll["lib" + idaname + ".dylib"]
    buf = ctypes.create_string_buffer(256)
    dll.get_loader_name(buf, 256)
    print "loader:", buf.raw

    ctypes offers many means to interface with C code, so you can use it to call almost any IDA API.

    Comments Off

    New features in Hex-Rays Decompiler 1.6

    October 10th, 2011

    Last week we released IDA 6.2 and Hex-Rays Decompiler 1.6. Many of the new IDA features have been described in previous posts, but there have been notable additions in the decompiler as well. They will let you make the decompilation cleaner and closer to the original source. However, it might be not very obvious how to use some of them, so we will describe them in more detail.

    1. Variable mapping

    This is probably the simplest new feature and can be used without any extra preparation.

    Sometimes the compiler stores the same variable in several places (e.g. a register and a stack slot). While the decompiler often manages to combine such locations, sometimes it’s not able to prove that they always contain the same value (especially in presence of calls that take address of stack variables). In such cases the user can help by performing such a merge or mapping manually.

    Consider the following very common case:

    int __stdcall SciFreeFilterInstance(_FILTER_INSTANCE *pFilterInstance)
    {
      _FILTER_INSTANCE *v1; // esi@1
    
      v1 = pFilterInstance;
      if ( pFilterInstance->Signature != 'FrtS' )
        RtlAssert(
          "(pFilterInstance)->Signature==SIGN_FILTER_INSTANCE",
          "d:\\xpsprtm\\drivers\\wdm\\dvd\\class\\codinit.c",
          0x17A2u,
          0);
      StreamClassDebugPrint(2, "Freeing filterinstance %p still open streams\n", v1);
    

    The compiler copied an incoming argument (pFilterInstance) into a register (v1==esi). To get rid of the extra name, right-click the left-hand variable and choose “Map to another variable”, or place cursor on it and press ‘=’:

    mapvar2

    Choose the right-hand variable from the list.

    mapvar3

    Once decompilation is refreshed, both the left-hand variable (v1) and the assignment are gone. Now we have only one variable – the incoming argument.

    int __stdcall SciFreeFilterInstance(_FILTER_INSTANCE *pFilterInstance)
    {
      if ( pFilterInstance->Signature != 'FrtS' )
        RtlAssert(
          "(pFilterInstance)->Signature==SIGN_FILTER_INSTANCE",
          "d:\\xpsprtm\\drivers\\wdm\\dvd\\class\\codinit.c",
          0x17A2u,
          0);
      StreamClassDebugPrint(2, "Freeing filterinstance %p still open streams\n",
        pFilterInstance);

    You can map several variables to the same name, if necessary.

    Made a mistake or mapped too much? It’s simple to fix. Right-click the wrongly mapped name and choose “Unmap variables”. Then choose the variable you want to see again.

    2. Union selection.

    This feature, naturally, only applies to unions. That means that you need to have union types in your database and assign the types to some variables or fields.

    Normally the decompiler tries to choose a union field which matches the expression best, but sometimes there are several equally valid matches, and sometimes other types in the expression are wrong. In such cases, you can override the decompiler’s decision. For example, this code is common in Windows drivers:

    NTSTATUS __stdcall DispatchDeviceControl(PDEVICE_OBJECT DeviceObject, PIRP Irp)
    {
      PIO_STACK_LOCATION stacklocation; // ebx@1
    
      stacklocation = Irp->Tail.Overlay.CurrentStackLocation;
      if ( *&stacklocation->Parameters.Create.FileAttributes == 0x224010 )
      {
        v8 = stacklocation->Parameters.Create.Options == 20;
        if ( !v8 )
          goto LABEL_18;
        if ( stacklocation->Parameters.Create.SecurityContext < 1 )
          goto LABEL_87;
        v23 = Irp->AssociatedIrp.MasterIrp;

    Since we know we’re in a DeviceControl handler, it’s likely the code is inspecting the Parameters.DeviceIoControl substructure and not Parameters.Create.

    Right-click the field and choose “Select union field”, or place cursor on it and press Alt-Y.

    selunion2

    Choose the Parameters.DeviceIoControl.IoControlCode field.

    selunion3

    Other references to Parameters.Create can be fixed the same way. The updated decompilation makes more sense:

    NTSTATUS __stdcall DispatchDeviceControl(PDEVICE_OBJECT DeviceObject, PIRP Irp)
    {
      PIO_STACK_LOCATION stacklocation; // ebx@1
    
      stacklocation = Irp->Tail.Overlay.CurrentStackLocation;
      if ( stacklocation->Parameters.DeviceIoControl.IoControlCode == 0x224010 )
      {
        v8 = stacklocation->Parameters.DeviceIoControl.InputBufferLength == 20;
        if ( !v8 )
          goto LABEL_18;
        if ( stacklocation->Parameters.DeviceIoControl.OutputBufferLength < 1 )
          goto LABEL_87;

    3. CONTAINING_RECORD macro

    This macro is commonly use in Windows drivers to get a pointer to the parent structure when we have a pointer to one of its fields.

    For example, consider these two structures, used in a driver:

    struct _HW_STREAM_OBJECT {
      ULONG  SizeOfThisPacket;
      ULONG  StreamNumber;
      PVOID  HwStreamExtension;
      ...
    } HW_STREAM_OBJECT, *PHW_STREAM_OBJECT;
    
    struct _STREAM_OBJECT
    {
      _COMMON_OBJECT ComObj;
      _FILE_OBJECT *FilterFileObject;
      _FILE_OBJECT *FileObject;
      _FILTER_INSTANCE *FilterInstance;
      _HW_STREAM_OBJECT HwStreamObject;
      ...
    };

    The following function accepts a pointer to _HW_STREAM_OBJECT:

    void __cdecl StreamClassStreamNotification(
      int NotificationType,
      _HW_STREAM_OBJECT *StreamObject,
      _HW_STREAM_REQUEST_BLOCK *pSrb,
      _KSEVENT_ENTRY *EventEntry,
      GUID *EventSet,
      ULONG EventId);

    But immediately converts it into the containing _STREAM_OBJECT:

    mov     eax, [ebp+StreamObject]
    test    eax, eax
    push    ebx
    push    esi
    lea     esi, [eax-_STREAM_OBJECT.HwStreamObject]
    

    Default decompilation doesn’t look great:

      char *v6; // esi@1
      v6 = (char *)&StreamObject[-2] - 36;
    

    There are two ways to make it nicer:

    1. Change type of v6 to be _STREAM_OBJECT*. The decompiler will detect that the expression “lines up” and convert it to use the macro.
    2. Right-click on the delta being subtracted (-36), select “Structure offset” and choose _STREAM_OBJECT from the list.

    In both cases you should get a nice expression:

      v6 = CONTAINING_RECORD(StreamObject, _STREAM_OBJECT, HwStreamObject);

    N.B.: currently you need to refresh the decompilation (press F5) to see the changes. We’ll improve it to happen automatically in future.

    4. Kernel and user-mode macros involving fs segment access.

    On Windows, the fs segment is used to store various thread-specific (for user-mode) or processor-specific (for kernel mode) data. Hex-Rays Decompiler 1.6 detects the most common ways of accessing them and converts them to corresponding macros. However, this functionality requires presence of specific types in the database. For user mode, it is the _TEB structure, for kernel mode it’s the KPCR structure.

    For example, consider the following code:

    mov     eax, large fs:18h
    mov     eax, [eax+30h]
    push    24h
    push    8
    push    dword ptr [eax+18h]
    call    ds:__imp__RtlAllocateHeap@12 ; RtlAllocateHeap(x,x,x)
    mov     esi, eax

    If you don’t have the _TEB structure in types, this will be decompiled to:

      v5 = RtlAllocateHeap(*(_DWORD *)(*(_DWORD *)(__readfsdword(24) + 48) + 24), 8, 36);

    However, if you do add the type, it will look much nicer:

      v5 = RtlAllocateHeap(NtCurrentTeb()->ProcessEnvironmentBlock->ProcessHeap, 8, 36);

    Currently we support the following macros:

    Macro Required types
    NtCurrentTeb _TEB
    KeGetPcr KPCR
    KeGetCurrentPrcb KPCR, KPCRB
    KeGetCurrentProcessorNumber KPCR
    KeGetCurrentThread KPCR, _KTHREAD

    Hint: the easiest way to get _TEB or KPCR types into your database is using the PDB plugin. Invoke it from File|Load file|PDB file…, enter a path to kernel32.dll (for user-mode code) or ntoskrnl.exe (for kernel-mode code), and check the “Types only” checkbox.

    kernpdb

    PDBs for those two files usually contain the necessary OS structures.

    We hope you will like these new additions. Note that the version 1.6 includes even more improvements and fixes, see the full list of the new features and the comparison page.

    9 Comments "

    IDA Pro 6.2 beta

    September 12th, 2011

    Soon we are going to start testing the next IDA version. There will be many improvements. Some of them we have mentioned previously:

    Proximity view
    PE+ support for Bochs (64-bit PE files)
    UI shortcut editor
    Filters in choosers
    Database snapshots

    Other new major features:

    • GUI installers for Linux and OS X


    • Automatic check for new versions:

    • Cross-references to structure members:

    • Floating licenses: our licensing system is now more flexible and allows big enterprises to purchase floating licenses. Contact sales@hex-rays.com for more information.

    If you have an active license and would like to test the beta, please send a message to support@hex-rays.com.

    3 Comments "

    Recon 2011: Practical C++ Decompilation

    August 2nd, 2011

    Last month I visited the Recon conference and had a great time again. I gave a talk on C++ decompilation and how to handle it in IDA and Hex-Rays decompiler. You can get the slides here, and download the recorded talk here.

    Edit: for some reason the streaming version does not show anything after the intro, please download the Quicktime version until it’s fixed.

     

    3 Comments "

    Recon 2010: Intro to Embedded Reverse Engineering for PC reversers

    August 24th, 2010

    In July I had the honor to speak at the Recon conference in Montreal, Canada. It was my first conference but I really liked the experience. I hope I’ll be able to attend it in future.
    The presentations were recorded and hopefully will appear on the Recon site soon but for now you can check out the slides (ODP, PDF). I have also uploaded some of the tools I mentioned, most notably various filesystem extractors compiled for Win32 (download).

    7 Comments "

    Debugging ARM code snippets in IDA Pro 5.6 using QEMU emulator

    January 8th, 2010

    Introduction

    IDA Pro 5.6 has a new feature: automatic running of the QEMU emulator. It can be used to debug small code snippets directly from the database.
    In this tutorial we will show how to dynamically run code that can be difficult to analyze statically.

    Target

    As an example we will use shellcode from the article “Alphanumeric RISC ARM Shellcode” in Phrack 66.
    It is self-modifying and because of alphanumeric limitation can be quite hard to undestand. So we will use the debugging feature to decode it.

    Read the rest of this entry “

    5 Comments "

    Advanced Windows Kernel Debugging with VMWare and IDA’s GDB debugger

    February 19th, 2009

    We have already published short tutorial on Windows kernel debugging
    with IDA and VMWare on our site, but the debugging experience can
    still be improved.

    VMWare’s GDB stub is very basic, it doesn’t know anything about processes or
    threads (for Windows guests), so for anything high-level we’ll need
    to do some extra work. We will show how to get the loaded module list
    and load symbols for all them using IDAPython.

    Read the rest of this entry “

    7 Comments "