Microcode in pictures

Since a picture is worth thousand words below are a few drawings for your perusal. Let us start at the top level, with the mbl_array_t class, which represents the entire microcode object:

The above picture does not show the control flow graph. For that we use predecessor and successor lists:

Pay attention to the block types here. Then, each basic block (mblock_t) contains a list of instructions:

Instructions (insn_t) can be nested, and the next drawing shows how it looks like:

As you see, conceptually things are quite simple. But the devil is in the details, as usual 🙂

 

IDAPython: wrappers are only wrappers

Intended audience

IDAPython developers who enjoy the occasional headache, leaky abstraction enthousiasts, or simply the curious.

TL;DR

IDAPython wraps C++ types, and the lifecycle of C++ objects (and in particular members of larger objects) is not necessarily the same as that of the Python wrapper object that is wrapping it.

The problem

One of our users reported IDA crashes when an IDAPython script of theirs. The user came up with a very simple way to reproduce the issue (thank you!), showing that this had to do with accessing the parents member of a ida_hexrays.ctree_visitor_t instance.

Here is (an even more simplified version of) the script the user sent us:

from ida_hexrays import *

my_parents = None

class my_visitor_t(ctree_visitor_t):
    def __init__(self, func):
        ctree_visitor_t.__init__(self, CV_PARENTS)

    def visit_expr(self, i):
        global my_parents
        if self.parents is not None:
            my_parents = self.parents
        return 0


def my_cb(event, *args):
    if event == hxe_print_func:
        f = args[0]
        my_visitor_t(f).apply_to(f.body, None)
        import gc
        gc.collect()
        my_parents.front() # will crash
    return 0

install_hexrays_callback(my_cb)

Note: I threw a gc.collect() in there, to make crashes more likely.

The script above is provided in its entirety for the sake of completeness, but really the important lines are only the following:

    def visit_expr(self, i):
        global my_parents
        if self.parents is not None:
            my_parents = self.parents

    (...)

        my_visitor_t(f).apply_to(f.body, None)
        my_parents.front() # will crash

Details, details, details

Since this issue is non-trivial, I’ll try and provide a step-by-step explanation, hopefully as clear as can be, by annotating the important lines of code mentioned above:

        my_visitor_t(f)

Create a my_visitor_t instance. That is a subclass of the ctree_visitor_t type, which means it eventually extends a C++ object of type ctree_visitor_t.

When the underlying C++ ctree_visitor_t object is created, its member named parents (a ctree_items_t vector) is initialized. For the sake of the example, let’s say the C++ ctree_visitor_t instance is located at memory 0x1000 and the parents member is placed at memory 0x100C.

                       .apply_to(f.body, None)

Call ctree_visitor_t::apply_to. Thanks to SWiG “magic”, C++ virtual method calls will be properly redirected and our my_visitor_t.visit_expr method will be called for each cexpr_t in the tree, as expected.

        if self.parents is not None:

Access self.parents. This will create a Python wrapper object. The key here is to understand that it’s a wrapper object which is backed by the real, C++ ctree_items_t instance.

For example, any access to the object returned by self.parents, will in fact translate to an access into the C++ ctree_items_t vector, so if one were to write, e.g., self.parents.size() (or even len(self.parents)), it’s actually the real underlying C++ ctree_items_t instance’s size() method that will end up being called.

            my_parents = self.parents

Another access to self.parents, and another Python wrapper will be created (once again backed by the actual ctree_items_t vector)

[Note: the fact that another wrapper is created is not a problem (in fact since it went out of scope, the previous wrapper might already have been garbage collected!)]

Once again, for the sake of the example, let’s say the wrapping PyObject instance is placed in memory, at 0xB000.
That wrapper is then bound to the global variable my_parents, causing its python refcount to increase to 2. Past that line, the refcount will drop back to 1 (again, because of scope logic), which means that Python wrapper object will remain alive.

[...apply_to() returns, and we are now back to the `my_cb` function...]

At this point, it’s likely my_visitor_t(f) has just been garbage collected since nobody keeps a reference to it.

That means:

  • the my_visitor_t instance has been destroyed, which means
  • the underlying ctree_visitor_t C++ object located at memory 0x1000 has been deleted, which in turn means
  • its parents object, which was located at memory 0x100C, is now invalid

                my_parents.front()
    

We are now calling front() on the my_parents Python object. If you recall, that my_parents object is a Python wrapper object located in memory at 0xB000. That wrapper object still has a refcount of (at least) 1, and is thus alive.

What is not quite alive anymore, however, is the actual C++ ctree_items_t vector, which was deleted as part of deleting the C++ ctree_visitor_t it belonged to.

In other words, we have a perfectly valid Python wrapper object, that has a dangling pointer to a member of a freshly-deleted C++ object.

The solution

The solution is, in terms of effort, rather simple: make a copy of the vector:

-            my_parents = self.parents
+            my_parents = ctree_items_t(self.parents)

since it doesn’t belong to the C++ ctree_visitor_t object, this copy won’t be thrashed when it is deleted.

Deobfuscating xor’ed strings

A few days ago a customer sent us a sample file. The code he sent us was using a very simple technique to obfuscate string constants by building them on the fly and using ‘xor’ to hide the string contents from static disassembly:


The decompiler recovered most of the xor’ed values but some of them were left obfuscated:

After some investigation it turned out that it is a shortcoming our the decompiler: the value propagation (or constant folding) can not handle the situation when an unusual part of a value is used in another expression. For example, if an instruction defines a four byte value, the second byte of the value can not be propagated to other expressions. More standard cases, like the low or high two bytes, or even just one byte, are handled well.

It seems that compilers never leave such constants unpropagated, this is why we did not encounter this case before.

Let us write a short decompiler plugin that would handle this situation and propagate a part of a constant into another expression. The idea is simple: as soon as we find a situation when a constant is used in a binary operation like xor, we will try to find the definition of the second operand, and if it is a constant, then we will propagate it. Graphically it will look like this:

mov #N, var.4           ; put a 4 byte constant into var
...
xor var @1.1, #M, var2.1 ; xor the second byte of var

is converted into

mov #N, var.4
...
xor #N>>8, #M, var2.1

The resulting xor will then automatically get optimized by the decompiler. However, to speed up things (to avoid another loop of optimization rules), we will call the optimize_flat() function ourselves.

Please note that we do not rely on the instruction opcode: the xor opcode can be replaced by any other binary operation, our logic will still work correctly.

Also we do not rely on the operand sizes (well, to speed up things we do not handle operands wider that 1 byte because they are handled fine by the decompiler).

Also we can handle not only the second byte, but any byte of the variable.

The final version of the plugin can be downloaded here. It is fully automatic, you just need to drop it into the plugins/ directory.

And the decompiler output looks nice now:

We could further improve the output and convert these assignments into a call to the strcpy() function, but this is left as an exercise for our dear readers 😉

P.S. Naturally, we will improve the decompiler to handle this case. The next version will include this improvement.

IDA on non-OS X/Retina Hi-DPI displays

The problem

Some users running IDA on Windows & Linux X11 platforms with Hi-DPI displays, have reported that IDA looks rather odd: the navigator bar is too narrow, the text under it gets truncated, and there is overall feeling of packing & clumsiness:

  • Windows:
  • Linux X11:

Looking closely, one can notice the following issues (probably not an exhaustive list)

Note: this should not happen and shouldn’t apply for OS X users running IDA on Retina displays. Nor should it happen (but we didn’t get a chance to test this) on non-X11 Linux display managers, such as Wayland.

Fix / mitigation:

On Linux X11 & Windows, if you are using Hi-DPI monitors and IDA looks somewhat like it does in the above screenshots, please try setting the environment variable QT_AUTO_SCREEN_SCALE_FACTOR to 1:

E.g., on Linux/X11:

~# export QT_AUTO_SCREEN_SCALE_FACTOR=1
~# path/to/ida my.idb

IDA should now look more pleasant:

  • Windows:
  • Linux X11:

Some things are still not perfect (e.g., checkboxes might remain small), but IDA definitely looks better.

Please give it a try!

Gory details

When one applies scaling/zooming, either in Windows and Linux X11, that will have the effect of causing the OS to return scaled values for font metrics when queried using point sizes (which is almost always the case.)

For example, when the font metrics for a font of size 12pt are requested by a Qt application, instead of returning 14 pixels as it would on a non-scaled system, the operating system will instead return 28 pixels on a 200% scaled one (in other words, this is essentially a font database-related feature).

That will, in turn, have the net effect of causing Qt to compute layout of the surrounding widgets according to those scaled font metrics, which explains why the text is (for the most part) not truncated.

However, what applying scaling does not do, is tell Qt that it should scale all other pixel measurements by that scale factor.

Consequently, paddings, margins, scrollbars, etc… all have uncomfortably small dimensions, especially when compared to text.

The QT_AUTO_SCREEN_SCALE_FACTOR environment variable is an opt-in that program users can define, in order to control how the program should look. It will in essence instruct Qt to perform automatic scaling of (non-font-related) graphical operations according to the pixel density of the screen(s).

More information can be found on Qt’s website.

Why is this not needed under OS X + Retina?

This is not needed under OS X + Retina, because Qt does not need to perform any kind of scaling there: the scaling is performed by the drawing primitives of the OS itself, and is entirely transparent to the application.

(In fact, an OSX application doesn’t even work with the real screen geometry, but rather with an “abstract” coordinate system, normalizing pixel sizes across screen densities.)

IDA 7.1: Qt 5.6.3 configure options & patch

A handful of our users have already requested information regarding the Qt 5.6.3 build, that is shipped with IDA 7.1.

Configure options

Here are the options that were used to build the libraries on:

  • Windows: ...\5.6.3\configure.bat "-nomake" "tests" "-qtnamespace" "QT" "-confirm-license" "-accessibility" "-opensource" "-force-debug-info" "-platform" "win32-msvc2015" "-opengl" "desktop" "-prefix" "C:/Qt/5.6.3-x64"
    • Note that you will have to build with Visual Studio 2015, to obtain compatible libs
  • Linux: .../5.6.3/configure "-nomake" "tests" "-qtnamespace" "QT" "-confirm-license" "-accessibility" "-opensource" "-force-debug-info" "-platform" "linux-g++-64" "-developer-build" "-fontconfig" "-qt-freetype" "-qt-libpng" "-glib" "-qt-xcb" "-dbus" "-qt-sql-sqlite" "-gtkstyle" "-prefix" "/usr/local/Qt/5.6.3-x64"
  • Mac OSX: .../5.6.3/configure "-nomake" "tests" "-qtnamespace" "QT" "-confirm-license" "-accessibility" "-opensource" "-force-debug-info" "-platform" "macx-g++" "-debug-and-release" "-fontconfig" "-qt-freetype" "-qt-libpng" "-qt-sql-sqlite" "-prefix" "/Users/Shared/Qt/5.6.3-x64"

patch

In addition to the specific configure options, the Qt build that ships with IDA includes the following patch. You should therefore apply it to your own Qt 5.6.3 sources before compiling, in order to obtain similar binaries (patch -p 1 < path/to/qt-5_6_3_full-IDA71.patch)

Note that this patch should work without any modification, against the 5.6.3 release as found there. You may have to fiddle with it, if your Qt 5.6.3 sources come from somewhere else.

IDAPython: namespacing for plugins, loaders & processor modules.

Intended audience

IDAPython plugins, loaders or processor modules developers.

The problem

Until now, IDAPython would load all loaders, processor modules & plugins in the '__main__' module.

This causes namespace pollution, which can sometimes leads to very obscure errors.

The solution

Starting with version 7.1, IDA will import plugins, loaders & processor modules in their own, separate Python modules.

The names of those Python modules is derived from the plugin, loader or processor module’s file name.

E.g.,

  • a plugin named myplg.py will now be imported into its own '__plugins__myplg' Python module.
  • a loader named myldr.py will now be imported into its own '__loaders__myldr' Python module.
  • a processor module named myprc.py will now be imported into its own '__procs__myprc' Python module.

My plugin/loader/processor module complains about unexisting variables/functions!

It very likely means that the code in question was, in effect, relying on some other code being loaded before it, and that was “polluting” the '__main__' module in a way that was very fortunate from its point-of-view (a happy coincidence, if you will.)

The solution is of course to fix the code in question so that it imports everything it needs, before making use of it.

However…

I’m in a hurry

As a temporary workaround, you can set NAMESPACE_AWARE to NO in cfg/python.cfg.

This will cause IDA to revert to the old behavior, where the '__main__' module is shared across all plugins, loaders, processor modules & user scripts.

However, please note that support for NAMESPACE_AWARE will be removed sometimes in the future.

IDA and common Python issues

With IDA 7.0 switching fully to native x64 architecture, we also switched to the x64 Python which brought some new issues but also exposed some we’ve seen before. This post tries to summarize the most common issues we’ve seen our users encounter as well as suggestions how to fix them or at least diagnose where it went wrong before you have to contact support.

Continue reading IDA and common Python issues

IDA 7.0: Qt 5.6.0 configure options & patch

A handful of our users have already requested information regarding the Qt 5.6.0 build, that is shipped with IDA 7.0.

Configure options

Here are the options that were used to build the libraries on:

  • Windows: ...\5.6.0\configure.bat "-nomake" "tests" "-qtnamespace" "QT"
    "-confirm-license" "-accessibility" "-opensource" "-force-debug-info" "-platform" "win32-msvc2015" "-opengl" "desktop" "-prefix" "C:/Qt/5.6.0-x64"

    • Note that you will have to build with Visual Studio 2015, to obtain compatible libs
  • Linux: .../5.6.0/configure "-nomake" "tests" "-qtnamespace" "QT" "-confirm-license" "-accessibility" "-opensource" "-force-debug-info" "-platform" "linux-g++-64" "-developer-build" "-fontconfig" "-qt-freetype" "-qt-libpng" "-glib" "-qt-xcb" "-dbus" "-qt-sql-sqlite" "-gtkstyle" "-prefix" "/usr/local/Qt/5.6.0-x64"
  • Mac OSX: .../5.6.0/configure "-nomake" "tests" "-qtnamespace" "QT" "-confirm-license" "-accessibility" "-opensource" "-force-debug-info" "-platform" "macx-g++" "-debug-and-release" "-fontconfig" "-qt-freetype" "-qt-libpng" "-qt-sql-sqlite" "-prefix" "/Users/Shared/Qt/5.6.0-x64"

patch

In addition to the specific configure options, the Qt build that ships with IDA includes the following patch. You should therefore apply it to your own Qt 5.6.0 sources before compiling, in order to obtain similar binaries.

Note that this patch should work without any modification, against the 5.6.0 release as found there. You may have to fiddle with it, if your Qt 5.6.0 sources come from somewhere else.

News about the x64 edition

Sorry for the long silence since IDA v6.95, we all were incredibly busy with the transition to the 64-bit version. We are happy to say now that we are close to the finish line and will announce the beta test soon.

Transition to x64 itself was not that hard. We have been compiling IDA in x64 mode since many years, so making it actually work was a piece of cake.

It was much more time consuming to clean up the API: make it more logical, easier to use, and even remove some obsolete stuff that we have been carrying around since ages. Switching to the x64 is the unique opportunity for this cleanup because there are no existing x64 plugins yet and we won’t be breaking any working plugins. We hope that you will like the new API much more.

Just to give you an idea: we made more than 8000 commits to our source code repository and just the commit descriptions are more than 1MB. We reached 1.6M lines of source code without taking into account any third party or auto-generated files. I haven’t counted how many lines had changed since v6.95, but it can easily be that every second line got modified during the transition. In short, it was a huge undertaking.

While it was huge, we did not manage to make everything ideal (is it possible at all?…) We will continue to work on the API in the future.

Naturally, we were also busy fixing our past bugs (309 bugs since the public release of v6.95, to be exact; fortunately almost all of them reveal themselves in very specific circumstances).

We were also working on new features. Just to give you an idea of the new stuff, see very short descriptions below.

First of all, IDA fully switched to UTF-8. It will be possible to use Unicode strings everywhere, and specify any specific encoding for a string in the input file. The databases will be kept in UTF-8 as well, which will allow us to get rid of inconsistencies between Windows and Unix versions of IDA. In fact it is deeper that a simple support of UTF-8, we have a new system for international character support but it deserves a separate blog entry. We will talk about it in more detail after the release.

We will release the PowerPC 64-bit decompiler along with IDA v7. This assembler code (which did not fit into the screenshot):

Gets converted into one line:

We added support for exception handling. IDA recognizes try/except blocks and neatly comments them in the listing:

The iOS debugger improves support for debugging dylibs from dyld_shared_cache and adds support for source code level debugging.

We have tons of other improvements: better GDBServer support, updated FLAIR signatures, improved decompiler heuristics, updated built-in functions for IDAPython and IDC, new switch table patterns, etc.

Stay tuned, we will announce the beta test soon!

IDA 6.95: Qt 5.6.0 configure options & patch

A handful of our users have already requested information regarding the Qt 5.6.0 build, that is shipped with IDA 6.95.

Configure options

Here are the options that were used to build the libraries on:

  • Windows: ...\5.6.0\configure.bat "-nomake" "tests" "-qtnamespace" "QT" "-confirm-license" "-accessibility" "-opensource" "-force-debug-info" "-platform" "win32-msvc2015" "-opengl" "desktop" "-prefix" "C:/Qt/5.6.0"
    • Note that you will have to build with Visual Studio 2015, to obtain compatible libs
  • Linux: .../5.6.0/configure "-nomake" "tests" "-qtnamespace" "QT" "-confirm-license" "-accessibility" "-opensource" "-force-debug-info" "-platform" "linux-g++-32" "-developer-build" "-fontconfig" "-qt-freetype" "-qt-libpng" "-glib" "-qt-xcb" "-dbus" "-qt-sql-sqlite" "-gtkstyle" "-prefix" "/usr/local/Qt/5.6.0"
  • Mac OSX: .../5.6.0/build/../configure "-nomake" "tests" "-qtnamespace" "QT" "-confirm-license" "-accessibility" "-opensource" "-force-debug-info" "-platform" "macx-g++-32" "-debug-and-release" "-fontconfig" "-qt-freetype" "-qt-libpng" "-qt-sql-sqlite" "-prefix" "/Users/Shared/Qt/5.6.0"

patch

In addition to the specific configure options, the Qt build that ships with IDA includes the following patch. You should therefore apply it to your own Qt 5.6.0 sources before compiling, in order to obtain similar binaries.

Note that this patch should work without any modification, against the 5.6.0 release as found there. You may have to fiddle with it, if your Qt 5.6.0 sources come from somewhere else.