Trunk, Branches, and Leaves

IDA Pro being and old and time-proven platform for binary analysis,
many plugins grew on it. There are custom made plugins for new processors
and file formats. There are deobfuscators, exporters, data visualizers,
object reconstructors and other stuff.


No one can preview and implement everything. Some “innovations” are
the result of software analysis improvements: malware authors
come up with something new if the old obfuscation methods do not work anymore.
New platforms and compilers require different analysis: for example,
the latest GNU compilers generate quite complex code which requires much
deeper approach.

Open architecture gives the users the opportunity to extend the core engine and build on it.
Be it one-day small script or plugin or something fundamental and serious,
it is for the benefit of everyone.

That’s why the decompiler will have an API. While it itself is built on the top of IDA,
you will be able build on the top of the decompiler. This is a pretty natural growth pattern:

Below are the descriptions in no particular order:

  • Typist

    This plugin reconstructs object types used in the program. The object boundaries
    can be approximatively determined as a side effect.

  • Ranger

    This plugin uses data flow analysis to find out possible value ranges of local
    variables and global data.

  • Classifier

    The output of the Typist is leveraged into class (object) definitions.
    Class hierarchy emerges as a result. The notion of virtual functions
    comes into existence.

  • Inliner

    Find code sequences which can be converted into inline functions.
    The output becomes more readable.

  • Code Slicer

    This plugin optimizes functions by performing ‘slices’ of
    only possible input argument values. For example, if a function with two argument
    is known to be always called with the second argument equal to zero, the plugin can
    remove all code which handles non-zero cases. More generic form of this plugin
    performs slicing on other data values, not only on function arguments.

  • FlowVisor

    Data flow visualizer. It uses information provided by the decompiler
    engine and other plugins. May have several different display methods.
    The least intrusive display is in the form of mouse hints (locations where the current
    variable is used/defined, its possible values, tainted/no). It can also display
    graphs and plain text. Other plugins will have their own visualization methods
    but this plugin will provide services for other plugins to use.

  • TaintStopper

    Performs taint analysis and displays potential uses of untrusted data.

  • VeriHeap

    Memory allocation verifier. Typical problems like failure to verify
    the result of memory allocation, double frees, frees of non-allocated
    memory can be detected.

  • CleanBounds

    Verify object boundaries are respected and there are no overflows.

  • JunkCollector

    Detects unreachable functions and removes from the further analysis.

  • Idiomizer

    This is a generic name for plugins which verify consistent use of programming
    idioms. For example, if before modifying a variable we acquire a lock
    in all program locations but one, we have a idiom violation. There are many
    programming idioms and there can be many different idiom verifiers.

  • Exporter

    Generic name for plugins which export information into other systems. The output
    can be ubiquitous XML or old good SQL databases.

  • Transformer

    Generic name for plugins which modify the decompiler output. The goal can vary
    tremendously from making the output more human readable to optimizing or instrumenting it.
    CodeSlicer and Inliner are examples of such plugins.

  • Microgen

    Generic name for plugins which translate assembly text into microcode.
    Microgens are also responsible for mapping CPU registers into microcode
    registers and resolving memory references. Microgens ‘port’ the decompiler
    to new processors and platforms. Ideally, we need to divide them into two parts:
    processor specific and operating system (environment) specific parts.

  • Procrustes

    Generic name for plugins which modify the assembly text to conform the
    decompiler assumptions. An example: low level assembly instructions which
    are not used by compilers and therefore can not be decompiled are replaced
    by equivalent function calls. These plugins are add-ons to microgens.

  • Vizier

    A plugin which modifies the core decompiler engine by adding a new transformation rule.
    For example, if some data is known to be read-only but the decompiler
    has no means of knowing it, a plugin could replace “load memory”
    instructions by “load constant” instructions for this data.

Plugins like CleanBounds, VeriHeap, and Idiomizer can be used to solve today’s practical problems.
Other plugins can be used to facilitate binary analysis and render it less time consuming.

I tried to come up with the list of plugins I’d personally like to have.
The list is far from being exhaustive. Feel free to add to it ;)



Plugins names and descriptions are completely fictional.

This entry was posted in Decompilation. Bookmark the permalink.

7 Responses to Trunk, Branches, and Leaves

  1. igorsk says:

    Hmm, here’s what comes immediately to mind…
    PyRays
    Scripting access to the API and ability to implement new plugins in Python.
    OOReconstructor
    Identify classes, methods, and relationships between them. Recover class structures.

  2. Ilfak Guilfanov says:

    PyRays is a good idea, maybe I have to make the header files SWIG compatible to facilitate new language bindings.
    OOReconstructor looks the same as Classifier.

  3. igorsk says:

    Oops, you’re right, I missed it :)

  4. hume says:

    It seems great!
    my question is:
    1.Does the microcode support typeinfo?
    2.When will hex-rays be released? I can’t wait anymore!Will it be included in the advanced version or be a sole product?

  5. Ilfak Guilfanov says:

    Thanks.
    The microcode does not really have type information. It is added at the later phase when it is converted into c tree.
    Concerning the release date, we have to finish the beta testing first. So far so good but there are no firm deadlines yet. It will be a separate product.

  6. Rolf Rolles says:

    The ability to manipulate the AST would be a nice touch.

  7. Rolf Rolles says:

    By the way, how’d you make that nice graphic?