Improving IDA analysis

For a typical MS Windows executable IDA does quite good job of recognizing code and creating functions and usually the result is eye-pleasing and easy to decipher. The analysis is quite good but not perfect – there are cases when it takes data for code or wrongly determines the function boundaries.

The good news are that there are easy methods to improve the situation.

It was obvious from the beginning that we can not make a perfect engine to tell code apart from data. Therefore we prepared several ways to alliviate the problem. First, the user has ultimate control over the listing and can anytime convert data to code and vice versa. Second, we created hooks for plugins. For example, each time IDA creates a function, it calls a hook named processor_t::func_bounds and a plugin has a chance to correct the function boundaries. Before creating any instruction, IDA calls processor_t::make_code and if it yields 0, IDA will forego from doing anything. The same scenario is used for data items (processor_t::make_data) and names (processor_t::rename).

In addition to these hooks, there are also events happening before and after the analysis. In fact, there are several events – one for each analysis queue. IDA has several of them:

code queue – addresses from this queue will be converted to code
function queue – addresses from this queue will be converted to functions
reanalysis queue – these addresses will be reanalyzed. This queue is used to create stack variables, correct cross references if a segment register gets modified and so on.
undefine – addresses from this queue will become unexplored
final queue – if an address from this queue is unexplored, ida will try to convert it to something (data or code). While this queue makes the listing nicer, all decisions for this queue are arbitrary ‘best-guess’. If you prefer to work with more precise yet unexplored listing, you might want to turn off the final analysis. The option is available from the kernel options.

There are some other queues (for flirt signature files and other stuff) but the mentioned ones are the most important. When any queue becomes empty, an event (processor_t::auto_queue_empty) is generated. When all queues become empty, a final event (processor_t::auto_empty) is generated and if no plugin or processor module adds anything to the queues in response to it, then the analysis is declared completed (processor_t::auto_empty_finally). Many processor modules react to these events and fine tune the listing one way or another.

The basic autoanalysis algorithm is quite simple. Guys from Determina guessed it right: http://www.determina.com/security.research/ (btw, check the presentation for more interesting stuff; they also have developed a better pdb plugin).

The answer to this problem is “use events!” You will find dozens of them in idp.hpp. You can completely change analysis outcome by providing more information to the kernel. It is very easy to hook to the events:

hook_to_notification_point(HT_IDP, my_event_handler, your_data);

and the handler will be:

int idaapi my_event_handler(void *your_data,
                            int notification_code,
                            va_list va)
{
  if ( notification_code == processor_t::make_code )
  {
    // take care of instruction creation...
  }
  return 0; // pass on the event further
}

If the analysis is not up to your expectations, just hook to events.

IDA graph mode Sainte Ida