Improving IDA analysis

For a typical MS Windows executable IDA does quite good job of recognizing code and creating functions and usually the result is eye-pleasing and easy to decipher. The analysis is quite good but not perfect – there are cases when it takes data for code or wrongly determines the function boundaries.

The good news are that there are easy methods to improve the situation.

It was obvious from the beginning that we can not make a perfect engine to tell code apart from data. Therefore we prepared several ways to alliviate the problem. First, the user has ultimate control over the listing and can anytime convert data to code and vice versa. Second, we created hooks for plugins. For example, each time IDA creates a function, it calls a hook named processor_t::func_bounds and a plugin has a chance to correct the function boundaries. Before creating any instruction, IDA calls processor_t::make_code and if it yields 0, IDA will forego from doing anything. The same scenario is used for data items (processor_t::make_data) and names (processor_t::rename).

In addition to these hooks, there are also events happening before and after the analysis. In fact, there are several events – one for each analysis queue. IDA has several of them:

  • code queue – addresses from this queue will be converted to code
  • function queue – addresses from this queue will be converted to functions
  • reanalysis queue – these addresses will be reanalyzed. This queue is used to create stack variables, correct cross references if a segment register gets modified and so on.
  • undefine – addresses from this queue will become unexplored
  • final queue – if an address from this queue is unexplored, ida will try to convert it to something (data or code). While this queue makes the listing nicer, all decisions for this queue are arbitrary ‘best-guess’. If you prefer to work with more precise yet unexplored listing, you might want to turn off the final analysis. The option is available from the kernel options.

There are some other queues (for flirt signature files and other stuff) but the mentioned ones are the most important. When any queue becomes empty, an event (processor_t::auto_queue_empty) is generated. When all queues become empty, a final event (processor_t::auto_empty) is generated and if no plugin or processor module adds anything to the queues in response to it, then the analysis is declared completed (processor_t::auto_empty_finally). Many processor modules react to these events and fine tune the listing one way or another.

The basic autoanalysis algorithm is quite simple. Guys from Determina guessed it right: http://www.determina.com/security.research/ (btw, check the presentation for more interesting stuff; they also have developed a better pdb plugin).

The answer to this problem is “use events!” You will find dozens of them in idp.hpp. You can completely change analysis outcome by providing more information to the kernel. It is very easy to hook to the events:

hook_to_notification_point(HT_IDP, my_event_handler, your_data);

and the handler will be:

int idaapi my_event_handler(void *your_data,
                            int notification_code,
                            va_list va)
{
  if ( notification_code == processor_t::make_code )
  {
    // take care of instruction creation...
  }
  return 0; // pass on the event further
}

If the analysis is not up to your expectations, just hook to events.

This entry was posted in IDA Pro. Bookmark the permalink.

2 Responses to Improving IDA analysis

  1. Joe Bruce says:

    A powerful notification that appears to be “missing” is an equivalent of the processor_t::is_switch() function that processor modules can define. In cases where the module source code is unavailable (e.g., PPC), it would be nice to extend this function for various compilers so switch statements could be recognized “automatically” during auto-analysis rather than requiring later repair via IDC scripts or regular plugins.
    The processor_t::is_insn_table_jump notification does not appear to have all the necessary arguments, so it does not appear to be a way to achieve this goal. Unless processor extension modules can modify the parent module’s processor_t instance, I don’t see a way to detect switch statements during auto-analysis…

  2. Ilfak Guilfanov says:

    Sorry for the delayed answer – I had no idea that the site was storing all comments in the junk folder. This is a side effect of the spam war…
    A plugin can replace the pointer to the is_switch() function with its own and achieve the goal. Some care is required to restore the old pointer before unloading the plugin but it is quite easy.
    You are right that the processor_t::is_insn_table_jump seems to be too weak and can not be used in plugins. Manually hooking to is_switch() is the only option for the moment.
    P.S. The structure to update is ‘ph’. The kernel copies the processor description from LPH to this variable at the loading time.