In May a contest was open on Datarescue’s forum:

http://www.datarescue.com/ubb/ultimatebb.php?/topic/4/375.html

There were some nice tries but nobody guessed it right.

It seems Datarescue will have to repeat the contest with another question

If you are curious to learn the correct answer, please read on.

For IDA Pro a program consists of functions.

A function has many attributes, like the entry point and frame size.

It may also have some locally defined items: labels and stack based variables.

Usually IDA automatically handles the stack variables: it creates stack variable

definitions (stkvars)

and converts the referencing operands into the **stkvar** type. Let’s consider

this source code:

The first sample function has been analyzed perfectly well:

There are three stack variables. They were created by IDA because of very recognizable

stack references in the **ESP+n** form.

I gave the stack variables meaningful names:

**hThread**, **enabled**, and **counter**.

In the green column after the addresses, IDA displays the stack pointer values.

The displayed values represent the difference between the initial value of the stack

pointer at the function entry point and the current value.

Look at the **push** instruction in this code. Right after it the difference

becomes 4.

There is no corresponding **pop** instruction, but this is not a mystery

for knowledgeable programmers working with MS Windows API:

functions with so called stdcall

^{1}

calling convention remove their arguments from the stack.

Fortunately, IDA knows about such functions

and carefully adjusts the stack after the call. The stack pointer

at 401012 becomes 0 as it should. Just as a sidenote, the information

about the stdcall functions

comes from the IDS files in the **ids** subdirectory and you can handle them with

special utilities available on Datarescue’s site.

But what happens when IDA does not know about the calling convention?

Look at this *wrong* listing:

At **401042** there is an indirect call and there are 2 push instructions just before it.

By examining the whole function we can tell with 100% certainty that the called function

must remove its arguments from the stack. Otherwise the return instruction at **401051**

could not return to the caller because the stack would still contain the additional bytes.

IDA did not know what to do with the call and left it as is.

As a consequence, the stack pointer values after the call became incorrect and

all references using ESP were incorrect too.

For example, the instruction at 401048 does not increment **p** but the third operand.

This was completely missed by IDA.

The current version of IDA behaves like this: it has a very simple stack tracer

which does not analyze the whole function but just traces

individual sp modifications for each instruction. Naturally, it fails

as soon as any indirect call or a call to a function with unknown effect

on the stack is made. To tell the whole truth, there is a mechanism to handle

the simplest cases when the function ends after several pushes and a call.

But this mechanism is too simple and fails more than succeeds.

The stdcall function problem is important for MS Windows programs. This calling

convention is a de facto standard for MS Windows API. It is also used in many projects

since it is the only calling convention which guarantees compatibility between different

programming languages. For example, all IDA exported functions are stdcall (there are

some exceptions, most notably functions with the variable number of arguments; for the

obvious reasons they can not be stdcall).

The good news are that the next version of IDA will handle the stack trace much better.

The second sample function will look like this:

IDA did determine that the call removes 8 bytes from the stack, adjusted the stack pointer

and correctly represented the subsequent stack variable. The function takes three arguments,

which is correct.

This information will be propagated in the disassembly and the whole disassembly

will be better.

How did IDA determine the stdcall functions? Here is the outline of the algorithm.

First we break the function down into *basic blocks*

^{2}:

(the red numbers denote the block numbers)

The second step is to build a system of linear equations based on the graph edges

and available push/call information. For each basic block we introduce two variables: the

value of the stack pointer at the entry and at the exit of the block.

For our sample function, the following equations have been generated:

The equation for block #1 uses inequality since the block may leave the 8 bytes on stack

as well as may clean them.

The third (last) step is to solve the equations and apply the results to the disassembly.

My first urge was: “hey, this is so simple, I’ll just solve these equations”.

So I implemented a tiny Gaussian elimination method (there were no inequalities in the equations yet)

and was happy with the result until a test.

The real wild world of programs turned out to be much more versatile and

weird, if not hostile. There were all kinds of functions.

Some functions **did** change the stack pointer so at the function exit it was different from its initial value.

I found their names and excluded them from the analysis.

Some functions were malformed.

There was no point of trying to analyze them.

Some functions gave so little information

about the stack pointer that it was not enough to build a sufficient number of equations.

The equations would have too many (infinite) number of solutions. I needed only one.

To choose only one solution among many possible, I had to restate the stack tracing

as an optimization problem and solve it with the simplex method ^{3}.

The optimization criteria was to find the minimum of the sum of all variables.

This is not completely correct but gives nice results in practice.

Probably it works well because compilers try to consume as little stack as

possible and do not issue meaningless **push** instructions ^{4}..

I also had to set the low bounds for all inner variables. Inner variables are

all variables which are not in_{0} and out_{n} where n is the number of a basic block

ending with **return**. Without the low bounds the solution is incorrect and can not

be applied.

The last step was to apply the obtained result back to the disassembly, which was straightforward.

There were some nasty questions, the biggest one was how to resolve

the situation with several indirect calls in a basic block.

A simple heuristic of assigning as much push instructions to a call a possible worked well enough.

The next version of IDA will use an external DLL to solve systems of linear equations.

It’s name is COIN/Clp and it comes from IBM.

This library is an overkill because our equations are derived from a graph and can be

solved in a simpler way but I opted for it because it was not obvious that something simple

would fit. In fact, it was the contrary: I constantly had to switch from simple

approaches to more sophisticated ones until the result was acceptable.

Alas, the new method does not handle all cases. There will still be stack analysis failures,

especially when the function uses structured exception handlers (SEH). Overall,

the success rate is about 97%.

I’d compare having a correct stack pointer trace to having a very flat and solid

workbench. It is not a big deal if you have a nice one. You do not even notice it when

everything works correctly. Things are different when your table is shaky and

your efforts are spent to keep it stable (look again at the wrong listing).

1. More about calling conventions:

http://blogs.msdn.com/oldnewthing/archive/2004/01/08/48616.aspx

2. Basic blocks are used quite often for program analysis:

http://en.wikipedia.org/wiki/Basic_block

3. More about simplex method:

http://en.wikipedia.org/wiki/Simplex_algorithm

4. I have to admit that I encountered some compiler-generated functions which were allocating stack space

before each call and releasing immediately after it. The allocated space was not used at all.

Nice work, I look forward to the new functionality.

So… those green stack offset counters. Is there a mechanism for me to manually give it the answer on a per-line basis? I.e. on a Call, I can manually tag it -8?

I’m guessing one of the functions you excluded from analysis is one that is used my the MS compliler that takes as an arg how many bytes to allocate on the stack of the caller?

If you don’t know which one I mean, I’ll go look it up…

Great article, nice approach to solving the stack trace problem!

I have a question though: How would you correctly trace the stack if you just have *one* big basic block?

(You may be able to assure that ESP at beginning and end are zero, but in-between how can you track correctly after each function call?)

Excellent article, well done!

Russel,

Yes, you guessed right, it is mostly about functions which allocate the stack space. If they don’t get recognized, the analysis will be very poor but I added a logic to reanalyze the appropriate functions if a function is renamed as SEH_prolog or given similar name.

You can specify the stack deltas using the Alt-K hotkey.

Lallous,

When there is only one big block, the simplex method immediately returns the answer: 0 at the entry and 0 at the end.

We apply the information to the disassembly the usual way. Several heuristics are used during this process. The main rule is to detect several pushes followed by a call and correct the call by the required amount.

Here the required amount depends on the input/output values and the instructions already encountered in the linear block (we strip off constant instructions at the beginning and end of the block). While the actual algorithm is quite complicated and takes into account compiler idiosyncrasies, it is nothing special.

Hey Ilfak,

Regarding merging the results back into the disasm: did you try splitting the basic blocks at call sites?

No, but I considered the idea. Having smaller basic blocks does not help much because the equations you can derive from them are essentially “we have a linear control flow”. In other words you would have 2 more variables and 2 more equations:

pre – in(i) = c1

out(i) – post = c2

where

preis the new variable denoting the stack state before the call andpostdenotes the state after the call.I also have an impression that the simplex method it too greedy to perform well on linear sequences like this. It will try to pop the arguments from the stack as soon as possible which is not always desirable.

I was working on a plugin that finds the places where IDA’s stack tracking algorithm fails. You just have to trace the control flow of a function and look at the stack pointer values. At every point where two control flow paths merge, the incoming stack pointer values must match. If they don’t, at least one of them is invalid and this can be reported.

This might be useful for finding the 3% cases where the new algorithm still has trouble.

The 7th equation should be “in4 – out0 = 0”, not “in4 – out3 = 0”.

Nice job any way.

Alex,

The simplex method fails not only because of different SP values on joining edges, but because of other reasons as well (for example, it can fail because of a tail call optimization). Besides, the problem of finding the failed cases does not exist. All failed cases are added to the problem list.

Thanks for telling about the wrong equation.

Fixed.

An excellent idea. But why do you say ESP must be zero when the function returns? For stdcall functions, this isn’t the case. The remaining (in)equalities seem enough to generate a solution without restricting ESP at that point. Am I missing something obvious? Still, a really great solution to the problem of tracing the stack pointer.

Thanks

1. It is the case for stdcall functions too: the esp must point to the returns address in all cases

2. Unfortunately, the remaining inequalities are not enough.