Reading assembly code

Even unobfuscated code is difficult to understand. Look at this function. Can you tell its purpose?

The answer is very simple. Here it is:

unsigned char div61(unsigned char x) { return x / 61; }

In is just a division by 61!
At the first sight it looks overly complicated. Why not use the div instruction? The reason is that this longer(!) code is faster. See some instruction timings here.
Then, when we understand this, the code still looks like a magic. There is nothing like division there, only all kinds of shuffling bits to the left and to the right. If you are really curious how this code works, consider reading this article by Torbjorn Granlund and Peter L. Montgomery:
Division by Invariant Integers using Multiplication
It would be nice to have code sequences like this automatically deciphered. Once we know the answer, the mystery disappears and only the tedious work of finding the dividend remains. A decent program analysis tool or decompiler definitely could (and should) do it.

17 thoughts on “Reading assembly code”

  1. nice to know that magic just for fun compiled it and disassembled it
    on gcc+gdb looks like the algo is almost same so some kind of pattern recognition
    should be able to isolate
    these magic chunks
    nice blog you started here ilfak and pretty informative too
    > cat ilfak.c
    unsigned char div61(unsigned char x) { return x / 61; }
    int a;
    int main (void)
    printf(“hello world\n”);
    a = div61(182);
    return 1;
    > gcc -o ilfak ilfak.c
    > ./ilfak
    hello world
    > gdb -q ilfak
    (gdb) info fun div
    All functions matching regular expression “div”:
    Non-debugging symbols:
    0x0804833c div61
    (gdb) set di i
    (gdb) disassemble div61
    Dump of assembler code for function div61:
    0x0804833c : push ebp
    0x0804833d : mov ebp,esp
    0x0804833f : sub esp,0x4
    0x08048342 : mov eax,DWORD PTR [ebp+8]
    0x08048345 : mov BYTE PTR [ebp-1],al
    0x08048348 : mov cl,BYTE PTR [ebp-1]
    0x0804834b : mov edx,0x0
    0x08048350 : mov dl,cl
    0x08048352 : mov eax,edx
    0x08048354 : add eax,eax
    0x08048356 : add eax,edx
    0x08048358 : shl eax,0x2
    0x0804835b : add eax,edx
    0x0804835d : mov edx,eax
    0x0804835f : shr dx,0x8
    0x08048363 : mov al,cl
    0x08048365 : sub al,dl
    0x08048367 : shr al
    0x08048369 : add eax,edx
    0x0804836b : shr al,0x5
    0x0804836e : and eax,0xff
    0x08048373 : leave
    0x08048374 : ret
    End of assembler dump.

  2. It must be a very clever pattern matching algorithm since modern compilers not only can reorder instructions but also can combine the several calculations into one…
    The second aspect is that the magic multiplier constant is different for different dividends and operand sizes. I mean that 8-bit division by 61 will use another magic than 32-bit division by the same number.

  3. Here is the same function but for a 32-bit argument:
    push ebp
    mov ebp, esp
    mov eax, 4325C53Fh
    mul [ebp+arg_0]
    shr edx, 4
    mov eax, edx
    mov esp, ebp
    pop ebp

  4. Nice blog entry!
    There is a simple way to see what calculation is done by doing some math with the costant.
    I’ll take your 32-Bit example:
    Convert 4325C53Fh to decimal: 1126548799
    Divide it by 2^32+number of shifts (in our case 2^32+4): 0,016…
    Then 1/x: 60,999 which is rounded 61
    In my experience this works for most cases.
    Sometimes the problem is to figure out if this is a / or a %.
    Yeah mod can also be done this way…

  5. ilfak
    thanks for replying back
    was the above snippet coded and compiled by you ?
    what compiler did you use
    i want to know why the stack is erased right in the entry stage itself after using the argument to the call i thought it must be some peculiar code in some malware the other day i was poking around w2k3
    nt!NtGdiStretchBlt and i see that stub is also employing the same kind of
    push ebp,mov ebp,esp
    then popping off ebp right before entering inside
    vc6 as far as i tried doesnt seem to generate such as proc
    neither does bcc free commandline or gcc
    could you shed some light
    is it some kind of equivalent code mov edi,edi that was then clarified to be a mechanism for hotpatching the system dlls
    also does ida by default doesnt use the size specifier ? in the above snippet movzx ecx,[ebp+arg_0] the src operand can have both byte as well as word ptr aka 0x0fb6 ,0x0fb7 how to recognize what was moved
    (well may be it doesnt make much of a difference i am just asking from newbie point of view)
    thanks and regards

  6. I just took this source code:
    unsigned char div61(unsigned char x) { return x / 61; }
    and compiled it with gcc 3.4.4 (cygming special). I do not remember the options and the calling convention, sorry.
    IDA does use the size specifier – either in the instruction or in the stack variable definition. The stack variable definitions are not visible in my code, that’s why you do not see the size specifier.
    Sometimes compilers do really strange things. In my example it decided to save a register and almost immediately restored it. With another compiler (icc?) I saw something even more stupefying:
    push ebp
    mov ebp, esp
    push esi
    mov [ebp-4], edi

    One could think that here we save esi on the stack and he would be totally wrong. In fact this code snippet saves edi!

  7. I see that I missed a comment which deserves a reply (from Atz-Soft).
    It is easy to tell apart / and % since the remainder operation is usually coded like this:
    x – x / n * n;
    If you can recognize division then recognizing remainder is pretty easy. But division can be really nasty, especially signed division since the magic number is calculated with less precision and the forumals working in most cases will fail. It seems I’ll have to post another blog entry about these funny things.

  8. Simplifying Atz-Soft’s method, there is a simpler formula:
    divisor = magic / 2^(32+shift)
    1126548799 / 2^36 = 60.99

  9. We need to make the formula more complex, not to simpify 🙂
    Consider the following snippet:
    push ebp
    shr ecx, 2
    mov eax, ecx
    mov edx, 24924925h
    mov ebp, esp
    mul edx
    pop ebp
    mov eax, edx
    The formulas will not work. No simple formula like this will work in fact.
    See the article – it is really enlightening!

  10. After hearing a portion of an interview on KFI 640, I was filled with dread. So I immediately went to the guest’s website at, and followed the instructions to download your “Windows WMF Metafile Vulnerability HotFix v. 1.2” It seemed to install properly, but when I ran the WMF Vulnerability Checker on that website, I received the error message that my system remained vulnerable.
    I then went to your site at, and found “Windows WMF Metafile Vulnerability HotFix v. 1.4”, which I then installed. However, when I ran the Vulnerability Checker from your website, I again received the error message that my system was vulnerable.
    As I type this, I assume that I made one mistake. I did not remove v. 1.2, before installing v. 1.4. But that does not explain why v. 1.2, did not correctly install, to satisfy the Vulnerability Checker.

  11. Ooops!!! I obviously misidentified your website. I don’t know what I was thinking, or typing. I only need to look at the command line, to know that this is Unfortunately, I could not find a means to edit a previous post. Silly me!!!
    In the mean time, I went to Control Panel, and found that Hotfix v. 1.4 did install. Presumably, on top of v. 1.2, since it was not listed. I removed the Hotfix patch, as reccomended, because it evidently was not working. That, or the Vulnerability Checker is not functioning properly. It is over my head, and beyond my ken.

  12. After downloading your hotfix my “Spyware Begone” (by Microsmarts) detected 2 browser hijacking files. I removed them, but the hotfix did not work any more. After the second download I tried the same with Microsoft Anti Spyware and it didn´t recognaise any spies..and the hotfix seems to work ok. Thanks !

  13. I don’t agree that we need complex pattern matching here; we need intelligent transformation. The Boomerang decompiler does something like this , though it needs much more work. After some hand editing for screwing up the parameters, it emits
    local2 = param1;
    local3 = (local2 + local2 + local2) * 4;
    local1 = (unsigned char) (local3 + local2) / 256;
    local4 = (local3 + local2 – local1) / 2;
    local5 = (local3 + local2 + local4) / 32;
    return (local5);
    When you unravel this, it does the same thing as the original program, in otherwords, it returns
    param1 * 269 / 16384,
    which is roughly division by 261. (Considering the effects of truncation, it is exactly division by 261).
    With a few more intelligent transformations, it would be there. But this is the thing: how do you pick the right transformations to apply, so that you end up with the “canonical” or “simplest” (in some sense) equivalent source code?
    By the way, the way it propagates the 12*param1 and 1*param1 into later expressions is an example of “too much propagation”. This is an intersting problem, to which I have found only an approximate solution so far.
    – Mike

  14. Thanks for the comment, Mike! I appreciate it. Here are my thoughts about the topic:
    One needs to recognize a sequence of instructions (or a subexpression) before a transformation rule can be applied – that’s why I was talking about a pattern matching. I do not know what exactly an “intelligent transformation” is so there might be something I miss.
    It seems that the most difficult part of the transformation is the pattern matching. The transformation itself – replacing a sequence of instructions with a “div” instruction – is simple and does not pose any problems. Even a very aggressive propagation can be handled without much difficulty because only the end result of the sequence will be propagated. However, it can be propagated far away from the rest of the sequence and this might cause more problems during pattern matching.
    Leaving the sequence “as it is” is not a good idea. It prevents the decompiler from recognizing the division and remainder operations in the code and consequentally from recognizing other idioms. For example:
    size_t size = ptr – array;
    callfunc(array, size);
    If the array elements are, say, 15 bytes long then the calculation of ‘ptr-array’ will involve a division. If the decompiler does not recognize it then instead of two nice and simple lines we will get a long sequence of calculations.
    To get the simplest code we have to apply only the rules which decrease the code complexity. While there are some undecidable cases, the division by multiplication rule is certainly the one to apply in all cases.
    Too much propagation exists; we can solve it at the late stages by using the classical CSE elimination algorthims. Something to consider when the decompiler reaches its maturity 🙂

Comments are closed.