I talked briefly before about disassembly, which is the first step in decompilation.
So, for example, if the output from your disassembler is:
push bp
mov bp,sp
mov word ptr [d_0021],0000h
c_000c:
cmp word ptr [d_0021],+0ah
jge c_0026
mov si,[d_0021]
shl si,1
mov ax,[d_0021]
mov [si+d_0023],ax
inc word ptr [d_0021]
jmp short c_000c
c_0026:
pop bp
ret
what is the corresponding C code?
Well, let's look at this in pieces:
push bp
mov bp,sp
this is standard C entry code for a function:
fn()
{
Notice that no local variables are allocated -- this would shown by
a statement like:
sub sp,+02h
The next standard piece is at the end:
c_0026:
pop bp
ret
this is standard C exit code for a function:
}
In the actual code block, we have a 2-byte integer being set:
mov word ptr [d_0021],0000h
So we know we have a global integer:
int d_0021;
that is set in a C statement:
d_0021 = 0;
note that we don't know what the integer was originally named.
Next we check the variable we just set:
c_000c:
cmp word ptr [d_0021],+0ah
jge c_0026
In C:
if (d_0021 < 10)
We notice that d_0021 is being used as an index (the SI addressing
in the following):
mov si,[d_0021]
shl si,1
mov ax,[d_0021]
mov [si+d_0023],ax
and that this is a 2-byte data reference (the SHL SI,1 is a
multiply by 2). This also tells us that we have a global array
of integers:
int d_0023[some size];
If we think about it, since we know that the index variable is
limited to 0 .. 9 by the previous IF statement, we can guess that
the size is 10. Or, we could just look furthur down in the
disassembly:
d_0021 dw 0000h
d_0023 dw 000ah dup (0000h)
and realize that this is really
int d_0023[10];
Again, we don't know what it was originally named.
The previous block also shows d_0021 being used as a data value
as well as an index and a counter.
d_0023[d_0021] = d_0021;
Next, we have:
inc word ptr [d_0021]
Or, in C:
d_0021++;
And, finally, we have:
jmp short c_000c
which goes back to the
if (d_0021 < 10)
statement. Since GOTO is almost never used in C, this means that
the original translation was wrong, and this really should be
while (d_0021 < 10)
Putting this all together, the decompiled fuction is:
int d_0021;
int d_0023[10];
fn()
{
d_0021 = 0;
while (d_0021 < 10)
{
d_0023[d_0021] = d_0021;
d_0021++;
}
}
Compare this to the original C code:
int i, j[10];
main()
{
for (i = 0; i < 10; i++)
j[i] = i;
}
And you'll note that this is the same ambiguity I mentioned before. Also, while I could have determined that this was really the main() function, not a general function, it would involve a few more steps that I didn't want to get into at this time.