Personally Interesting Stuff: Re-creating source code, Part 2

As I said before, I'm going to talk more about disassembly.

With a von Neumann architecture processor like the x86, there has to be an algorithm to distinguish code from data.

Long ago, when I was writing anti-virus software, I wrote my own disassembler called codegen. The algorithm I used to separate code from data was based on noticing that processor instructions came in 4 "flavors" of flow control:

NORMAL flow control is standard instruction like ADD or SUB. The instruction does not change the instruction sequence, and so the next instruction is the one after the current instruction.

GOTO flow control is an instruction like JMP. The instruction changes the instruction sequence, and the next instruction is the target of the instruction. It is not known if there is an instruction after this one, that must be determined elsewhere.

CALL flow control is an instruction like CALL or INT or JNE. The instruction can change the instruction sequence, and there are instructions both at the target of the instruction and after the current instruction.

EXIT flow control is an instruction like INT 20H. While the processor does not really "stop" processing, there is no target instruction associated with this instruction, the program has exited and it is not known if there is an instruction after this one, that must be determined elsewhere.

These 4 rules, combined with the known entry point for the program, can find most of the code in a program. By examining the instructions found more closely, it can be determined if they access data, and if so where, thus finding the data in the program.

There are some places where this algorithm needs a little human intervention, such as an interrupt vector set by the program (which is data access that determines a code location that is not able to be found by the 4 "flavors" of flow control described above), or a memory indirect JMP (for an x86 processor, this would be an instruction like
JMP BX
or
JMP WORD PTR [SI+1234H]

Personally Interesting Stuff

Monday, December 22, 2008

Re-creating source code, Part 2

1 comment:

Followers

Blog Archive

About Me