A compiler takes as input high-level source code and outputs assembly code, which is then assembled into binary. Most assemblers come with a disassembler that can be used to convert binary code back to (human readable... sort of) assembly code.
With NASM, the disassembler is called ndisasm. Say you have a C program called stuff.c as follows:
#include <stdio.h>
int main() {
int i;
int sum = 0x1234;
sum += 0xABCDEF;
for (i=0; i < 10; i++) {
sum += i;
}
printf("sum=%d\n",sum);
sum = 0x2345;
}
Here is a sequence of commands to look at the assembly code generated by the compiler (in a 32-bit world):
% gcc -m32 cprogram.c -o cprogram
% ndisasm -b 32 cprogram > cprogram.asm
The file cprogram.asm now contains the disassembled code. On my Linux box it has 3323 lines! You note that in the C code I have put some “easy to spot once translated to assembly” constant. The relevant piece of assembly is:
000003ED C744241C34120000 mov dword [esp+0x1c],0x1234
000003F5 C744241800000000 mov dword [esp+0x18],0x0
000003FD EB0D jmp short 0x40c
000003FF 8B442418 mov eax,[esp+0x18]
00000403 0144241C add [esp+0x1c],eax
00000407 8344241801 add dword [esp+0x18],byte +0x1
0000040C 837C241809 cmp dword [esp+0x18],byte +0x9
00000411 7EEC jng 0x3ff
00000413 B810850408 mov eax,0x8048510
00000418 8B54241C mov edx,[esp+0x1c]
0000041C 89542404 mov [esp+0x4],edx
00000420 890424 mov [esp],eax
00000423 E8D8FEFFFF call dword 0x300
00000428 C744241C45230000 mov dword [esp+0x1c],0x2345
On each line the disassembler conveniently prints the address of the instruction and the binary code for the instruction on each line.
It also prints the assembly code, but all addresses are in hex (we “lost” the high-level code information such as variable names). Still, it’s possible to reverse-engineer what the program does if you know assembly.