Basics for Binary Exploitation

2022-07-20 1545 words 8 minutes

Contents

we all know how C programs is get compiled.

first your C file goes to the compiler, then compiler convert it into sequence of operation that will be executed by computer
each operation compiled into sequence of bytes called operation code or OP code

Why Assembly ?

trying to read the instruction that our computer executing is impossible. Assembly is Language that designed for translating the instruction that our computer will execute into human readable language. In order to understand what is happening when executable is get executed , you must first understand the assembly of the executable.

Basic components of C program

Heap
Stack
Registers
Instructions

There are mainly Two architectures in Assembly

x86 (32 bit) (will be covered in this blog)
x64 (64 bit)

1. Heap

when functions like malloc(),calloc() called or the global or static are declared, or we can say the manual memory allocation this all goes to HEAP.

2. Registers

registers is small storage areas. this registers are use to store values of addresses or variables that can be represented by 8 bytes or less than 8 bytes there are total 6 general purpose registers:

eax (Primary accumulator)
ebx (Base Register)
ecx (Counter Register)
edx (Data Register)
esi (Source Index)
edi (Destination Index)

there are 3 registers Reserved for specific purpose

ebp (Base Pointer)
esp (Stack Pointer)
eip (Instruction Pointer)

3. Stack

The stack is data structure that contain element that can be added (push) or removed (pop)

push adds element on TOP of the stack
pop remove element from TOP of the stack

Each Stack element has its own address. Stack grows towards lower memory addresses, which means stack goes from high address values to low address values. Whenever a function is called, that function goes to the stack frame. In every stack frame there is esp (stack pointer) which points to TOP of the stack and ebp(Base pointer) which points to BASE of the stack. all the addresses outside the stack is considered as JUNK by the stack.

let’s understand stack frame by Code and execution process.

#include<stdio.h>
void fun(int x)
{
   int a=0;
   int b=x;
}

int main(){
   fun(10);
}

Now first of all in fun’s stack frame the value of argument i.e 10 will get stored
Then the return address of function fun
Then the memory is get allocated for variable a (assume 4 bytes)
Now the value of variable x is not directly get stored in variable b. the value of variable x first get stored in general purpose register like eax or any then the value inside that general purpose register is get stored into variable b

The stack frame will look like (Numbers in red denoting the steps to follow)

x86 Assembly

Now this section will cover how your computer execute the code. There are 2 syntax of Assembly :

AT & T
Intel

we will use intel syntax in this section

Instructions

every Assembly instruction have two parts, operation followed by instruction E.g.

mov eax,0x5

1. mov

mov instruction take 2 arguments

mov arg1,arg2

this instruction copy the value of argument 2 to argument 1

E.g.

mov eax,0x5

this is similar to eax = 5 Note: if we want to copy the Value of variable or register into another register we need to write the name of variable in [] square brackets. if we wont give square brackets, it will take the Address of variable or register

E.g. (1)

consider eax = 0, ebx = 5 address of ebx is 0x1793

mov eax,ebx

this above instruction will not store the value of ebx to eax. this instruction will store the address of ebx i.e. 0x1793 to eax. now eax will be eax = 0x1793

E.g. (2)

consider eax = 0, ebx = 5 address of ebx is 0x1793

mov eax,[ebx]

this above instruction will store the Value of ebx i.e. 5 to eax. now eax will be eax = 5

2. add

add instruction take 2 arguments

add arg1,arg2

this instruction add value of arg2 to arg1 and store it in arg1

E.g. (1)

consider eax = 2

add eax,0x5

after the above instruction get executed. the value of eax will be eax = eax + 5 i.e. eax = 2 + 5, so eax = 7

E.g. (2)

consider eax = 4 ebx = 10

add ebx,eax

after the above instruction get executed. the value of ebx will be ebx = ebx + eax i.e. ebx = 10 + 4, so ebx = 14

3. sub

sub instruction takes 2 arguments. it works similar as add

sub arg1,arg2

this instruction subtract value of arg2 to arg1 and store it in arg1

E.g. (1)

consider eax = 11

add eax,0x5

after the above instruction get executed. the value of eax will be eax = eax - 5 i.e. eax = 11 - 5, so eax = 6

E.g. (2)

consider eax = 4 ebx = 10

sub ebx,eax

after the above instruction get executed. the value of ebx will be ebx = ebx - eax i.e. ebx = 10 - 4, so ebx = 6

4. push / pop

push instruction take 1 argument

push arg

this instruction will push the arg to TOP of the stack

E.g.

consider eax = 3

push eax

when argument is given to a push, push will decrement the esp(stack pointer).
Note: decrement the esp what does it mean? the stack address goes from high to low

E.g.

stack frame is from 0x1735 to 0x1720 then esp will start from 0x1735 and goes upto 0x1720

pop instruction take register as argument

pop reg

this instruction will store the value of element which is on TOP of the stack into reg then it will remove or pop that TOP element from the stack

E.g.

consider the top element on the stack is 3

pop eax

when argument is given to a pop, pop will store the value from top of the stack to a register that is given in argument
then pop will increase the esp(stack pointer)

5. lea

lea stands for load effective address. this instruction takes register and address as an argument

lea reg,addr

this instruction take address and store it into the given register

E.g

lea eax,0x1739

the address i.e 0x1739 will get stored into eax register. now eax will be, eax = 0x1739

Control flow of Executable

all if statements, loops and code come together in this instructions
every instruction has instruction address
eip(instruction pointer) contain the address of instruction that are currently being executed, then it will move to the next instruction

1. cmp

cmp is a compare instruction which takes two arguments. it work same as sub but rather storing the value, it will set flag in flag register the flags will be <0 or >0 or =0

cmp arg1,arg2

E.g (1)

cmp 1,3

after execution of this instruction what happens is 1-3 = -2 so flag register will store <0.

E.g. (2)

cmp 5,2

so, 5-2 = 3 so flag register will store >0.

E.g. (3)

cmp 5,5

so, 5-5 = 0 so flag register will store 0

Note: the cmp (compare) instruction is always followed by jmp instruction

2. jump

this instruction take address as an argument

jmp addr

this instruction will check the current state of the flag and accordingly jump (jump means basically changing the value of eip (instruction pointer) to the given address) to the address.

Types of Jump instruction:

jne -> jump not equal to
je -> jump equal to
jg -> jump greater than
jl -> jump less than

E.g. (1)

cmp 1,3
jl addr27

when above instruction is get executed the eip will stop at jl addr 27 as shown in image
now at this point the flag will got check which is < 0
so the condition is true because instruction is jump less than, so eip will go to Addr 27 and execute the instruction 27

E.g. (2)

cmp 5,2
jl addr27

when the above instruction is get executed. the eip will stop at jl addr 27 as shown in image
now at this point the flag will got check which is >0
so the condition is false because instruction is jump less, so eip will go to Addr 4 and execute the instruction 4
if the instruction has jg rather than jl then condition will true and eip will go to Addr 27 and execute the instruction 27

3. call

call instruction calls a function whether its user defined or in built. call takes function as an argument

call <func>

when the call instruction is get executed. the instruction will push the address of function to stack and jump to that first instruction

4. leave / return

leave instruction always followed with return instruction

leave
- leave instruction will set esp (stack pointer) to ebp (base pointer)
- means leave will destroy the stack frame
return
- now as we can see the return is now on top of the stack.
- means return pop itself from stack and set the eip (instruction pointer) to that return address

Thank You! guys for reading my blog post on Binary Exploitation Basics.

Resource

x86 Assembly Crash Course, this blog is based on this video
Try this for Visualization of Stack and Heap By running C code