assembly-crash-course
assembly-crash-course
http://www.electronics-tutorials.ws/logic/logic_1.html
All roads lead to the CPU
Python,
Source Code JavaScript, C, C++, Rust
Java
Binary-encoded
Instructions CPU
Too deep!
http://www.electronics-tutorials.ws/category/combination
Computer Architecture (very high level)
Network +
Others
Computer Architecture (drilling down)
ALU
Network +
Others
Computer Architecture (further down!)
Cache ALU
Network +
Others
Computer Architecture (as far as we'll go)
CPU
Python,
Source Code JavaScript, C, C++, Rust
Java
Binary-encoded
Instructions
#
All Roads Lead to the CPU
C Traditional
Compiler
C++
Rust
C#
JIT Compiler
Java
010101
Ruby
Lua
Bytecode Bytecode
Python Compiler Interpreter
JavaScript
Bash
Interpreter
Perl
#
Speaking Binary
Humans have a hard time with binary code... 0 1 0 1 0 1 0 1
"push" "rbp"
So we created a text representation of the binary...
data we directly give it as part of the instruction data that is close at hand data in storage
#
Verbs / Operations
What might you want to tell the computer to do with data?
Some ideas:
add some data together
subtract some data
multiply some data
divide some data
move some data into or out of storage
compare two pieces of data with each other
test some other properties of data
The list goes on! Regardless of dialect, an assembly instruction looks like one of:
OPERATION
OPERATION OPERAND
OPERATION OPERAND OPERAND
OPERATION OPERAND OPERAND OPERAND
#
Dialects of Assembly Dialects
In the beginning (of x86), Intel created:
... the Intel 8085 CPU
... then the Intel 8086 CPU
... then the Intel 80186 CPU
... then the Intel 80286 CPU
... then the Intel 80386 CPU, which became modern x86
... and gave us a great Assembly dialect for all of them!
AT&T came along and created a (subjectively) TERRIBLE Assembly syntax for
x86.
Why? No one knows.
tl;dr: there are two competing Assembly syntaxes for x86:
the right one (Intel) and the VERY WRONG one (AT&T).
Use Intel x86 syntax. They literally made the architecture.
#
All roads lead to the CPU
Python,
Source Code JavaScript, C, C++, Rust
Java
Binary-encoded
Instructions
Decimal Binary
0 0
Binary?
1 1
# 2 10
3 11
4 100
5 101
Described mathematically by: 6 110
Thomas Harriot (pictured), Juan Caramuel y Lobkowitz, and/or Leibniz 7 111
sometime in the 16th and 17th centuries. 8 1000
But also known earlier: https://en.wikipedia.org/wiki/Binary_code 9 1001
10 1010
Decimal (base 10) has digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. 11 1011
Binary (base 2) has digits 0, 1. 12 1100
13 1101
A binary digit is called a bit. 14 1110
15 1111
http://www.electronics-tutorials.ws/logic/logic_1.html
Decimal Binary Octal Hex
0 0 00 0x0
But if we use a base 2X, we can represent X binary digits 17 10001 021 0x11
18 10010 022 0x12
at once! Common bases: 19 10011 023 0x13
Octal (base 23, or 8), commonly prefixed with 0 20 10100 024 0x14
Hexadecimal (base 24, or 16). 128 10000000 0200 0x80
Caveat: how do we represent digits >10? A,B,C,D,E, and F! 192 11000000 0300 0xc0
Commonly prefixed with 0x. 224 11100000 0340 0xe0
240 11110000 0360 0xf0
#
Expressing Text
Bits in a computer typically do something useful.
Examples: encoding assembly instructions, whole programs, images, text...
8-bit byte.
IBM invented 8-bit EBCDIC in 1963 for use on their terminals.
ASCII (also released in 1963!) replaced it, but the 8-bit byte stuck.
Every modern architecture uses 8-bit bytes.
#
Grouping Bytes into Words
Bytes are 8-bit, but modern architectures are (mostly) 64-bit...
Word.
Words are groupings of 8-bit bytes.
Architectures define the word width.
For historical reasons, the terminology is really messed up.
Nibble: half of a byte, 4 bits
Byte: 1 byte, 8 bits
Half word / "word": 2 bytes, 16 bits
Double word (dword): 4 bytes, 32 bits
Quad word (qword): 8 bytes, 64 bits
Note that the term Word on a 64-bit architecture can refer to either 16 or 16 bits!
Be precise.
#
Expressing Numbers
A 64-bit machine can reason about 64 bits at a time.
Caveat: in practice, even more. Modern x86 can use specialized hardware to crunch data 512 bits (64 bytes) at a time!
Bonus: sign-bit is still there (for easy testing for negative numbers)!
Note: smallest expressible negative number (for 8 bits): 0b10000000 = -128
#
Anatomy of a Word
Consider 0xc001c475:
bit
ign
yte
yte
it, s
tb
it
tb
tb
tb
firs
las
las
firs
"most significant" bits/bytes 11000000 00000001 11000100 01110101 "least significant" bits/bytes
rdx
eax
ax
ah al
Data specified directly in the instruction like this is called an Immediate Value.
You can also load data into partial registers:
mov ah, 0x5
mov al, 0x39
32-bit CAVEAT!
If you write to a 32-bit partial (e.g., eax), the CPU will zero out the rest of the register!
This was done for (believe it or not) performance reasons.
shr rax, 10 rax = rax >> 10 shift rax's bits right by 10, filling with 10 zeroes on the left
sar rax, 10 rax = rax >> 10 shift rax's bits right by 10, with sign-extension to fill the now "missing" bits!
ror rax, 10 rax = (rax >> 10) | (rax << 54) rotate the bits of rax right by 10
rol rax, 10 rax = (rax << 10) | (rax >> 54) rotate the bits of rax left by 10
Curious how these work? Play around with the rappel tool (https://github.com/yrp604/rappel)!
#
Some Registers are Special
You cannot directly read from or write to rip.
Contains the memory address of the next instruction to be executed (ip = Instruction Pointer).
Your process can ask for more memory from the Operating System (more on
this later)!
0x10000 0x7fffffffffff
c001ca75
b0bacafe
c001ca75
stack
Values can be popped back off of the stack (to any register!).
pop rbx # sets rbx to 0xc001ca75
pop rcx # sets rcx to 0xb0bacafe
c001ca75
stack
#
Addressing the Stack
The CPU knows where the stack is because its address is stored in rsp.
rsp = 0x7f01f3453050
0x7f01f3453050
c001ca75
stack
push 0xb0bacafe
rsp = 0x7f01f3453048
0x7f01f3453048
b0bacafe
c001ca75
stack
pop rcx
rsp = 0x7f01f3453050
0x7f01f3453050
c001ca75
stack
This will store the 64-bit value in rbx into memory at address 0x133337:
mov rax, 0x133337
mov [rax], rbx
Don't forget: changing 32-bit partials (e.g., by loading from memory) zeroes out
the whole 64-register. Storing 32-bits to memory has no such problems, though.
#
Memory Endianess
Data on most modern systems is stored backwards, in little endian.
ah al
Bytes are only shuffled for multi-byte stores and loads of registers to memory!
Individual bytes never have their bits shuffled.
Yes, writes to the stack behave just like any other write to memory.
Why little endian?
Intel created the 8008 for a company called Datapoint in 1972.
Datapoint used little endian for easier implementation of carry in arithmetic!
Intel used little endian in 8008 for compatibility with Datapoint's processes!
Every step in the evolution between 8008 and modern x86 maintained some level of binary compatibility with its predecessor.
https://en.wikipedia.org/wiki/Endianness
#
Address Calculation
You can do some limited calculation for memory addresses.
Use rax as an offset off some base address (in this case, the stack).
mov rax, 0
mov rbx, [rsp+rax*8] # read a qword right at the stack pointer
inc rax
mov rcx, [rsp+rax*8] # read the qword to the right of the previous one
You can get the calculated address with Load Effective Address (lea).
mov rax, 1
pop rcx
lea rbx, [rsp+rax*8+5] # rbx now holds the computed address for double-checking
mov rbx, [rbx]
You can also use mov to read directly from those locations!
mov rax, [rip] # load 8 bytes from the location pointed to by the address of the next instruction
This is useful for working with data embedded near your code!
This is what makes certain security features on modern machines possible.
#
Writing Immediate Values
You can also write immediate values. However, you must specify their size!
This writes a 32-bit 0x1337 (padded with 0 bits) to address 0x133337.
mov rax, 0x133337
mov DWORD PTR [rax], 0x1337
Example:
0x400800
Program
pop rax pop rbx add rax, rbx push rax
Binary Code
Program
58 5b 48 01 d8 50
Binary Code
#
Control Flow: Jumps
CPUs execute instructions in sequence until told not to.
One way to interrupt the sequence is with a jmp instruction:
mov cx, 1337
jmp STAY_LEET
mov cx, 0
STAY_LEET:
push rcx
0x400800 STAY_LEET
Program mov rcx, 0x1337 jmp STAY_LEET mov rcx, 0 push rcx
Binary Code
STAY_LEET
0x400800 0x400804 0x400806 0x40080a
Program eb 04
66 b9 37 13 66 b9 00 00 51
Binary Code (skip 4 bytes)
STAY_LEET
0x400800 0x400804 0x400806 0x40080a
Program
66 b9 37 13 75 04 66 b9 00 00 51
Binary Code
Common patterns:
cmp rax, rbx; ja STAY_LEET # unsigned rax > rbx. 0xffffffff >= 0
cmp rax, rbx; jle STAY_LEET # signed rax <= rbx. 0xffffffff = -1 < 0
test rax, rax; jnz STAY_LEET # rax != 0
cmp rax, rbx; je STAY_LEET # rax == rbx
With looping and conditional control flow, we have almost everything we need to
write anything we want!
#
Control Flow: Function Calls!
Assembly code is split into functions with call and ret.
call pushes rip (address of the next instruction after the call) and jumps away!
ret pops rip and jumps to it!
read returns the number of bytes read via rax, so we can easily write them out:
write(1, buf, n);
mov rdi, 1 # the stdout file descriptor
mov rsi, rsp # write the data from the stack
mov rdx, rax # the number of bytes to write (same as what we read in)
mov rax, 1 # system call number of write()
syscall # do the system call
Critical resource: https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/
#
System Calls
System calls have very well-defined interfaces that very rarely change.
There are over 300 system calls in Linux. Here are some examples:
int open(const char *pathname, int flags) - returns a file new file descriptor of the open file (also shows up in
/proc/self/fd!)
ssize_t read(int fd, void *buf, size_t count) - reads data from the file descriptor
ssize_t write(int fd, void *buf, size_t count) - writes data to the file descriptor
pid_t fork() - forks off an identical child process. Returns 0 if you're the child and the PID of the child if you're the
parent.
int execve(const char *filename, char **argv, char **envp) - replaces your process.
pid_t wait(int *wstatus) - wait child termination, return its PID, write its status into *wstatus.
Look familiar?
Goodbye, world!
#
#
From Assembly to Binary
We built a quitter... Now we have to put it in an Assembly file:
# .intel_syntax tells the assembler that we are using Intel assembly syntax
# noprefix tells it that we will not prefix all register names with "%" (cause that looks silly)
.intel_syntax noprefix
mov rdi, 42 # our program's return code (e.g., for bash scripts)
mov rax, 60 # system call number of exit()
syscall # do the system call
If that warning from ld annoys you, add this to the beginning of the program so
that gcc doesn't have to guess at where your code starts:
.global _start
_start:
# then the rest of your code!
https://en.wikipedia.org/wiki/Software_bug
#
Debugging
Debugging is done with debuggers, such as gdb.
Debuggers use (among other methods), a special debug instruction:
mov rdi, 42 // our program's return code (e.g., for bash scripts)
mov rax, 60 // system call number of exit()
int3 // trigger the debugger with a breakpoint!
syscall // do the system call
strace lets you figure out how your program is interacting with the OS.
A great first stop for debugging.
Documentation of x86:
Opcode listing by byte value: http://ref.x86asm.net/coder64.html
Instruction documentation: https://www.felixcloutier.com/x86/
Intel's x86_64 architecture manual: https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf