Analysis Virtualization-based Obfuscation
Analysis Virtualization-based Obfuscation
Tim Blazytko
@mr_phrazer
[email protected]
https://synthesis.to
Personal Details
binary security researcher, co-founder of emproof GmbH and former PhD student
1
Today
• manual analysis
2
Slides, Code and Samples
https://github.com/mrphrazer/r2con2021_deobfuscation
3
Virtual Machine Basics
Virtual Machines
__secret_ip:
mov edx, eax
add edx, ebx
mov eax, ebx
mov ebx, edx
loop __secret_ip
5
Virtual Machines
__secret_ip:
mov edx, eax
add edx, ebx
mov eax, ebx
mov ebx, edx
loop __secret_ip
5
Virtual Machines
__secret_ip:
mov edx, eax
add edx, ebx
mov eax, ebx
mov ebx, edx
loop __secret_ip
5
Virtual Machines
➟
vpop r1 vldi #1
add edx, ebx vld r2 vld r3
mov eax, ebx vld r1 vsub r3
vadd r1 vld #0
mov ebx, edx vld r2 veq r3
loop __secret_ip vpop r0 vbr0 #-0E
5
Virtual Machines
__secret_ip: __bytecode:
push __bytecode db 54 68 69 73 20 64 6f
➟
db 65 73 6e 27 74 20 6c
call vm_entry db 6f 6f 6b 20 6c 69 6b
db 65 20 61 6e 79 74 68
db 69 6e 67 20 74 6f 20
db 6d 65 2e de ad be ef
5
Virtual Machines
__secret_ip: __bytecode:
push __bytecode db 54 68 69 73 20 64 6f
➟
db 65 73 6e 27 74 20 6c
call vm_entry db 6f 6f 6b 20 6c 69 6b
db 65 20 61 6e 79 74 68
db 69 6e 67 20 74 6f 20
ret
5
Virtual Machines
Core Components
VM Entry/Exit Context Switch: native context ⇔ virtual context
VM Dispatcher Fetch–Decode–Execute loop
Handler Table Individual VM ISA instruction semantics
6
Virtual Machines
Core Components
VM Entry/Exit Context Switch: native context ⇔ virtual context
VM Dispatcher Fetch–Decode–Execute loop
Handler Table Individual VM ISA instruction semantics
6
Virtual Machines
Core Components
VM Entry/Exit Context Switch: native context ⇔ virtual context
VM Dispatcher Fetch–Decode–Execute loop
Handler Table Individual VM ISA instruction semantics
6
Virtual Machines
7
Virtual Machines
handle_vpush
handle_vpush
}
handle_vadd
handle_vadd
handle_vxor
handle_vxor
handle_vexit
handle_vexit
handle_vpop
handle_vpop
…
VM Entry
}
look up
7
Data Structures
• bytecode
8
Data Structures
• bytecode
8
Data Structures
• bytecode
8
Virtual Machines
__vm_dispatcher:
mov bl, [rsi]
inc rsi
movzx rax, bl
jmp __handler_table[rax * 8]
VM Dispatcher
9
Virtual Machines
__handle_vnor:
__vm_dispatcher: mov rcx, [rbp]
mov bl, [rsi] mov rbx, [rbp + 4]
inc rsi not rcx
movzx rax, bl not rbx
jmp __handler_table[rax * 8] and rcx, rbx
mov [rbp + 4], rcx
pushf
VM Dispatcher pop [rbp]
jmp __vm_dispatcher
9
Instruction Handler Arguments
• stack-based architecture
• pop arguments from stack
• push results onto stack
• examples: JVM, CPython, WebAssembly, …
• register-based architecture
• pass arguments in virtual registers
• store results in virtual registers
• examples: Dalvik, Lua, LLVM, …
• hybrid architectures possible
10
Breaking Virtual Machine Obfuscation
11
Manual Analysis
Goals
13
Sample
• 11 VM handlers
1 https://tigress.wtf/
14
Task #1: Identification of VM Components
15
Task #2: Recovering Handler Semantics I
16
Task #3: Recovering Handler Semantics II
17
Task #4: Recovering Handler Semantics IV
18
Lessons Learned
• handler 0x11e1 loads a constant from the bytecode and pushes it onto the stack
19
Symbolic Execution
Symbolic Execution
⇒ dynamic VM disassembler
21
Symbolic Execution
__handle_vnor:
mov rcx, [rbp]
mov rbx, [rbp + 4]
not rcx
not rbx
and rcx, rbx
mov [rbp + 4], rcx
pushf
pop [rbp]
jmp __vm_dispatcher
rcx ← [rbp]
__handle_vnor:
• mov rcx, [rbp]
mov rbx, [rbp + 4]
not rcx
not rbx
and rcx, rbx
mov [rbp + 4], rcx
pushf
pop [rbp]
jmp __vm_dispatcher
rcx ← [rbp]
__handle_vnor:
rbx ← [rbp + 4]
mov rcx, [rbp]
• mov rbx, [rbp + 4]
not rcx
not rbx
and rcx, rbx
mov [rbp + 4], rcx
pushf
pop [rbp]
jmp __vm_dispatcher
rcx ← [rbp]
__handle_vnor:
rbx ← [rbp + 4]
mov rcx, [rbp]
rcx ← ¬ rcx = ¬ [rbp]
mov rbx, [rbp + 4]
• not rcx
not rbx
and rcx, rbx
mov [rbp + 4], rcx
pushf
pop [rbp]
jmp __vm_dispatcher
rcx ← [rbp]
__handle_vnor:
rbx ← [rbp + 4]
mov rcx, [rbp]
rcx ← ¬ rcx = ¬ [rbp]
mov rbx, [rbp + 4]
not rcx rbx ← ¬ rbx = ¬ [rbp + 4]
• not rbx
and rcx, rbx
mov [rbp + 4], rcx
pushf
pop [rbp]
jmp __vm_dispatcher
rcx ← [rbp]
__handle_vnor:
rbx ← [rbp + 4]
mov rcx, [rbp]
rcx ← ¬ rcx = ¬ [rbp]
mov rbx, [rbp + 4]
not rcx rbx ← ¬ rbx = ¬ [rbp + 4]
not rbx rcx ← rcx ∧ rbx
• and rcx, rbx = (¬ [rbp]) ∧ (¬ [rbp + 4])
mov [rbp + 4], rcx
pushf
pop [rbp]
jmp __vm_dispatcher
rcx ← [rbp]
__handle_vnor:
rbx ← [rbp + 4]
mov rcx, [rbp]
rcx ← ¬ rcx = ¬ [rbp]
mov rbx, [rbp + 4]
not rcx rbx ← ¬ rbx = ¬ [rbp + 4]
not rbx rcx ← rcx ∧ rbx
• and rcx, rbx = (¬ [rbp]) ∧ (¬ [rbp + 4])
mov [rbp + 4], rcx = [rbp] ↓ [rbp + 4]
pushf
pop [rbp]
jmp __vm_dispatcher
rcx ← [rbp]
__handle_vnor:
rbx ← [rbp + 4]
mov rcx, [rbp]
rcx ← ¬ rcx = ¬ [rbp]
mov rbx, [rbp + 4]
not rcx rbx ← ¬ rbx = ¬ [rbp + 4]
not rbx rcx ← rcx ∧ rbx
and rcx, rbx = (¬ [rbp]) ∧ (¬ [rbp + 4])
• mov [rbp + 4], rcx = [rbp] ↓ [rbp + 4]
pushf [rbp + 4] ← rcx = [rbp] ↓ [rbp + 4]
pop [rbp]
jmp __vm_dispatcher
rcx ← [rbp]
__handle_vnor:
rbx ← [rbp + 4]
mov rcx, [rbp]
rcx ← ¬ rcx = ¬ [rbp]
mov rbx, [rbp + 4]
not rcx rbx ← ¬ rbx = ¬ [rbp + 4]
not rbx rcx ← rcx ∧ rbx
and rcx, rbx = (¬ [rbp]) ∧ (¬ [rbp + 4])
mov [rbp + 4], rcx = [rbp] ↓ [rbp + 4]
• pushf [rbp + 4] ← rcx = [rbp] ↓ [rbp + 4]
pop [rbp]
jmp __vm_dispatcher
rsp ← rsp − 4
[rsp] ← flags
rcx ← [rbp]
__handle_vnor:
rbx ← [rbp + 4]
mov rcx, [rbp]
rcx ← ¬ rcx = ¬ [rbp]
mov rbx, [rbp + 4]
not rcx rbx ← ¬ rbx = ¬ [rbp + 4]
not rbx rcx ← rcx ∧ rbx
and rcx, rbx = (¬ [rbp]) ∧ (¬ [rbp + 4])
mov [rbp + 4], rcx = [rbp] ↓ [rbp + 4]
pushf [rbp + 4] ← rcx = [rbp] ↓ [rbp + 4]
• pop [rbp]
jmp __vm_dispatcher
rsp ← rsp − 4
[rsp] ← flags
[rbp] ← [rsp] = flags
Handler performing nor
rsp ← rsp + 4
(with flag side-effects)
22
Symbolic Execution
rcx ← [rbp]
__handle_vnor:
rbx ← [rbp + 4]
mov rcx, [rbp]
rcx ← ¬ rcx = ¬ [rbp]
mov rbx, [rbp + 4]
not rcx rbx ← ¬ rbx = ¬ [rbp + 4]
not rbx rcx ← rcx ∧ rbx
and rcx, rbx = (¬ [rbp]) ∧ (¬ [rbp + 4])
mov [rbp + 4], rcx = [rbp] ↓ [rbp + 4]
pushf [rbp + 4] ← rcx = [rbp] ↓ [rbp + 4]
pop [rbp]
• jmp __vm_dispatcher
rsp ← rsp − 4
[rsp] ← flags
[rbp] ← [rsp] = flags
Handler performing nor
rsp ← rsp + 4
(with flag side-effects)
22
Symbolic Execution on the Binary Level
• free of side effects (explicit formulas for implicit flag and stack pointer updates)
• pre-configure the symbolic state with concrete values (for concolic execution)
2 https://github.com/cea-sec/miasm
23
Task #5: SE-based Handler Analysis I
Reminder: The handler loads a constant (bytecode) and pushes it onto the stack.
24
Task #6: SE-based Handler Analysis II
25
Lessons Learned
• load a constant from the bytecode and store it onto the stack
• pop to values from the stack, add them and push the result onto the stack
26
Writing an SE-based Disassembler
Overview
• basic VM layout
28
VM Deobfuscation Automation Primer
1. build a symbolic execution engine that automatically follows the execution flow
3. each time SE stops, check why and hardcode register/memory values (bytecode, …)
• dump values
29
Task #7: Following the Execution Flow
• Add more and more knowledge about the VM and re-run the script.
• Use multiple concrete inputs for the VM and derive their corresponding outputs.
30
Task #8: Building a VM Disassembler
31
Task #9: Reconstruction of VM Disassembly
32
Lessons Learned
• var_0x4 := 0
33
Conclusion
Takeaways
• way more advanced VMs exist, but approach stays the same
35
Conclusion
Today:
• manual analysis of a VM
• writing an SE-based disassembler
• reconstruction of VM disassembly
• slides, code and samples:
https://github.com/mrphrazer/r2con2021_deobfuscation
F @mr_phrazer
S https://synthesis.to
36