0% found this document useful (0 votes)
17 views25 pages

Windows Kernel Internals II: Virtual Machine Architecture

Uploaded by

Alisha Raheja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views25 pages

Windows Kernel Internals II: Virtual Machine Architecture

Uploaded by

Alisha Raheja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Windows Kernel Internals II

Virtual Machine Architecture


University of Tokyo – July 2004

Dave Probert, Ph.D.


Advanced Operating Systems Group
Windows Core Operating Systems Division
Microsoft Corporation

© Microsoft Corporation 2004 1


Hosted VM Model
Windows acts as a “host”
– Resources for each VM are allocated from the host
– All I/O with external devices is performed through the host
“Guest” code runs within a
separate context Host context Guest context

Process Process
– Independent address space
– Specialized “VMM” kernel Virtual
Machine
Guest Code

Host Kernel
VMM Kernel
Host Physical Machine

© Microsoft Corporation 2004 2


VM Components
VMM Kernel
– Thin layer, all in assembly
– Code executed at ring-0
– Exception handling
– External Interrupt pass-
through Host context Guest context

– Page table maintenance Virtual Virtual


PC Server
– Located within a 32MB area Guest Code
of address space known as
the “VMM work area” Virtual
Machine
– Work area is relocatable “Additions”
Host NDIS VMM
– One VMM instance per Kernel Driver Driver
VMM Kernel
virtual processor Host Physical Machine

© Microsoft Corporation 2004 3


VM Components
VMM Driver
- Provides kernel-level VM-related
services
- CreateVirtualMachine
- CreateVirtualProcessor
- ExecuteVirtualProcessor
- Implements context switching
Host context Guest context
mechanism between the host
and guest contexts Virtual Virtual
PC Server
- Loads and bootstraps
Guest Code
the VMM kernel
- Much of the security work we’ve Virtual
done recently involved Machine
repackaging the VMM kernel Host NDIS VMM
“Additions”

code into the VMM driver Kernel Driver Driver


VMM Kernel
Host Physical Machine

© Microsoft Corporation 2004 4


VM Components
NDIS Filter Driver
- Allows VM to send and
receive
Ethernet packets via
physical
Ethernet hardware
Host context Guest context
- Spoofs unique MAC
addresses Virtual
PC
Virtual
Server
for virtual NICs Guest Code

- Injects packets into host


Ethernet stack for guest- Virtual
Machine

to-host Host NDIS VMM


“Additions”

networking Kernel Driver Driver


VMM Kernel
Host Physical Machine

© Microsoft Corporation 2004 5


VM Components
Virtual PC / Virtual Server
executables
- Device emulation modules
- Resource allocation
- VM configuration creation
& editing Host context Guest context
- VM control (start, stop, Virtual Virtual
pause, save) PC Server
Guest Code
- Scripting APIs
- User interaction Virtual
Machine
- Host side of guest/host Host NDIS VMM
“Additions”

integration features Kernel Driver Driver


VMM Kernel
Host Physical Machine

© Microsoft Corporation 2004 6


VM Components
Virtual Machine “Additions”
– Collection of components
installed within the guest
environment by the user
– Implement optimizations
• Video
• SCSI
Host context Guest context
• Networking (in the future)
• Guest kernel patches Virtual
PC
Virtual
Server
– Implement guest half of Guest Code

guest/host integration
features Virtual
Machine
“Additions”
• Clipboard sharing Host NDIS VMM
• File drag and drop Kernel Driver Driver
VMM Kernel
• Arbitrary video resizing Host Physical Machine

© Microsoft Corporation 2004 7


VM Execution Loop
Host code repeatedly calls ExecuteVirtualProcessor
VMM acts as “co-routine” (i.e. VMM state is saved and
restored each time ExecuteVirtualProcessor is
called)
Cycles spent inside guest context are counted against
the calling thread
– Host code can control how much time is spent in
guest
Return code indicates why ExecuteVirtualProcessor
returned
– Time slice complete
– IN or OUT instruction encountered
– HLT instruction encountered

© Microsoft Corporation 2004 8


Processor Virtualization
x86 Virtualization
– Processor is non-virtualizable
• Poor privileged and user state separation
– For example, EFLAGS register contains condition codes
(user state) and interrupt mask (privileged state)
• Some instructions that access privileged state are
non-trapping
– Overly complex and messy architecture
• Many modes, legacy protection mechanisms and
general “warts”

© Microsoft Corporation 2004 9


Processor Emulation
In general, emulation is necessary
– VM uses a binary translation mechanism
• Most instructions are copied directly
• Non-virtualizable (“dangerous”) instructions are
modified
– Binary translation execution imposes ~50%
performance overhead

© Microsoft Corporation 2004 10


Direct Execution
In some processor modes, it’s safe to use direct
execution, others require emulation

Real Mode Emulation

Virtual 8086 (v86) mode Direct Execution

Protected Mode Ring 3 Direct Execution (with a few exceptions)

Protected Mode Ring 0 Emulation, unless known to be safe

© Microsoft Corporation 2004 11


Direct Execution
“Ring Compression”
– Guest ring-0, 1, 2 code is executed at ring 1
– Guest ring-3 code is executed at ring 3
– Provides correct MMU protection semantics (since ring 0-2 can
access privileged pages)

Direct execution of ring-0 code is only allowed if the


VMM is notified that it’s “safe”
– This requires patching certain “dangerous” instruction sequences
in the Windows kernel and HAL
– Patching is performed at runtime in memory only
– Patches are different for each version of Windows kernel & HAL

© Microsoft Corporation 2004 12


Guest OS Patching
Examples:
– PUSHFD / POPFD
– CLI / STI
– Spin lock acquisition failure (in the future)

Original Code
pushfd never traps (breaks IF virtualization)
pushfd
cli cli traps, but cannot be easily patched with a
mov eax,[ebp+8] jmp because it only takes up one byte
call [eax]
popfd
ret popfd never traps (breaks IF virtualization)

This sequence prevents correct behavior in direct execution

© Microsoft Corporation 2004 13


Guest OS Patching
Synthetic instructions
– Use an illegal instruction form (reserved for us by Intel)
– Five bytes in length (for ease in patching)
– Exhibit same side effects of real instruction

Original Code With Synthetic Instructions

pushfd vmpushfd All synthetic


cli vmcli instructions trap and
mov eax,[ebp+8] mov eax,[ebp+8]
are five bytes long so
call [eax] call [eax]
popfd vmpopf they can be replaced
ret ret with jmp or call
instructions at runtime

This sequence allows correct behavior in direct execution, but generates three traps
© Microsoft Corporation 2004 14
Guest OS Patching
Runtime Guest OS Patching
– Replace synthetic instructions with subroutine calls
– This technique prevents us from exposing internal VMM
implementation details to OS vendors. We can change the
subroutine implementations in the future.
Original Code With Synthetic Instructions With Runtime Patches

pushfd vmpushfd call _vmpushfd


cli vmcli call _vmcli
mov eax,[ebp+8] mov eax,[ebp+8] mov eax,[ebp+8]
call [eax] call [eax] call [eax]
popfd vmpopf call _vmpopfd
ret ret ret

This patched sequence is correct and fast

© Microsoft Corporation 2004 15


Direct Execution Overhead
Necessary to trap into the VMM kernel on some
instructions
– IN & OUT for I/O device emulation
– STI & CLI for interrupt mask virtualization
– INT & IRET to catch ring transitions
– INVLPG and MOV to CR3 for page table virtualization
Traps are expensive – and getting worse
– ~500 cycles on Pentium III or AMD processors; ~2000
cycles on Pentium 4
– Runtime patching of some trapping instructions is
possible
© Microsoft Corporation 2004 16
Physical Memory & RAM
Virtualized RAM
– User decides how much RAM is associated with each
virtual machine
Physical pages
– Allocated by VMM from host OS
– Currently allocated at the time the VM starts, but
could be allocated on demand
– Host physical addresses don’t match guest physical
addresses

© Microsoft Corporation 2004 17


Logical Page Mappings
Logical Memory
– Logical mappings defined by guest page tables
(mostly)
– VMM finds 32MB unused area for the VMM code and
data (the “VMM work area”).
– VMM monitors guest OS address space usage and
relocates itself if necessary

© Microsoft Corporation 2004 18


VMM Page Tables
VMM maintains its own private page table
– Initially, only the VMM work area is mapped

VMM Page Tables Guest Page Tables


PD Table PD Table
Physical CR3 Virtual CR3

VMM work area


Unused area
mapped here

© Microsoft Corporation 2004 19


VMM Page Tables
VMM maintains its own private page table
– Initially, only the VMM work area is mapped
– Guest pages are mapped on demand as they are
accessed

VMM Page Tables Guest Page Tables


PD Table PD Table
Physical CR3 Virtual CR3

VMM work area


Unused area
mapped here

© Microsoft Corporation 2004 20


VMM Page Tables
VMM maintains its own private page table
– Initially, only the VMM work area is mapped
– Guest pages are mapped on demand as they are
accessed
– Guest pages are unmapped when guest flushes its TLB
– VMM work area is relocated as necessary
VMM Page Tables Guest Page Tables
PD Table PD Table
Physical CR3 Virtual CR3

VMM work area


mapped here
Previous VMM
location now in use
by the guest

© Microsoft Corporation 2004 21


Memory Sharing
Memory allocated with VMM APIs
can be used in three ways
– Mapped within the VMM work area
– As guest virtual RAM (mapped into the guest address
space according to the guest page tables)
– Mapped within the host context (for emulated DMA
operations)

© Microsoft Corporation 2004 22


Device Emulation
Device emulation modules
- Emulate behaviors of a real hardware Device Emulation Models

device 440BX chipset with PIIX4


System BIOS (AMI)

- Register “callbacks” for I/O port PCI Bus


ISA Bus

accesses Power Management


SM Bus
8259 PIC
- Can access virtualized “RAM” for PIT
DMA Controller
emulated DMA operations CMOS
RTC
- Communicate among themselves (e.g. Memory Controller
RAM & VRAM
Ethernet module “plugs into” the PCI COM (Serial) Ports
LPT (Parallel) Ports
bus module and communicates with IDE/ATAPI Controllers
SCSI Adapters (Adaptec 2940)
the PIC module to assert interrupts) SVGA Video Adapter (S3 Trio64)
VESA BIOS
- May call host services to perform 2D Graphics Accelerator
Hardware Cursor
emulation Ethernet Adapters (DEC 21140)
SoundBlaster Sound Card

- Can be suspended, saved and Keyboard


Mouse

restored
© Microsoft Corporation 2004 23
Device I/O Accesses
I/O accesses (IN & OUT
instructions)
- Trap into VMM kernel
- Force a context switch back
to the host context where
device emulation module Host context Guest context
is invoked
- “Fast I/O handlers” can be
Virtual PC
called from within the VMM Guest User Code
context
Device
- Some OUTs can be batched Emulation 3
Module
MMIO accesses 3
Guest Kernel
- Caught in VMM’s page 1
fault handler Host Kernel VMM
Driver
0 1 Guest HAL
- Very expensive
OUT instr.
Context Switch GPF trap
0 Host HAL 0 VMM Kernel
Host Physical Machine

© Microsoft Corporation 2004 24


Discussion

© Microsoft Corporation 2004 25

You might also like