AMD64 Architecture Programmer's Manual Volume 2 System Programming
AMD64 Architecture Programmer's Manual Volume 2 System Programming
AMD64 Architecture
Programmer’s Manual
Volume 2:
System Programming
Trademarks
AMD, the AMD arrow logo, AMD Athlon, and AMD Opteron, and combinations thereof, AMD Virtualization and 3DNow!
are trademarks, and AMD-K6 is a registered trademark of Advanced Micro Devices, Inc.
MMX is a trademark and Pentium is a registered trademark of Intel Corporation.
Windows NT is a registered trademark of Microsoft Corporation.
HyperTransport is a licensed trademark of the HyperTransport Technology Consortium.
Other product names used in this publication are for identification purposes only and may be trademarks of their
respective companies.
24593—Rev. 3.14—September 2007 AMD64 Technology
Contents
Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
About This Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxix
Terms and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxix
Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxvi
Endian Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxix
Related Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxix
1 System-Programming Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
1.1 Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Memory Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Memory Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Canonical Address Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Paging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Mixing Segmentation and Paging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Real Addressing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Operating Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Long Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
64-Bit Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Compatibility Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Legacy Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
System Management Mode (SMM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 System Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 System-Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7 Additional System-Programming Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Hardware Multitasking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Machine Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Software Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Performance Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 x86 and AMD64 Architecture Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
2.1 Operating Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Long Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Legacy Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
System-Management Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2 Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Memory Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Page Translation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Contents i
AMD64 Technology 24593—Rev. 3.14—September 2007
Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Protection Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
General-Purpose Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
128-Bit Media Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Flags Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Instruction Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Stack Pointer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Control Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Debug Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Extended Feature Register (EFER) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Memory Type Range Registers (MTRRs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Other Model-Specific Registers (MSRs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
REX Prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Segment-Override Prefixes in 64-Bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Operands and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Address Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Instructions that Reference RSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
NOP Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Single-Byte INC and DEC Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
MOVSXD Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Invalid Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
FXSAVE and FXRSTOR Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.6 Interrupts and Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Interrupt Descriptor Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Stack Frame Pushes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Stack Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
IRET Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Task-Priority Register (CR8) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
New Exception Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.7 Hardware Task Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.8 Long-Mode vs. Legacy-Mode Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 System Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41
3.1 System-Control Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
CR0 Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
CR2 and CR3 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
CR4 Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
CR1 and CR5–CR7 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
64-Bit-Mode Extended Control Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
CR8 (Task Priority Register, TPR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
RFLAGS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Extended Feature Enable Register (EFER) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2 Model-Specific Registers (MSRs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
System Configuration Register (SYSCFG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
System-Linkage Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
ii Contents
24593—Rev. 3.14—September 2007 AMD64 Technology
Memory-Typing Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Debug-Extension Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Performance-Monitoring Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Machine-Check Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.3 Processor Feature Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4 Segmented Virtual Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63
4.1 Real Mode Segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Virtual-8086 Mode Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3 Protected Mode Segmented-Memory Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Multi-Segmented Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Flat-Memory Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Segmentation in 64-Bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4 Segmentation Data Structures and Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.5 Segment Selectors and Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Segment Selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Segment Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Segment Registers in 64-Bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.6 Descriptor Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Global Descriptor Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Global Descriptor-Table Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Local Descriptor Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Local Descriptor-Table Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Interrupt Descriptor Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Interrupt Descriptor-Table Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.7 Legacy Segment Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Descriptor Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Code-Segment Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Data-Segment Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
System Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Gate Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.8 Long-Mode Segment Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Code-Segment Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Data-Segment Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
System Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Gate Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Long Mode Descriptor Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.9 Segment-Protection Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Privilege-Level Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Privilege-Level Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.10 Data-Access Privilege Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Accessing Data Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Accessing Stack Segments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.11 Control-Transfer Privilege Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Direct Control Transfers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Control Transfers Through Call Gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Return Control Transfers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.12 Limit Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Contents iii
AMD64 Technology 24593—Rev. 3.14—September 2007
iv Contents
24593—Rev. 3.14—September 2007 AMD64 Technology
Contents v
AMD64 Technology 24593—Rev. 3.14—September 2007
vi Contents
24593—Rev. 3.14—September 2007 AMD64 Technology
Contents vii
AMD64 Technology 24593—Rev. 3.14—September 2007
viii Contents
24593—Rev. 3.14—September 2007 AMD64 Technology
Contents ix
AMD64 Technology 24593—Rev. 3.14—September 2007
x Contents
24593—Rev. 3.14—September 2007 AMD64 Technology
Contents xi
AMD64 Technology 24593—Rev. 3.14—September 2007
xii Contents
24593—Rev. 3.14—September 2007 AMD64 Technology
Figures
Figure 1-1. Segmented-Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Figure 1-2. Flat Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Figure 1-3. Paged Memory Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Figure 1-4. 64-Bit Flat, Paged-Memory Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Figure 1-5. Real-Address Memory Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Figure 1-6. Operating Modes of the AMD64 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Figure 1-7. System Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Figure 1-8. System-Data Structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Figure 3-1. Control Register 0 (CR0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Figure 3-2. Control Register 2 (CR2)—Legacy-Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Figure 3-3. Control Register 2 (CR2)—Long Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Figure 3-4. Control Register 3 (CR3)—Legacy-Mode Non-PAE Paging. . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Figure 3-5. Control Register 3 (CR3)—Legacy-Mode PAE Paging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Figure 3-6. Control Register 3 (CR3)—Long Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Figure 3-7. Control Register 4 (CR4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Figure 3-8. RFLAGS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Figure 3-9. Extended Feature Enable Register (EFER). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Figure 3-10. AMD64 Architecture Model-Specific Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Figure 3-11. System-Configuration Register (SYSCFG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Figure 4-1. Segmentation Data Structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Figure 4-2. Segment and Descriptor-Table Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Figure 4-3. Segment Selector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Figure 4-4. Segment-Register Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Figure 4-5. FS and GS Segment-Register Format—64-Bit Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Figure 4-6. Global and Local Descriptor-Table Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Figure 4-7. GDTR and IDTR Format—Legacy Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Figure 4-8. GDTR and IDTR Format—Long Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Figure 4-9. Relationship between the LDT and GDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Figure 4-10. LDTR Format—Legacy Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Figure 4-11. LDTR Format—Long Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Figure 4-12. Indexing an IDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Figures xiii
AMD64 Technology 24593—Rev. 3.14—September 2007
xiv Figures
24593—Rev. 3.14—September 2007 AMD64 Technology
Figures xv
AMD64 Technology 24593—Rev. 3.14—September 2007
xvi Figures
24593—Rev. 3.14—September 2007 AMD64 Technology
Figures xvii
AMD64 Technology 24593—Rev. 3.14—September 2007
xviii Figures
24593—Rev. 3.14—September 2007 AMD64 Technology
Figures xix
AMD64 Technology 24593—Rev. 3.14—September 2007
xx Figures
24593—Rev. 3.14—September 2007 AMD64 Technology
Tables
Table 1-1. Operating Modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Table 1-2. Interrupts and Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Table 2-1. Instructions That Reference RSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Table 2-2. 64-Bit Mode Near Branches, Default 64-Bit Operand Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Table 2-3. Invalid Instructions in 64-Bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Table 2-4. Invalid Instructions in Long Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Table 2-5. Reassigned Instructions in 64-Bit Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Table 2-6. Differences Between Long Mode and Legacy Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Table 4-1. Segment Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Table 4-2. Descriptor Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Table 4-3. Code-Segment Descriptor Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Table 4-4. Data-Segment Descriptor Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Table 4-5. System-Segment Descriptor Types (S=0)—Legacy Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Table 4-6. System-Segment Descriptor Types—Long Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Table 4-7. Descriptor-Entry Field Changes in Long Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Table 5-1. Supported Paging Alternatives (CR0.PG=1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Table 5-2. Physical-Page Protection, CR0.WP=0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Table 5-3. Effect of CR0.WP=1 on Supervisor Page Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Table 6-1. System Management Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Table 7-1. Memory Access by Memory Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Table 7-2. Caching Policy by Memory Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Table 7-3. Memory Access Ordering Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Table 7-4. AMD64 Architecture Cache-Operating Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Table 7-5. MTRR Type Field Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Table 7-6. Fixed-Range MTRR Address Ranges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Table 7-7. Combined MTRR and Page-Level Memory Type with Unmodified PAT MSR . . . . . . . . . . . 191
Table 7-8. PAT Type Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Table 7-9. PAT-Register PA-Field Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Table 7-10. Combined Effect of MTRR and PAT Memory Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Table 7-11. Extended Fixed-Range MTRR Type Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Table 8-1. Interrupt-Vector Source and Cause. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Table 8-2. Interrupt-Vector Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
Table 8-3. Double-Fault Exception Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Table 8-4. Invalid-TSS Exception Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Tables xxi
AMD64 Technology 24593—Rev. 3.14—September 2007
xxii Tables
24593—Rev. 3.14—September 2007 AMD64 Technology
Tables xxiii
AMD64 Technology 24593—Rev. 3.14—September 2007
xxiv Tables
24593—Rev. 3.14—September 2007 AMD64 Technology
Revision History
September
3.12 Added numerous minor clarifications.
2006
December Added Chapter 15, Secure Virtual Machine. Incorporated numerous factual
3.11
2005 corrections and updates.
Preface
Audience
This volume (Volume 2) is intended for programmers writing operating systems, loaders, linkers,
device drivers, or system utilities. It assumes an understanding of AMD64 architecture application-
level programming as described in Volume 1.
This volume describes the AMD64 architecture’s resources and functions that are managed by system
software, including operating-mode control, memory management, interrupts and exceptions, task and
state-change management, system-management mode (including power management), multi-
processor support, debugging, and processor initialization.
Application-programming topics are described in Volume 1. Details about each instruction are
described in volumes 3, 4, and 5.
Organization
This volume begins with an overview of system programming and differences between the x86 and
AMD64 architectures. This is followed by chapters that describe the following details of system
programming:
• System Resources—The system registers and processor ID (CPUID) functions.
• Segmented Virtual Memory—The segmented-memory models supported by the architecture and
their associated data structures and protection checks.
• Page Translation and Protection—The page-translation functions supported by the architecture
and their associated data structures and protection checks.
Preface xxvii
AMD64 Technology 24593—Rev. 3.14—September 2007
xxviii Preface
24593—Rev. 3.14—September 2007 AMD64 Technology
Definitions
Some of the following definitions assume a knowledge of the legacy x86 architecture. See “Related
Documents” on page xxxix for descriptions of the legacy x86 architecture.
Preface xxix
AMD64 Technology 24593—Rev. 3.14—September 2007
absolute
Said of a displacement that references the base of a code segment rather than an instruction pointer.
Contrast with relative.
ASID
Address space identifier.
biased exponent
The sum of a floating-point value’s exponent and a constant bias for a particular floating-point data
type. The bias makes the range of the biased exponent always positive, which allows reciprocation
without overflow.
byte
Eight bits.
clear
To write a bit value of 0. Compare set.
compatibility mode
A submode of long mode. In compatibility mode, the default address size is 32 bits, and legacy 16-
bit and 32-bit applications run without modification.
commit
To irreversibly write, in program order, an instruction’s result to software-visible storage, such as a
register (including flags), the data cache, an internal write buffer, or memory.
CPL
Current privilege level.
CR0–CR4
A register range, from register CR0 through CR4, inclusive, with the low-order register first.
CR0.PE = 1
Notation indicating that the PE bit of the CR0 register has a value of 1.
direct
Referencing a memory location whose address is included in the instruction’s syntax as an
immediate operand. The address may be an absolute or relative address. Compare indirect.
dirty data
Data held in the processor’s caches or internal buffers that is more recent than the copy held in
main memory.
displacement
A signed value that is added to the base of a segment (absolute addressing) or an instruction pointer
(relative addressing). Same as offset.
xxx Preface
24593—Rev. 3.14—September 2007 AMD64 Technology
doubleword
Two words, or four bytes, or 32 bits.
double quadword
Eight words, or 16 bytes, or 128 bits. Also called octword.
DS:rSI
The contents of a memory location whose segment address is in the DS register and whose offset
relative to that segment is in the rSI register.
EFER.LME = 0
Notation indicating that the LME bit of the EFER register has a value of 0.
effective address size
The address size for the current instruction after accounting for the default address size and any
address-size override prefix.
effective operand size
The operand size for the current instruction after accounting for the default operand size and any
operand-size override prefix.
element
See vector.
exception
An abnormal condition that occurs as the result of executing an instruction. The processor’s
response to an exception depends on the type of the exception. For all exceptions except 128-bit
media SIMD floating-point exceptions and x87 floating-point exceptions, control is transferred to
the handler (or service routine) for that exception, as defined by the exception’s vector. For
floating-point exceptions defined by the IEEE 754 standard, there are both masked and unmasked
responses. When unmasked, the exception handler is called, and when masked, a default response
is provided instead of calling the handler.
FF /0
Notation indicating that FF is the first byte of an opcode, and a subopcode in the ModR/M byte has
a value of 0.
flush
An often ambiguous term meaning (1) writeback, if modified, and invalidate, as in “flush the cache
line,” or (2) invalidate, as in “flush the pipeline,” or (3) change a value, as in “flush to zero.”
GDT
Global descriptor table.
Preface xxxi
AMD64 Technology 24593—Rev. 3.14—September 2007
GIF
Global interrupt flag.
IDT
Interrupt descriptor table.
IGN
Ignore. Field is ignored.
indirect
Referencing a memory location whose address is in a register or other memory location. The
address may be an absolute or relative address. Compare direct.
IRB
The virtual-8086 mode interrupt-redirection bitmap.
IST
The long-mode interrupt-stack table.
IVT
The real-address mode interrupt-vector table.
LDT
Local descriptor table.
legacy x86
The legacy x86 architecture. See “Related Documents” on page xxxix for descriptions of the
legacy x86 architecture.
legacy mode
An operating mode of the AMD64 architecture in which existing 16-bit and 32-bit applications and
operating systems run without modification. A processor implementation of the AMD64
architecture can run in either long mode or legacy mode. Legacy mode has three submodes, real
mode, protected mode, and virtual-8086 mode.
long mode
An operating mode unique to the AMD64 architecture. A processor implementation of the
AMD64 architecture can run in either long mode or legacy mode. Long mode has two submodes,
64-bit mode and compatibility mode.
lsb
Least-significant bit.
LSB
Least-significant byte.
xxxii Preface
24593—Rev. 3.14—September 2007 AMD64 Technology
main memory
Physical memory, such as RAM and ROM (but not cache memory) that is installed in a particular
computer system.
mask
(1) A control bit that prevents the occurrence of a floating-point exception from invoking an
exception-handling routine. (2) A field of bits used for a control purpose.
MBZ
Must be zero. If software attempts to set an MBZ bit to 1, a general-protection exception (#GP)
occurs.
memory
Unless otherwise specified, main memory.
ModRM
A byte following an instruction opcode that specifies address calculation based on mode (Mod),
register (R), and memory (M) variables.
moffset
A 16, 32, or 64-bit offset that specifies a memory operand directly, without using a ModRM or SIB
byte.
msb
Most-significant bit.
MSB
Most-significant byte.
multimedia instructions
A combination of 128-bit media instructions and 64-bit media instructions.
octword
Same as double quadword.
offset
Same as displacement.
overflow
The condition in which a floating-point number is larger in magnitude than the largest, finite,
positive or negative number that can be represented in the data-type format being used.
packed
See vector.
Preface xxxiii
AMD64 Technology 24593—Rev. 3.14—September 2007
PAE
Physical-address extensions.
physical memory
Actual memory, consisting of main memory and cache.
probe
A check for an address in a processor’s caches or internal buffers. External probes originate
outside the processor, and internal probes originate within the processor.
protected mode
A submode of legacy mode.
quadword
Four words, or eight bytes, or 64 bits.
RAZ
Read as zero (0), regardless of what is written.
real-address mode
See real mode.
real mode
A short name for real-address mode, a submode of legacy mode.
relative
Referencing with a displacement (also called offset) from an instruction pointer rather than the
base of a code segment. Contrast with absolute.
reserved
Fields marked as reserved may be used at some future time.
To preserve compatibility with future processors, reserved fields require special handling when
read or written by software.
Reserved fields may be further qualified as MBZ, RAZ, SBZ or IGN (see definitions).
Software must not depend on the state of a reserved field, nor upon the ability of such fields to
return to a previously written state.
If a reserved field is not marked with one of the above qualifiers, software must not change the state
of that field; it must reload that field with the same values returned from a prior read.
REX
An instruction prefix that specifies a 64-bit operand size and provides access to additional
registers.
RIP-relative addressing
Addressing relative to the 64-bit RIP instruction pointer.
xxxiv Preface
24593—Rev. 3.14—September 2007 AMD64 Technology
SBZ
Should be zero. An attempt by software to set an SBZ bit to 1 results in undefined behavior.
set
To write a bit value of 1. Compare clear.
SIB
A byte following an instruction opcode that specifies address calculation based on scale (S), index
(I), and base (B).
SIMD
Single instruction, multiple data. See vector.
SSE
Streaming SIMD extensions instruction set. See 128-bit media instructions and 64-bit media
instructions.
SSE2
Extensions to the SSE instruction set. See 128-bit media instructions and 64-bit media
instructions.
SSE3
Further extensions to the SSE instruction set. See 128-bit media instructions.
sticky bit
A bit that is set or cleared by hardware and that remains in that state until explicitly changed by
software.
TOP
The x87 top-of-stack pointer.
TSS
Task-state segment.
underflow
The condition in which a floating-point number is smaller in magnitude than the smallest nonzero,
positive or negative number that can be represented in the data-type format being used.
vector
(1) A set of integer or floating-point values, called elements, that are packed into a single operand.
Most of the 128-bit and 64-bit media instructions use vectors as operands. Vectors are also called
packed or SIMD (single-instruction multiple-data) operands.
(2) An index into an interrupt descriptor table (IDT), used to access exception handlers. Compare
exception.
Preface xxxv
AMD64 Technology 24593—Rev. 3.14—September 2007
virtual-8086 mode
A submode of legacy mode.
VMCB
Virtual machine control block.
VMM
Virtual machine monitor.
word
Two bytes, or 16 bits.
x86
See legacy x86.
Registers
In the following list of registers, the names are used to refer either to a given register or to the contents
of that register:
AH–DH
The high 8-bit AH, BH, CH, and DH registers. Compare AL–DL.
AL–DL
The low 8-bit AL, BL, CL, and DL registers. Compare AH–DH.
AL–r15B
The low 8-bit AL, BL, CL, DL, SIL, DIL, BPL, SPL, and R8B–R15B registers, available in 64-bit
mode.
BP
Base pointer register.
CRn
Control register number n.
CS
Code segment register.
eAX–eSP
The 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers or the 32-bit EAX, EBX, ECX, EDX,
EDI, ESI, EBP, and ESP registers. Compare rAX–rSP.
EFER
Extended features enable register.
xxxvi Preface
24593—Rev. 3.14—September 2007 AMD64 Technology
eFLAGS
16-bit or 32-bit flags register. Compare rFLAGS.
EFLAGS
32-bit (extended) flags register.
eIP
16-bit or 32-bit instruction-pointer register. Compare rIP.
EIP
32-bit (extended) instruction-pointer register.
FLAGS
16-bit flags register.
GDTR
Global descriptor table register.
GPRs
General-purpose registers. For the 16-bit data size, these are AX, BX, CX, DX, DI, SI, BP, and SP.
For the 32-bit data size, these are EAX, EBX, ECX, EDX, EDI, ESI, EBP, and ESP. For the 64-bit
data size, these include RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, and R8–R15.
IDTR
Interrupt descriptor table register.
IP
16-bit instruction-pointer register.
LDTR
Local descriptor table register.
MSR
Model-specific register.
r8–r15
The 8-bit R8B–R15B registers, or the 16-bit R8W–R15W registers, or the 32-bit R8D–R15D
registers, or the 64-bit R8–R15 registers.
rAX–rSP
The 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers, or the 32-bit EAX, EBX, ECX, EDX,
EDI, ESI, EBP, and ESP registers, or the 64-bit RAX, RBX, RCX, RDX, RDI, RSI, RBP, and RSP
registers. Replace the placeholder r with nothing for 16-bit size, “E” for 32-bit size, or “R” for 64-
bit size.
Preface xxxvii
AMD64 Technology 24593—Rev. 3.14—September 2007
RAX
64-bit version of the EAX register.
RBP
64-bit version of the EBP register.
RBX
64-bit version of the EBX register.
RCX
64-bit version of the ECX register.
RDI
64-bit version of the EDI register.
RDX
64-bit version of the EDX register.
rFLAGS
16-bit, 32-bit, or 64-bit flags register. Compare RFLAGS.
RFLAGS
64-bit flags register. Compare rFLAGS.
rIP
16-bit, 32-bit, or 64-bit instruction-pointer register. Compare RIP.
RIP
64-bit instruction-pointer register.
RSI
64-bit version of the ESI register.
RSP
64-bit version of the ESP register.
SP
Stack pointer register.
SS
Stack segment register.
TPR
Task priority register (CR8), a new register introduced in the AMD64 architecture to speed
interrupt management.
xxxviii Preface
24593—Rev. 3.14—September 2007 AMD64 Technology
TR
Task register.
Endian Order
The x86 and AMD64 architectures address memory using little-endian byte-ordering. Multibyte
values are stored with their least-significant byte at the lowest byte address, and they are illustrated
with their least significant byte at the right side. Strings are illustrated in reverse order, because the
addresses of their bytes increase from right to left.
Related Documents
• Peter Abel, IBM PC Assembly Language and Programming, Prentice-Hall, Englewood Cliffs, NJ,
1995.
• Rakesh Agarwal, 80x86 Architecture & Programming: Volume II, Prentice-Hall, Englewood
Cliffs, NJ, 1991.
• AMD data sheets and application notes for particular hardware implementations of the AMD64
architecture.
• AMD, AMD-K6™ MMX™ Enhanced Processor Multimedia Technology, Sunnyvale, CA, 2000.
• AMD, 3DNow!™ Technology Manual, Sunnyvale, CA, 2000.
• AMD, AMD Extensions to the 3DNow!™ and MMX™ Instruction Sets, Sunnyvale, CA, 2000.
• AMD, SYSCALL and SYSRET Instruction Specification Application Note, Sunnyvale, CA, 1998.
• Don Anderson and Tom Shanley, Pentium Processor System Architecture, Addison-Wesley, New
York, 1995.
• Nabajyoti Barkakati and Randall Hyde, Microsoft Macro Assembler Bible, Sams, Carmel, Indiana,
1992.
• Barry B. Brey, 8086/8088, 80286, 80386, and 80486 Assembly Language Programming,
Macmillan Publishing Co., New York, 1994.
• Barry B. Brey, Programming the 80286, 80386, 80486, and Pentium Based Personal Computer,
Prentice-Hall, Englewood Cliffs, NJ, 1995.
• Ralf Brown and Jim Kyle, PC Interrupts, Addison-Wesley, New York, 1994.
• Penn Brumm and Don Brumm, 80386/80486 Assembly Language Programming, Windcrest
McGraw-Hill, 1993.
• Geoff Chappell, DOS Internals, Addison-Wesley, New York, 1994.
• Chips and Technologies, Inc. Super386 DX Programmer’s Reference Manual, Chips and
Technologies, Inc., San Jose, 1992.
• John Crawford and Patrick Gelsinger, Programming the 80386, Sybex, San Francisco, 1987.
• Cyrix Corporation, 5x86 Processor BIOS Writer's Guide, Cyrix Corporation, Richardson, TX,
1995.
Preface xxxix
AMD64 Technology 24593—Rev. 3.14—September 2007
• Cyrix Corporation, M1 Processor Data Book, Cyrix Corporation, Richardson, TX, 1996.
• Cyrix Corporation, MX Processor MMX Extension Opcode Table, Cyrix Corporation, Richardson,
TX, 1996.
• Cyrix Corporation, MX Processor Data Book, Cyrix Corporation, Richardson, TX, 1997.
• Ray Duncan, Extending DOS: A Programmer's Guide to Protected-Mode DOS, Addison Wesley,
NY, 1991.
• William B. Giles, Assembly Language Programming for the Intel 80xxx Family, Macmillan, New
York, 1991.
• Frank van Gilluwe, The Undocumented PC, Addison-Wesley, New York, 1994.
• John L. Hennessy and David A. Patterson, Computer Architecture, Morgan Kaufmann Publishers,
San Mateo, CA, 1996.
• Thom Hogan, The Programmer’s PC Sourcebook, Microsoft Press, Redmond, WA, 1991.
• Hal Katircioglu, Inside the 486, Pentium, and Pentium Pro, Peer-to-Peer Communications, Menlo
Park, CA, 1997.
• IBM Corporation, 486SLC Microprocessor Data Sheet, IBM Corporation, Essex Junction, VT,
1993.
• IBM Corporation, 486SLC2 Microprocessor Data Sheet, IBM Corporation, Essex Junction, VT,
1993.
• IBM Corporation, 80486DX2 Processor Floating Point Instructions, IBM Corporation, Essex
Junction, VT, 1995.
• IBM Corporation, 80486DX2 Processor BIOS Writer's Guide, IBM Corporation, Essex Junction,
VT, 1995.
• IBM Corporation, Blue Lightning 486DX2 Data Book, IBM Corporation, Essex Junction, VT,
1994.
• Institute of Electrical and Electronics Engineers, IEEE Standard for Binary Floating-Point
Arithmetic, ANSI/IEEE Std 754-1985.
• Institute of Electrical and Electronics Engineers, IEEE Standard for Radix-Independent Floating-
Point Arithmetic, ANSI/IEEE Std 854-1987.
• Muhammad Ali Mazidi and Janice Gillispie Mazidi, 80X86 IBM PC and Compatible Computers,
Prentice-Hall, Englewood Cliffs, NJ, 1997.
• Hans-Peter Messmer, The Indispensable Pentium Book, Addison-Wesley, New York, 1995.
• Karen Miller, An Assembly Language Introduction to Computer Architecture: Using the Intel
Pentium, Oxford University Press, New York, 1999.
• Stephen Morse, Eric Isaacson, and Douglas Albert, The 80386/387 Architecture, John Wiley &
Sons, New York, 1987.
• NexGen Inc., Nx586TM Processor Data Book, NexGen Inc., Milpitas, CA, 1993.
• NexGen Inc., Nx686TM Processor Data Book, NexGen Inc., Milpitas, CA, 1994.
xl Preface
24593—Rev. 3.14—September 2007 AMD64 Technology
• Bipin Patwardhan, Introduction to the Streaming SIMD Extensions in the Pentium® III,
www.x86.org/articles/sse_pt1/ simd1.htm, June, 2000.
• Peter Norton, Peter Aitken, and Richard Wilton, PC Programmer’s Bible, Microsoft Press,
Redmond, WA, 1993.
• PharLap 386|ASM Reference Manual, Pharlap, Cambridge MA, 1993.
• PharLap TNT DOS-Extender Reference Manual, Pharlap, Cambridge MA, 1995.
• Sen-Cuo Ro and Sheau-Chuen Her, i386/i486 Advanced Programming, Van Nostrand Reinhold,
New York, 1993.
• Jeffrey P. Royer, Introduction to Protected Mode Programming, course materials for an onsite
class, 1992.
• Tom Shanley, Protected Mode System Architecture, Addison Wesley, NY, 1996.
• SGS-Thomson Corporation, 80486DX Processor SMM Programming Manual, SGS-Thomson
Corporation, 1995.
• Walter A. Triebel, The 80386DX Microprocessor, Prentice-Hall, Englewood Cliffs, NJ, 1992.
• John Wharton, The Complete x86, MicroDesign Resources, Sebastopol, California, 1994.
• Web sites and newsgroups:
- www.amd.com
- news.comp.arch
- news.comp.lang.asm.x86
- news.intel.microprocessors
- news.microsoft
Preface xli
AMD64 Technology 24593—Rev. 3.14—September 2007
xlii Preface
24593—Rev. 3.14—September 2007 AMD64 Technology
1 System-Programming Overview
This entire volume is intended for system-software developers—programmers writing operating
systems, loaders, linkers, device drivers, or utilities that require access to system resources. These
system resources are generally available only to software running at the highest-privilege level
(CPL=0), also referred to as privileged software. Privilege levels and their interactions are fully
described in “Segment-Protection Overview” on page 93.
This chapter introduces the basic features and capabilities of the AMD64 architecture that are available
to system-software developers. The concepts include:
• The supported address forms and how memory is organized.
• How memory-management hardware makes use of the various address forms to access memory.
• The processor operating modes, and how the memory-management hardware supports each of
those modes.
• The system-control registers used to manage system resources.
• The interrupt and exception mechanism, and how it is used to interrupt program execution and to
report errors.
• Additional, miscellaneous features available to system software, including support for hardware
multitasking, reporting machine-check exceptions, debugging software problems, and optimizing
software performance.
Many of the legacy features and capabilities are enhanced by the AMD64 architecture to support 64-
bit operating systems and applications, while providing backward-compatibility with existing
software.
System-Programming Overview 1
AMD64 Technology 24593—Rev. 3.14—September 2007
The segment selector specifies an entry in either the global or local descriptor table. The specified
descriptor-table entry describes the segment location in virtual-address space, its size, and other
characteristics. The effective address is used as an offset into the segment specified by the selector.
Logical addresses are often referred to as far pointers. Far pointers are used in software addressing
when the segment reference must be explicit (i.e., a reference to a segment outside the current
segment).
Effective Addresses. The offset into a memory segment is referred to as an effective address (see
“Segmentation” on page 5 for a description of segmented memory). Effective addresses are formed by
adding together elements comprising a base value, a scaled-index value, and a displacement value. The
effective-address computation is represented by the equation
Effective Address = Base + (Scale x Index) + Displacement
2 System-Programming Overview
24593—Rev. 3.14—September 2007 AMD64 Technology
Linear (Virtual) Addresses. The segment-selector portion of a logical address specifies a segment-
descriptor entry in either the global or local descriptor table. The specified segment-descriptor entry
contains the segment-base address, which is the starting location of the segment in linear-address
space. A linear address is formed by adding the segment-base address to the effective address
(segment offset), which creates a reference to any byte location within the supported linear-address
space. Linear addresses are often referred to as virtual addresses, and both terms are used
interchangeably throughout this document.
Linear Address = Segment Base Address + Effective Address
When the flat-memory model is used—as in 64-bit mode—a segment-base address is treated as 0. In
this case, the linear address is identical to the effective address. In long mode, linear addresses must be
in canonical address form, as described in “Canonical Address Form” on page 4.
Physical Addresses. A physical address is a reference into the physical-address space, typically
main memory. Physical addresses are translated from virtual addresses using page-translation
mechanisms. See “Paging” on page 7 for information on how the paging mechanism is used for
virtual-address to physical-address translation. When the paging mechanism is not enabled, the virtual
(linear) address is used as the physical address.
Virtual Memory. Software uses virtual addresses to access locations within the virtual-memory
space. System software is responsible for managing the relocation of applications and data in virtual-
memory space using segment-memory management. System software is also responsible for mapping
virtual memory to physical memory through the use of page translation. The AMD64 architecture
supports different virtual-memory sizes using the following address-translation modes:
• Protected Mode—This mode supports 4 gigabytes of virtual-address space using 32-bit virtual
addresses.
• Long Mode—This mode supports 16 exabytes of virtual-address space using 64-bit virtual
addresses.
System-Programming Overview 3
AMD64 Technology 24593—Rev. 3.14—September 2007
Physical Memory. Physical addresses are used to directly access main memory. For a particular
computer system, the size of the available physical-address space is equal to the amount of main
memory installed in the system. The maximum amount of physical memory accessible depends on the
processor implementation and on the address-translation mode. The AMD64 architecture supports
varying physical-memory sizes using the following address-translation modes:
• Real-Address Mode—This mode, also called real mode, supports 1 megabyte of physical-address
space using 20-bit physical addresses. This address-translation mode is described in “Real
Addressing” on page 10. Real mode is available only from legacy mode (see “Legacy Modes” on
page 14).
• Legacy Protected Mode—This mode supports several different address-space sizes, depending on
the translation mechanism used and whether extensions to those mechanisms are enabled.
Legacy protected mode supports 4 gigabytes of physical-address space using 32-bit physical
addresses. Both segment translation (see “Segmentation” on page 5) and page translation (see
“Paging” on page 7) can be used to access the physical address space, when the processor is
running in legacy protected mode.
When the physical-address size extensions are enabled (see “Physical-Address Extensions (PAE)
Bit” on page 119), the page-translation mechanism can be extended to support 52-bit physical
addresses. 52-bit physical addresses allow up to 4 petabytes of physical-address space to be
supported. (Currently, the AMD64 architecture supports 40-bit addresses in this mode, allowing up
to 1 terabyte of physical-address space to be supported.
• Long Mode—This mode is unique to the AMD64 architecture. This mode supports up to 4
petabytes of physical-address space using 52-bit physical addresses. Long mode requires the use of
page-translation and the physical-address size extensions (PAE).
4 System-Programming Overview
24593—Rev. 3.14—September 2007 AMD64 Technology
1.2.1 Segmentation
Segmentation was originally created as a method by which system software could isolate software
processes (tasks), and the data used by those processes, from one another in an effort to increase the
reliability of systems running multiple processes simultaneously.
The AMD64 architecture is designed to support all forms of legacy segmentation. However, most
modern system software does not use the segmentation features available in the legacy x86
architecture. Instead, system software typically handles program and data isolation using page-level
protection. For this reason, the AMD64 architecture dispenses with multiple segments in 64-bit mode
and, instead, uses a flat-memory model. The elimination of segmentation allows new 64-bit system
software to be coded more simply, and it supports more efficient management of multi-processing than
is possible in the legacy x86 architecture.
Segmentation is, however, used in compatibility mode and legacy mode. Here, segmentation is a form
of base memory-addressing that allows software and data to be relocated in virtual-address space off of
an arbitrary base address. Software and data can be relocated in virtual-address space using one or
more variable-sized memory segments. The legacy x86 architecture provides several methods of
restricting access to segments from other segments so that software and data can be protected from
interfering with each other.
In compatibility and legacy modes, up to 16,383 unique segments can be defined. The base-address
value, segment size (called a limit), protection, and other attributes for each segment are contained in a
data structure called a segment descriptor. Collections of segment descriptors are held in descriptor
tables. Specific segment descriptors are referenced or selected from the descriptor table using a
segment selector register. Six segment-selector registers are available, providing access to as many as
six segments at a time.
Figure 1-1 on page 6 shows an example of segmented memory. Segmentation is described in
Chapter 4, “Segmented Virtual Memory.”
System-Programming Overview 5
AMD64 Technology 24593—Rev. 3.14—September 2007
Virtual Address
Space
Effective Address
Descriptor Table
Selectors Virtual Address
CS Limit
DS Base
ES
FS
Segment
GS Limit
Base
SS
Segment
513-201.eps
Flat Segmentation. One special case of segmented memory is the flat-memory model. In the legacy
flat-memory model, all segment-base addresses have a value of 0, and the segment limits are fixed at
4 Gbytes. Segmentation cannot be disabled but use of the flat-memory model effectively disables
segment translation. The result is a virtual address that equals the effective address. Figure 1-2 on
page 7 shows an example of the flat-memory model.
Software running in 64-bit mode automatically uses the flat-memory model. In 64-bit mode, the
segment base is treated as if it were 0, and the segment limit is ignored. This allows an effective
addresses to access the full virtual-address space supported by the processor.
6 System-Programming Overview
24593—Rev. 3.14—September 2007 AMD64 Technology
Virtual Address
Space
Flat Segment
513-202.eps
1.2.2 Paging
Paging allows software and data to be relocated in physical-address space using fixed-size blocks
called physical pages. The legacy x86 architecture supports three different physical-page sizes of
4 Kbytes, 2 Mbytes, and 4 Mbytes. As with segment translation, access to physical pages by lesser-
privileged software can be restricted.
Page translation uses a hierarchical data structure called a page-translation table to translate virtual
pages into physical-pages. The number of levels in the translation-table hierarchy can be as few as one
or as many as four, depending on the physical-page size and processor operating mode. Translation
tables are aligned on 4-Kbyte boundaries. Physical pages must be aligned on 4-Kbyte, 2-Mbyte, or 4-
Mbyte boundaries, depending on the physical-page size.
Each table in the translation hierarchy is indexed by a portion of the virtual-address bits. The entry
referenced by the table index contains a pointer to the base address of the next-lower-level table in the
translation hierarchy. In the case of the lowest-level table, its entry points to the physical-page base
address. The physical page is then indexed by the least-significant bits of the virtual address to yield
the physical address.
Figure 1-3 on page 8 shows an example of paged memory with three levels in the translation-table
hierarchy. Paging is described in Chapter 5, “Page Translation and Protection.”
System-Programming Overview 7
AMD64 Technology 24593—Rev. 3.14—September 2007
Physical Address
Virtual Address Space
Physical Address
513-203.eps
8 System-Programming Overview
24593—Rev. 3.14—September 2007 AMD64 Technology
The simplest, most efficient method of memory management is the flat-memory model. In the flat-
memory model, all segment base addresses have a value of 0 and the segment limits are fixed at 4
Gbytes. The segmentation mechanism is still used each time a memory reference is made, but because
virtual addresses are identical to effective addresses in this model, the segmentation mechanism is
effectively ignored. Translation of virtual (or effective) addresses to physical addresses takes place
using the paging mechanism only.
Because 64-bit mode disables segmentation, it uses a flat, paged-memory model for memory
management. The 4 Gbyte segment limit is ignored in 64-bit mode. Figure 1-4 shows an example of
this model.
Physical Address
Effective Address Virtual Address
Page Frame
Flat Segment
System-Programming Overview 9
AMD64 Technology 24593—Rev. 3.14—September 2007
Selectors
CS
DS
ES
15 0
FS
Effective Address
GS
SS
19 0 19 0
19 0
Physical Address
513-205.eps
10 System-Programming Overview
24593—Rev. 3.14—September 2007 AMD64 Technology
System-Programming Overview 11
AMD64 Technology 24593—Rev. 3.14—September 2007
Long Mode
CS.L=1 SMI#
64-bit Compatibility
Mode Mode RSM
CS.L=0
CS.L=0
SMI# RSM
CR0.PE=1 CR0.PE=0 Reset
Reset
System
Real SMI#
Management
Mode RSM
Mode
Reset 513-206.eps
12 System-Programming Overview
24593—Rev. 3.14—September 2007 AMD64 Technology
Long Mode Activation.” Long mode features are described throughout this document, where
applicable.
System-Programming Overview 13
AMD64 Technology 24593—Rev. 3.14—September 2007
Real Mode. In this mode, also called real-address mode, the processor supports a physical-memory
space of 1 Mbyte and operand sizes of 16 bits (default) or 32 bits (with instruction prefixes). Interrupt
handling and address generation are nearly identical to the 80286 processor's real mode. Paging is not
supported. All software runs at privilege level 0.
Real mode is entered after reset or processor power-up. The mode is not supported when the processor
is operating in long mode because long mode requires that paged protected mode be enabled.
Protected Mode. In this mode, the processor supports virtual-memory and physical-memory spaces
of 4 Gbytes and operand sizes of 16 or 32 bits. All segment translation, segment protection, and
hardware multitasking functions are available. System software can use segmentation to relocate
effective addresses in virtual-address space. If paging is not enabled, virtual addresses are equal to
physical addresses. Paging can be optionally enabled to allow translation of virtual addresses to
physical addresses and to use the page-based memory-protection mechanisms.
In protected mode, software runs at privilege levels 0, 1, 2, or 3. Typically, application software runs at
privilege level 3, the system software runs at privilege levels 0 and 1, and privilege level 2 is available
to system software for other uses. The 16-bit version of this mode was first introduced in the 80286
processor.
Virtual-8086 Mode. Virtual-8086 mode allows system software to run 16-bit real-mode software on a
virtualized-8086 processor. In this mode, software written for the 8086, 8088, 80186, or 80188
processor can run as a privilege-level-3 task under protected mode. The processor supports a virtual-
memory space of 1 Mbytes and operand sizes of 16 bits (default) or 32 bits (with instruction prefixes),
and it uses real-mode address translation.
Virtual-8086 mode is enabled by setting the virtual-machine bit in the EFLAGS register
(EFLAGS.VM). EFLAGS.VM can only be set or cleared when the EFLAGS register is loaded from
the TSS as a result of a task switch, or by executing an IRET instruction from privileged software. The
POPF instruction cannot be used to set or clear the EFLAGS.VM bit.
Virtual-8086 mode is not supported when the processor is operating in long mode. When long mode is
enabled, any attempt to enable virtual-8086 mode is silently ignored.
14 System-Programming Overview
24593—Rev. 3.14—September 2007 AMD64 Technology
System-Programming Overview 15
AMD64 Technology 24593—Rev. 3.14—September 2007
DR7 MCG_STAT
TR
513-260.eps
Also defined as system registers are a number of model-specific registers included in the AMD64
architectural definition, and shown in Figure 1-7:
• Extended-Feature-Enable Register—The EFER register is used to enable and report status on
special features not controlled by the CRn control registers. In particular, EFER is used to control
activation of long mode. See “Extended Feature Enable Register (EFER)” on page 54 for more
information.
16 System-Programming Overview
24593—Rev. 3.14—September 2007 AMD64 Technology
System-Programming Overview 17
AMD64 Technology 24593—Rev. 3.14—September 2007
Descriptor Tables
Page-Translation Tables
513-261.eps
18 System-Programming Overview
24593—Rev. 3.14—September 2007 AMD64 Technology
interrupt handlers. System software must initialize the global-descriptor and interrupt-descriptor
tables, while use of the local-descriptor table is optional. See “Descriptor Tables” on page 71 for
more information.
• Task-State Segment—The task-state segment is a special segment for holding processor-state
information for a specific program, or task. It also contains the stack pointers used when switching
to more-privileged programs. The hardware multitasking mechanism uses the state information in
the segment when suspending and resuming a task. Calls and interrupts that switch stacks cause the
stack pointers to be read from the task-state segment. System software must create at least one
task-state segment, even if hardware multitasking is not used. See “Legacy Task-State Segment”
on page 313, and “64-Bit Task State Segment” on page 317 for details.
• Page-Translation Tables—Use of page translation is optional in protected mode, but it is required
in long mode. A four-level page-translation data structure is provided to allow long-mode
operating systems to translate a 64-bit virtual-address space into a 52-bit physical-address space.
Legacy protected mode can use two- or three-level page-translation data structures. See “Page
Translation Overview” on page 115 for more information on page translation.
1.6 Interrupts
The AMD64 architecture provides a mechanism for the processor to automatically suspend (interrupt)
software execution and transfer control to an interrupt handler when an interrupt or exception occurs.
An interrupt handler is privileged software designed to identify and respond to the cause of an interrupt
or exception, and return control back to the interrupted software. Interrupts can be caused when
system hardware signals an interrupt condition using one of the external-interrupt signals on the
processor. Interrupts can also be caused by software that executes an interrupt instruction. Exceptions
occur when the processor detects an abnormal condition as a result of executing an instruction. The
term “interrupts” as used throughout this volume includes both interrupts and exceptions when the
distinction is unnecessary.
System software not only sets up the interrupt handlers, but it must also create and initialize the data
structures the processor uses to execute an interrupt handler when an interrupt occurs. The data
structures include the code-segment descriptors for the interrupt-handler software and any data-
segment descriptors for data and stack accesses. Interrupt-gate descriptors must also be supplied.
Interrupt gates point to interrupt-handler code-segment descriptors, and the entry point in an interrupt
handler. Interrupt gates are stored in the interrupt-descriptor table. The code-segment and data-
segment descriptors are stored in the global-descriptor table and, optionally, the local-descriptor table.
When an interrupt occurs, the processor uses the interrupt vector to find the appropriate interrupt gate
in the interrupt-descriptor table. The gate points to the interrupt-handler code segment and entry point,
and the processor transfers control to that location. Before invoking the interrupt handler, the processor
saves information required to return to the interrupted program. For details on how the processor
transfers control to interrupt handlers, see “Legacy Protected-Mode Interrupt Control Transfers” on
page 231, and “Long-Mode Interrupt Control Transfers” on page 241.
System-Programming Overview 19
AMD64 Technology 24593—Rev. 3.14—September 2007
Table 1-2 shows the supported interrupts and exceptions, ordered by their vector number. Refer to
“Vectors” on page 208 for a complete description of each interrupt, and a description of the interrupt
mechanism.
20 System-Programming Overview
24593—Rev. 3.14—September 2007 AMD64 Technology
Support for hardware multitasking is provided by implementations of the AMD64 architecture when
software is running in legacy mode. Hardware multitasking provides automated mechanisms for
switching tasks, saving the execution state of the suspended task, and restoring the execution state of
the resumed task. When hardware multitasking is used to switch tasks, the processor takes the
following actions:
• The processor automatically suspends execution of the task, allowing any executing instructions to
complete and save their results.
• The execution state of a task is saved in the task TSS.
• The execution state of a new task is loaded into the processor from its TSS.
• The processor begins executing the new task at the location specified in the new task TSS.
Use of hardware-multitasking features is optional in legacy mode. Generally, modern operating
systems do not use the hardware-multitasking features, and instead perform task management entirely
in software. Long mode does not support hardware multitasking at all.
Whether hardware multitasking is used or not, system software must create and initialize at least one
task-state segment data-structure. This requirement holds for both long-mode and legacy-mode
software. The single task-state segment holds critical pieces of the task execution environment and is
referenced during certain control transfers.
Detailed information on hardware multitasking is available in Chapter 12, “Task Management,” along
with a full description of the requirements that must be met in initializing a task-state segment when
hardware multitasking is not used.
System-Programming Overview 21
AMD64 Technology 24593—Rev. 3.14—September 2007
22 System-Programming Overview
24593—Rev. 3.14—September 2007 AMD64 Technology
64-Bit Mode. 64-bit mode provides full support for 64-bit system software and applications. The new
features introduced in support of 64-bit mode are summarized throughout this chapter. To use 64-bit
mode, a 64-bit operating system and tool chain are required.
Compatibility Mode. Compatibility mode allows 64-bit operating systems to implement binary
compatibility with existing 16-bit and 32-bit x86 applications. It allows these applications to run,
without recompilation, under control of a 64-bit operating system in long mode. The architectural
enhancements introduced by the AMD64 architecture that support compatibility mode are
summarized throughout this chapter.
Unsupported Modes. Long mode does not support the following two operating modes:
• Virtual-8086 Mode—The virtual-8086 mode bit (EFLAGS.VM) is ignored when the processor is
running in long mode. When long mode is enabled, any attempt to enable virtual-8086 mode is
silently ignored. System software must leave long mode in order to use virtual-8086 mode.
• Real Mode—Real mode is not supported when the processor is operating in long mode because
long mode requires that protected mode be enabled.
systems. Legacy mode supports real mode, protected mode, and virtual-8086 mode. A reset always
places the processor in legacy mode (real mode), and the processor continues to run in legacy mode
until system software activates long mode. New features added by the AMD64 architecture that are
supported in legacy mode are summarized in this chapter.
Page-Size Extensions (PSE). Page-size extensions (CR4.PSE) are ignored in long mode. Long
mode does not support the 4-Mbyte page size enabled by page-size extensions. Long mode does,
however, support 4-Kbyte and 2-Mbyte page sizes.
Paging Data Structures. The AMD64 architecture extends the page-translation data structures in
support of long mode. The extensions are:
• Page-map level-4 (PML4)—Long mode defines a new page-translation data structure, the PML4
table. The PML4 table sits at the top of the page-translation hierarchy and references PDP tables.
• Page-directory pointer (PDP)—The PDP tables in long mode are expanded from 4 entries to 512
entries each.
• Page-directory pointer entry (PDPE)—Previously undefined fields within the legacy-mode PDPE
are defined by the AMD64 architecture.
CR3 Register. The CR3 register is expanded to 64 bits for use in long-mode page translation. When
long mode is active, the CR3 register references the base address of the PML4 table. In legacy mode,
the upper 32 bits of CR3 are masked by the processor to support legacy page translation. CR3
references the PDP base-address when physical-address extensions are enabled, or the page-directory
table base-address when physical-address extensions are disabled.
• PSE—The use of page-size extensions allows legacy mode software to define 4-Mbyte pages using
the 32-bit page-translation tables. When page-size extensions are enabled (CR4.PSE=1), the
AMD64 architecture enhances the 4-Mbyte PDE to support 40 physical-address bits.
See “Legacy-Mode Page Translation” on page 120 for more information on these enhancements.
2.2.3 Segmentation
In long mode, the effects of segmentation depend on whether the processor is running in compatibility
mode or 64-bit mode:
• In compatibility mode, segmentation functions just as it does in legacy mode, using legacy 16-bit
or 32-bit protected mode semantics.
• 64-bit mode requires a flat-memory model for creating a flat 64-bit virtual-address space. Much of
the segmentation capability present in legacy mode and compatibility mode is disabled when the
processor is running in 64-bit mode.
The differences in the segmentation model as defined by the AMD64 architecture are summarized in
the following sections. See Chapter 4, “Segmented Virtual Memory,” for a thorough description of
these differences.
Descriptor-Table Registers. In long mode, the base-address portion of the descriptor-table registers
(GDTR, IDTR, LDTR, and TR) are expanded to 64 bits. The full 64-bit base address can only be
loaded by software when the processor is running in 64-bit mode (using the LGDT, LIDT, LLDT, and
LTR instructions, respectively). However, the full 64-bit base address is used by a processor running in
compatibility mode (in addition to 64-bit mode) when making a reference into a descriptor table.
A processor running in legacy mode can only load the low 32 bits of the base address, and the high 32
bits are ignored when references are made to the descriptor tables.
Data-Segment Descriptors. The following differences exist for data-segment descriptors in 64-bit
mode only:
• The DS, ES, and SS descriptor base-address fields are ignored by the processor.
• The FS and GS descriptor base-address fields are expanded to 64 bits and used in effective-address
calculations. The 64 bits of base address are mapped to model-specific registers (MSRs), and can
only be loaded using the WRMSR instruction.
• The limit fields and attribute fields of all data-segment descriptors (DS, ES, FS, GS, and SS) are
ignored by the processor.
In compatibility mode, the processor treats data-segment descriptors as it does in legacy mode.
Compatibility mode ignores the high 32 bits of base address in the FS and GS segment descriptors
when calculating an effective address.
System-Segment Descriptors. In 64-bit mode only, The LDT and TSS system-segment descriptor
formats are expanded by 64 bits, allowing them to hold 64-bit base addresses. LLDT and LTR
instructions can be used to load these descriptors into the LDTR and TR registers, respectively, from
64-bit mode.
In compatibility mode and legacy mode, the formats of the LDT and TSS system-segment descriptors
are unchanged. Also, unlike code-segment and data-segment descriptors, system-segment descriptor
limits are checked by the processor in long mode.
Some legacy mode LDT and TSS type-field encodings are illegal in long mode (both compatibility
mode and 64-bit mode), and others are redefined to new types. See “System Descriptors” on page 88
for additional information.
Gate Descriptors. The following differences exist between gate descriptors in long mode (both
compatibility mode and 64-bit mode) and in legacy mode:
• In long mode, all 32-bit gate descriptors are redefined as 64-bit gate descriptors, and are expanded
to hold 64-bit offsets. The length of a gate descriptor in long mode is therefore 128 bits (16 bytes),
versus the 64 bits (8 bytes) in legacy mode.
• Some type-field encodings are illegal in long mode, and others are redefined to new types. See
“Gate Descriptors” on page 90 for additional information.
• The interrupt-gate and trap-gate descriptors define a new field, called the interrupt-stack table
(IST) field.
• Code and data segments used in 64-bit mode are treated as both readable and writable.
See “Page-Protection Checks” on page 142 and “Segment-Protection Overview” on page 93 for
detailed information on the protection-check changes.
2.4 Registers
The AMD64 architecture adds additional registers to the architecture, and in many cases expands the
size of existing registers to 64 bits. The 80-bit floating-point stack registers and their overlaid 64-bit
MMX™ registers are not modified by the AMD64 architecture.
• Specify additional control registers. One additional control register, CR8, is defined in 64-bit
mode.
• Specify additional debug registers (although none are currently defined).
Not all instructions require a REX prefix. The prefix is necessary only if an instruction references one
of the extended registers or uses a 64-bit operand. If a REX prefix is used when it has no meaning, it is
ignored.
Default 64-Bit Operand Size. In 64-bit mode, two groups of instructions have a default operand size
of 64 bits and thus do not need a REX prefix for this operand size:
• Near branches.
• All instructions, except far branches, that implicitly reference the RSP. See “Instructions that
Reference RSP” on page 31 for additional information.
Operand-Size Overrides. In 64-bit mode, the default operand size is 32 bits. A REX prefix can be
used to specify a 64-bit operand size. Software uses a legacy operand-size (66h) prefix to toggle to 16-
bit operand size. The REX prefix takes precedence over the legacy operand-size prefix.
Zero Extension of Results. In 64-bit mode, when performing 32-bit operations with a GPR
destination, the processor zero-extends the 32-bit result into the full 64-bit destination. Both 8-bit and
16-bit operations on GPRs preserve all unwritten upper bits of the destination GPR. This is consistent
with legacy 16-bit and 32-bit semantics for partial-width results.
Address-Size Overrides. In 64-bit mode, the default-address size is 64 bits. The address size can be
overridden to 32 bits by using the address-size prefix (67h). 16-bit addresses are not supported in 64-
bit mode. In compatibility mode and legacy mode, address-size overrides function the same as in x86
legacy architecture.
Displacements and Immediates. Generally, displacement and immediate values in 64-bit mode are
not extended to 64 bits. They are still limited to 32 bits and are sign extended during effective-address
calculations. In 64-bit mode, however, support is provided for some 64-bit displacement and
immediate forms of the MOV instruction.
Zero Extending 16-Bit and 32-Bit Addresses. All 16-bit and 32-bit address calculations are zero-
extended in long mode to form 64-bit addresses. Address calculations are first truncated to the
effective-address size of the current mode (64-bit mode or compatibility mode), as overridden by any
address-size prefix. The result is then zero-extended to the full 64-bit address width.
2.5.6 Branches
The AMD64 architecture expands two branching mechanisms to accommodate branches in the full 64-
bit virtual-address space:
• In 64-bit mode, near-branch semantics are redefined.
• In both 64-bit and compatibility modes, a 64-bit call-gate descriptor is defined for far calls.
In addition, enhancements are made to the legacy SYSCALL and SYSRET instructions.
Near Branches. In 64-bit mode, the operand size for all near branches defaults to 64 bits (see
Table 2-2 for a listing). Therefore, these instructions update the full 64-bit RIP without the need for a
REX operand-size prefix. The following aspects of near branches default to 64 bits:
• Truncation of the instruction pointer.
• Size of a stack pop or stack push, resulting from a CALL or RET.
• Size of a stack-pointer increment or decrement, resulting from a CALL or RET.
• Size of operand fetched by indirect-branch operand size.
The operand size for near branches can be overridden to 16 bits in 64-bit mode.
Table 2-2. 64-Bit Mode Near Branches, Default 64-Bit Operand Size
Opcode
Mnemonic Description
(hex)
CALL E8, FF/2 Call Procedure Near
Jcc many Jump Conditional Near
JMP E9, EB, FF/4 Jump Near
LOOP E2 Loop
LOOPcc E0, E1 Loop Conditional
RET C3, C2 Return From Call (near)
The address size of near branches is not forced in 64-bit mode. Such addresses are 64 bits by default,
but they can be overridden to 32 bits by a prefix.
The size of the displacement field for relative branches is still limited to 32 bits.
Far Branches Through Long-Mode Call Gates. Long mode redefines the 32-bit call-gate
descriptor type as a 64-bit call-gate descriptor and expands the call-gate descriptor size to hold a 64-bit
offset. The long-mode call-gate descriptor allows far branches to reference any location in the
supported virtual-address space. In long mode, the call-gate mechanism is changed as follows:
• In long mode, CALL and JMP instructions that reference call-gates must reference 64-bit call
gates.
• A 64-bit call-gate descriptor must reference a 64-bit code-segment.
• When a control transfer is made through a 64-bit call gate, the 64-bit target address is read from the
64-bit call-gate descriptor. The base address in the target code-segment descriptor is ignored.
Stack Switching. Automatic stack switching is also modified when a control transfer occurs through
a call gate in long mode:
• The target-stack pointer read from the TSS is a 64-bit RSP value.
• The SS register is loaded with a null selector. Setting the new SS selector to null allows nested
control transfers in 64-bit mode to be handled properly. The SS.RPL value is updated to remain
consistent with the newly loaded CPL value.
• The size of pushes onto the new stack is modified to accommodate the 64-bit RIP and RSP values.
• Automatic parameter copying is not supported in long mode.
Far Returns. In long mode, far returns can load a null SS selector from the stack under the following
conditions:
• The target operating mode is 64-bit mode.
• The target CPL<3.
Allowing RET to load SS with a null selector under these conditions makes it possible for the
processor to unnest far CALLs (and interrupts) in long mode.
Task Gates. Control transfers through task gates are not supported in long mode.
Branches to 64-Bit Offsets. Because immediate values are generally limited to 32 bits, the only way
a full 64-bit absolute RIP can be specified in 64-bit mode is with an indirect branch. For this reason,
direct forms of far branches are eliminated from the instruction set in 64-bit mode.
SYSCALL and SYSRET Instructions. The AMD64 architecture expands the function of the legacy
SYSCALL and SYSRET instructions in long mode. In addition, two new STAR registers, LSTAR and
CSTAR, are provided to hold the 64-bit target RIP for the instructions when they are executed in long
mode. The legacy STAR register is not expanded in long mode. See “SYSCALL and SYSRET” on
page 150 for additional information.
SWAPGS Instruction. The AMD64 architecture provides the SWAPGS instruction as a fast method
for system software to load a pointer to system data-structures. SWAPGS is valid only in 64-bit mode.
An undefined-opcode exception (#UD) occurs if software attempts to execute SWAPGS in legacy
mode or compatibility mode. See “SWAPGS Instruction” on page 152 for additional information.
SYSENTER and SYSEXIT Instructions. The SYSENTER and SYSEXIT instructions are invalid in
long mode, and result in an invalid opcode exception (#UD) if software attempts to use them. Software
should use the SYSCALL and SYSRET instructions when running in long mode. See “SYSENTER
and SYSEXIT (Legacy Mode Only)” on page 152 for additional information.
Table 2-5 on page 36 lists the instructions that are no longer valid in 64-bit mode because their
opcodes have been reassigned. The reassigned opcodes are used in 64-bit mode as REX instruction
prefixes.
Long-Mode Stack Switches. When stacks are switched as part of a long-mode privilege-level
change resulting from an interrupt, the following occurs:
• The target-stack pointer read from the TSS is a 64-bit RSP value.
• The SS register is loaded with a null selector. Setting the new SS selector to null allows nested
control transfers in 64-bit mode to be handled properly. The SS.RPL value is cleared to 0.
• The old SS and RSP are saved on the new stack.
Interrupt Stack Table. In long mode, a new interrupt stack table (IST) mechanism is available as an
alternative to the modified legacy stack-switching mechanism. The IST mechanism unconditionally
switches stacks when it is enabled. It can be enabled for individual interrupt vectors using a field in the
IDT entry. This allows mixing interrupt vectors that use the modified legacy mechanism with vectors
that use the IST mechanism. The IST pointers are stored in the long-mode TSS. The IST mechanism is
only available when long mode is enabled.
3 System Resources
The operating system manages the software-execution environment and general system operation
through the use of system resources. These resources consist of system registers (control registers and
model-specific registers) and system-data structures (memory-management and protection tables).
The system-control registers are described in detail in this chapter; many of the features they control
are described elsewhere in this volume. The model-specific registers supported by the AMD64
architecture are introduced in this chapter.
Because of their complexity, system-data structures are described in separate chapters. Refer to the
following chapters for detailed information on these data structures:
• Descriptors and descriptor tables are described in “Segmentation Data Structures and Registers”
on page 65.
• Page-translation tables are described in “Legacy-Mode Page Translation” on page 120 and “Long-
Mode Page Translation” on page 128.
• The task-state segment is described in “Legacy Task-State Segment” on page 313 and “64-Bit Task
State Segment” on page 317.
Not all processor implementations are required to support all possible features. The last section in this
chapter addresses processor-feature identification. System software uses the capabilities described in
that section to determine which features are supported so that the appropriate service routines are
loaded.
System Resources 41
AMD64 Technology 24593—Rev. 3.14—September 2007
• EFER—This model-specific register contains status and controls for additional features not
managed by the CR0 and CR4 registers. Included in this register are the long-mode enable and
activation controls introduced by the AMD64 architecture.
Control registers CR1, CR5–CR7, and CR9–CR15 are reserved.
In legacy mode, all control registers and RFLAGS are 32 bits. The EFER register is 64 bits in all
modes. The AMD64 architecture expands all 32-bit system-control registers to 64 bits. In 64-bit mode,
the MOV CRn instructions read or write all 64 bits of these registers (operand-size prefixes are
ignored). In compatibility and legacy modes, control-register writes fill the low 32 bits with data and
the high 32 bits with zeros, and control-register reads return only the low 32 bits.
In 64-bit mode, the high 32 bits of CR0 and CR4 are reserved and must be written with zeros. Writing
a 1 to any of the high 32 bits results in a general-protection exception, #GP(0). All 64 bits of CR2 are
writable. However, the MOV CRn instructions do not check that addresses written to CR2 are within
the virtual-address limitations of the processor implementation.
All CR3 bits are writable, except for unimplemented physical address bits, which must be cleared to 0.
The upper 32 bits of RFLAGS are always read as zero by the processor. Attempts to load the upper 32
bits of RFLAGS with anything other than zero are ignored by the processor.
42 System Resources
24593—Rev. 3.14—September 2007 AMD64 Technology
63 32
Reserved, MBZ
31 30 29 28 19 18 17 16 15 6 5 4 3 2 1 0
P C N A W N E T E M P
Reserved R Reserved
G D W M P E T S M P E
The functions of the CR0 control bits are (unless otherwise noted, all bits are read/write):
Protected-Mode Enable (PE) Bit. Bit 0. Software enables protected mode by setting PE to 1, and
disables protected mode by clearing PE to 0. When the processor is running in protected mode,
segment-protection mechanisms are enabled.
See “Segment-Protection Overview” on page 93 for information on the segment-protection
mechanisms.
Monitor Coprocessor (MP) Bit. Bit 1. Software uses the MP bit with the task-switched control bit
(CR0.TS) to control whether execution of the WAIT/FWAIT instruction causes a device-not-available
exception (#NM) to occur, as follows:
• If both the monitor-coprocessor and task-switched bits are set (CR0.MP=1 and CR0.TS=1), then
executing the WAIT/FWAIT instruction causes a device-not-available exception (#NM).
• If either the monitor-coprocessor or task-switched bits are clear (CR0.MP=0 or CR0.TS=0), then
executing the WAIT/FWAIT instruction proceeds normally.
System Resources 43
AMD64 Technology 24593—Rev. 3.14—September 2007
Software typically should set MP to 1 if the processor implementation supports x87 instructions. This
allows the CR0.TS bit to completely control when the x87-instruction context is saved as a result of a
task switch.
Emulate Coprocessor (EM) Bit. Bit 2. Software forces all x87 instructions to cause a device-not-
available exception (#NM) by setting EM to 1. Likewise, setting EM to 1 forces an invalid-opcode
exception (#UD) when an attempt is made to execute any of the 64-bit or 128-bit media instructions.
The exception handlers can emulate these instruction types if desired. Setting the EM bit to 1 does not
cause an #NM exception when the WAIT/FWAIT instruction is executed.
Task Switched (TS) Bit. Bit 3. When an attempt is made to execute an x87 or media instruction while
TS=1, a device-not-available exception (#NM) occurs. Software can use this mechanism—sometimes
referred to as “lazy context-switching”—to save the unit contexts before executing the next instruction
of those types. As a result, the x87 and media instruction-unit contexts are saved only when necessary
as a result of a task switch.
When a hardware task switch occurs, TS is automatically set to 1. System software that implements
software task-switching rather than using the hardware task-switch mechanism can still use the TS bit
to control x87 and media instruction-unit context saves. In this case, the task-management software
uses a MOV CR0 instruction to explicitly set the TS bit to 1 during a task switch. Software can clear
the TS bit by either executing the CLTS instruction or by writing to the CR0 register directly. Long-
mode system software can use this approach even though the hardware task-switch mechanism is not
supported in long mode.
The CR0.MP bit controls whether the WAIT/FWAIT instruction causes an #NM exception when
TS=1.
Extension Type (ET) Bit. Bit 4, read-only. In some early x86 processors, software set ET to 1 to
indicate support of the 387DX math-coprocessor instruction set. This bit is now reserved and forced to
1 by the processor. Software cannot clear this bit to 0.
Numeric Error (NE) Bit. Bit 5. Clearing the NE bit to 0 disables internal control of x87 floating-point
exceptions and enables external control. When NE is cleared to 0, the IGNNE# input signal controls
whether x87 floating-point exceptions are ignored:
• When IGNNE# is 1, x87 floating-point exceptions are ignored.
• When IGNNE# is 0, x87 floating-point exceptions are reported by setting the FERR# input signal
to 1. External logic can use the FERR# signal as an external interrupt.
When NE is set to 1, internal control over x87 floating-point exception reporting is enabled and the
external reporting mechanism is disabled. It is recommended that software set NE to 1. This enables
optimal performance in handling x87 floating-point exceptions.
Write Protect (WP) Bit. Bit 16. Read-only pages are protected from supervisor-level writes when the
WP bit is set to 1. When WP is cleared to 0, supervisor software can write into read-only pages.
See “Page-Protection Checks” on page 142 for information on the page-protection mechanism.
44 System Resources
24593—Rev. 3.14—September 2007 AMD64 Technology
Alignment Mask (AM) Bit. Bit 18. Software enables automatic alignment checking by setting the
AM bit to 1 when eFLAGS.AC=1. Alignment checking can be disabled by clearing either AM or
eFLAGS.AC to 0. When automatic alignment checking is enabled and CPL=3, a memory reference to
an unaligned operand causes an alignment-check exception (#AC).
Not Writethrough (NW) Bit. Bit 29. Ignored. This bit can be set to 1 or cleared to 0, but its value is
ignored. The NW bit exists only for legacy purposes.
Cache Disable (CD) Bit. Bit 30. When CD is cleared to 0, the internal caches are enabled. When CD
is set to 1, no new data or instructions are brought into the internal caches. However, the processor still
accesses the internal caches when CD=1 under the following situations:
• Reads that hit in an internal cache cause the data to be read from the internal cache that reported the
hit.
• Writes that hit in an internal cache cause the cache line that reported the hit to be written back to
memory and invalidated in the cache.
Cache misses do not affect the internal caches when CD=1. Software can prevent cache access by
writing back and invalidating the caches before setting CD to 1 (this avoids caching the instructions
that set CD to 1).
Setting CD to 1 also causes the processor to ignore the page-level cache-control bits (PWT and PCD)
when paging is enabled. These bits are located in the page-translation tables and CR3 register. See
“Page-Level Writethrough (PWT) Bit” on page 137 and “Page-Level Cache Disable (PCD) Bit” on
page 137 for information on page-level cache control.
See “Memory Caches” on page 176 for information on the internal caches.
Paging Enable (PG) Bit. Bit 31. Software enables page translation by setting PG to 1, and disables
page translation by clearing PG to 0. Page translation cannot be enabled unless the processor is in
protected mode (CR0.PE=1). If software attempts to set PG to 1 when PE is cleared to 0, the processor
causes a general-protection exception (#GP).
See “Page Translation Overview” on page 115 for information on the page-translation mechanism.
Reserved Bits. Bits 28–19, 17, 15–6, and 63–32. When writing the CR0 register, software should set
the values of reserved bits to the values found during the previous CR0 read. No attempt should be
made to change reserved bits, and software should never rely on the values of reserved bits. In long
mode, bits 63–32 are reserved and must be written with zero, otherwise a #GP occurs.
System Resources 45
AMD64 Technology 24593—Rev. 3.14—September 2007
31 0
63 32
31 0
See “CR2 Register” on page 220 for a description of the CR2 register.
The CR3 register is used to point to the base address of the highest-level page-translation table.
31 12 11 5 4 3 2 0
P P
Page-Directory-Table Base Address Reserved C W Reserved
D T
31 5 4 3 2 0
P P
Page-Directory-Pointer-Table Base Address C W Reserved
D T
63 52 51 32
Page-Map Level-4 Table Base Address
Reserved, MBZ
(This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 5 4 3 2 0
P P
Page-Map Level-4 Table Base Address Reserved C W Reserved
D T
46 System Resources
24593—Rev. 3.14—September 2007 AMD64 Technology
The legacy CR3 register is described in “CR3 Register” on page 120, and the long-mode CR3 register
is described in “CR3” on page 128.
63 32
Reserved, MBZ
31 11 10 9 8 7 6 5 4 3 2 1 0
O P P M P P T P V
OSF D
Reserved, MBZ S C G C A S S V M
XSR E
X E E E E E D I E
The function of the CR4 control bits are (all bits are read/write):
Virtual-8086 Mode Extensions (VME) Bit. Bit 0. Setting VME to 1 enables hardware-supported
performance enhancements for software running in virtual-8086 mode. Clearing VME to 0 disables
this support. The enhancements enabled when VME=1 include:
• Virtualized, maskable, external-interrupt control and notification using the VIF and VIP bits in the
rFLAGS register. Virtualizing affects the operation of several instructions that manipulate the
rFLAGS.IF bit.
• Selective intercept of software interrupts (INTn instructions) using the interrupt-redirection bitmap
in the TSS.
System Resources 47
AMD64 Technology 24593—Rev. 3.14—September 2007
Protected-Mode Virtual Interrupts (PVI) Bit. Bit 1. Setting PVI to 1 enables support for protected-
mode virtual interrupts. Clearing PVI to 0 disables this support. When PVI=1, hardware support of two
bits in the rFLAGS register, VIF and VIP, is enabled.
Only the STI and CLI instructions are affected by enabling PVI. Unlike the case when CR0.VME=1,
the interrupt-redirection bitmap in the TSS cannot be used for selective INTn interception.
PVI enhancements are also supported in long mode. See “Virtual Interrupts” on page 247 for more
information on using PVI.
Time-Stamp Disable (TSD) Bit. Bit 2. The TSD bit allows software to control the privilege level at
which the time-stamp counter can be read. When TSD is cleared to 0, software running at any privilege
level can read the time-stamp counter using the RDTSC or RDTSCP instructions. When TSD is set to
1, only software running at privilege-level 0 can execute the RDTSC or RDTSCP instructions.
Debugging Extensions (DE) Bit. Bit 3. Setting the DE bit to 1 enables the I/O breakpoint capability
and enforces treatment of the DR4 and DR5 registers as reserved. Software that accesses DR4 or DR5
when DE=1 causes a invalid opcode exception (#UD).
When the DE bit is cleared to 0, I/O breakpoint capabilities are disabled. Software references to the
DR4 and DR5 registers are aliased to the DR6 and DR7 registers, respectively.
Page-Size Extensions (PSE) Bit. Bit 4. Setting PSE to 1 enables the use of 4-Mbyte physical pages.
With PSE=1, the physical-page size is selected between 4 Kbytes and 4 Mbytes using the page-
directory entry page-size field (PS). Clearing PSE to 0 disables the use of 4-Mbyte physical pages and
restricts all physical pages to 4 Kbytes.
The PSE bit has no effect when physical-address extensions are enabled (CR4.PAE=1). Because long
mode requires CR4.PAE=1, the PSE bit is ignored when the processor is running in long mode.
See “4-Mbyte Page Translation” on page 123 for more information on 4-Mbyte page translation.
Physical-Address Extension (PAE) Bit. Bit 5. Setting PAE to 1 enables the use of physical-address
extensions and 2-Mbyte physical pages. Clearing PAE to 0 disables these features.
With PAE=1, the page-translation data structures are expanded from 32 bits to 64 bits, allowing the
translation of up to 52-bit physical addresses. Also, the physical-page size is selectable between
4 Kbytes and 2 Mbytes using the page-directory-entry page-size field (PS). Long mode requires PAE
to be enabled in order to use the 64-bit page-translation data structures to translate 64-bit virtual
addresses to 52-bit physical addresses.
See “PAE Paging” on page 124 for more information on physical-address extensions.
Machine-Check Enable (MCE) Bit. Bit 6. Setting MCE to 1 enables the machine-check exception
mechanism. Clearing this bit to 0 disables the mechanism. When enabled, a machine-check exception
(#MC) occurs when an uncorrectable machine-check error is encountered.
48 System Resources
24593—Rev. 3.14—September 2007 AMD64 Technology
Regardless of whether machine-check exceptions are enabled, the processor records enabled-errors
when they occur. Error-reporting is performed by the machine-check error-reporting register banks.
Each bank includes a control register for enabling error reporting and a status register for capturing
errors. Correctable machine-check errors are also reported, but they do not cause a machine-check
exception.
See Chapter 9, “Machine Check Mechanism,” for a description of the machine-check mechanism, the
registers used, and the types of errors captured by the mechanism.
Page-Global Enable (PGE) Bit. Bit 7. When page translation is enabled, system-software
performance can often be improved by making some page translations global to all tasks and
procedures. Setting PGE to 1 enables the global-page mechanism. Clearing this bit to 0 disables the
mechanism.
When PGE is enabled, system software can set the global-page (G) bit in the lowest level of the page-
translation hierarchy to 1, indicating that the page translation is global. Page translations marked as
global are not invalidated in the TLB when the page-translation-table base address (CR3) is updated.
When the G bit is cleared, the page translation is not global. All supported physical-page sizes also
support the global-page mechanism. See “Global Pages” on page 140 for information on using the
global-page mechanism.
Performance-Monitoring Counter Enable (PCE) Bit. Bit 8. Setting PCE to 1 allows software
running at any privilege level to use the RDPMC instruction. Software uses the RDPMC instruction to
read the four performance-monitoring MSRs, PerfCTR[3:0]. Clearing PCE to 0 allows only the most-
privileged software (CPL=0) to use the RDPMC instruction.
FXSAVE/FXRSTOR Support (OSFXSR) Bit. Bit 9. System software must set the OSFXSR bit to 1
to enable use of the 128-bit media instructions. When this bit is set to 1, it also indicates that system
software uses the FXSAVE and FXRSTOR instructions to save and restore the processor state for the
x87, 64-bit media, and 128-bit media instructions.
Clearing the OSFXSR bit to 0 indicates that 128-bit media instructions cannot be used. Attempts to use
those instructions while this bit is clear result in an invalid-opcode exception (#UD). Software can
continue to use the FXSAVE/FXRSTOR instructions for saving and restoring the processor state for
the x87 and 64-bit media instructions.
Unmasked Exception Support (OSXMMEXCPT) Bit. Bit 10. System software must set the
OSXMMEXCPT bit to 1 when it supports the SIMD floating-point exception (#XF) for handling of
unmasked 128-bit media floating-point errors. Clearing the OSXMMEXCPT bit to 0 indicates the
#XF handler is not supported. When OSXMMEXCPT=0, unmasked 128-bit media floating-point
exceptions cause an invalid-opcode exception (#UD). See “SIMD Floating-Point Exception Causes”
in Volume 1 for more information on 128-bit media unmasked floating-point exceptions.
System Resources 49
AMD64 Technology 24593—Rev. 3.14—September 2007
50 System Resources
24593—Rev. 3.14—September 2007 AMD64 Technology
63 32
Reserved, RAZ
31 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
V V
I A V R N O D I T S Z A P C
Reserved, RAZ I I 0 IOPL 0 0 1
D C M F T F F F F F F F F F
P F
The functions of the RFLAGS control and status bits used by application software are described in
“Flags Register” in Volume 1. The functions of RFLAGS system bits are (unless otherwise noted, all
bits are read/write):
Trap Flag (TF) Bit. Bit 8. Software sets the TF bit to 1 to enable single-step mode during software
debug. Clearing this bit to 0 disables single-step mode.
When single-step mode is enabled, a debug exception (#DB) occurs after each instruction completes
execution. Single stepping begins with the instruction following the instruction that sets TF. Single
stepping is disabled (TF=0) when the #DB exception occurs or when any exception or interrupt occurs.
System Resources 51
AMD64 Technology 24593—Rev. 3.14—September 2007
See “Single Stepping” on page 339 for information on using the single-step mode during debugging.
Interrupt Flag (IF) Bit. Bit 9. Software sets the IF bit to 1 to enable maskable interrupts. Clearing this
bit to 0 causes the processor to ignore maskable interrupts. The state of the IF bit does not affect the
response of a processor to non-maskable interrupts, software-interrupt instructions, or exceptions.
The ability to modify the IF bit depends on several factors:
• The current privilege-level (CPL)
• The I/O privilege level (RFLAGS.IOPL)
• Whether or not virtual-8086 mode extensions are enabled (CR4.VME=1)
• Whether or not protected-mode virtual interrupts are enabled (CR4.PVI=1)
See “Masking External Interrupts” on page 207 for information on interrupt masking. See “Accessing
the RFLAGs Register” on page 154 for information on the specific instructions used to modify the IF
bit.
I/O Privilege Level Field (IOPL) Field. Bits 13–12. The IOPL field specifies the privilege level
required to execute I/O address-space instructions (i.e., instructions that address the I/O space rather
than memory-mapped I/O, such as IN, OUT, INS, OUTS, etc.). For software to execute these
instructions, the current privilege-level (CPL) must be equal to or higher than (lower numerical value
than) the privilege specified by IOPL (CPL <= IOPL). If the CPL is lower than (higher numerical value
than) that specified by the IOPL (CPL > IOPL), the processor causes a general-protection exception
(#GP) when software attempts to execute an I/O instruction. See “Protected-Mode I/O” in Volume 1
for information on how IOPL controls access to address-space I/O.
Virtual-8086 mode uses IOPL to control virtual interrupts and the IF bit when virtual-8086 mode
extensions are enabled (CR4.VME=1). The protected-mode virtual-interrupt mechanism (PVI) also
uses IOPL to control virtual interrupts and the IF bit when PVI is enabled (CR4.PVI=1). See “Virtual
Interrupts” on page 247 for information on how IOPL is used by the virtual interrupt mechanism.
Nested Task (NT) Bit. Bit 14, IRET reads the NT bit to determine whether the current task is nested
within another task. When NT is set to 1, the current task is nested within another task. When NT is
cleared to 0, the current task is at the top level (not nested).
The processor sets the NT bit during a task switch resulting from a CALL, interrupt, or exception
through a task gate. When an IRET is executed from legacy mode while the NT bit is set, a task switch
occurs. See “Task Switches Using Task Gates” on page 323 for information on switching tasks using
task gates, and “Nesting Tasks” on page 325 for information on task nesting.
Resume Flag (RF) Bit. Bit 16. The RF bit allows an instruction to be restarted following an
instruction breakpoint resulting in a debug exception (#DB). This bit prevents multiple debug
exceptions from occurring on the same instruction.
52 System Resources
24593—Rev. 3.14—September 2007 AMD64 Technology
The processor clears the RF bit after every instruction is successfully executed, except when the
instruction is:
• An IRET that sets the RF bit.
• JMP, CALL, or INTn through a task gate.
In both of the above cases, RF is not cleared to 0 until the next instruction successfully executes.
When an exception occurs (or when a string instruction is interrupted), the processor normally sets
RF=1 in the rFLAGS image saved on the interrupt stack. However, when a #DB exception occurs as a
result of an instruction breakpoint, the processor clears the RF bit to 0 in the interrupt-stack rFLAGS
image.
For instruction restart to work properly following an instruction breakpoint, the #DB exception
handler must set RF to 1 in the interrupt-stack rFLAGS image. When an IRET is later executed to
return to the instruction that caused the instruction-breakpoint #DB exception, the set RF bit (RF=1) is
loaded from the interrupt-stack rFLAGS image. RF is not cleared by the processor until the instruction
causing the #DB exception successfully executes.
Virtual-8086 Mode (VM) Bit. Bit 17. Software sets the VM bit to 1 to enable virtual-8086 mode.
Software clears the VM bit to 0 to disable virtual-8086 mode. System software can only change this bit
using a task switch or an IRET. It cannot modify the bit using the POPFD instruction.
Alignment Check (AC) Bit. Bit 18. Software enables automatic alignment checking by setting the
AC bit to 1 when CR0.AM=1. Alignment checking can be disabled by clearing either AC or CR0.AM
to 0. When automatic alignment checking is enabled and the current privilege-level (CPL) is 3 (least
privileged), a memory reference to an unaligned operand causes an alignment-check exception (#AC).
Virtual Interrupt (VIF) Bit. Bit 19. The VIF bit is a virtual image of the RFLAGS.IF bit. It is enabled
when either virtual-8086 mode extensions are enabled (CR4.VME=1) or protected-mode virtual
interrupts are enabled (CR4.PVI=1), and the RFLAGS.IOPL field is less than 3. When enabled,
instructions that ordinarily would modify the IF bit actually modify the VIF bit with no effect on the
RFLAGS.IF bit.
System software that supports virtual-8086 mode should enable the VIF bit using CR4.VME. This
allows 8086 software to execute instructions that can set and clear the RFLAGS.IF bit without causing
an exception. With VIF enabled in virtual-8086 mode, those instructions set and clear the VIF bit
instead, giving the appearance to the 8086 software that it is modifying the RFLAGS.IF bit. System
software reads the VIF bit to determine whether or not to take the action desired by the 8086 software
(enabling or disabling interrupts by setting or clearing the RFLAGS.IF bit).
In long mode, the use of the VIF bit is supported when CR4.PVI=1. See “Virtual Interrupts” on
page 247 for more information on virtual interrupts.
Virtual Interrupt Pending (VIP) Bit. Bit 20. The VIP bit is provided as an extension to both virtual-
8086 mode and protected mode. It is used by system software to indicate that an external, maskable
interrupt is pending (awaiting) execution by either a virtual-8086 mode or protected-mode interrupt-
System Resources 53
AMD64 Technology 24593—Rev. 3.14—September 2007
service routine. Software must enable virtual-8086 mode extensions (CR4.VME=1) or protected-
mode virtual interrupts (CR4.PVI=1) before using VIP.
VIP is normally set to 1 by a protected-mode interrupt-service routine that was entered from virtual-
8086 mode as a result of an external, maskable interrupt. Before returning to the virtual-8086 mode
application, the service routine sets VIP to 1 if EFLAGS.VIF=1. When the virtual-8086 mode
application attempts to enable interrupts by clearing EFLAGS.VIF to 0 while VIP=1, a general-
protection exception (#GP) occurs. The #GP service routine can then decide whether to allow the
virtual-8086 mode service routine to handle the pending external, maskable interrupt. (EFLAGS is
specifically referred to in this case because virtual-8086 mode is supported only from legacy mode.)
In long mode, the use of the VIP bit is supported when CR4.PVI=1. See “Virtual Interrupts” on
page 247 for more information on virtual-8086 mode interrupts and the VIP bit.
Processor Feature Identification (ID) Bit. Bit 21. The ability of software to modify this bit
indicates that the processor implementation supports the CPUID instruction. See “Processor Feature
Identification” on page 61 for more information on the CPUID instruction.
54 System Resources
24593—Rev. 3.14—September 2007 AMD64 Technology
63 32
Reserved, MBZ
31 15 14 13 12 11 10 9 8 7 1 0
F S
F M N L M L S
V
Reserved, MBZ X B X M B M Reserved, RAZ C
S
M
Z E A Z E E
R E
System-Call Extension (SCE) Bit. Bit 0. Setting this bit to 1 enables the SYSCALL and SYSRET
instructions. Application software can use these instructions for low-latency system calls and returns
in a non-segmented (flat) address space. See “Fast System Call and Return” on page 149 for additional
information.
Long Mode Enable (LME) Bit. Bit 8. Setting this bit to 1 enables the processor to activate long mode.
Long mode is not activated until software enables paging some time later. When paging is enabled
after LME is set to 1, the processor sets the EFER.LMA bit to 1, indicating that long mode is not only
enabled but also active. See Chapter 14, “Processor Initialization and Long Mode Activation,” for
more information on activating long mode.
Long Mode Active (LMA) Bit. Bit 10, read-only. This bit indicates that long mode is active. The
processor sets LMA to 1 when both long mode and paging have been enabled by system software. See
Chapter 14, “Processor Initialization and Long Mode Activation,” for more information on activating
long mode.
When LMA=1, the processor is running either in compatibility mode or 64-bit mode, depending on the
value of the L bit in a code-segment descriptor, as shown in Figure 1-6 on page 12.
System Resources 55
AMD64 Technology 24593—Rev. 3.14—September 2007
When LMA=0, the processor is running in legacy mode. In this mode, the processor behaves like a
standard 32-bit x86 processor, with none of the new 64-bit features enabled.
No-Execute Enable (NXE) Bit. Bit 11. Setting this bit to 1 enables the no-execute page-protection
feature. The feature is disabled when this bit is cleared to 0. See “No Execute (NX) Bit” on page 143
for more information.
Before setting NXE, system software should verify the processor supports the feature by examining
the extended-feature flags returned by the CPUID instruction. For more information, see the CPUID
Specification, order# 25481.
Secure Virtual Machine Enable (SVME) Bit. Bit 12. Enables the SVM extensions. When this bit is
zero, the SVM instructions cause #UD exceptions. EFER.SVME defaults to a reset value of zero. The
effect of turning off EFER.SVME while a guest is running is undefined; therefore, the VMM should
always prevent guests from writing EFER. SVM extensions can be disabled by setting
VM_CR.SVME_DISABLE. For more information, see descriptions of LOCK and SMVE_DISABLE
bits in Section 15.28.1, “VM_CR MSR (C001_0114h),” on page 420.
Fast FXSAVE/FXRSTOR (FFXSR) Bit. Bit 14. Setting this bit to 1 enables the FXSAVE and
FXRSTOR instructions to execute faster in 64-bit mode at CPL 0. This is accomplished by not saving
or restoring the XMM registers (XMM0-XMM15). The FFXSR bit has no effect when the
FXSAVE/FXRSTOR instructions are executed in non 64-bit mode, or when CPL > 0. The FFXSR bit
does not affect the save/restore of the legacy x87 floating-point state, or the save/restore of MXCSR.
Before setting FFXSR, system software should verify whether this feature is supported by examining
the CPUID extended feature flags returned by the CPUID instruction. For more information, see
"Function 8000_0001h: Processor Signature and AMD Features" in Volume 3.
56 System Resources
24593—Rev. 3.14—September 2007 AMD64 Technology
The AMD64 architecture includes a number of features that are controlled using MSRs. Those MSRs
are shown in Figure 3-10. The EFER register—described in “Extended Feature Enable Register
(EFER)” on page 54—is also an MSR.
The following sections briefly describe the MSRs in the AMD64 architecture.
System Resources 57
AMD64 Technology 24593—Rev. 3.14—September 2007
specific, and are described in the BIOS writer’s guide for the implementation. Implementation-specific
features are not shown in Figure 3-11.
31 22 21 20 19 18 17 0
T M M M
O V F F
Reserved Reserved
M D D D
2 M M E
The function of the SYSCFG bits are (all bits are read/write unless otherwise noted):
MtrrFixDramEn Bit. Bit 18. Setting this bit to 1 enables use of the RdMem and WrMem attributes in
the fixed-range MTRR registers. When cleared, these attributes are disabled. The RdMem and
WrMem attributes allow system software to define fixed-range IORRs using the fixed-range MTRRs.
See “Extended Fixed-Range MTRR Type-Field Encodings” on page 197 for information on using this
feature.
MtrrFixDramModEn Bit. Bit 19. Setting this bit to 1 allows software to read and write the RdMem
and WrMem bits. When cleared, writes do not modify the RdMem and WrMem bits, and reads return
0. See “Extended Fixed-Range MTRR Type-Field Encodings” on page 197 for information on using
this feature.
MtrrVarDramEn Bit. Bit 20. Setting this bit to 1 enables the TOP_MEM register and the variable-
range IORRs. These registers are disabled when the bit is cleared to 0. See “IORRs” on page 199 and
“Top of Memory” on page 201 for information on using these features.
MtrrTom2En Bit. Bit 21. Setting this bit to 1 enables the TOP_MEM2 register. The register is
disabled when this bit is cleared to 0. See “Top of Memory” on page 201 for information on using this
feature.
58 System Resources
24593—Rev. 3.14—September 2007 AMD64 Technology
STAR, LSTAR, CSTAR, and SFMASK Registers. These registers are used to provide mode-
dependent linkage information for the SYSCALL and SYSRET instructions. STAR is used in legacy
modes, LSTAR in 64-bit mode, and CSTAR in compatibility mode. SFMASK is used by the
SYSCALL instruction for rFLAGS in long mode.
FS.base and GS.base Registers. These registers allow 64-bit base-address values to be specified
for the FS and GS segments, for use in 64-bit mode. See “FS and GS Registers in 64-Bit Mode” on
page 70 for a description of the special treatment the FS and GS segments receive.
KernelGSbase Register. This register is used by the SWAPGS instruction. This instruction
exchanges the value located in KernelGSbase with the value located in GS.base.
MTRRcap Register. This register contains information describing the level of MTRR support
provided by the processor.
MTRRdefType Register. This register establishes the default memory type to be used for physical
memory that is not specifically characterized using the fixed-range and variable-range MTRRs.
MTRRphysBasen and MTRRphysMaskn Registers. These registers form a register pair that can
be used to characterize any address range within the physical-memory space, including all of physical
memory. Up to eight address ranges of varying sizes can be characterized using these registers.
MTRRfixn Registers. These registers are used to characterize fixed-size memory ranges in the first 1
Mbytes of physical-memory space.
PAT Register. This register allows memory-type characterization based on the virtual (linear)
address. It is an extension to the PCD and PWT memory types supported by the legacy paging
mechanism. The PAT mechanism provides the same memory-typing capabilities as the MTRRs, but
with the added flexibility provided by the paging mechanism.
TOP_MEM and TOP_MEM2 Registers. These top-of-memory registers allow system software to
specify physical addresses ranges as memory-mapped I/O locations.
Refer to “Memory-Type Range Registers” on page 183 for more information on using these registers.
System Resources 59
AMD64 Technology 24593—Rev. 3.14—September 2007
DebugCtlMSR Register. This register provides control over control-transfer recording and single
stepping, and external-breakpoint reporting and trace messages.
TSC Register. This register is used to count processor-clock cycles. It can be read using the RDMSR
instruction, or it can be read using the either of the read time-stamp counter instructions, RDTSC or
RDTSCP. System software can make RDTSC or RDTSCP available for use by non-privileged
software by clearing the time-stamp disable bit (CR4.TSD) to 0.
PerfEvtSeln Registers. These registers are used to specify the events counted by the corresponding
performance counter, and to control other aspects of its operation.
PerfCtrn Registers. These registers are performance counters that hold a count of processor events
or the duration of events, under the control of the corresponding PerfEvtSeln register. Each PerfCtrn
register can be read using the RDMSR instruction, or they can be read using the read performance-
monitor counter instruction, RDPMC. System software can make RDPMC available for use by non-
privileged software by setting the performance-monitor counter enable bit (CR4.PCE) to 1.
Refer to “Using Performance Counters” on page 346 for more information on using these registers.
60 System Resources
24593—Rev. 3.14—September 2007 AMD64 Technology
MCG_CAP Register. This register identifies the machine-check capabilities supported by the
processor.
MCG_CTL Register. This register provides global control over machine-check-error reporting.
MCG_STATUS Register. This register reports global status on detected machine-check errors.
The second type is error-reporting register banks, which report on machine-check errors associated
with a specific processor unit (or group of processor units). There can be different numbers of register
banks for each processor implementation, and each bank is numbered from 0 to i. The registers in each
bank perform the following functions:
Refer to “Using Machine Check Features” on page 267 for more information on using these registers.
System Resources 61
AMD64 Technology 24593—Rev. 3.14—September 2007
See “CPUID” in Volume 3 for details on the operation of this instruction, and the CPUID Specification
(order# 25481) for information returned by each processor implementation.
62 System Resources
24593—Rev. 3.14—September 2007 AMD64 Technology
When this occurs, the selector is updated and the selector base is set to selector * 16. The segment limit
and segment attributes are unchanged, but are normally 64K (the maximum allowable limit) and
read/write data, respectively.
On FAR transfers, CS (code segment) selector is updated to the new value, and the CS segment base is
set to selector * 16. The CS segment limit and attributes are unchanged, but are usually 64K and
read/write, respectively.
If the interrupt descriptor table (IDT) is used to find the real mode IDT see “Real-Mode Interrupt
Control Transfers” on page 229.
The GDT, LDT, and TSS (see below) are not used in real mode.
The multi-segmented memory model provides the greatest level of flexibility for system software
using the segmentation mechanism.
Compatibility mode allows the multi-segmented model to be used in support of legacy software.
However, in compatibility mode, the multi-segmented memory model is restricted to the first 4 Gbytes
of virtual-memory space. Access to virtual memory above 4 Gbytes requires the use of 64-bit mode,
which does not support segmentation.
software stacks. The TSS and task-switch mechanism are described in Chapter 12, “Task
Management.”
• Segment Selectors—Descriptors are selected for use from the descriptor tables using a segment
selector. A segment selector contains an index into either the GDT or LDT. The IDT is indexed
using an interrupt vector, as described in “Legacy Protected-Mode Interrupt Control Transfers” on
page 231, and in “Long-Mode Interrupt Control Transfers” on page 241.
Figure 4-2 on page 67 shows the registers used by the segmentation mechanism. The registers have the
following relationship to the data structures:
• Segment Registers—The six segment registers (CS, DS, ES, FS, GS, and SS) are used to point to
the user segments. A segment selector selects a descriptor when it is loaded into one of the segment
registers. This causes the processor to automatically load the selected descriptor into a software-
invisible portion of the segment register.
• Descriptor-Table Registers—The three descriptor-table registers (GDTR, LDTR, and IDTR) are
used to point to the system segments. The descriptor-table registers identify the virtual-memory
location and size of the descriptor tables.
• Task Register (TR)—Describes the location and limit of the current task state segment (TSS).
Local-Descriptor-Table Register
FS
LDTR
GS
Task Register
Stack Segment Register
TR
SS
513-264.eps
A fourth system-segment register, the TR, points to the TSS. The data structures and registers
associated with task-state segments are described in “Task-Management Resources” on page 308.
15 3 2 1 0
SI TI RPL
Selector Index Field. Bits 15–3. The selector-index field specifies an entry in the descriptor table.
Descriptor-table entries are eight bytes long, so the selector index is scaled by 8 to form a byte offset
into the descriptor table. The offset is then added to either the global or local descriptor-table base
address (as indicated by the table-index bit) to form the descriptor-entry address in virtual-address
space.
Some descriptor entries in long mode are 16 bytes long rather than 8 bytes (see “Legacy Segment
Descriptors” on page 77 for more information on long-mode descriptor-table entries). These expanded
descriptors consume two entries in the descriptor table. Long mode, however, continues to scale the
selector index by eight to form the descriptor-table offset. It is the responsibility of system software to
assign selectors such that they correctly point to the start of an expanded entry.
Table Indicator (TI) Bit. Bit 2. The TI bit indicates which table holds the descriptor referenced by the
selector index. When TI=0 the GDT is used and when TI=1 the LDT is used. The descriptor-table base
address is read from the appropriate descriptor-table register and added to the scaled selector index as
described above.
Requestor Privilege-Level (RPL) Field. Bits 1–0. The RPL represents the privilege level (CPL) the
processor is operating under at the time the selector is created.
RPL is used in segment privilege-checks to prevent software running at lesser privilege levels from
accessing privileged data. See “Data-Access Privilege Checks” on page 95 and “Control-Transfer
Privilege Checks” on page 98 for more information on segment privilege-checks.
Null Selector. Null selectors have a selector index of 0 and TI=0, corresponding to the first entry in
the GDT. However, null selectors do not reference the first GDT entry but are instead used to invalidate
unused segment registers. A general-protection exception (#GP) occurs if a reference is made to use a
segment register containing a null selector in non-64-bit mode. By initializing unused segment
registers with null selectors software can trap references to unused segments.
Null selectors can only be loaded into the DS, ES, FS and GS data-segment registers, and into the
LDTR descriptor-table register. A #GP occurs if software attempts to load the CS register with a null
selector or if software attempts to load the SS register with a null selector in non 64-bit mode or at CPL
3.
The processor maintains a hidden portion of the segment register in addition to the selector value
loaded by software. This hidden portion contains the values found in the descriptor-table entry
referenced by the segment selector. The processor loads the descriptor-table entry into the hidden
portion when the segment register is loaded. By keeping the corresponding descriptor-table entry in
hardware, performance is optimized for the majority of memory references.
Figure 4-4 shows the format of the visible and hidden portions of the segment register. Except for the
FS and GS segment base, software cannot directly read or write the hidden portion (shown as gray-
shaded boxes in Figure 4-4).
Selector
Segment Attributes
CS Register. The CS register contains the segment selector referencing the current code-segment
descriptor entry. All instruction fetches reference the CS descriptor. When a new selector is loaded into
the CS register, the current-privilege level (CPL) of the processor is set to that of the CS-segment
descriptor-privilege level (DPL).
Data-Segment Registers. The DS register contains the segment selector referencing the default
data-segment descriptor entry. The SS register contains the stack-segment selector. The ES, FS, and
GS registers are optionally loaded with segment selectors referencing other data segments. Data
accesses default to referencing the DS descriptor except in the following two cases:
CS Register in 64-Bit Mode. In 64-bit mode, most of the hidden portion of the CS register is
ignored. Only the L (long), D (default operation size), and DPL (descriptor privilege-level) attributes
are recognized by 64-bit mode. Address calculations assume a CS.base value of 0. CS references do
not check the CS.limit value, but instead check that the effective address is in canonical form.
DS, ES, and SS Registers in 64-Bit Mode. In 64-bit mode, the contents of the ES, DS, and SS
segment registers are ignored. All fields (base, limit, and attribute) in the hidden portion of the segment
registers are ignored.
Address calculations in 64-bit mode that reference the ES, DS, or SS segments are treated as if the
segment base is 0. Instead of performing limit checks, the processor checks that all virtual-address
references are in canonical form.
Neither enabling and activating long mode nor switching between 64-bit and compatibility modes
changes the contents of the visible or hidden portions of the segment registers. These registers remain
unchanged during 64-bit mode execution unless explicit segment loads are performed.
FS and GS Registers in 64-Bit Mode. Unlike the CS, DS, ES, and SS segments, the FS and GS
segment overrides can be used in 64-bit mode. When FS and GS segment overrides are used in 64-bit
mode, their respective base addresses are used in the effective-address (EA) calculation. The complete
EA calculation then becomes (FS or GS).base + base + (scale ∗ index) + displacement. The FS.base
and GS.base values are also expanded to the full 64-bit virtual-address size, as shown in Figure 4-5.
The resulting EA calculation is allowed to wrap across positive and negative addresses.
Selector
Segment Attributes
In 64-bit mode, FS-segment and GS-segment overrides are not checked for limit or attributes. Instead,
the processor checks that all virtual-address references are in canonical form.
Segment register-load instructions (MOV to Sreg and POP Sreg) load only a 32-bit base-address value
into the hidden portion of the FS and GS segment registers. The base-address bits above the low 32 bits
are cleared to 0 as a result of a segment-register load.
To allow loading all 64 bits of the base address, the FS.base and GS.base hidden descriptor-register
fields are mapped to MSRs. Privileged software (CPL=0) can load the 64-bit base address into FS.base
or GS.base using a single WRMSR instruction. The addresses written into the expanded FS.base and
GS.base registers must be in canonical form. A WRMSR instruction that attempts to write a non-
canonical address to these registers causes a general-protection exception (#GP) to occur.
The FS.base MSR address is C000_0100h while the GS.base MSR address is C000_0101h.
When in compatibility mode, the FS and GS overrides operate as defined by the legacy x86
architecture regardless of the value loaded into the high 32 bits of the hidden descriptor-register base-
address field. Compatibility mode ignores the high 32 bits when calculating an effective address.
describe a segment (see “Null Selector” on page 68 for information on using the null selector). The
first usable GDT entry is referenced with a selector index of 1.
Global (TI=0)
Local (TI=1)
Descriptor Table
+
Unused in GDT
Descriptor Table Base Address Descriptor Table Limit
513-220.eps
Figure 4-8 on page 73 shows the format of the GDTR in 64-bit mode.
513-266.eps
Limit. 2 bytes. These bits define the 16-bit limit, or size, of the GDT in bytes. The limit value is added
to the base address to yield the ending byte address of the GDT. A general-protection exception (#GP)
occurs if software attempts to access a descriptor beyond the GDT limit.
The offsets into the descriptor tables are not extended by the AMD64 architecture in support of long
mode. Therefore, the GDTR and IDTR limit-field sizes are unchanged from the legacy sizes. The
processor does check the limits in long mode during GDT and IDT accesses.
Base Address. 8 bytes. The base-address field holds the starting byte address of the GDT in virtual-
memory space. The GDT can be located at any byte address in virtual memory, but system software
should align the GDT on a doubleword boundary to avoid the potential performance penalties
associated with accessing unaligned data.
The AMD64 architecture increases the base-address field of the GDTR to 64 bits so that system
software running in long mode can locate the GDT anywhere in the 64-bit virtual-address space. The
processor ignores the high-order 4 bytes of base address when running in legacy mode.
Loading a null selector into the LDTR is useful if software does not use an LDT. This causes a #GP if
an erroneous reference is made to the LDT.
Global Local
Descriptor Descriptor
Table Table
LDT Selector
LDT Attributes
Selector
Descriptor Attributes
Figure 4-11 shows the format of the LDTR in long mode (both compatibility mode and 64-bit mode).
Selector
Descriptor Attributes
LDT Selector. 2 bytes. These bits are loaded explicitly from the TSS during a task switch, or by using
the LLDT instruction. The LDT selector must point to an LDT system-segment descriptor entry in the
GDT. If it does not, a general-protection exception (#GP) occurs.
The following three fields are loaded automatically from the LDT descriptor in the GDT as a result of
loading the LDT selector. The register fields are shown as shaded boxes in Figure 4-10 and
Figure 4-11.
Base Address. The base-address field holds the starting byte address of the LDT in virtual-memory
space. Like the GDT, the LDT can be located anywhere in system memory, but software should align
the LDT on a doubleword boundary to avoid performance penalties associated with accessing
unaligned data.
The AMD64 architecture expands the base-address field of the LDTR to 64 bits so that system
software running in long mode can locate an LDT anywhere in the 64-bit virtual-address space. The
processor ignores the high-order 32 base-address bits when running in legacy mode. Because the
LDTR is loaded from the GDT, the system-segment descriptor format (LDTs are system segments) has
been expanded by the AMD64 architecture in support of 64-bit mode. See “Long Mode Descriptor
Summary” on page 92 for more information on this expanded format. The high-order base-address bits
are only loaded from 64-bit mode using the LLDT instruction (see “LLDT and LTR Instructions” on
page 155 for more information on this instruction).
Limit. This field defines the limit, or size, of the LDT in bytes. The LDT limit as stored in the LDTR
is 32 bits. When the LDT limit is loaded from the GDT descriptor entry, the 20-bit limit field in the
descriptor is expanded to 32 bits and scaled based on the value of the descriptor granularity (G) bit. For
details on the limit biasing and granularity, see “Granularity (G) Bit” on page 79.
If an attempt is made to access a descriptor beyond the LDT limit, a general-protection exception
(#GP) occurs.
The offsets into the descriptor tables are not extended by the AMD64 architecture in support of long
mode. Therefore, the LDTR limit-field size is unchanged from the legacy size. The processor does
check the LDT limit in long mode during LDT accesses.
Attributes. This field holds the descriptor attributes, such as privilege rights, segment presence and
segment granularity.
Interrupt
Descriptor Table
+
Interrupt Vector
+ *
Descriptor Entry
Size
• System Segments—System segments consist of LDT segments and task-state segments (TSS).
Gate descriptors are another type of system-segment descriptor. Rather than describing segments,
gate descriptors point to program entry points.
Figure 4-13 shows the generic format for user-segment and system-segment descriptors. User and
system segments are differentiated using the S bit. S=1 indicates a user segment, and S=0 indicates a
system segment. Gray shading indicates the field or bit is reserved. The format for a gate descriptor
differs from the generic segment descriptor, and is described separately in “Gate Descriptors” on
page 84.
31 24 23 22 21 20 19 16 15 14 13 12 11 8 7 0
D A
Segment Limit
Base Address 31–24 G / V P DPL S Type Base Address 23–16 +4
19–16
B L
Figure 4-13 shows the fields in a generic, legacy-mode, 8-byte segment descriptor. In this figure, +0
indicates the address of the descriptor’s first byte, and +4 indicates the address of the descriptor’s fifth
byte. The fields are defined as follows, from least-significant to most-significant bit positions:
Segment Limit. The 20-bit segment limit is formed by concatenating bits 19–16 of byte +4 with bits
15–0 of byte +0. The segment limit defines the segment size, in bytes. The granularity (G) bit controls
how the segment-limit field is scaled (see “Granularity (G) Bit” on page 79). For data segments, the
expand-down (E) bit determines whether the segment limit defines the lower or upper segment-
boundary (see “Expand-Down (E) Bit” on page 82).
If software references a segment descriptor with an address beyond the segment limit, a general-
protection exception (#GP) occurs. The #GP occurs if any part of the memory reference falls outside
the segment limit. For example, a doubleword (4-byte) address reference causes a #GP if one or more
bytes are located beyond the segment limit.
Base Address. The 32-bit base address is formed by concatenating bits 31–24 of byte +4 with bits
7–0 of byte +4, and with bits 15–0 of byte +0. The segment-base address field locates the start of a
segment in virtual-address space.
S Bit and Type Field. Bit 12 of byte +4, and bits 11–8 of byte +4. The S and Type fields, together,
specify the descriptor type and its access characteristics. Table 4-2 summarizes the descriptor types by
S-field encoding and gives a cross reference to descriptions of the Type-field encodings.
Table 4-2. Descriptor Types
Descriptor
S Field Type-Field Encoding
Type
LDT
0 (System) TSS See Table 4-5 on page 83
Gate
Code See Table 4-3 on page 81
1 (User)
Data See Table 4-4 on page 82
Descriptor Privilege-Level (DPL) Field. Bits 14–13 of byte +4. The DPL field indicates the
descriptor-privilege level of the segment. DPL can be set to any value from 0 to 3, with 0 specifying the
most privilege and 3 the least privilege. See “Data-Access Privilege Checks” on page 95 and “Control-
Transfer Privilege Checks” on page 98 for more information on how the DPL is used during segment
privilege-checks.
Present (P) Bit. Bit 15 of byte +4. The segment-present bit indicates that the segment referenced by
the descriptor is loaded in memory. If a reference is made to a descriptor entry when P=0, a segment-
not-present exception (#NP) occurs. This bit is set and cleared by system software and is never altered
by the processor.
Available To Software (AVL) Bit. Bit 20 of byte +4. This field is available to software, which can
write any value to it. The processor does not set or clear this field.
Default Operand Size (D/B) Bit. Bit 22 of byte +4. The default operand-size bit is found in code-
segment and data-segment descriptors but not in system-segment descriptors. Setting this bit to 1
indicates a 32-bit default operand size, and clearing it to 0 indicates a 16-bit default size. The effect this
bit has on a segment depends on the segment-descriptor type. See “Code-Segment Default-Operand
Size (D) Bit” on page 81 for a description of the D bit in code-segment descriptors. “Data-Segment
Default Operand Size (D/B) Bit” on page 83 describes the D bit in data-segment descriptors, including
stack segments, where the bit is referred to as the “B” bit.
Granularity (G) Bit. Bit 23 of byte +4. The granularity bit specifies how the segment-limit field is
scaled. Clearing the G bit to 0 indicates that the limit field is not scaled. In this case, the limit equals the
number of bytes available in the segment. Setting the G bit to 1 indicates that the limit field is scaled by
4 Kbytes (4096 bytes). Here, the limit field equals the number of 4-Kbyte blocks available in the
segment.
Setting a limit of 0 indicates a 1-byte segment limit when G = 0. Setting the same limit of 0 when G =
1 indicates a segment limit of 4095.
Reserved Bits. Generally, software should clear all reserved bits to 0, so they can be defined in future
revisions to the AMD64 architecture.
31 24 23 22 21 20 19 16 15 14 13 12 11 10 9 8 7 0
A
Segment
Base Address 31–24 G D V P DPL 1 1 C R A Base Address 23–16 +4
Limit 19–16
L
Code-segment descriptors have the S bit set to 1, identifying the segments as user segments. Type-field
bit 11 differentiates code-segment descriptors (bit 11 set to 1) from data-segment descriptors (bit 11
cleared to 0). The remaining type-field bits (10–8) define the access characteristics for the code-
segment, as follows:
Conforming (C) Bit. Bit 10 of byte +4. Setting this bit to 1 identifies the code segment as conforming.
When control is transferred to a higher-privilege conforming code-segment (C=1) from a lower-
privilege code segment, the processor CPL does not change. Transfers to non-conforming code-
segments (C=0) with a higher privilege-level than the CPL can occur only through gate descriptors.
See “Control-Transfer Privilege Checks” on page 98 for more information on conforming and non-
conforming code-segments.
Readable (R) Bit. Bit 9 of byte +4. Setting this bit to 1 indicates the code segment is both executable
and readable as data. When this bit is cleared to 0, the code segment is executable, but attempts to read
data from the code segment cause a general-protection exception (#GP) to occur.
Accessed (A) Bit. Bit 8 of byte +4. The accessed bit is set to 1 by the processor when the descriptor is
copied from the GDT or LDT into the CS register. This bit is only cleared by software.
Table 4-3 on page 81 summarizes the code-segment type-field encodings.
Code-Segment Default-Operand Size (D) Bit. Bit 22 of byte +4. In code-segment descriptors, the
D bit selects the default operand size and address sizes. In legacy mode, when D=0 the default operand
size and address size is 16 bits and when D=1 the default operand size and address size is 32 bits.
Instruction prefixes can be used to override the operand size or address size, or both.
31 24 23 22 21 20 19 16 15 14 13 12 11 10 9 8 7 0
D A
Segment
Base Address 31–24 G / V P DPL 1 0 E W A Base Address 23–16 +4
Limit 19–16
B L
Data-segment descriptors have the S bit set to 1, identifying them as user segments. Type-field bit 11
differentiates data-segment descriptors (bit 11 cleared to 0) from code-segment descriptors (bit 11 set
to 1). The remaining type-field bits (10–8) define the data-segment access characteristics, as follows:
Expand-Down (E) Bit. Bit 10 of byte +4. Setting this bit to 1 identifies the data segment as expand-
down. In expand-down segments, the segment limit defines the lower segment boundary while the
base is the upper boundary. Valid segment offsets in expand-down segments lie in the byte range
limit+1 to FFFFh or FFFF_FFFFh, depending on the value of the data segment default operand size
(D/B) bit.
Expand-down segments are useful for stacks, which grow in the downward direction as elements are
pushed onto the stack. The stack pointer, ESP, is decremented by an amount equal to the operand size
as a result of executing a PUSH instruction.
Clearing the E bit to 0 identifies the data segment as expand-up. Valid segment offsets in expand-up
segments lie in the byte range 0 to segment limit.
Writable (W) Bit. Bit 9 of byte +4. Setting this bit to 1 identifies the data segment as read/write. When
this bit is cleared to 0, the segment is read-only. A general-protection exception (#GP) occurs if
software attempts to write into a data segment when W=0.
Accessed (A) Bit. Bit 8 of byte +4. The accessed bit is set to 1 by the processor when the descriptor is
copied from the GDT or LDT into one of the data-segment registers or the stack-segment register. This
bit is only cleared by software.
Table 4-4 summarizes the data-segment type-field encodings.
Data-Segment Default Operand Size (D/B) Bit. Bit 22 of byte +4. For expand-down data segments
(E=1), setting D=1 sets the upper bound of the segment at 0_FFFF_FFFFh. Clearing D=0 sets the
upper bound of the segment at 0_FFFFh.
In the case where a data segment is referenced by the stack selector (SS), the D bit is referred to as the
B bit. For stack segments, the B bit sets the default stack size. Setting B=1 establishes a 32-bit stack
referenced by the 32-bit ESP register. Clearing B=0 establishes a 16-bit stack referenced by the 16-bit
SP register.
Figure 4-16 shows the legacy-mode system-segment descriptor format used for referencing LDT and
TSS segments (gray shading indicates the bit is reserved). This format is also used in compatibility
mode. The system-segments are used as follows:
• The LDT typically holds segment descriptors belonging to a single task (see “Local Descriptor
Table” on page 73).
• The TSS is a data structure for holding processor-state information. Processor state is saved in a
TSS when a task is suspended, and state is restored from the TSS when a task is restarted. System
software must create at least one TSS referenced by the task register, TR. See “Legacy Task-State
Segment” on page 313 for more information on the TSS.
31 24 23 22 21 20 19 16 15 14 13 12 11 8 7 0
I A
Segment
Base Address 31–24 G G V P DPL 0 Type Base Address 23–16 +4
Limit 19–16
N L
31 16 15 14 13 12 11 8 7 6 5 4 0
Reserved
Target Code-Segment Offset 31–16 P DPL 0 Type Parameter Count +4
IGN
31 16 15 14 13 12 11 8 7 0
31 16 15 14 13 12 11 8 7 0
There are several differences between the gate-descriptor format and the system-segment descriptor
format. These differences are described as follows, from least-significant to most-significant bit
positions:
Target Code-Segment Offset. The 32-bit segment offset is formed by concatenating bits 31–16 of
byte +4 with bits 15–0 of byte +0. The segment-offset field specifies the target-procedure entry point
(offset) into the segment. This field is loaded into the EIP register as a result of a control transfer using
the gate descriptor.
Target Code-Segment Selector. Bits 31–16 of byte +0. The segment-selector field identifies the
target-procedure segment descriptor, located in either the GDT or LDT. The segment selector is loaded
into the CS segment register as a result of a control transfer using the gate descriptor.
TSS Selector. Bits 31–16 of byte +0 (task gates only). This field identifies the target-task TSS
descriptor, located in any of the three descriptor tables (GDT, LDT, and IDT).
Parameter Count (Call Gates Only). Bits 4–0 of byte +4. Legacy-mode call-gate descriptors
contain a 5-bit parameter-count field. This field specifies the number of parameters to be copied from
the currently-executing program stack to the target program stack during an automatic stack switch.
Automatic stack switches are performed by the processor during a control transfer through a call gate
to a greater privilege-level. The parameter size depends on the call-gate size as specified in the type
field. 32-bit call gates copy 4-byte parameters, and 16-bit call gates copy 2-byte parameters. See
“Stack Switching” on page 106 for more information on call-gate parameter copying.
31 24 23 22 21 20 19 16 15 14 13 12 11 10 9 8 7 0
A
Segment
Base Address 31–24 G D L V P DPL 1 1 C R A Base Address 23–16 +4
Limit 19–16
L
Fields Ignored in 64-Bit Mode. Segmentation is disabled in 64-bit mode, and code segments span
all of virtual memory. In this mode, code-segment base addresses are ignored. For the purpose of
virtual-address calculations, the base address is treated as if it has a value of zero.
Segment-limit checking is not performed, and both the segment-limit field and granularity (G) bit are
ignored. Instead, the virtual address is checked to see if it is in canonical-address form.
The readable (R) and accessed (A) attributes in the type field are also ignored.
Long (L) Attribute Bit. Bit 21 of byte +4. Long mode introduces a new attribute, the long (L) bit, in
code-segment descriptors. This bit specifies that the processor is running in 64-bit mode (L=1) or
compatibility mode (L=0). When the processor is running in legacy mode, this bit is reserved.
Compatibility mode, maintains binary compatibility with legacy 16-bit and 32-bit applications.
Compatibility mode is selected on a code-segment basis, and it allows legacy applications to coexist
under the same 64-bit system software along with 64-bit applications running in 64-bit mode. System
software running in long mode can execute existing 16-bit and 32-bit applications by clearing the L bit
of the code-segment descriptor to 0.
When L=0, the legacy meaning of the code-segment D bit (see “Code-Segment Default-Operand Size
(D) Bit” on page 81)—and the address-size and operand-size prefixes—are observed. Segmentation is
enabled when L=0. From an application viewpoint, the processor is in a legacy 16-bit or 32-bit
operating environment (depending on the D bit), even though long mode is activated.
If the processor is running in 64-bit mode (L=1), the only valid setting of the D bit is 0. This setting
produces a default operand size of 32 bits and a default address size of 64 bits. The combination L=1
and D=1 is reserved for future use.
“Instruction Prefixes” in Volume 3 describes the effect of the code-segment L and D bits on default
operand and address sizes when long mode is activated. These default sizes can be overridden with
operand size, address size, and REX prefixes.
31 24 23 22 21 20 19 16 15 14 13 12 11 10 9 8 7 0
D A
Segment
Base Address 31–24 G / V P DPL 1 0 E W A Base Address 23–16 +4
Limit 19–16
B L
Fields Ignored in 64-Bit Mode. Segmentation is disabled in 64-bit mode. The interpretation of the
segment-base address depends on the segment register used:
• In data-segment descriptors referenced by the DS, ES and SS segment registers, the base-address
field is ignored. For the purpose of virtual-address calculations, the base address is treated as if it
has a value of zero.
• Data segments referenced by the FS and GS segment registers receive special treatment in 64-bit
mode. For these segments, the base address field is not ignored, and a non-zero value can be used
in virtual-address calculations. A 64-bit segment-base address can be specified using model-
specific registers. See “FS and GS Registers in 64-Bit Mode” on page 70 for more information.
Segment-limit checking is not performed on any data segments in 64-bit mode, and both the segment-
limit field and granularity (G) bit are ignored. The D/B bit is unused in 64-bit mode.
The expand-down (E), writable (W), and accessed (A) type-field attributes are ignored.
A data-segment-descriptor DPL field is ignored in 64-bit mode, and segment-privilege checks are not
performed on data segments. System software can use the page-protection mechanisms to isolate and
protect data from unauthorized access.
31 23 20 19 16 15 14 13 12 11 10 9 8 7 0
A
Segment
Base Address 31–24 G V P DPL 0 Type Base Address 23–16 +4
Limit 19–16
L
The 64-bit system-segment base address must be in canonical form. Otherwise, a general-protection
exception occurs with a selector error-code, #GP(selector), when the system segment is loaded.
System-segment limit values are checked by the processor in both 64-bit and compatibility modes,
under the control of the granularity (G) bit.
Figure 4-22 shows that bits 12–8 of dword +12 must be cleared to 0. These bits correspond to the S and
Type fields in a legacy descriptor. Clearing these bits to 0 corresponds to an illegal type in legacy mode
and causes a #GP if an attempt is made to access the upper half of a 64-bit mode system-segment
descriptor as a legacy descriptor or as the lower half of a 64-bit mode system-segment descriptor.
31 16 15 14 13 12 11 10 9 8 7 0
31 16 15 14 13 12 11 8 7 3 2 0
The target code segment referenced by a long-mode gate descriptor must be a 64-bit code segment
(CS.L=1, CS.D=0). If the target is not a 64-bit code segment, a general-protection exception,
#GP(error), occurs. The error code reported depends on the gate type:
• Call gates report the target code-segment selector as the error code.
• Interrupt and trap gates report the interrupt-vector number as the error code.
A general-protection exception, #GP(0), occurs if software attempts to reference a long-mode gate
descriptor with a target-segment offset that is not in canonical form.
It is possible for software to store legacy and long mode gate descriptors in the same descriptor table.
Figure 4-23 on page 90 shows that bits 12–8 of byte +12 in a long-mode call gate must be cleared to 0.
These bits correspond to the S and Type fields in a legacy call gate. Clearing these bits to 0 corresponds
to an illegal type in legacy mode and causes a #GP if an attempt is made to access the upper half of a
64-bit mode call-gate descriptor as a legacy call-gate descriptor.
It is not necessary to clear these same bits in a long-mode interrupt gate or trap gate. In long mode, the
interrupt-descriptor table (IDT) must contain 64-bit interrupt gates or trap gates. The processor
automatically indexes the IDT by scaling the interrupt vector by 16. This makes it impossible to access
the upper half of a long-mode interrupt gate, or trap gate, as a legacy gate when the processor is
running in long mode.
IST Field (Interrupt and Trap Gates). Bits 2–0 of byte +4. Long-mode interrupt gate and trap gate
descriptors contain a new, 3-bit interrupt-stack-table (IST) field not present in legacy gate descriptors.
The IST field is used as an index into the IST portion of a long-mode TSS. If the IST field is not 0, the
index references an IST pointer in the TSS, which the processor loads into the RSP register when an
interrupt occurs. If the IST index is 0, the processor uses the legacy stack-switching mechanism (with
some modifications) when an interrupt occurs. See “Interrupt-Stack Table” on page 245 for more
information.
Count Field (Call Gates). The count field found in legacy call-gate descriptors is not supported in
long-mode call gates. In long mode, the field is reserved and should be cleared to zero.
Memory Management
File Allocation
Interrupt Handling
Privilege Device-Drivers
0 Library Routines
Privilege 1
Privilege 2
Current Privilege-Level. The current privilege-level (CPL) is the privilege level at which the
processor is currently executing. The CPL is stored in an internal processor register that is invisible to
software. Software changes the CPL by performing a control transfer to a different code segment with
a new privilege level.
Descriptor Privilege-Level. The descriptor privilege-level (DPL) is the privilege level that system
software assigns to individual segments. The DPL is used in privilege checks to determine whether
software can access the segment referenced by the descriptor. In the case of gate descriptors, the DPL
determines whether software can access the descriptor reference by the gate. The DPL is stored in the
segment (or gate) descriptor.
Requestor Privilege-Level. The requestor privilege-level (RPL) reflects the privilege level of the
program that created the selector. The RPL can be used to let a called program know the privilege level
of the program that initiated the call. The RPL is stored in the selector used to reference the segment
(or gate) descriptor.
The following sections describe how the CPL, DPL, and RPL are used by the processor in performing
privilege checks on data accesses and control transfers. Failure to pass a protection check generally
causes an exception to occur.
Effective
CS CPL=3 Privilege
Max 3
Data
RPL=0 Access Denied
Selector Data
≤ Segment
DPL=2
Descriptor
Effective
CS CPL=0 Privilege
Max 0
Data
RPL=0 Access Allowed
Selector Data
≤ Segment
DPL=2
Descriptor
Example 1 in Figure 4-26 shows a failing data-access privilege check. The effective privilege level is 3
because CPL=3. This value is greater than the descriptor DPL, so access to the data segment is denied.
Example 2 in Figure 4-26 shows a passing data-access privilege check. Here, the effective privilege
level is 0 because both the CPL and RPL have values of 0. This value is less than the descriptor DPL,
so access to the data segment is allowed, and the data-segment register is successfully loaded.
1. The processor checks that the CPL and the stack-selector RPL are equal. If they are not equal, a
general-protection exception (#GP) occurs and the SS register is not loaded.
2. The processor compares the CPL with the DPL in the descriptor-table entry referenced by the
segment selector. The two values must be equal. If they are not equal, a #GP occurs and the SS
register is not loaded.
Figure 4-27 shows two examples of stack-access privilege checks. In Example 1 the CPL, stack-
selector RPL, and stack segment-descriptor DPL are all equal, so access to the stack segment using the
SS register is allowed. In Example 2, the stack-selector RPL and stack segment-descriptor DPL are
both equal. However, the CPL is not equal to the stack segment-descriptor DPL, and access to the stack
segment through the SS register is denied.
CS CPL=3
=
Stack
RPL=3 Access Allowed
Selector Stack
Segment
DPL=3
Descriptor
CS CPL=2
=
Stack
RPL=3 Access Denied
Selector Stack
Segment
DPL=3
Descriptor
more-privileged, nonconforming code segment (see “Control Transfers Through Call Gates” on
page 102 for more information).
In far calls and jumps, the far pointer (CS:rIP) references the target code-segment descriptor. Before
loading the CS register with a nonconforming code-segment selector, the processor checks as follows
to see if access is allowed:
1. DPL = CPL Check—The processor compares the target code-segment descriptor DPL with the
currently executing program CPL. If they are equal, the processor performs the next check. If they
are not equal, a general-protection exception (#GP) occurs.
2. RPL ≤ CPL Check—The processor compares the target code-segment selector RPL with the
currently executing program CPL. If the RPL is less than or equal to the CPL, access is allowed. If
the RPL is greater than the CPL, a #GP exception occurs.
If access is allowed, the processor loads the CS and rIP registers with their new values and begins
executing from the target location. The CPL is not changed—the target-CS selector RPL value is
disregarded when the selector is loaded into the CS register.
Figure 4-28 on page 100 shows three examples of privilege checks performed as a result of a far
control transfer to a nonconforming code-segment. In Example 1, access is allowed because CPL =
DPL and RPL ≤ CPL. In Example 2, access is denied because CPL ≠ DPL. In Example 3, access is
denied because RPL > CPL.
Code
RPL=0 Access
Selector
Allowed
≤
Access Allowed
CS CPL=2 ?
= Code
Access Segment
DPL=2 Allowed
Descriptor
Code
RPL=0 Access
Selector
Allowed
≤
Access Denied
CS CPL=2 ?
= Code
Access Segment
DPL=3 Denied
Descriptor
Code
RPL=3 Access
Selector
Denied
≤
Access Denied
CS CPL=2 ?
= Code
Access Segment
DPL=2 Allowed
Descriptor
Conforming Code Segments. On a direct control transfer to a conforming code segment, the target
code-segment descriptor DPL can be lower than (at a greater privilege) the CPL. Before loading the
CS register with a conforming code-segment selector, the processor compares the target code-segment
descriptor DPL with the currently-executing program CPL. If the DPL is less than or equal to the CPL,
access is allowed. If the DPL is greater than the CPL, a #GP exception occurs.
On an access to a conforming code segment, the RPL is ignored and not involved in the privilege
check.
When access is allowed, the processor loads the CS and rIP registers with their new values and begins
executing from the target location. The CPL is not changed—the target CS-descriptor DPL value is
disregarded when the selector is loaded into the CS register. The target program runs at the same
privilege as the program that called it.
Figure 4-29 shows two examples of privilege checks performed as a result of a direct control transfer
to a conforming code segment. In Example 1, access is allowed because the CPL of 3 is greater than
the DPL of 0. As the target code selector is loaded into the CS register, the old CPL value of 3 replaces
the target-code selector RPL value, and the target program executes with CPL=3. In Example 2, access
is denied because CPL < DPL.
Code
Selector
CS CPL=3
Access Allowed
≥ Code
Segment
DPL=0
Descriptor
Code
Selector
CS CPL=0
Access Denied
≥ Code
Segment
DPL=3
Descriptor
Transfer Mechanism. The pointer operand of a far-CALL or far-JMP instruction consists of two
pieces: a code-segment selector (CS) and a code-segment offset (rIP). In a call-gate transfer, the CS
selector points to a call-gate descriptor rather than a code-segment descriptor, and the rIP is ignored
(but required by the instruction).
Figure 4-30 shows a call-gate control transfer in legacy mode. The call-gate descriptor contains
segment-selector and segment-offset fields (see “Gate Descriptors” on page 84 for a detailed
description of the call-gate format and fields). These two fields perform the same function as the
pointer operand in a direct control-transfer instruction. The segment-selector field points to the target
code-segment descriptor, and the segment-offset field is the instruction-pointer offset into the target
code-segment. The code-segment base taken from the code-segment descriptor is added to the offset
field in the call-gate descriptor to create the target virtual address (linear address).
Virtual-Address
Space
Far Pointer
Descriptor Table
Call-Gate
Descriptor
DPL Code-Segment Selector
Code-Segment Offset + Virtual Address
Code Segment
DPL Code-Segment Limit
Code-Segment Base
Code-Segment
Descriptor
513-233.eps
Figure 4-31 shows a call-gate control transfer in long mode. The long-mode call-gate descriptor
format is expanded by 64 bits to hold a full 64-bit offset into the virtual-address space. Only long-
mode call gates can be referenced in long mode (64-bit mode and compatibility mode). The legacy-
mode 32-bit call-gate types are redefined in long mode as 64-bit types, and 16-bit call-gate types are
illegal.
Call-Gate
Descriptor Table
Descriptor
Code-Segment
Flat Code-Segment
Descriptor
Unused 513-234.eps
A long-mode call gate must reference a 64-bit code-segment descriptor. In 64-bit mode, the code-
segment descriptor base-address and limit fields are ignored. The target virtual-address is the 64-bit
offset field in the expanded call-gate descriptor.
Privilege Checks. Before loading the CS register with the code-segment selector located in the call
gate, the processor performs three privilege checks. The following checks are performed when either
conforming or nonconforming code segments are referenced:
1. The processor compares the CPL with the call-gate DPL from the call-gate descriptor (DPLG).
The CPL must be numerically less than or equal to DPLG for this check to pass. In other words,
the following expression must be true: CPL ≤ DPLG.
2. The processor compares the RPL in the call-gate selector with DPLG. The RPL must be
numerically less than or equal to DPLG for this check to pass. In other words, the following
expression must be true: RPL ≤ DPLG.
3. The processor compares the CPL with the target code-segment DPL from the code-segment
descriptor (DPLS). The type of comparison varies depending on the type of control transfer.
- When a call—or a jump to a conforming code segment—is used to transfer control through a
call gate, the CPL must be numerically greater than or equal to DPLS for this check to pass.
(This check prevents control transfers to less-privileged programs.) In other words, the
following expression must be true: CPL ≥ DPLS.
- When a JMP instruction is used to transfer control through a call gate to a nonconforming code
segment, the CPL must be numerically equal to DPLS for this check to pass. (JMP instructions
cannot change CPL.) In other words, the following expression must be true: CPL = DPLS.
Figure 4-32 on page 105 shows two examples of call-gate privilege checks. In Example 1, all privilege
checks pass as follows:
• The call-gate DPL (DPLG) is at the lowest privilege (3), specifying that software running at any
privilege level (CPL) can access the gate.
• The selector referencing the call gate passes its privilege check because the RPL is numerically
less than or equal to DPLG.
• The target code segment is at the highest privilege level (DPLS = 0). This means software running
at any privilege level can access the target code segment through the call gate.
CS CPL=2
Call-Gate
RPL=3
Selector
DPLG=3
Code
Call-Gate Descriptor Segment
DPLS=0
Access Allowed
Code-Segment Descriptor
Example 1: Privilege Check Passes
CS CPL=2
Call-Gate
RPL=3
Selector
DPLG=0
Code
Call-Gate Descriptor Segment
DPLS=3
Access Denied
Code-Segment Descriptor
Example 2: Privilege Check Fails 513-232.eps
• The target code segment is at a lower privilege (DPLS = 3) than the currently running software
(CPL = 2). Transitions from more-privileged software to less-privileged software are not allowed,
so this privilege check fails as well.
Although all three privilege checks failed in Example 2, failing only one check is sufficient to deny
access into the target code segment.
Stack Switching. The processor performs an automatic stack switch when a control transfer causes a
change in privilege levels to occur. Switching stacks isolates more-privileged software stacks from
less-privileged software stacks and provides a mechanism for saving the return pointer back to the
program that initiated the call.
When switching to more-privileged software, as is done when transferring control using a call gate, the
processor uses the corresponding stack pointer (privilege-level 0, 1, or 2) stored in the task-state
segment (TSS). The format of the stack pointer stored in the TSS depends on the system-software
operating mode:
• Legacy-mode system software stores a 32-bit ESP value (stack offset) and 16-bit SS selector
register value in the TSS for each of three privilege levels 0, 1, and 2.
• Long-mode system software stores a 64-bit RSP value in the TSS for privilege levels 0, 1, and 2.
No SS register value is stored in the TSS because in long mode a call gate must reference a 64-bit
code-segment descriptor. 64-bit mode does not use segmentation, and the stack pointer consists
solely of the 64-bit RSP. Any value loaded in the SS register is ignored.
See “Task-Management Resources” on page 308 for more information on the legacy-mode and long-
mode TSS formats.
Figure 4-33 on page 107 shows a 32-bit stack in legacy mode before and after the automatic stack
switch. This particular example assumes that parameters are passed from the current program to the
target program. The process followed by legacy mode in switching stacks and copying parameters is:
1. The target code-segment DPL is read by the processor and used as an index into the TSS for
selecting the new stack pointer (SS:ESP). For example, if DPL=1 the processor selects the
SS:ESP for privilege-level 1 from the TSS.
2. The SS and ESP registers are loaded with the new SS:ESP values read from the TSS.
3. The old values of the SS and ESP registers are pushed onto the stack pointed to by the new
SS:ESP.
4. The 5-bit count field is read from the call-gate descriptor.
5. The number of parameters specified in the count field (up to 31) are copied from the old stack to
the new stack. The size of the parameters copied by the processor depends on the call-gate size:
32-bit call gates copy 4-byte parameters and 16-bit call gates copy 2-byte parameters.
6. The return pointer is pushed onto the stack. The return pointer consists of the current CS-register
value and the EIP of the instruction following the calling instruction.
7. The CS register is loaded from the segment-selector field in the call-gate descriptor, and the EIP is
loaded from the offset field in the call-gate descriptor.
8. The target program begins executing with the instruction referenced by new CS:EIP.
Old New
32-Bit Stack 32-Bit Stack
Before CALL After CALL
Old SS +(n*4)+12
Old ESP +(n*4)+8
Parameter 1 +(n-1)*4 Parameter 1 +(n*4)+4
Parameter 2 +(n-2)*4 Parameter 2 +(n*4)
... ...
Parameter n Old SS:ESP Parameter n +8
Old CS +4
Old EIP New SS:ESP
Figure 4-34 shows a 32-bit stack in legacy mode before and after the automatic stack switch when no
parameters are passed (count=0). Most software does not use the call-gate descriptor count-field to
pass parameters. System software typically defines linkage mechanisms that do not rely on automatic
parameter copying.
Old New
32-Bit Stack 32-Bit Stack
Before CALL After CALL
Old SS +12
Old ESP +8
Old CS +4
Old SS:ESP Old EIP New SS:ESP
Figure 4-35 on page 108 shows a long-mode stack switch. In long mode, all call gates must reference
64-bit code-segment descriptors, so a long-mode stack switch uses a 64-bit stack. The process of
switching stacks in long mode is similar to switching in legacy mode when no parameters are passed.
The process is as follows:
1. The target code-segment DPL is read by the processor and used as an index into the 64-bit TSS
for selecting the new stack pointer (RSP).
2. The RSP register is loaded with the new RSP value read from the TSS. The SS register is loaded
with a null selector (SS=0). Setting the new SS selector to null allows proper handling of nested
control transfers in 64-bit mode. See “Nested Returns to 64-Bit Mode Procedures” on page 110
for additional information.
As in legacy mode, it is desirable to keep the stack-segment requestor privilege-level (SS.RPL)
equal to the current privilege-level (CPL). When using a call gate to change privilege levels, the
SS.RPL is updated to reflect the new CPL. The SS.RPL is restored from the return-target CS.RPL
on the subsequent privilege-level-changing far return.
3. The old values of the SS and RSP registers are pushed onto the stack pointed to by the new RSP.
The old SS value is popped on a subsequent far return. This allows system software to set up the
SS selector for a compatibility-mode process by executing a RET (or IRET) that changes the
privilege level.
4. The return pointer is pushed onto the stack. The return pointer consists of the current CS-register
value and the RIP of the instruction following the calling instruction.
5. The CS register is loaded from the segment-selector field in the long-mode call-gate descriptor,
and the RIP is loaded from the offset field in the long-mode call-gate descriptor.
The target program begins execution with the instruction referenced by the new RIP.
Old New
64-Bit Stack 64-Bit Stack
Before CALL After CALL
Old SS +24
Old RSP +16
Old CS +8
Old SS:RSP Old RIP New RSP (SS=0 + new_CPL)
Stack Switch
All long-mode stack pushes resulting from a privilege-level-changing far call are eight-bytes wide and
increment the RSP by eight. Long mode ignores the call-gate count field and does not support the
automatic parameter-copy feature found in legacy mode. Software can access parameters on the old
stack, if necessary, by referencing the old stack segment selector and stack pointer saved on the new
process stack.
Stack Switching. The stack switch performed by a far return to a lower-privilege level reverses the
stack switch of a call gate to a higher-privilege level, except that parameters are never automatically
copied as part of a return. The process followed by a far-return stack switch in long mode and legacy
mode is:
1. The return code-segment RPL is read by the processor from the CS value stored on the stack to
determine that a lower-privilege control transfer is occurring.
2. The return-program instruction pointer is popped off the current-program (higher privilege) stack
and loaded into the CS and rIP registers.
3. The return instruction can include an immediate operand that specifies the number of additional
bytes to be popped off of the stack. These bytes may correspond to the parameters pushed onto the
stack previously by a call through a call gate containing a non-zero parameter-count field. If the
return includes the immediate operand, then the stack pointer is adjusted upward by adding the
specified number of bytes to the rSP.
4. The return-program stack pointer is popped off the current-program (higher privilege) stack and
loaded into the SS and rSP registers. In the case of nested returns to 64-bit mode, a null selector
can be popped into the SS register.
The operand size of a far return determines the size of stack pops when switching stacks. If a far return
is used in 64-bit mode to return from a prior call through a long-mode call gate, the far return must use
a 64-bit operand size. The 64-bit operand size allows the far return to properly read the stack
established previously by the far call.
Nested Returns to 64-Bit Mode Procedures. In long mode, a far call that changes privilege levels
causes the SS register to be loaded with a null selector (this is the same action taken by an interrupt in
long mode). If the called procedure performs another far call to a higher-privileged procedure, or is
interrupted, the null SS selector is pushed onto the stack frame, and another null selector is loaded into
the SS register. Using a null selector in this way allows the processor to properly handle returns nested
within 64-bit-mode procedures and interrupt handlers.
Normally, a RET that pops a null selector into the SS register causes a general-protection exception
(#GP) to occur. However, in long mode, the null selector acts as a flag indicating the existence of
nested interrupt handlers or other privileged software in 64-bit mode. Long mode allows RET to pop a
null selector into SS from the stack under the following conditions:
• The target mode is 64-bit mode.
• The target CPL is less than 3.
In this case, the processor does not load an SS descriptor, and the null selector is loaded into SS
without causing a #GP exception.
Three bits from the descriptor entry are used to control how the segment-limit field is interpreted: the
granularity (G) bit, the default operand-size (D) bit, and for data segments, the expand-down (E) bit.
See “Legacy Segment Descriptors” on page 77 for a detailed description of each bit.
For all segments other than expand-down segments, the minimum segment-offset is 0. The maximum
segment-offset depends on the value of the G bit:
• If G=0 (byte granularity), the maximum allowable segment-offset is equal to the value of the
segment-limit field.
• If G=1 (4096-byte granularity), the segment-limit field is first scaled by 4096 (1000h). Then 4095
(0FFFh) is added to the scaled value to arrive at the maximum allowable segment-offset, as shown
in the following equation:
maximum segment-offset = (limit × 1000h) + 0FFFh
For example, if the segment-limit field is 0100h, then the maximum allowable segment-offset is
(0100h × 1000h) + 0FFFh = 10_1FFFh.
In both cases, the maximum segment-size is specified when the descriptor segment-limit field is
0F_FFFFh.
Expand-Down Segments. Expand-down data segments are supported in legacy mode and
compatibility mode but not in 64-bit mode. With expand-down data segments, the maximum segment
offset depends on the value of the D bit in the data-segment descriptor:
• If D=0 the maximum segment-offset is 0_FFFFh.
• If D=1 the maximum segment-offset is 0_FFFF_FFFFh.
The minimum allowable segment offset in expand-down segments depends on the value of the G bit:
• If G=0 (byte granularity), the minimum allowable segment offset is the segment-limit value plus 1.
For example, if the segment-limit field is 0100h, then the minimum allowable segment-offset is
0101h.
• If G=1 (4096-byte granularity), the segment-limit value in the descriptor is first scaled by 4096
(1000h), and then 4095 (0FFFh) is added to the scaled value to arrive at a scaled segment-limit
value. The minimum allowable segment-offset is this scaled segment-limit value plus 1, as shown
in the following equation:
minimum segment-offset = (limit × 1000) + 0FFFh + 1
For example, if the segment-limit field is 0100h, then the minimum allowable segment-offset is
(0100h × 1000h) + 0FFFh + 1 = 10_1000h.
For expand-down segments, the maximum segment size is specified when the segment-limit value is 0.
Descriptor-Table Register Loads. Loads into the LDTR and TR descriptor-table registers are
checked for the appropriate system-segment type. The LDTR can only be loaded with an LDT
descriptor, and the TR only with a TSS descriptor. The checks are performed during any action that
causes these registers to be loaded. This includes execution of the LLDT and LTR instructions and
during task switches.
Segment Register Loads. The following restrictions are placed on the segment-descriptor types that
can be loaded into the six user segment registers:
• Only code segments can be loaded into the CS register.
• Only writable data segments can be loaded into the SS register.
• Only the following segment types can be loaded into the DS, ES, FS, or GS registers:
- Read-only or read/write data segments.
- Readable code segments.
These checks are performed during any action that causes the segment registers to be loaded. This
includes execution of the MOV segment-register instructions, control transfers, and task switches.
Control Transfers. Control transfers (branches and interrupts) place additional restrictions on the
segment types that can be referenced during the transfer:
• The segment-descriptor type referenced by far CALLs and far JMPs must be one of the following:
- A code segment
- A call gate or a task gate
- An available TSS (only allowed in legacy mode)
- A task gate (only allowed in legacy mode)
• Only code-segment descriptors can be referenced by call-gate, interrupt-gate, and trap-gate
descriptors.
• Only TSS descriptors can be referenced by task-gate descriptors.
• The link field (selector) in the TSS can only point to a TSS descriptor. This is checked during an
IRET control transfer to a task.
• The far RET and far IRET instructions can only reference code-segment descriptors.
• The interrupt-descriptor table (IDT), which is referenced during interrupt control transfers, can
only contain interrupt gates, trap gates, and task gates.
Segment Access. After a segment descriptor is successfully loaded into one of the segment
registers, reads and writes into the segments are restricted in the following ways:
• Writes are not allowed into read-only data-segment types.
• Writes are not allowed into code-segment types (executable segments).
• Reads from code-segment types are not allowed if the readable (R) type bit is cleared to 0.
These checks are generally performed during execution of instructions that access memory.
Compatibility Mode and 64-Bit Mode. The following type checks differ in long mode (64-bit mode
and compatibility mode) as compared to legacy mode:
• System Segments—System-segment types are checked, but the following types that are valid in
legacy mode are illegal in long mode:
- 16-bit available TSS.
- 16-bit busy TSS.
- Type-field encoding of 00h in the upper half of a system-segment descriptor to indicate an
illegal type and prevent access as a legacy descriptor.
• Gates—Gate-descriptor types are checked, but the following types that are valid in legacy mode
are illegal in long mode:
- 16-bit call gate.
- 16-bit interrupt gate.
- 16-bit trap gate.
- Task gate.
64-Bit Mode. 64-bit mode disables segmentation, and most of the segment-descriptor fields are
ignored. The following list identifies situations where type checks in 64-bit mode differ from those in
compatibility mode and legacy mode:
• Code Segments—The readable (R) type bit is ignored in 64-bit mode. None of the legacy type-
checks that prevent reads from or writes into code segments are performed in 64-bit mode.
• Data Segments—Data-segment type attributes are ignored in 64-bit mode. The writable (W) and
expand-down (E) type bits are ignored. All data segments are treated as writable.
Virtual addresses are translated to physical addresses through hierarchical translation tables created
and managed by system software. Each table contains a set of entries that point to the next-lower table
in the translation hierarchy. A single table at one level of the hierarchy can have hundreds of entries,
each of which points to a unique table at the next-lower hierarchical level. Each lower-level table can
in turn have hundreds of entries pointing to tables further down the hierarchy. The lowest-level table in
the hierarchy points to the translated physical page.
Figure 5-1 on page 117 shows an overview of the page-translation hierarchy used in long mode.
Legacy mode paging uses a subset of this translation hierarchy (the page-map level-4 table does not
exist in legacy mode and the PDP table may or may not be used, depending on which paging mode is
enabled). As this figure shows, a virtual address is divided into fields, each of which is used as an offset
into a translation table. The complete translation chain is made up of all table entries referenced by the
virtual-address fields. The lowest-order virtual-address bits are used as the byte offset into the physical
page.
Sign Page Map Page Directory Page Directory Page Table Physical Page
Extension Level-4 Offset Pointer Offset Offset Offset Offset
PDPE
PTE
PML4E PDE
Physical
Address
Page Map
Level 4
Table
Legacy page translation offers a variety of alternatives in translating virtual addresses to physical
addresses. Four physical-page sizes of 4 Kbytes, 2 Mbytes and 4 Mbytes are available. Virtual
addresses are 32 bits long, and physical addresses up to the supported physical-address size can be
used. The AMD64 architecture enhances the legacy translation support by allowing virtual addresses
of up to 64 bits long to be translated into physical addresses of up to 52 bits long.
Currently, the AMD64 architecture defines a mechanism for translating 48-bit virtual addresses to 52-
bit physical addresses. The mechanism used to translate a full 64-bit virtual address is reserved and
will be described in a future AMD64 architectural specification.
4-Mbyte) physical-page sizes. The page-size (also PS) bit in the PDPE (bit 7, referred to as PDPE.PS)
selects between 2-Mbyte and 1-Gbyte physical-page sizes in long mode.
When PDE.PS is set to 1, large physical pages are used, and the PDE becomes the lowest level of the
translation hierarchy. The size of the large page is determined by the values of CR4.PAE and
CR4.PSE, as shown in Figure 5-1 on page 118. When PDE.PS is cleared to 0, standard 4-Kbyte
physical pages are used, and the PTE is the lowest level of the translation hierarchy.
When PDPE.PS is set to 1, 1-Gbyte physical pages are used, and the PDPE becomes the lowest level of
the translation hierarchy. Neither the PDE nor PTE are used for 1-Gbyte paging.
normal (non-PAE) paging is used (CR4.PAE=0). Figure 5-3 shows the CR3 format when PAE paging
is used (CR4.PAE=1).
31 12 11 5 4 3 2 0
P P
Page-Directory-Table Base Address Reserved C W Reserved
D T
31 5 4 3 2 0
P P
Page-Directory-Pointer-Table Base Address C W Reserved
D T
Table Base Address Field. This field points to the starting physical address of the highest-level
page-translation table. The size of this field depends on the form of paging used:
• Normal (Non-PAE) Paging (CR4.PAE=0)—This 20-bit field occupies bits 31–12, and points to the
base address of the page-directory table. The page-directory table is aligned on a 4-Kbyte
boundary, with the low-order 12 address bits (11–0) assumed to be 0. This yields a total base-
address size of 32 bits.
• PAE Paging (CR4.PAE=1)—This field is 27 bits and occupies bits 31–5. The CR3 register points
to the base address of the page-directory-pointer table. The page-directory-pointer table is aligned
on a 32-byte boundary, with the low 5 address bits (4–0) assumed to be 0.
Page-Level Writethrough (PWT) Bit. Bit 3. Page-level writethrough indicates whether the highest-
level page-translation table has a writeback or writethrough caching policy. When PWT=0, the table
has a writeback caching policy. When PWT=1, the table has a writethrough caching policy.
Page-Level Cache Disable (PCD) Bit. Bit 4. Page-level cache disable indicates whether the highest-
level page-translation table is cacheable. When PCD=0, the table is cacheable. When PCD=1, the table
is not cacheable.
Reserved Bits. Reserved fields should be cleared to 0 by software when writing CR3.
4-Kbyte Page Translation. 4-Kbyte physical-page translation is performed by dividing the 32-bit
virtual address into three fields. Each of the upper two fields are used as an index into a two-level page-
translation hierarchy. The virtual-address fields are used as follows, and are shown in Figure 5-4:
• Bits 31–22 index into the 1024-entry page-directory table.
• Bits 21–12 index into the 1024-entry page table.
• Bits 11–0 provide the byte offset into the physical page.
Virtual Address
31 22 21 12 11 0
Page-Directory Page-Table
Offset Offset Page Offset
10 10 12
Page- 4 Kbyte
Directory Page Physical
Table Table Page
32
PTE
Physical
32 Address
PDE
31 12
Page-Directory Base CR3
Figure 5-5 on page 123 shows the format of the PDE (page-directory entry), and Figure 5-6 on
page 123 shows the format of the PTE (page-table entry). Each table occupies 4 Kbytes and can hold
1024 of the 32-bit table entries. The fields within these table entries are described in “Page-
Translation-Table Entry Fields” on page 135.
Figure 5-5 shows bit 7 cleared to 0. This bit is the page-size bit (PS), and specifies a 4-Kbyte physical-
page translation.
31 12 11 9 8 7 6 5 4 3 2 1 0
I I P P U R
Page-Table Base Address AVL G 0 G A C W / / P
N N D T S W
31 12 11 9 8 7 6 5 4 3 2 1 0
P P P U R
Physical-Page Base Address AVL G A D A C W / / P
T D T S W
4-Mbyte Page Translation. 4-Mbyte page translation is only supported when page-size extensions
are enabled (CR4.PSE=1) and physical-address extensions are disabled (CR4.PAE=0).
PSE defines a page-size bit in the 32-bit PDE format (PDE.PS). This bit is used by the processor
during page translation to support both 4-Mbyte and 4-Kbyte pages. 4-Mbyte pages are selected when
PDE.PS is set to 1, and the PDE points directly to a 4-Mbyte physical page. PTEs are not used in a 4-
Mbyte page translation. If PDE.PS is cleared to 0, or if 4-Mbyte page translation is disabled, the PDE
points to a PTE.
4-Mbyte page translation is performed by dividing the 32-bit virtual address into two fields. Each field
is used as an index into a single-level page-translation hierarchy. The virtual-address fields are used as
follows, and are shown in Figure 5-7 on page 124:
• Bits 31–22 index into the 1024-entry page-directory table.
• Bits 21–0 provide the byte offset into the physical page.
Virtual Address
31 22 21 0
Page-Directory
Offset Page Offset
10 22
Page- 4 Mbyte
Directory Physical
Table Page
Physical
40 Address
PDE
31 12
Page-Directory Base CR3
The AMD64 architecture modifies the legacy 32-bit PDE format in PSE mode to increase physical-
address size support to 40 bits. This increase in address size is accomplished by using bits 20–13 to
hold eight additional high-order physical-address bits. Bit 21 is reserved and must be cleared to 0.
Figure 5-8 shows the format of the PDE when PSE mode is enabled. The physical-page base-address
bits are contained in a split field. The high-order, physical-page base-address bits 39–32 are located in
PDE[20:13], and physical-page base-address bits 31–22 are located in PDE[31:22].
31 22 21 20 13 12 11 9 8 7 6 5 4 3 2 1 0
P P P U R
Physical-Page Base Address
Physical-Page Base Address [31:22] 0 A AVL G 1 D A C W / / P
[39:32]
T D T S W
addresses (up to 52 bits). The size of each table remains 4 Kbytes, which means each table can hold
512 of the 64-bit entries. PAE paging also introduces a third-level page-translation table, known as the
page-directory-pointer table (PDP).
The size of large pages in PAE-paging mode is 2 Mbytes rather than 4 Mbytes. PAE uses the page-
directory page-size bit (PDE.PS) to allow selection between 4-Kbyte and 2-Mbyte page sizes. PAE
automatically uses the page-size bit, so the value of CR4.PSE is ignored by PAE paging.
4-Kbyte Page Translation. With PAE paging, 4-Kbyte physical-page translation is performed by
dividing the 32-bit virtual address into four fields, each of the upper three fields is used as an index into
a 3-level page-translation hierarchy. The virtual-address fields are described as follows and are shown
in Figure 5-9:
• Bits 31–30 index into a 4-entry page-directory-pointer table.
• Bits 29–21 index into the 512-entry page-directory table.
• Bits 20–12 index into the 512-entry page table.
• Bits 11–0 provide the byte offset into the physical page.
Virtual Address
31 30 29 21 20 12 11 0
2 9 9 12
Page- 4 Kbyte
Directory Page Physical
Page-
Table Table Page
Directory-
Pointer
Table 52*
PTE
52*
PDPE
Physical
52* Address
PDE
Figures 5-10 through 5-12 show the legacy-mode 4-Kbyte translation-table formats:
• Figure 5-10 shows the PDPE (page-directory-pointer entry) format.
• Figure 5-11 shows the PDE (page-directory entry) format.
• Figure 5-12 shows the PTE (page-table entry) format.
The fields within these table entries are described in “Page-Translation-Table Entry Fields” on
page 135.
Figure 5-11 shows the PDE.PS bit cleared to 0 (bit 7), specifying a 4-Kbyte physical-page translation.
63 52 51 32
Page-Directory Base Address
Reserved, MBZ
(This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 9 8 5 4 3 2 1 0
P P
Reserved,
Page-Directory Base Address AVL C W MBZ P
MBZ
D T
63 62 52 51 32
N Page-Table Base Address
Reserved, MBZ
X (This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 9 8 7 6 5 4 3 2 1 0
I I P P U R
Page-Table Base Address AVL G 0 G A C W / / P
N N D T S W
63 62 52 51 32
N Physical-Page Base Address
Reserved, MBZ
X (This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 9 8 7 6 5 4 3 2 1 0
P P P U R
Physical-Page Base Address AVL G A D A C W / / P
T D T S W
2-Mbyte Page Translation. 2-Mbyte page translation is performed by dividing the 32-bit virtual
address into three fields. Each field is used as an index into a 2-level page-translation hierarchy. The
virtual-address fields are described as follows and are shown in Figure 5-13 on page 127:
Virtual Address
31 30 29 21 20 0
Page-Directory- Page-Directory
Offset Page Offset
Pointer Offset
2 9 21
Page- 2 Mbyte
Page- Directory Physical
Directory- Table Page
Pointer
Table
52*
PDPE
Physical
52* Address
PDE
Figure 5-14 shows the format of the PDPE (page-directory-pointer entry) and Figure 5-15 on page 128
shows the format of the PDE (page-directory entry). PTEs are not used in 2-Mbyte page translations.
Figure 5-15 on page 128 shows the PDE.PS bit set to 1 (bit 7), specifying a 2-Mbyte physical-page
translation.
63 52 51 32
Page-Directory Base Address
Reserved, MBZ
(This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 9 8 5 4 3 2 1 0
P P
Reserved,
Page-Directory Base Address AVL C W MBZ P
MBZ
D T
63 62 52 51 32
N Physical-Page Base Address
Reserved, MBZ
X (This is an architectural limit. A given implementation may support fewer bits.)
31 21 20 13 12 11 9 8 7 6 5
4 3 2 1 0
P P P U R
Physical-Page Base Address Reserved, MBZ A AVL G 1 D A C W / / P
T D T S W
5.3.2 CR3
In long mode, the CR3 register is used to point to the PML4 base address. CR3 is expanded to 64 bits
in long mode, allowing the PML4 table to be located anywhere in the 52-bit physical-address space.
Figure 5-16 on page 129 shows the long-mode CR3 format.
63 52 51 32
Page-Map Level-4 Table Base Address
Reserved, MBZ
(This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 5 4 3 2 0
P P
Page-Map Level-4 Table Base Address Reserved C W Reserved
D T
Table Base Address Field. Bits 51–12. This 40-bit field points to the PML4 base address. The
PML4 table is aligned on a 4-Kbyte boundary with the low-order 12 address bits (11–0) assumed to be
0. This yields a total base-address size of 52 bits. System software running on processor
implementations supporting less than the full 52-bit physical-address space must clear the
unimplemented upper base-address bits to 0.
Page-Level Writethrough (PWT) Bit. Bit 3. Page-level writethrough indicates whether the highest-
level page-translation table has a writeback or writethrough caching policy. When PWT=0, the table
has a writeback caching policy. When PWT=1, the table has a writethrough caching policy.
Page-Level Cache Disable (PCD) Bit. Bit 4. Page-level cache disable indicates whether the highest-
level page-translation table is cacheable. When PCD=0, the table is cacheable. When PCD=1, the table
is not cacheable.
Reserved Bits. Reserved fields should be cleared to 0 by software when writing CR3.
Virtual Address
63 48 47 39 38 30 29 21 20 12 11 0
Page-Map
Page-Directory- Page-Directory Page-Table Physical-
Sign Extend Level-4 Offset
Pointer Offset Offset Offset Page Offset
(PML4)
9 9 9 9 12
Page-
Page-Map Directory- Page- 4 Kbyte
Level-4 Pointer Directory Page Physical
Table Table Table Table Page
52*
PTE
52*
52* PDPE
Physical
PML4E 52*
Address
PDE
Figures 5-18 through 5-20 on page 131 and Figure 5-21 on page 131 show the long-mode 4-Kbyte
translation-table formats:
• Figure 5-18 on page 131 shows the PML4E (page-map level-4 entry) format.
• Figure 5-19 on page 131 shows the PDPE (page-directory-pointer entry) format.
• Figure 5-20 on page 131 shows the PDE (page-directory entry) format.
• Figure 5-21 on page 131 shows the PTE (page-table entry) format.
The fields within these table entries are described in “Page-Translation-Table Entry Fields” on
page 135.
Figure 5-20 on page 131 shows the PDE.PS bit (bit 7) cleared to 0, indicating a 4-Kbyte physical-page
translation.
63 62 52 51 32
N Page-Directory-Pointer Base Address
Available
X (This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 9 8 7 6 5 4 3 2 1 0
I P P U R
Page-Directory-Pointer Base Address AVL MBZ G A C W / / P
N D T S W
63 62 52 51 32
N Page-Directory Base Address
Available
X (This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 9 8 7 6 5 4 3 2 1 0
M I P P U R
Page-Directory Base Address AVL B 0 G A C W / / P
Z N D T S W
63 62 52 51 32
N Page-Table Base Address
Available
X (This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 9 8 7 6 5 4 3 2 1 0
I I P P U R
Page-Table Base Address AVL G 0 G A C W / / P
N N D T S W
63 62 52 51 32
N Physical-Page Base Address
Available
X (This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 9 8 7 6 5 4 3 2 1 0
P P P U R
Physical-Page Base Address AVL G A D A C W / / P
T D T S W
Virtual Address
63 48 47 39 38 30 29 21 20 0
Page-Map
Page-Directory- Page-Directory
Sign Extend Level-4 Table Offset
Pointer Offset Offset Page Offset
(PML4)
9 9 9 21
Page-
Page-Map Directory- Page- 2 Mbyte
Level-4 Pointer Directory Physical
Table Table Table Page
52*
52* PDPE
Physical
PML4E 52* Address
PDE
Figures 5-23 through 5-25 on page 133 show the long-mode 2-Mbyte translation-table formats (the
PML4 and PDPT formats are identical to those used for 4-Kbyte page translations and are repeated
here for clarity):
• Figure 5-23 on page 133 shows the PML4E (page-map level-4 entry) format.
• Figure 5-24 on page 133 shows the PDPE (page-directory-pointer entry) format.
• Figure 5-25 on page 133 shows the PDE (page-directory entry) format.
The fields within these table entries are described in “Page-Translation-Table Entry Fields” on
page 135. PTEs are not used in 2-Mbyte page translations.
Figure 5-25 shows the PDE.PS bit (bit 7) set to 1, indicating a 2-Mbyte physical-page translation.
63 62 52 51 32
N Page-Directory-Pointer Base Address
Available
X (This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 9 8 7 6 5 4 3 2 1 0
I P P U R
Page-Directory-Pointer Base Address AVL MBZ G A C W / / P
N D T S W
63 62 52 51 32
N Page-Directory Base Address
Available
X (This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 9 8 7 6 5 4 3 2 1 0
M I P P U R
Page-Directory Base Address AVL B 0 G A C W / / P
Z N D T S W
63 52 51 32
N Physical Page Base Address
Available
X (This is an architectural limit. A given implementation may support fewer bits.)
31 21 20 13 12 11 9 8 7 6 5
4 3 2 1 0
P P P U R
Physical Page Base Address Reserved, MBZ A AVL G 1 D A C W / / P
T D T S W
Virtual Address
63 48 47 39 38 30 29 0
Page-Map
Page-Directory-
Sign Extend Level-4 Table Offset Page Offset
Pointer Offset
(PML4)
9 9 30
Page-
Page-Map Directory- 1 Gbyte
Level-4 Pointer Physical
Table Table Page
52*
52* PDPE
Physical
PML4E
Address
Figure 5-27 and Figure 5-28 on page 135 show the long mode 1-Gbyte translation-table formats (the
PML4 format is identical to the one used for 4-Kbyte page translations and is repeated here for clarity):
• Figure 5-27 shows the PML4E (page-map level-4 entry) format.
• Figure 5-28 shows the PDPE (page-directory-pointer entry) format.
The fields within these table entries are described in “Page-Translation-Table Entry Fields” on
page 135 in the current volume. PTEs and PDEs are not used in 1-Gbyte page translations.
Figure 5-28 on page 135 shows the PDPE.PS bit (bit 7) set to 1, indicating a 1-Gbyte physical-page
translation.
63 62 52 51 32
31 12 11 9 8 7 6 5 4 3 2 1 0
I P P U R
Page-Directory-Pointer Base Address AVL MBZ G A C W / / P
N D T S W
63 62 52 51 32
31 30 12 11 9 8 7 6 5 4 3 2 1 0
Phy
P P P U R
Page
Reserved, MBZ A AVL G 1 D A C W / / P
Base
T D T S W
Addr
1-Gbyte Paging Feature Identification. EDX bit 26 as returned by CPUID function 8000_0001h
indicates 1-Gbyte page support. The EAX register as returned by CPUID function 8000_0019h reports
the number of 1-Gbyte L1 TLB entries supported and EBX reports the number of 1-Gbyte L2 TLB
entries. See the CPUID Specification, order# 25481, for details.
Translation-Table Base Address Field. The translation-table base-address field points to the
physical base address of the next-lower-level table in the page-translation hierarchy. Page data-
structure tables are always aligned on 4-Kbyte boundaries, so only the address bits above bit 11 are
stored in the translation-table base-address field. Bits 11–0 are assumed to be 0. The size of the field
depends on the mode:
• In normal (non-PAE) paging (CR4.PAE=0), this field specifies a 32-bit physical address.
• In PAE paging (CR4.PAE=1), this field specifies a 52-bit physical address.
52 bits correspond to the maximum physical-address size allowed by the AMD64 architecture. If a
processor implementation supports fewer than the full 52-bit physical address, software must clear the
unimplemented high-order translation-table base-address bits to 0. For example, if a processor
implementation supports a 40-bit physical-address size, software must clear bits 51–40 when writing a
translation-table base-address field in a page data-structure entry.
Physical-Page Base Address Field. The physical-page base-address field points to the base
address of the translated physical page. This field is found only in the lowest level of the page-
translation hierarchy. The size of the field depends on the mode:
• In normal (non-PAE) paging (CR4.PAE=0), this field specifies a 32-bit base address for a physical
page.
• In PAE paging (CR4.PAE=1), this field specifies a 52-bit base address for a physical page.
Physical pages can be 4 Kbytes, 2 Mbytes, 4 Mbytes, or 1-Gbyte and they are always aligned on an
address boundary corresponding to the physical-page length. For example, a 2-Mbyte physical page is
always aligned on a 2-Mbyte address boundary. Because of this alignment, the low-order address bits
are assumed to be 0, as follows:
• 4-Kbyte pages, bits 11–0 are assumed 0.
• 2-Mbyte pages, bits 20–0 are assumed 0.
• 4-Mbyte pages, bits 21–0 are assumed 0.
• 1-Gbyte pages, bits 29–0 are assumed 0.
Present (P) Bit. Bit 0. This bit indicates whether the page-translation table or physical page is loaded
in physical memory. When the P bit is cleared to 0, the table or physical page is not loaded in physical
memory. When the P bit is set to 1, the table or physical page is loaded in physical memory.
Software clears this bit to 0 to indicate a page table or physical page is not loaded in physical memory.
A page-fault exception (#PF) occurs if an attempt is made to access a table or page when the P bit is 0.
System software is responsible for loading the missing table or page into memory and setting the P bit
to 1.
When the P bit is 0, indicating a not-present page, all remaining bits in the page data-structure entry are
available to software.
Entries with P cleared to 0 are never cached in TLB nor will the processor set the Accessed or Dirty bit
for the table entry.
Read/Write (R/W) Bit. Bit 1. This bit controls read/write access to all physical pages mapped by the
table entry. For example, a page-map level-4 R/W bit controls read/write access to all 128M
(512 × 512 × 512) physical pages it maps through the lower-level translation tables. When the R/W bit
is cleared to 0, access is restricted to read-only. When the R/W bit is set to 1, both read and write access
is allowed. See “Page-Protection Checks” on page 142 for a description of the paging read/write
protection mechanism.
User/Supervisor (U/S) Bit. Bit 2. This bit controls user (CPL 3) access to all physical pages mapped
by the table entry. For example, a page-map level-4 U/S bit controls the access allowed to all 128M
(512 × 512 × 512) physical pages it maps through the lower-level translation tables. When the U/S bit
is cleared to 0, access is restricted to supervisor level (CPL 0, 1, 2). When the U/S bit is set to 1, both
user and supervisor access is allowed. See “Page-Protection Checks” on page 142 for a description of
the paging user/supervisor protection mechanism.
Page-Level Writethrough (PWT) Bit. Bit 3. This bit indicates whether the page-translation table or
physical page to which this entry points has a writeback or writethrough caching policy. When the
PWT bit is cleared to 0, the table or physical page has a writeback caching policy. When the PWT bit is
set to 1, the table or physical page has a writethrough caching policy. See “Memory Caches” on
page 176 for additional information on caching.
Page-Level Cache Disable (PCD) Bit. Bit 4. This bit indicates whether the page-translation table or
physical page to which this entry points is cacheable. When the PCD bit is cleared to 0, the table or
physical page is cacheable. When the PCD bit is set to 1, the table or physical page is not cacheable.
See “Memory Caches” on page 176 for additional information on caching.
Accessed (A) Bit. Bit 5. This bit indicates whether the page-translation table or physical page to
which this entry points has been accessed. The A bit is set to 1 by the processor the first time the table
or physical page is either read from or written to. The A bit is never cleared by the processor. Instead,
software must clear this bit to 0 when it needs to track the frequency of table or physical-page accesses.
Dirty (D) Bit. Bit 6. This bit is only present in the lowest level of the page-translation hierarchy. It
indicates whether the page-translation table or physical page to which this entry points has been
written. The D bit is set to 1 by the processor the first time there is a write to the physical page. The D
bit is never cleared by the processor. Instead, software must clear this bit to 0 when it needs to track the
frequency of physical-page writes.
Page Size (PS) Bit. Bit 7. This bit is present in page-directory entries and long-mode page-directory-
pointer entries. When the PS bit is set in the page-directory-pointer entry (PDPE) or page-directory
entry (PDE), that entry is the lowest level of the page-translation hierarchy. When the PS bit is cleared
to 0 in all levels, the lowest level of the page-translation hierarchy is the page-table entry (PTE), and
the physical-page size is 4 Kbytes. The physical-page size is determined as follows:
• If EFER.LMA=1 and PDPE.PS=1, the physical-page size is 1 Gbyte.
• If CR4.PAE=0 and PDE.PSE=1, the physical-page size is 4 Mbytes.
• If CR4.PAE=1 and PDE.PSE=1, the physical-page size is 2 Mbytes.
See Table 5-1 on page 118 for a description of the relationship between the PS bit, PAE, physical-page
sizes, and page-translation hierarchy.
Global Page (G) Bit. Bit 8. This bit is only present in the lowest level of the page-translation
hierarchy. It indicates the physical page is a global page. The TLB entry for a global page (G=1) is not
invalidated when CR3 is loaded either explicitly by a MOV CRn instruction or implicitly during a task
switch. Use of the G bit requires the page-global enable bit in CR4 to be set to 1 (CR4.PGE=1). See
“Global Pages” on page 140 for more information on the global-page mechanism.
Available to Software (AVL) Bit. These bits are not interpreted by the processor and are available for
use by system software.
Page-Attribute Table (PAT) Bit. This bit is only present in the lowest level of the page-translation
hierarchy, as follows:
• If the lowest level is a PTE (PDE.PS=0), PAT occupies bit 7.
• If the lowest level is a PDE (PDE.PS=1) or PDPE (PDPE.PS=1), PAT occupies bit 12.
The PAT bit is the high-order bit of a 3-bit index into the PAT register (Figure 7-10 on page 193). The
other two bits involved in forming the index are the PCD and PWT bits. Not all processors support the
PAT bit by implementing the PAT registers. See “Page-Attribute Table Mechanism” on page 193 for a
description of the PAT mechanism and how it is used.
No Execute (NX) Bit. Bit 63. This bit is present in the translation-table entries defined for PAE
paging, with the exception that the legacy-mode PDPE does not contain this bit. This bit is not
supported by non-PAE paging.
The NX bit can only be set when the no-execute page-protection feature is enabled by setting
EFER.NXE to 1 (see “Extended Feature Enable Register (EFER)” on page 54). If EFER.NXE=0, the
NX bit is treated as reserved. In this case, a page-fault exception (#PF) occurs if the NX bit is not
cleared to 0.
This bit controls the ability to execute code from all physical pages mapped by the table entry. For
example, a page-map level-4 NX bit controls the ability to execute code from all 128M
(512 × 512 × 512) physical pages it maps through the lower-level translation tables. When the NX bit
is cleared to 0, code can be executed from the mapped physical pages. When the NX bit is set to 1, code
cannot be executed from the mapped physical pages. See “No Execute (NX) Bit” on page 143 for a
description of the no-execute page-protection mechanism.
Reserved Bits. Software should clear all reserved bits to 0. If the processor is in long mode, or if
page-size and physical-address extensions are enabled in legacy mode, a page-fault exception (#PF)
occurs if reserved bits are not cleared to 0.
Accessed (A) bit. The Accessed bit can be set for instructions that are speculatively executed by the
processor.
For example, the Accessed bit may be set by instructions in a mispredicted branch path even though
those instructions are never retired. Thus, software must not assume that the TLB entry has not been
cached in the TLB, just because no instruction that accessed the page was successfully retired.
Nevertheless, a table entry is never cached in the TLB without its Accessed bit being set at the same
time.
The processor does not order Accessed bit updates with respect to loads done by other instructions.
Dirty (D) bit. The Dirty bit is not updated speculatively. For instructions with multiple writes, the D
bit may be set for any writes completed up to the point of a fault. In rare cases, the Dirty bit may be set
even if a write was not actually performed, including MASKMOVQ with a mask of zero and certain
x87 floating point instructions that cause an exception. Thus software can not assume that the page has
actually been written even where PTE.D is set to 1.
If PTE.D is cleared to 0, software can rely on the fact that the page has not been written.
Dirty bit updates are ordered with respect to other loads and stores. However, to ensure compatibility
with future processors, a serializing operation should be inserted before reading the D bit.
System software is responsible for managing the TLBs when updates are made to the linear-to-
physical mapping of addresses. A change to any paging data-structure entry is not automatically
reflected in the TLB, and hardware snooping of TLBs during memory-reference cycles is not
performed. Software must invalidate the TLB entry of a modified translation-table entry so that the
change is reflected in subsequent address translations. TLB invalidation is described in “TLB
Management” on page 140. Only privileged software running at CPL=0 can manage the TLBs.
Explicit Invalidations. Three mechanisms are provided to explicitly invalidate the TLB:
• The invalidate TLB entry instruction (INVLPG) can be used to invalidate specific entries within
the TLB. This instruction invalidates a page, regardless of whether it is marked as global or not.
The Invalidate TLB entry in a Specified ASID (INVLPGA) operates similarly, but operates on the
specified ASID. See “Invalidate Page, Alternate ASID” on page 390.
• Updates to the CR3 register cause the entire TLB to be invalidated except for global pages. The
CR3 register can be updated with the MOV CR3 instruction. CR3 is also updated during a task
switch, with the updated CR3 value read from the TSS of the new task.
• The TLB_CONTROL field of a VMCB can request specific flushes of the TLB to occur when the
VMRUN instruction is executed on that VMCB. See “TLB Flush” on page 390.
Implicit Invalidations. The following operations cause the entire TLB to be invalidated, including
global pages:
• Modifying the CR0.PG bit (paging enable).
• Modifying the CR4.PAE bit (physical-address extensions), the CR4.PSE bit (page-size
extensions), or the CR4.PGE bit (page-global enable).
• Entering SMM as a result of an SMI interrupt.
• Executing the RSM instruction to return from SMM.
• Updating a memory-type range register (MTRR) with the WRMSR instruction.
• External initialization of the processor.
• External masking of the A20 address bit (asserting the A20M# input signal).
• Writes to certain model-specific registers with the WRMSR instruction; see the BIOS and Kernel
Developer's Guide for the processor implementation for more information
Speculative Caching of Address Translations. For performance reasons, AMD64 processors may
speculatively load valid address translations into the TLB on false execution paths. Such translations
are not based on references that a program makes from an “architectural state” perspective, but which
the processor may make in speculatively following an instruction path which turns out to be
mispredicted. This may occur for both instruction fetches and data references. Such entries remain
cached in the TLBs and may be used in subsequent translations. Loading a translation speculatively
does not set the A bit.
Caching of Upper Level Translation Table Entries. Similarly, to improve the performance of table
walks on TLB misses, AMD64 processors may save upper level translation table entries in special
table walk caching structures which are kept coherent with the tables in memory via the same
mechanisms as the TLBs—by means of the INVLPG instruction, moves to CR3, and modification of
paging control bits in CR0 and CR4. Like address translations in the TLB, these upper level entries
may also be cached speculatively and by false-path execution. These entries are never cached if their P
(present) bits are set to 0.
Under certain circumstances, an upper-level table entry that cannot ultimately lead to a valid
translation (because there are no valid entries in the lower level table to which it points) may also be
cached. This can happen while executing down a false path, when an in-progress table walk gets
cancelled by the branch mispredict before the low level table entry that would cause a fault is
encountered. Said another way, the fact that a page table has no valid entries does not guarantee that
upper level table entries won't be accessed and cached in the processor, as long as those upper level
entries are marked as present. For this reason, it is not safe to modify an upper level entry, even if no
valid lower-level entries exist, without first clearing its present bit, followed by an INVLPG
instruction.
Use of Cached Entries When Reporting a Page Fault Exception. On current AMD64 processors,
when any type of page fault exception is encountered by the MMU, any cached upper-level entries that
lead to the faulting entry are flushed (along with the TLB entry, if already cached) and the table walk is
repeated to confirm the page fault using the table entries in memory. This is done because a table entry
is allowed to be upgraded (by marking it as present, or by removing its write, execute or supervisor
restrictions) without explicitly maintaining TLB coherency. Such an upgrade will be found when the
table is re-walked, which resolves the fault. If the fault is confirmed on the re-walk however, a page
fault exception is reported, and upper level entries that may have been cached on the re-walk are
flushed.
Handling of D-Bit Updates. When the processor needs to set the D bit in the PTE for a TLB entry
that is already marked as writable at all cached TLB levels, the table walk that is performed to access
the PTE in memory uses cached upper level table entries. This differs from the fault situation
previously described, in which cached entries aren’t used to confirm the fault during the table walk.
Invalidation of Cached Upper-level Entries by INVLPG. Current AMD64 processors invalidate all
cached upper-level entries (in addition to the targeted TLB entry) on any INVLPG instruction. Future
implementations may however invalidate only those upper-level entries that are on the table walk path
of the address targeted by the INVLPG. Because existing memory management software may rely on
the current behavior, a more selective approach that may be implemented in the future will be
implemented as a software-visible feature that must be explicitly enabled.
Handling of PDPT Entries in PAE Mode. When 32-bit PAE mode is enabled on AMD64 processors
(CR4.PAE is set to 1) a third level of the address translation table hierarchy, the page directory pointer
table (PDPT), is enabled. This table contains four entries. On current AMD64 processors, in native
mode, these four entries are unconditionally loaded into the table walk cache whenever CR3 is written
with the PDPT base address, and remain locked in. At this point they are also checked for reserved bit
violations, and if such violations are present a general protection fault occurs.
Under SVM, however, when the processor is in guest mode with PAE enabled, the guest PDPT entries
are not cached or validated at this point, but instead are loaded and checked on demand in the normal
course of address translation, just like page directory and page table entries. Any reserved bit
violations are detected at the point of use, and result in a page fault (#PF) exception rather than a
general protection (#GP) fault. The cached PDPT entries are subject to displacement from the table
walk cache and reloading from the PDPT, hence software must assume that the PDPT entries may be
read by the processor at any point while those tables are active. Future AMD processors may
implement this same behavior in native mode as well, rather than pre-loading the PDPT entries.
data pages as instructions. All of these forms of protection are available at all levels of the page-
translation hierarchy.
The processor checks a page for execute permission only when the page translation is loaded into the
instruction TLB as a result of a page-table walk. The remaining protection checks are performed when
a virtual address is translated into a physical address. For those checks, the processor examines the
page-level memory-protection bits in the translation tables to determine if the access is allowed. The
bits involved in these checks are:
• User/Supervisor (U/S)—The U/S bit is introduced in “User/Supervisor (U/S) Bit” on page 137.
• Read/Write (R/W)—The R/W bit is introduced in “Read/Write (R/W) Bit” on page 137.
• Write-Protect Enable (CR0.WP)—The CR0.WP bit is introduced in “Write Protect (WP) Bit” on
page 44.
If all table entries in the translation hierarchy are specified as user level the physical page is a user
page, and both supervisor and user software can access it. In this case the physical page is read-only if
any table entry in the translation hierarchy specifies read-only access. All table entries in the
translation hierarchy must specify read/write access for the physical page to be read/write.
Table 5-3 shows the overall effect that privilege level and access type have on physical-page access
when write protection is enabled (CR0.WP=1). When any translation-table entry is specified as
supervisor level, the physical page is a supervisor page and can only be accessed by supervisor
software. In this case, the physical page is read-only if any table entry in the translation hierarchy
specifies read-only access. All table entries in the translation hierarchy must specify read/write access
for the supervisor page to be read/write.
6 System-Management Instructions
System-management instructions provide control over the resources used to manage the processor
operating environment. This includes memory management, memory protection, task management,
interrupt and exception handling, system-management mode, software debug and performance
analysis, and model-specific features. Most instructions used to access these resources are privileged
and can only be executed while the processor is running at CPL=0, although some instructions can be
executed at any privilege level.
Table 6-1 summarizes the instructions used for system management. These include all privileged
instructions, instructions whose privilege requirement is under the control of system software, non-
privileged instructions that are used primarily by system software, and instructions used to transfer
control to system software. Most of the instructions listed in Table 6-1 are summarized in this chapter,
although a few are introduced elsewhere in this manual, as indicated in the Reference column of
Table 6-1.
For details on individual system instructions, see “System Instruction Reference” in Volume 3.
The following instructions are summarized in this chapter but are not categorized as system
instructions, because of their importance to application programming:
• The CPUID instruction returns information critical to system software in initializing the operating
environment. It is fully described in “Processor Feature Identification” on page 61.
• The PUSHF and POPF instructions set and clear certain RFLAGS bits depending on the processor
operating mode and privilege level. These dependencies are described in “POPF and PUSHF
Instructions” on page 154.
• The MOV, PUSH, and POP instructions can be used to load and store segment registers, as
described in “MOV, POP, and PUSH Instructions” on page 155.
either a far CALL instruction or a software interrupt. Transferring control through one of these gates is
slowed by the segmentation-related overhead, as is the later return using a far RET or IRET
instruction. The following checks are performed when control is transferred in this manner:
• Selectors, gate descriptors, and segment descriptors are in the proper form.
• Descriptors lie within the bounds of the descriptor tables.
• Gate descriptors reference the appropriate segment descriptors.
• The caller, gate, and target privileges all allow the control transfer to take place.
• The stack created by the call has sufficient properties to allow the transfer to take place.
In addition to these call-gate checks, other checks are made involving the task-state segment when a
task switch occurs.
SYSCALL and SYSRET Instructions. SYSCALL and SYSRET are low-latency system call and
return instructions. These instructions assume the operating system implements a flat-memory model,
which greatly simplifies calls to and returns from the operating system. This simplification comes
from eliminating unneeded checks, and by loading pre-determined values into the CS and SS segment
registers (both visible and hidden portions). As a result, SYSCALL and SYSRET can take fewer than
one-fourth the number of internal clock cycles to complete than the legacy CALL and RET
instructions. SYSCALL and SYSRET are particularly well-suited for use in 64-bit mode, which
requires implementation of a paged, flat-memory model.
SYSCALL and SYSRET require that the code-segment base, limit, and attributes (except for CPL) are
consistent for all application and system processes. Only the CPL is allowed to vary. The processor
assumes (but does not check) that the SYSCALL target CS has CPL=0 and the SYSRET target CS has
CPL=3.
For details on the SYSCALL and SYSRET instructions, see “System Instruction Reference” in
Volume 3.
SYSCALL and SYSRET MSRs. The STAR, LSTAR, and CSTAR registers are model-specific
registers (MSRs) used to specify the target address of a SYSCALL instruction as well as the CS and SS
selectors of the called and returned procedures. The SFMASK register is used in long mode to specify
how rFLAGS is handled by these instructions. Figure 6-1 on page 151 shows the STAR, LSTAR,
CSTAR, and SFMASK register formats.
63 48 47 32 31 0
STAR C000_0081h SYSRET CS and SS SYSCALL CS and SS 32-bit SYSCALL Target EIP
• STAR—The STAR register has the following fields (unless otherwise noted, all bits are read/write):
- SYSRET CS and SS Selectors—Bits 63–48. This field is used to specify both the CS and SS
selectors loaded into CS and SS during SYSRET. If SYSRET is returning to 32-bit mode
(either legacy or compatibility), this field is copied directly into the CS selector field. If
SYSRET is returning to 64-bit mode, the CS selector is set to this field + 16. SS.Sel is set to this
field + 8, regardless of the target mode. Because SYSRET always returns to CPL 3, the RPL
bits 49–48 should be initialized to 11b.
- SYSCALL CS and SS Selectors—Bits 47–32. This field is used to specify both the CS and SS
selectors loaded into CS and SS during SYSCALL. This field is copied directly into CS.Sel.
SS.Sel is set to this field + 8. Because SYSCALL always switches to CPL 0, the RPL bits
33–32 should be initialized to 00b.
- 32-bit SYSCALL Target EIP—Bits 31–0. This is the target EIP of the called procedure.
The legacy STAR register is not expanded in long mode to provide a 64-bit target RIP address.
Instead, long mode provides two new STAR registers—long STAR (LSTAR) and compatibility
STAR (CSTAR)—that hold a 64-bit target RIP.
• LSTAR and CSTAR—The LSTAR register holds the target RIP of the called procedure in long
mode when the calling software is in 64-bit mode. The CSTAR register holds the target RIP of the
called procedure in long mode when the calling software is in compatibility mode. The WRMSR
instruction is used to load the target RIP into the LSTAR and CSTAR registers. If the RIP written to
either of the MSRs is not in canonical form, a #GP fault is generated on the WRMSR instruction.
• SFMASK—The SFMASK register is used to specify which RFLAGS bits are cleared during a
SYSCALL. In long mode, SFMASK is used to specify which RFLAGS bits are cleared when
SYSCALL is executed. If a bit in SFMASK is set to 1, the corresponding bit in RFLAGS is cleared
to 0. If a bit in SFMASK is cleared to 0, the corresponding rFLAGS bit is not modified.
SYSENTER and SYSEXIT MSRs. Three model-specific registers (MSRs) are used to specify the
target address and stack pointers for the SYSENTER instruction as well as the CS and SS selectors of
the called and returned procedures. The register fields are:
• SYSENTER Target CS—Holds the CS selector of the called procedure.
• SYSENTER Target ESP—Holds the called-procedure stack pointer. The SS selector is updated
automatically to point to the next descriptor entry after the SYSENTER Target CS, and ESP is the
offset into that stack segment.
• SYSENTER Target EIP—Holds the offset into the CS of the called procedure.
Figure 6-2 shows the register formats and their corresponding MSR IDs.
63 32 31 16 15 0
calls, no kernel stack exists at the OS entry point. Neither is there a straightforward method to obtain a
pointer to kernel structures, from which the kernel stack pointer could be read. Thus, the kernel cannot
save GPRs or reference memory. SwapGS does not require any GPR or memory operands, so no
registers need to be saved before using it. Similarly, when the OS kernel is entered via an interrupt or
exception (where the kernel stack is already set up), SwapGS can be used to quickly get a pointer to the
kernel data structures.
See “FS and GS Registers in 64-Bit Mode” on page 70 for more information on using the GS.base
register in 64-bit mode.
CPUID Instruction. The CPUID instruction provides complete information about the processor
implementation and its capabilities. Software operating at any privilege level can execute the CPUID
instruction to collect this information. System software normally uses the CPUID instruction to
determine which optional features are available so the system can be configured appropriately. The
optional features identified by the CPUID instruction are described in “CPUID” in Volume 3.
MOV CRn Instructions. The MOV CRn instructions can be used to copy data between the control
registers and the general-purpose registers. These instructions are privileged and cause a general-
protection exception (#GP) if non-privileged software attempts to execute them.
LMSW and SMSW Instructions. The machine status word is located in CR0 register bits 15–0. The
load machine status word (LMSW) instruction writes only the least-significant four status-word bits
(CR0[3:0]). All remaining status-word bits (CR0[15:4]) are left unmodified by the instruction. The
instruction is privileged and causes a #GP to occur if non-privileged software attempts to execute it.
The store machine status word (SMSW) instruction stores all 16 status-word bits (CR0[15:0]) into the
target GPR or memory location. The instruction is not privileged and can be executed by all software.
CLTS Instruction. The clear task-switched bit instruction (CLTS) clears CR0.TS to 0. The CR0.TS
bit is set to 1 by the processor every time a task switch takes place. The bit is useful to system software
in determining when the x87 and multimedia register state should be saved or restored. See “Task
Switched (TS) Bit” on page 44 for more information on using CR0.TS to manage x87-instruction
state. The CLTS instruction is privileged and causes a #GP to occur if non-privileged software attempts
to execute it.
POPF and PUSHF Instructions. The pop and push RFLAGS instructions are used for moving data
between the rFLAGS register and the stack. They are not system-management instructions, but their
behavior is mode-dependent.
CLI and STI Instructions. The clear interrupt (CLI) and set interrupt (STI) instructions modify only
the RFLAGS.IF bit or RFLAGS.VIF bit. Clearing rFLAGS.IF to 0 causes the processor to ignore
maskable interrupts. Setting RFLAGS.IF to 1 causes the processor to allow maskable interrupts.
See “Virtual Interrupts” on page 247 for more information on the operation of these instructions when
virtual-8086 mode extensions are enabled (CR4.VME=1).
RDMSR and WRMSR Instructions. The read/write model-specific register instructions (RDMSR
and WRMSR) can be used by privileged software to access the 64-bit MSRs. See “Model-Specific
Registers (MSRs)” on page 56 for details about the MSRs.
RDPMC Instruction. The read performance-monitoring counter instruction, RDPMC, is used to read
the model-specific performance-monitor registers, PerfCTR[3:0].
RDTSC Instruction. The read time-stamp counter instruction, RDTSC, is used to read the model-
specific time-stamp counter (TSC) register.
RDTSCP Instruction. The read time-stamp counter and processor ID instruction, RDTSCP, is used
to read the model-specific time-stamp counter (TSC) register. as well as the low 32 bits of the
TSC_AUX register (MSR C000_0103h).
MOV, POP, and PUSH Instructions. The MOV and POP instructions can be used to load a selector
into a segment register from a general-purpose register or memory (MOV) or from the stack (POP).
Any segment register, except the CS register, can be loaded with the MOV and POP instructions. The
CS register must be loaded with a far-transfer instruction.
All segment register selectors can be stored in a general-purpose register or memory using the MOV
instruction or pushed onto the stack using the PUSH instruction.
When a selector is loaded into a segment register, the processor automatically loads the corresponding
descriptor-table entry into the hidden portion of the selector register. The hidden portion contains the
base address, limit, and segment attributes.
Segment-load and segment-store instructions work normally in 64-bit mode. The appropriate entry is
read from the system descriptor table (GDT or LDT) and is loaded into the hidden portion of the
segment descriptor register. However, the contents of data-segment and stack-segment descriptor
registers are ignored, except in the case of the FS and GS segment-register base fields—see “FS and
GS Registers in 64-Bit Mode” on page 70 for more information.
The ability to use segment-load instructions allows a 64-bit operating system to set up segment
registers for a compatibility-mode application before switching to compatibility mode.
LGDT and LIDT Instructions. The load GDTR (LGDT) and load IDTR (LIDT) instructions load a
pseudo-descriptor from memory into the GDTR or IDTR registers, respectively.
LLDT and LTR Instructions. The load LDTR (LLDT) and load TR (LTR) instructions load a system-
segment descriptor from the GDT into the LDTR and TR segment-descriptor registers (hidden
portion), respectively.
SGDT and SIDT Instructions. The store GDTR (SGDT) and store IDTR (SIDT) instructions reverse
the operation of the LGDT and LIDT instructions. SGDT and SIDT store a pseudo-descriptor from the
GDTR or IDTR register into memory.
SLDT and STR Instructions. In all modes, the store LDTR (SLDT) and store TR (STR) instructions
store the LDT or task selector from the visible portion of the LDTR or TR register into a general-
purpose register or memory, respectively. The hidden portion of the LDTR or TR register is not stored.
LAR Instruction. The load access-rights (LAR) instruction can be used to determine if access to a
segment is allowed, based on privilege checks and type checks. The LAR instruction uses a segment-
selector in the source operand to reference a descriptor in the GDT or LDT. LAR performs a set of
access-rights checks and, if successful, loads the segment-descriptor access rights into the destination
register. Software can further examine the access-rights bits to determine if access into the segment is
allowed.
LSL Instruction. The load segment-limit (LSL) instruction uses a segment-selector in the source
operand to reference a descriptor in the GDT or LDT. LSL performs a set of preliminary access-rights
checks and, if successful, loads the segment-descriptor limit field into the destination register.
Software can use the limit value in comparisons with pointer offsets to prevent segment limit
violations.
VERR and VERW Instructions. The verify read-rights (VERR) and verify write-rights (VERW) can
be used to determine if a target code or data segment (not a system segment) can be read or written
from the current privilege level (CPL). The source operand for these instructions is a pointer to the
segment selector to be tested. If the tested segment (code or data) is readable from the current CPL, the
VERR instruction sets RFLAGS.ZF to 1; otherwise, it is cleared to zero. Likewise, if the tested data
segment is writable, the VERW instruction sets the RFLAGS.ZF to 1. A code segment cannot be tested
for writability.
ARPL Instruction. The adjust RPL-field (ARPL) instruction can be used by system software to
prevent access into privileged-data segments by lower-privileged software. This can happen if an
application passes a selector to system software and the selector RPL is less than (has greater privilege
than) the calling-application CPL. To prevent this surrogate access, system software executes ARPL
with the following operands:
• The destination operand is the data-segment selector passed to system software by the application.
• The source operand is the application code-segment selector (available on the system-software
stack as a result of the CALL into system software by the application).
ARPL is not supported in 64-bit mode.
WBINVD Instruction. The writeback and invalidate (WBINVD) instruction is used to write all
modified cache lines to memory so that memory contains the most recent copy of data. After the writes
are complete, the instruction invalidates all cache lines. This instruction operates on all caches in the
memory hierarchy, including caches that are external to the processor.
INVD Instruction. The invalidate (INVD) instruction is used to invalidate all cache lines in all caches
in the memory hierarchy. Unlike the WBINVD instruction, no modified cache lines are written to
memory. The INVD instruction should only be used in situations where memory coherency is not
required.
INVLPG Instruction. The invalidate TLB entry (INVLPG) instruction can be used to invalidate
specific entries within the TLB. The source operand is a virtual-memory address that specifies the
TLB entry to be invalidated. Invalidating a TLB entry does not remove the associated page-table entry
from the data cache. See “Translation-Lookaside Buffer (TLB)” on page 139 for more information.
7 Memory System
This chapter describes:
• Cache coherency mechanisms
• Cache control mechanisms
• Memory typing
• Memory mapped I/O
• Memory ordering rules
• Serializing instructions
Figure 7-1 on page 160 shows a conceptual picture of a processor and memory system, and how data
and instructions flow between the various components. This diagram is not intended to represent a
specific microarchitectural implementation but instead is used to illustrate the major memory-system
components covered by this chapter.
Main Memory
L2 Cache
Write-Combining
L1 L1 Buffers
Instruction Cache Data Cache
Write Buffers
Load/Store Unit
Execution Units
Processor Chip
513-211.eps
The memory-system components described in this chapter are shown as unshaded boxes in Figure 7-1.
Those items are summarized in the following paragraphs.
Main memory is external to the processor chip and is the memory-hierarchy level farthest from the
processor execution units.
Caches are the memory-hierarchy levels closest to the processor execution units. They are much
smaller and much faster than main memory, and can be either internal or external to the processor chip.
Caches contain copies of the most frequently used instructions and data. By allowing fast access to
frequently used data, software can run much faster than if it had to access that data from main memory.
Figure 7-1 shows three caches, all internal to the processor:
• L1 Data Cache—The L1 (level-1) data cache holds the data most recently read or written by the
software running on the processor.
• L1 Instruction Cache—The L1 instruction cache is similar to the L1 data cache except that it holds
only the instructions executed most frequently. In some processor implementations, the L1
instruction cache can be combined with the L1 data cache to form a unified L1 cache.
• L2 Cache—The L2 (level-2) cache is usually several times larger than the L1 caches, but it is also
slower. It is common for L2 caches to be implemented as a unified cache containing both
instructions and data. Recently used instructions and data that do not fit within the L1 caches can
reside in the L2 cache. The L2 cache can be exclusive, meaning it does not cache information
contained in the L1 cache. Conversely, inclusive L2 caches contain a copy of the L1-cached
information.
Memory-read operations from cacheable memory first check the cache to see if the requested
information is available. A read hit occurs if the information is available in the cache, and a read miss
occurs if the information is not available. Likewise, a write hit occurs if the memory write can be
stored in the cache, and a write miss occurs if it cannot be stored in the cache.
Caches are divided into fixed-size blocks called cache lines. The cache allocates lines to correspond to
regions in memory of the same size as the cache line, aligned on an address boundary equal to the
cache-line size. For example, in a cache with 32-byte lines, the cache lines are aligned on 32-byte
boundaries and byte addresses 0007h and 001Eh are both located in the same cache line. The size of a
cache line is implementation dependent. Most implementations have either 32-byte or 64-byte cache
lines.
The process of loading data into a cache is a cache-line fill. Even if only a single byte is requested, all
bytes in a cache line are loaded from memory. Typically, a cache-line fill must remove (evict) an
existing cache line to make room for the new line loaded from memory. This process is called cache-
line replacement. If the existing cache line was modified before the replacement, the processor
performs a cache-line writeback to main memory when it performs the cache-line fill.
Cache-line writebacks help maintain coherency (consistency) between the caches and main memory.
Internally, the processor can also maintain cache coherency by internally probing (checking) the other
caches and write buffers for a more recent version of the requested data. External devices can also
check processor caches for more recent versions of data by externally probing the processor.
Throughout this document, the term probe is used to refer to external probes, while internal probes are
always qualified with the word internal.
Write buffers temporarily hold data writes when main memory or the caches are busy with other
memory accesses. The existence of write buffers is implementation dependent.
Implementations of the architecture can use write-combining buffers if the order and size of non-
cacheable writes to main memory is not important to the operation of software. These buffers can
combine multiple, individual writes to main memory and transfer the data in fewer bus transactions.
read instruction requires the result of the write instruction for proper software operation. For
cacheable memory types, the write data can be forwarded to the read instruction before it is
actually written to memory.
Processor 0 Processor 1
Store A ← 1 Load B
Store B ← 1 Load A
Load A cannot read 0 when Load B reads 1. (This rule may be violated in the case of loads as
part of a string operation, in which one iteration of the string reads 0 for Load A while another
iteration reads 1 for Load B.)
- Stores do not pass loads
Processor 0 Processor 1
Load A Load B
Store B ← 1 Store A ← 1
Load A and Load B cannot both read 1.
• Stores from a processor appear to be committed to the memory system in program order; however,
stores can be delayed arbitrarily by store buffering while the processor continues operation. For the
code example below, both load A in processor 1 and load B in processor 0 can read 1 from the first
store in each processor. Therefore, stores from a processor may not appear to be sequentially
consistent.
Processor 0 Processor 1
Store A ← 1 Store B ← 1
… …
Store A ← 2 Store B ← 2
… …
Load B Load A
Processor 0 Processor 1
Store A ← 1 Store B ← 1
Load B Load A
All combinations of Load A and Load B values are allowed. Where sequential consistency is
needed (for example in Dekker’s algorithm for mutual exclusion), an MFENCE instruction
should be used between the store and the subsequent load, or a locked access, such as LOCK
XCHG, should be used for the store.
Processor 0 Processor 1
Store A ← 1 Store B ← 1
MFENCE MFENCE
Load B Load A
• There can be different points of visibility for a memory operation, including local (within a
processor), non-local (within a subset of processors) and global (across the system). Using a data
bypass, a local load can read the result of a local store in a store buffer, before the store becomes
non-locally visible. Program order is still maintained when using such bypasses.
Processor 0 Processor 1
Store A ← 1 Store B ← 1
Load r1 A Load r3 B
Load r2 B Load r4 A
Load A in processor 0 can read 1 using the data bypass, while Load A in processor 1 can read 0.
Similarly, Load B in processor 1 can read 1 while Load B in processor 0 can read 0. Therefore
result r1 = 1, r2 = 0, r3 = 1 and r4 = 0 is allowed. There are no constraints on the relative order of
Store A and Load A in processor 0, and store B and Load B in processor 1.
If a very strong memory ordering model is required that does not allow local store-load bypasses,
an MFENCE instruction should be used between the store and the subsequent load or a
synchronizing instruction such as LOCK XCHG should be used for the store. This memory
ordering is stronger than total store ordering.
Processor 0 Processor 1
Store A ← 1 Store B ← 1
MFENCE MFENCE
Load r1 A Load r3 B
Load r2 B Load r4 A
The MFENCE instruction ensures that any buffered stores are globally visible before the loads are
allowed to execute, so the result r1 = 1, r2 = 0, r3 = 1 and r4 = 0 is not allowed. Similarly, a LOCK
XCHG would ensure the loads don't execute until its store operation is globally visible.
rite Hit
Write
Probe W
Read M
Pro
Hit
be
Wr Wr
ite ite
Mi Hit
ss
(W
Bm
Pro
em
be
it ory
dH )
Wri
ea
eR
te
rob
Hit
To maintain memory coherency, external bus masters (typically other processors with their own
internal caches) need to acquire the most recent copy of data before caching it internally. That copy can
be in main memory or in the internal caches of other bus-mastering devices. When an external master
has a cache read-miss or write-miss, it probes the other mastering devices to determine whether the
most recent copy of data is held in any of their caches. If one of the other mastering devices holds the
most recent copy, it provides it to the requesting device. Otherwise, the most recent copy is provided
by main memory.
To prevent this problem, software must use an INVLPG or MOV CR3 instruction immediately after
the page-table update to ensure that subsequent instruction fetches and data accesses use the correct
virtual-page-to-physical-page translation. It is not necessary to perform a TLB invalidation operation
preceding the table update.
All writes to WT memory update main memory, and writes that hit in the cache update the cache
line (cache lines remain in the same state after a write that hits a cache line). Writes that miss the
cache do not allocate a cache line. Write buffering of WT memory is allowed.
• Writeback (WB)—Reads from WB memory are cacheable and allocate cache lines on a read miss.
Cache lines can be allocated in the shared, exclusive, or modified states. Reads from WB memory
can be speculative.
All writes that hit in the cache update the cache line and place the cache line in the modified state.
Writes that miss the cache allocate a new cache line and place the cache line in the modified state.
Writes to main memory only take place during writeback operations. Write buffering of WB
memory is allowed.
The WB memory type provides the highest-possible performance and is useful for most software
and data stored in system memory (DRAM).
Table 7-1 shows the memory access ordering possible for each memory type supported by the AMD64
architecture. Table 7-3 on page 173 shows the ordering behavior of various operations on various
memory types in greater detail. Table 7-2 on page 172 shows the caching policy for the same memory
types.
Serialize instructions/
Interrupts/Exceptions
Store (wp, wt, wb)
Load (wp, wt, wb)
Load/Store (io)
Lock (atomic)
Store (uc)
Load (uc)
Store
First Memory Operation
Set 0
Set 1
Set 2 Line Data 0,2 Line Data 1,2 Line Data n-1,2
Set 3
. . .
Set m-1
= = . . . =
Hit
Hit Hit
MUX n:1
Data
Cache
Hit Data
Physical Address
Tag Field Index Field Offset Field
513-213.eps
As shown in Figure 7-3, the cache is organized as an array of cache lines. Each cache line consists of
three parts: a cache-data line (a fixed-size copy of a memory block), a tag, and other information. Rows
of cache lines in the cache array are sets, and columns of cache lines are ways. In an n-way set-
associative cache, each set is a collection of n lines. For example, in a four-way set-associative cache,
each set is a collection of four cache lines, one from each way.
The cache is accessed using the physical address of the data or instruction being referenced. To access
data within a cache line, the physical address is used to select the set, way, and byte from the cache.
This is accomplished by dividing the physical address into the following three fields:
• Index—The index field selects the cache set (row) to be examined for a hit. All cache lines within
the set (one from each way) are selected by the index field.
• Tag—The tag field is used to select a specific cache line from the cache set. The physical-address
tag field is compared with each cache-line tag in the set. If a match is found, a cache hit is
signalled, and the appropriate cache line is selected from the set. If a match is not found, a cache
miss is signalled.
• Offset—The offset field points to the first byte in the cache line corresponding to the memory
reference. The referenced data or instruction value is read from (or written to, in the case of
memory writes) the selected cache line starting at the location selected by the offset field.
In Figure 7-3 on page 177, the physical-address index field is shown selecting Set 2 from the cache.
The tag entry for each cache line in the set is compared with the physical-address tag field. The tag
entry for Way 1 matches the physical-address tag field, so the cache-line data for Set 2, Way 1 is
selected using the n:1 multiplexor. Finally, the physical-address offset field is used to point to the first
byte of the referenced data (or instruction) in the selected cache line.
Cache lines can contain other information in addition to the data and tags, as shown in Figure 7-3 on
page 177. MOESI state and the state bits associated with the cache-replacement algorithm are typical
pieces of information kept with the cache line. Instruction caches can also contain pre-decode or
branch-prediction information. The type of information stored with the cache line is implementation
dependent.
Self-Modifying Code. Software that writes into the code segment from which it was fetched is
classified as self-modifying code. To avoid cache-coherency problems due to self-modifying code, a
check is made during data writes to see whether the data-memory location corresponds to a code-
segment memory location. If it does, implementations of the AMD64 architecture invalidate the
corresponding instruction-cache line(s) during the data-memory write. Entries in the data cache are
not invalidated, and it is possible for the modified instruction to be cached by the data cache following
the memory write. A subsequent fetch of the modified instruction goes to main memory to get the
coherent version of the instruction. If the data cache holds the most recent copy of the instruction
rather than main memory, it provides that copy.
The processor determines whether a write is in a code segment by internally probing the instruction
cache and prefetched instructions. If the internal probe returns a hit, the instruction-cache line and
prefetched instructions are invalidated. The internal probes into the instruction cache and prefetch
hardware are always performed using the physical address of an instruction in order to avoid potential
aliasing problems associated with using virtual (linear) addresses.
Software that stores into a code segment running simultaneously on another processor with the intent
that the other processor execute the written data as code is classified as cross-modifying code. To avoid
cache-coherency issues when using cross-modifying code, the processor doing the store should
provide synchronization between the processors using locked semaphores. Synchronization for cross-
modifying code is not required for code that resides within the naturally aligned quadword.
Cache Disable. Bit 30 of the CR0 register is the cache-disable bit, CR0.CD. Caching is enabled
when CR0.CD is cleared to 0, and caching is disabled when CR0.CD is set to 1. When caching is
disabled, reads and writes access main memory.
Software can disable the cache while the cache still holds valid data (or instructions). If a read or write
hits the L1 data cache or the L2 cache when CR0.CD=1, the processor does the following:
1. Writes the cache line back if it is in the modified or owned state.
2. Invalidates the cache line.
3. Performs a non-cacheable main-memory access to read or write the data.
If an instruction fetch hits the L1 instruction cache when CR0.CD=1, the processor reads the cached
instructions rather than access main memory.
The processor also responds to cache probes when CR0.CD=1. Probes that hit the cache cause the
processor to perform Step 1. Step 2 (cache-line invalidation) is performed only if the probe is
performed on behalf of a memory write or an exclusive read.
Writethrough Disable. Bit 29 of the CR0 register is the not writethrough disable bit, CR0.NW. In
early x86 processors, CR0.NW is used to control cache writethrough behavior, and the combination of
CR0.NW and CR0.CD determines the cache operating mode.
In early x86 processors, clearing CR0.NW to 0 enables writeback caching for main memory,
effectively disabling writethrough caching for main memory. When CR0.NW=0, software can disable
writeback caching for specific memory pages or regions by using other cache control mechanisms.
When software sets CR0.NW to 1, writeback caching is disabled for main memory, while
writethrough caching is enabled.
In implementations of the AMD64 architecture, CR0.NW is not used to qualify the cache operating
mode established by CR0.CD. Table 7-4 shows the effects of CR0.NW and CR0.CD on the AMD64
architecture cache-operating modes.
Page-Level Cache Disable. Bit 4 of all paging data-structure entries controls page-level cache
disable (PCD). When a data-structure-entry PCD bit is cleared to 0, the page table or physical page
pointed to by that entry is cacheable, as determined by the CR0.CD bit. When the PCD bit is set to 1,
the page table or physical page is not cacheable. The PCD bit in the paging data-structure base-register
(bit 4 in CR3) controls the cacheability of the highest-level page table in the page-translation hierarchy.
Page-Level Writethrough Enable. Bit 3 of all paging data-structure entries is the page-level
writethrough enable control (PWT). When a data-structure-entry PWT bit is cleared to 0, the page
table or physical page pointed to by that entry has a writeback caching policy. When the PWT bit is set
to 1, the page table or physical page has a writethrough caching policy. The PWT bit in the paging
data-structure base-register (bit 3 in CR3) controls the caching policy of the highest-level page table in
the page-translation hierarchy.
The corresponding PCD bit must be cleared to 0 (page caching enabled) for the PWT bit to have an
effect.
Memory Typing. Two mechanisms are provided for software to control access to and cacheability of
specific memory regions:
• The memory-type range registers (MTRRs) control cacheability based on physical addresses. See
“MTRRs” on page 184 for more information on the use of MTRRs.
• The page-attribute table (PAT) mechanism controls cacheability based on virtual addresses. PAT
extends the capabilities provided by the PCD and PWT page-level cache controls. See “Page-
Attribute Table Mechanism” on page 193 for more information on the use of the PAT mechanism.
System software can combine the use of both the MTRRs and PAT mechanisms to maximize control
over memory cacheability.
If the MTRRs are disabled in implementations that support the MTRR mechanism, the default
memory type is set to uncacheable (UC). Memory accesses are not cached even if the caches are
enabled by clearing CR0.CD to 0. Cacheable memory types must be established using the MTRRs in
order for memory accesses to be cached.
Cache Control Precedence. The cache-control mechanisms are used to define the memory type and
cacheability of main memory and regions of main memory. Taken together, the most restrictive
memory type takes precedence in defining the caching policy of memory. The order of precedence is:
1. Uncacheable (UC)
2. Write-combining (WC)
3. Write-protected (WP)
4. Writethrough (WT)
5. Writeback (WB)
For example, assume a large memory region is designated a writethrough type using the MTRRs.
Individual pages within that region can have caching disabled by setting the appropriate page-table
PCD bits. However, no pages within that region can have a writeback caching policy, regardless of the
page-table PWT values.
Data Prefetch. The prefetch instructions are used by software as a hint to the processor that the
referenced data is likely to be used in the near future. The processor can preload the cache line
containing the data in anticipation of its use. PREFETCH provides a hint that the data is to be read.
PREFETCHW provides a hint that the data is to be written. The processor can mark the line as
modified if it is preloaded using PREFETCHW.
Memory Ordering. Instructions are provided for software to enforce memory ordering (serialization)
in weakly-ordered memory types. These instructions are:
• SFENCE (store fence)—forces all memory writes (stores) preceding the SFENCE (in program
order) to be written into memory before memory writes following the SFENCE.
• LFENCE (load fence)—forces all memory reads (loads) preceding the LFENCE (in program
order) to be read from memory before memory reads following the LFENCE.
• MFENCE (memory fence)—forces all memory accesses (reads and writes) preceding the
MFENCE (in program order) to be written into or read from memory before memory accesses
following the MFENCE.
Cache Line Flush. The CLFLUSH instruction (writeback, if modified, and invalidate) takes the byte
memory-address operand (a linear address), and checks to see if the address is cached. If the address is
cached, the entire cache line containing the address is invalidated. If any portion of the cache line is
dirty (in the modified or owned state), the entire line is written to main memory before it is invalidated.
CLFLUSH affects all caches in the memory hierarchy—internal and external to the processor. The
checking and invalidation process continues until the address has been invalidated in all caches.
In most cases, the underlying memory type assigned to the address has no effect on the behavior of this
instruction. However, when the underlying memory type for the address is UC or WC (as defined by
the MTRRs), the processor does not proceed with checking all caches to see if the address is cached. In
both cases, the address is uncacheable, and invalidation is unnecessary. Write-combining buffers are
written back to memory if the corresponding physical address falls within the buffer active-address
range.
Cache Writeback and Invalidate. Unlike the CLFLUSH instruction, the WBINVD instruction
operates on the entire cache, rather than a single cache line. The WBINVD instruction first writes back
all cache lines that are dirty (in the modified or owned state) to main memory. After writeback is
complete, the instruction invalidates all cache lines. The checking and invalidation process continues
until all internal caches are invalidated. A special bus cycle is transmitted to higher-level external
caches directing them to perform a writeback-and-invalidate operation.
Cache Invalidate. The INVD instruction is used to invalidate all cache lines. Unlike the WBINVD
instruction, dirty cache lines are not written to main memory. The process continues until all internal
caches have been invalidated. A special bus cycle is transmitted to higher-level external caches
directing them to perform an invalidation.
The INVD instruction should only be used in situations where memory coherency is not required.
If the MTRRs are disabled in implementations that support the MTRR mechanism, the default
memory type is set to uncacheable (UC). Memory accesses are not cached even if the caches are
enabled by clearing CR0.CD to 0. Cacheable memory types must be established using the MTRRs to
enable memory accesses to be cached.
7.7.2 MTRRs
Both fixed-size and variable-size address ranges are supported by the MTRR mechanism. The fixed-
size ranges are restricted to the lower 1 Mbyte of physical-address space, while the variable-size
ranges can be located anywhere in the physical-address space.
Figure 7-4 on page 185 shows an example mapping of physical memory using the fixed-size and
variable-size MTRRs. The areas shaded gray are not mapped by the MTRRs. Unmapped areas are set
to the software-selected default memory type.
Physical Memory
0_FFFF_FFFF_FFFFh
Up to 8 Variable Ranges
10_0000h
64 4-Kbyte Ranges 256 Kbytes 0F_FFFFh
MTRRs are 64-bit model-specific registers (MSRs). They are read using the RDMSR instruction and
written using the WRMSR instruction. See “Memory-Typing MSRs” on page 464 for a listing of the
MTRR MSR numbers. The following sections describe the types of MTRRs and their function.
Fixed-Range MTRRs. The fixed-range MTRRs are used to characterize the first 1 Mbyte of physical
memory. Each fixed-range MTRR contains eight type fields for characterizing a total of eight memory
ranges. Fixed-range MTRRs support extended type-field encodings as described in “Extended Fixed-
Range MTRR Type-Field Encodings” on page 197. The extended type field allows a fixed-range
MTRR to be used as a fixed-range IORR. Figure 7-5 on page 186 shows the format of a fixed-range
MTRR.
63 56 55 48 47 40 39 32
31 24 23 16 15 8 7 0
For the purposes of memory characterization, the first 1 Mbyte of physical memory is segmented into
a total of 88 non-overlapping memory ranges, as follows:
• The 512 Kbytes of memory spanning addresses 00_0000h to 07_FFFFh are segmented into eight
64-Kbyte ranges. A single MTRR is used to characterize this address space.
• The 256 Kbytes of memory spanning addresses 08_0000h to 0B_FFFFh are segmented into 16 16-
Kbyte ranges. Two MTRRs are used to characterize this address space.
• The 256 Kbytes of memory spanning addresses 0C_0000h to 0F_FFFFh are segmented into 64 4-
Kbyte ranges. Eight MTRRs are used to characterize this address space.
Table 7-6 shows the address ranges corresponding to the type fields within each fixed-range MTRR.
The gray-shaded heading boxes represent the bit ranges for each type field in a fixed-range MTTR. See
Table 7-5 on page 184 for the type-field encodings.
Variable-Range MTRRs. The variable-range MTRRs can be used to characterize any address range
within the physical-memory space, including all of physical memory. Up to eight address ranges of
varying sizes can be characterized using the MTRR. Two variable-range MTRRs are used to
characterize each address range: MTRRphysBasen and MTRRphysMaskn (n is the address-range
number from 0 to 7). For example, address-range 3 is characterized using the MTRRphysBase3 and
MTRRphysMask3 register pair.
Figure 7-6 shows the format of the MTRRphysBasen register and Figure 7-7 on page 188 shows the
format of the MTRRphysMaskn register. The fields within the register pair are read/write.
MTRRphysBasen Registers. The fields in these variable-range MTRRs, shown in Figure 7-6, are:
• Type—Bits 7–0. The memory type used to characterize the memory range. See Table 7-5 on
page 184 for the type-field encodings. Variable-range MTRRs do not support the extended type-
field encodings.
• Range Physical Base-Address (PhysBase)—Bits 51–12. The memory-range base-address in
physical-address space. PhysBase is aligned on a 4-Kbyte (or greater) address in the 52-bit
physical-address space supported by the AMD64 architecture. PhysBase represents the most-
significant 40-address bits of the physical address. Physical-address bits 11–0 are assumed to be 0.
63 52 51 32
PhysBase
Reserved, MBZ
(This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 8 7 0
Reserved,
PhysBase Type
MBZ
MTRRphysMaskn Registers. The fields in these variable-range MTRRs, shown in Figure 7-7, are:
• Valid (V)—Bit 11. Indicates that the MTRR pair is valid (enabled) when set to 1. When the valid bit
is cleared to 0 the register pair is not used.
• Range Physical Mask (PhysMask)—Bits 51–12. The mask value used to specify the memory
range. Like PhysBase, PhysMask is aligned on a 4-Kbyte physical-address boundary. Bits 11–0 of
PhysMask are assumed to be 0.
63 52 51 32
PhysMask
Reserved, MBZ
(This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 10 0
PhysMask and PhysBase are used together to determine whether a target physical-address falls within
the specified address range. PhysMask is logically ANDed with PhysBase and separately ANDed with
the upper 40 bits of the target physical-address. If the results of the two operations are identical, the
target physical-address falls within the specified memory range. The pseudo-code for the operation is:
MaskBase = PhysMask AND PhysBase
MaskTarget = PhysMask AND Target_Address[51:12]
if MaskBase = MaskTarget
then Target_Address_In_Range
else Target_Address_Not_In_Range
Variable Range Size and Alignment. The size and alignment of variable memory-ranges (MTRRs)
and I/O ranges (IORRs) are restricted as follows:
• The boundary on which a variable range is aligned must be equal to the range size. For example, a
memory range of 16 Mbytes must be aligned on a 16-Mbyte boundary.
• The range size must be a power of 2 (2n, 52 > n > 11), with a minimum allowable size of 4 Kbytes.
For example, 4 Mbytes and 8 Mbytes are allowable memory range sizes, but 6 Mbytes is not
allowable.
PhysMask and PhysBase Values. Software can calculate the PhysMask value using the following
procedure:
1. Subtract the memory-range physical base-address from the upper physical-address of the memory
range.
2. Subtract the value calculated in Step 1 from the physical memory size.
3. Truncate the lower 12 bits of the result in Step 2 to create the PhysMask value to be loaded into
the MTRRphysMaskn register. Truncation is performed by right-shifting the value 12 bits.
For example, assume a 32-Mbyte memory range is specified within the 52-bit physical address space,
starting at address 200_0000h. The upper address of the range is 3FF_FFFFh. Following the process
outlined above yields:
1. 3FF_FFFFh–200_0000h = 1FF_FFFFh
2. F_FFFF_FFFF_FFFF–1FF_FFFFh = F_FFFF_FE00_0000h
3. Right shift (F_FFFF_FE00_0000h) by 12 = FF_FFFF_E000h
In this example, the 40-bit value loaded into the PhysMask field is FF_FFFF_E000h.
Software must also truncate the lower 12 bits of the physical base-address before loading it into the
PhysBase field. In the example above, the 40-bit PhysBase field is 00_0000_2000h.
Default-Range MTRRs. Physical addresses that are not within ranges established by fixed-range and
variable-range MTRRs are set to a default memory-type using the MTRRdefType register. The format
of this register is shown in Figure 7-8.
63 32
Reserved, MBZ
31 12 11 10 9 8 7 0
F Res,
Reserved, MBZ E Type
E MBZ
The fields within the MTRRdefType register are read/write. These fields are:
• Type—Bits 7–0. The default memory-type used to characterize physical-memory space. See
Table 7-5 on page 184 for the type-field encodings. The extended type-field encodings are not
supported by this register.
• Fixed-Range Enable (FE)—Bit 10. All fixed-range MTRRs are enabled when FE is set to 1.
Clearing FE to 0 disables all fixed-range MTRRs. Setting and clearing FE has no effect on the
variable-range MTRRs. The FE bit has no effect unless the E bit is set to 1 (see below).
• MTRR Enable (E)—Bit 11. This is the MTRR enable bit. All fixed-range and variable-range
MTRRs are enabled when E is set to 1. Clearing E to 0 disables all fixed-range and variable-range
MTRRs and sets the default memory-type to uncacheable (UC) regardless of the value of the Type
field.
Identifying MTRR Features. Software determines whether a processor supports the MTRR
mechanism by executing the CPUID instruction with either function 1 or function 8000_0001h. If
MTRRs are supported, bit 12 in the EDX register is set to 1 by CPUID. See “Processor Feature
Identification” on page 61 for more information on the CPUID instruction.
The MTRR capability register (MTRRcap) is a read-only register containing information describing
the level of MTRR support provided by the processor. Figure 7-9 shows the format of this register. If
MTRRs are supported, software can read MTRRcap using the RDMSR instruction. Attempting to
write to the MTRRcap register causes a general-protection exception (#GP).
63 32
Reserved
31 11 10 9 8 7 0
R F
W
Reserved e I VCNT
C
s X
Large Page Sizes. When paging is enabled, software can use large page sizes (2 Mbytes and
4 Mbytes) in addition to the more typical 4-Kbyte page size. When large page sizes are used, it is
possible for multiple MTRRs to span the memory range within a single large page. Each MTRR can
characterize the regions within the page with different memory types. If this occurs, the effective
memory-type used by the processor within the large page is undefined.
Software can avoid the undefined behavior in one of the following ways:
• Avoid using multiple MTRRs to characterize a single large page.
• Use multiple 4-Kbyte pages rather than a single large page.
• If multiple MTRRs must be used within a single large page, software can set the MTRR type fields
to the same value.
• If the multiple MTRRs must have different type-field values, software can set the large page PCD
and PWT bits to the most restrictive memory type defined by the multiple MTRRs.
Overlapping MTRR Registers. If the address ranges of two or more MTRRs overlap, the following
rules are applied to determine the memory type used to characterize the overlapping address range:
1. Fixed-range MTRRs, which characterize only the first 1 Mbyte of physical memory, have
precedence over variable-range MTRRs.
2. If two or more variable-range MTRRs overlap, the following rules apply:
a. If the memory types are identical, then that memory type is used.
b. If at least one of the memory types is UC, the UC memory type is used.
c. If at least one of the memory types is WT, and the only other memory type is WB, then the
WT memory type is used.
d. If the combination of memory types is not listed Steps A through C immediately above, then
the memory type used is undefined.
63 59 58 56 55 51 50 48 47 43 42 40 41 35 34 32
31 27 26 24 23 19 18 16 15 11 10 8 7 3 2 0
The PAT register contains eight page-attribute (PA) fields, numbered from PA0 to PA7. The PA fields
hold the encoding of a memory type, as found in Table 7-8 on page 194. The PAT type-encodings
match the MTRR type-encodings, with the exception that PAT adds the 07h encoding. The 07h
encoding corresponds to a UC- type. The UC- type (07h) is identical to the UC type (00h) except it can
be overridden by an MTRR type of WC.
Software can write any supported memory-type encoding into any of the eight PA fields. An attempt to
write anything but zeros into the reserved fields causes a general-protection exception (#GP). An
attempt to write an unsupported type encoding into a PA field also causes a #GP exception.
The PAT register fields are initiated at processor reset to the default values shown in Table 7-9 on
page 195.
Page-Translation Table Access. The PAT bit exists only in the PTE (4-K paging) or PDEs (2/4
Mbyte paging). In the remaining upper levels (PML4 PDP, 4K PDEs), only the PWT and PCD bits are
used to index into the first 4 entries in the PAT register. The resulting memory type is used for the next
lower paging level.
7 5 4 3 2 0
These extensions are enabled using the following bits in the SYSCFG MSR:
• MtrrFixDramEn—Bit 18. When set to 1, RdMem and WrMem attributes are enabled. When
cleared to 0, these attributes are disabled. When disabled, accesses are directed to memory-mapped
I/O space.
• MtrrFixDramModEn—Bit 19. When set to 1, software can read and write the RdMem and
WrMem bits. When cleared to 0, writes do not modify the RdMem and WrMem bits, and reads
return 0.
To use the MTRR extensions, system software must first set MtrrFixDramModEn=1 to allow
modification to the RdMem and WrMem bits. After the attribute bits are properly initialized in the
fixed-range registers, the extensions can be enabled by setting MtrrFixDramEn=1.
RdMem and WrMem allow the processor to independently direct reads and writes to either system
memory or memory-mapped I/O. The RdMem and WrMem controls are particularly useful when
shadowing ROM devices located in memory-mapped I/O space. It is often useful to shadow such
devices in RAM system memory to improve access performance, but writes into the RAM location can
corrupt the shadowed ROM information. The MTRR extensions solve this problem. System software
can create the shadow location by setting WrMem = 1 and RdMem = 0 for the specified memory range
and then copy the ROM location into itself. Reads are directed to the memory-mapped ROM, but
writes go to the same physical addresses in system memory. After the copy is complete, system
software can change the bit values to WrMem = 0 and RdMem = 1. Now reads are directed to the faster
copy located in system memory, and writes are directed to memory-mapped ROM. The ROM responds
as it would normally to a write, which is to ignore it.
Not all combinations of RdMem and WrMem are supported for each memory type encoded by bits
2–0. Table 7-11 on page 199 shows the allowable combinations. The behavior of reserved encoding
combinations (shown as gray-shaded cells) is undefined and results in unpredictable behavior.
7.9.2 IORRs
The IORRs operate similarly to the variable-range MTRRs. The IORRs specify whether reads and
writes in any physical-address range map to system memory or memory-mapped I/O. Up to two
address ranges of varying sizes can be controlled using the IORRs. A pair of IORRs are used to control
each address range: IORRBasen and IORRMaskn (n is the address-range number from 0 to 1).
Figure 7-12 on page 200 shows the format of the IORRBasen registers and Figure 7-13 on page 201
shows the format of the IORRMaskn registers. The fields within the register pair are read/write.
The intersection of the IORR range with the equivalent effective MTRR range follows the same type
encoding table (Table 7-11) as the fixed-range MTRR, where the RdMem/WrMem and memory type
are directly tied together.
• RdMem—Bit 4. When set to 1, the processor directs read requests for this physical address range to
system memory. When cleared to 0, reads are directed to memory-mapped I/O.
• Range Physical-Base-Address (PhysBase)—Bits 51–12. The memory-range base-address in
physical-address space. PhysBase is aligned on a 4-Kbyte (or greater) address in the 52-bit
physical-address space supported by the AMD64 architecture. PhysBase represents the most-
significant 40-address bits of the physical address. Physical-address bits 11–0 are assumed to be 0.
The format of these registers is shown in Figure 7-12.
63 52 51 32
PhysBase
Reserved, MBZ
(This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 5 4 3 0
R W Reserved,
PhysBase Reserved, MBZ
d r MBZ
63 52 51 32
PhysMask
Reserved, MBZ
(This is an architectural limit. A given implementation may support fewer bits.)
31 12 11 10 0
The operation of the PhysMask and PhysBase fields is identical to that of the variable-range MTRRs.
See page 188 for a description of this operation.
The intersection of the top-of-memory range with the equivalent effective MTRR range follows the
same type encoding table (Table 7-11 on page 199) as the fixed-range MTRR, where the
RdMem/WrMem and memory type are directly tied together.
Physical Memory
Maximum System Memory
Memory-Mapped
I/O
TOP_MEM2
TOP_MEM2 - 1
System Memory
4GB
4GB - 1
Memory-Mapped
I/O
TOP_MEM
TOP_MEM - 1
System Memory
513-269.eps
Figure 7-15 shows the format of the TOP_MEM and TOP_MEM2 registers. Bits 51–23 specify an 8-
Mbyte aligned physical address. All remaining bits are reserved and ignored by the processor. System
software should clear those bits to zero to maintain compatibility with possible future extensions to the
registers. The TOP_MEM registers are model-specific registers. See “Memory-Typing MSRs” on
page 464 for information on the MSR address and reset values for these registers.
63 52 51 32
Top-of-Memory Physical Address
Reserved, IGN
(This is an architectural limit. A given implementation may support fewer bits.)
31 23 22 0
The TOP_MEM register is enabled by setting the MtrrVarDramEn bit in the SYSCFG MSR (bit 20) to
1. The TOP_MEM2 register is enabled by setting the MtrrTom2En bit in the SYSCFG MSR (bit 21) to
1. The registers are disabled when their respective enable bits are cleared to 0. When the top-of-
memory registers are disabled, memory accesses default to memory-mapped I/O space.
8.1.1 Precision
Precision describes how the exception is related to the interrupted program:
• Precise exceptions are reported on a predictable instruction boundary. This boundary is generally
the first instruction that has not completed when the event occurs. All previous instructions (in
program order) are allowed to complete before transferring control to the event handler. The
pointer to the instruction boundary is saved automatically by the processor. When the event handler
completes execution, it returns to the interrupted program and restarts execution at the interrupted-
instruction boundary.
• Imprecise exceptions are not guaranteed to be reported on a predictable instruction boundary. The
boundary can be any instruction that has not completed when the interrupt event occurs. Imprecise
events can be considered asynchronous, because the source of the interrupt is not necessarily
related to the interrupted instruction. Imprecise exception and interrupt handlers typically collect
machine-state information related to the interrupting event for reporting through system-diagnostic
software. The interrupted program is not restartable.
General Masking Capabilities. Software can mask the occurrence of certain exceptions and
interrupts. Masking can delay or even prevent triggering of the exception-handling or interrupt-
handling mechanism when an interrupt-event occurs. External interrupts are classified as maskable or
nonmaskable:
• Maskable interrupts trigger the interrupt-handling mechanism only when RFLAGS.IF=1.
Otherwise they are held pending for as long as the RFLAGS.IF bit is cleared to 0.
• Nonmaskable interrupts (NMI) are unaffected by the value of the rFLAGS.IF bit. However, the
occurrence of an NMI masks further NMIs until an IRET instruction is executed.
Masking During Stack Switches. The processor delays recognition of maskable external interrupts
and debug exceptions during certain instruction sequences that are often used by software to switch
stacks. The typical programming sequence used to switch stacks is:
1. Load a stack selector into the SS register.
2. Load a stack offset into the ESP register.
If an interrupting event occurs after the selector is loaded but before the stack offset is loaded, the
interrupted-program stack pointer is invalid during execution of the interrupt handler.
To prevent interrupts from causing stack-pointer problems, the processor does not allow external
interrupts or debug exceptions to occur until the instruction immediately following the MOV SS or
POP SS instruction completes execution.
The recommended method of performing this sequence is to use the LSS instruction. LSS loads both
SS and ESP, and the instruction inhibits interrupts until both registers are updated successfully.
8.2 Vectors
Specific exception and interrupt sources are assigned a fixed vector-identification number (also called
an “interrupt vector” or simply “vector”). The interrupt vector is used by the interrupt-handling
mechanism to locate the system-software service routine assigned to the exception or interrupt. Up to
256 unique interrupt vectors are available. The first 32 vectors are reserved for predefined exception
and interrupt conditions. Software-interrupt sources can trigger an interrupt using any available
interrupt vector.
Table 8-1 on page 209 lists the supported interrupt-vector numbers, the corresponding exception or
interrupt name, the mnemonic, the source of the interrupt event, and a summary of the possible causes.
Table 8-2 on page 210 shows how each interrupt vector is classified. Reserved interrupt vectors are
indicated by the gray-shaded rows.
The following sections describe each interrupt in detail. The format of the error code reported by each
interrupt is described in “Error Codes” on page 224.
Program Restart. #DE is a fault-type exception. The saved instruction pointer points to the
instruction that caused the #DE.
Error Code Returned. None. #DB information is returned in the debug-status register, DR6.
Program Restart. #DB can be either a fault-type or trap-type exception. In the following cases, the
saved instruction pointer points to the instruction that caused the #DB:
• Instruction execution.
• Invalid debug-register access, or general detect.
In all other cases, the instruction that caused the #DB is completed, and the saved instruction pointer
points to the instruction after the one that caused the #DB.
The RFLAGS.RF bit can be used to restart an instruction following an instruction breakpoint resulting
in a #DB. In most cases, the processor clears RFLAGS.RF to 0 after every instruction is successfully
executed. However, in the case of the IRET, JMP, CALL, and INTn (through a task gate) instructions,
RFLAGS.RF is not cleared to 0 until the next instruction successfully executes.
When a non-debug exception occurs (or when a string instruction is interrupted), the processor
normally sets RFLAGS.RF to 1 in the RFLAGS image that is pushed on the interrupt stack. A
subsequent IRET back to the interrupted program pops the RFLAGS image off the stack and into the
RFLAGS register, with RFLAGS.RF=1. The interrupted instruction executes without causing an
instruction breakpoint, after which the processor clears RFLAGS.RF to 0.
However, when a #DB exception occurs, the processor clears RFLAGS.RF to 0 in the RFLAGS image
that is pushed on the interrupt stack. The #DB handler has two options:
• Disable the instruction breakpoint completely.
• Set RFLAGS.RF to 1 in the interrupt-stack rFLAGS image. The instruction breakpoint condition is
ignored immediately after the IRET, but reoccurs if the instruction address is accessed later, as can
occur in a program loop.
Program Restart. NMI is an interrupt. The processor recognizes an NMI at an instruction boundary.
The saved instruction pointer points to the instruction immediately following the boundary where the
NMI was recognized.
Masking. NMI cannot be masked. However, when an NMI is recognized by the processor,
recognition of subsequent NMIs are disabled until an IRET instruction is executed.
Program Restart. #BP is a trap-type exception. The saved instruction pointer points to the byte after
the INT3 instruction. This location can be the start of the next instruction. However, if the INT3 is used
to replace the first opcode bytes of an instruction, the restart location is likely to be in the middle of an
instruction. In the latter case, the debug software must replace the INT3 byte with the correct
instruction byte. The saved RIP instruction pointer must then be decremented by one before returning
to the interrupted program. This allows the program to be restarted correctly on the interrupted-
instruction boundary.
Program Restart. #OF is a trap-type exception. The saved instruction pointer points to the
instruction following the INTO instruction that caused the #OF.
Program Restart. #BR is a fault-type exception. The saved instruction pointer points to the BOUND
instruction that caused the #BR.
• Execution of any 128-bit media instruction (uses XMM registers), or 64-bit media instruction
(uses MMX™ registers) when CR0.EM = 1.
• Execution of any 128-bit media floating-point instruction (uses XMM registers) that causes a
numeric exception when CR4.OSXMMEXCPT = 0.
• Use of the DR4 or DR5 debug registers when CR4.DE = 1.
• Execution of RSM when not in SMM mode.
See the specific instruction description (in the other volumes) for additional information on invalid
conditions.
#UD cannot be disabled.
Program Restart. #UD is a fault-type exception. The saved instruction pointer points to the
instruction that caused the #UD.
Program Restart. #NM is a fault-type exception. The saved instruction pointer points to the
instruction that caused the #NM.
If a third interrupting event occurs while transferring control to the #DF handler, the processor shuts
down. Only an NMI, RESET, or INIT can restart the processor in this case. However, if the processor
shuts down as it is executing an NMI handler, the processor can only be restarted with RESET or INIT.
#DF cannot be disabled.
Program Restart. #DF is an abort-type exception. The saved instruction pointer is undefined, and the
program cannot be restarted.
Error Code Returned. See Table 8-4 for a list of error codes returned by the #TS exception.
Program Restart. #TS is a fault-type exception. If the exception occurs before loading the segment
selectors from the TSS, the saved instruction pointer points to the instruction that caused the #TS.
However, most #TS conditions occur due to errors with the loaded segment selectors. When an error is
found with a segment selector, the hardware task-switch mechanism completes loading the new task
state from the TSS, and then triggers the #TS exception mechanism. In this case, the saved instruction
pointer points to the first instruction in the new task.
In long mode, a #TS cannot be caused by a task switch, because the hardware task-switch mechanism
is disabled. A #TS occurs only as a result of a control transfer through a gate descriptor that results in
an invalid stack-segment reference using an SS selector in the TSS. In this case, the saved instruction
pointer always points to the control-transfer instruction that caused the #TS.
Error Code Returned. The segment-selector index of the segment descriptor causing the #NP
exception.
Program Restart. #NP is a fault-type exception. In most cases, the saved instruction pointer points to
the instruction that loaded the segment selector resulting in the #NP. See “Exceptions During a Task
Switch” on page 224 for a description of the consequences when this exception occurs during a task
switch.
Error Code Returned. The error code depends on the cause of the #SS, as shown in Table 8-5 on
page 218:
Program Restart. #SS is a fault-type exception. In most cases, the saved instruction pointer points to
the instruction that caused the #SS. See “Exceptions During a Task Switch” on page 224 for a
description of the consequences when this exception occurs during a task switch.
Error Code Returned. As shown in Table 8-6, a selector index is reported as the error code if the
#GP is due to a segment-descriptor access. In all other cases, an error code of 0 is returned.
Program Restart. #GP is a fault-type exception. In most cases, the saved instruction pointer points to
the instruction that caused the #GP. See “Exceptions During a Task Switch” on page 224 for a
description of the consequences when this exception occurs during a task switch.
CR2 Register. The virtual (linear) address that caused the #PF is stored in the CR2 register. The
legacy CR2 register is 32 bits long. The CR2 register in the AMD64 architecture is 64 bits long, as
shown in Figure 8-1 on page 220. In AMD64 implementations, when either software or a page fault
causes a write to the CR2 register, only the low-order 32 bits of CR2 are used in legacy mode; the
processor clears the high-order 32 bits.
63 0
Error Code Returned. The page-fault error code is pushed onto the page-fault exception-handler
stack. See “Page-Fault Error Code” on page 225 for a description of this error code.
Program Restart. #PF is a fault-type exception. In most cases, the saved instruction pointer points to
the instruction that caused the #PF. See “Exceptions During a Task Switch” on page 224 for a
description of what can happen if this exception occurs during a task switch.
Error Code Returned. None. Exception information is provided by the x87 status-word register. See
“x87 Floating-Point Programming” in Volume 1 for more information on using this register.
Program Restart. #MF is a fault-type exception. The #MF exception is not precise, because multiple
instructions and exceptions can occur before the #MF handler is invoked. Also, the saved instruction
pointer does not point to the instruction that caused the exception resulting in the #MF. Instead, the
saved instruction pointer points to the x87 floating-point instruction or FWAIT/WAIT instruction that
is about to be executed when the #MF occurs. The address of the last instruction that caused an x87
floating-point exception is in the x87 instruction-pointer register. See “x87 Floating-Point
Programming” in Volume 1 for information on accessing this register.
Masking. Each type of x87 floating-point exception can be masked by setting the appropriate bits in
the x87 control-word register. See “x87 Floating-Point Programming” in Volume 1 for more
information on using this register.
#MF can also be disabled by clearing the CR0.NE bit to 0. See “Numeric Error (NE) Bit” on page 44
for more information on using this bit.
Program Restart. #AC is a fault-type exception. The saved instruction pointer points to the
instruction that caused the #AC.
Error Code Returned. None. Error information is provided by model-specific registers (MSRs)
defined by the machine-check architecture.
Program Restart. #MC is an abort-type exception. There is no reliable way to restart the program. If
the EIPV flag (EIP valid) is set in the MCG_Status MSR, the saved CS and rIP point to the instruction
that caused the error. If EIP is clear, the CS:rIP of the instruction causing the failure is not known or the
machine check is not related to a specific instruction.
The CR4.OSXMMEXCPT bit specifies the interrupt vector to be taken when an unmasked 128-bit
media floating-point exception occurs. When CR4.OSXMMEXCPT=1, the #XF interrupt vector is
taken when an exception occurs. When CR4.OSXMMEXCPT=0, the #UD (undefined opcode)
interrupt vector is taken when an exception occurs.
The 128-bit media floating-point exceptions reported by the #XF exception are (including
mnemonics):
• IE—Invalid-operation exception (also called #I).
• DE—Denormalized-operand exception (also called #D).
• ZE—Zero-divide exception (also called #Z).
• OE—Overflow exception (also called #O).
• UE—Underflow exception (also called #U).
• PE—Precision exception (also called #P or inexact-result exception).
Each type of 128-bit media floating-point exception can be masked by setting the appropriate bits in
the MXCSR register. #XF can also be disabled by clearing the CR4.OSXMMEXCPT bit to 0.
Error Code Returned. None. Exception information is provided by the 128-bit media floating-point
MXCSR register. See “128-Bit Media and Scientific Programming” in Volume 1 for more information
on using this register.
Program Restart. #XF is a fault-type exception. Unlike the #MF exception, the #XF exception is
precise. The saved instruction pointer points to the instruction that caused the #XF.
Program Restart. The saved instruction pointer depends on the interrupt source:
• External interrupts are recognized on instruction boundaries. The saved instruction pointer points
to the instruction immediately following the boundary where the external interrupt was
recognized.
• If the interrupt occurs as a result of executing the INTn instruction, the saved instruction pointer
points to the instruction after the INTn.
Masking. The ability to mask user-defined interrupts depends on the interrupt source:
• External interrupts can be masked using the rFLAGS.IF bit. Setting rFLAGS.IF to 1 enables
external interrupts, while clearing rFLAGS.IF to 0 inhibits them.
• Software interrupts (initiated by the INTn instruction) cannot be disabled.
31 16 15 3 2 1 0
I E
T
Reserved Selector Index D X
I
T T
descriptor in either the global-descriptor table (GDT) or local-descriptor table (LDT), as indicated
by the TI bit.
• TI—Bit 2. If this bit is set to 1, the error-code selector-index field references a descriptor in the
LDT. If cleared to 0, the selector-index field references a descriptor in the GDT. The TI bit is
relevant only when the IDT bit is cleared to 0.
• Selector Index—Bits 15–3. The selector-index field specifies the index into either the GDT, LDT,
or IDT, as specified by the IDT and TI bits.
Some exceptions return a zero in the selector-error code.
31 4 3 2 1 0
R U R
Reserved I/D S / / P
V S W
8.5 Priorities
To allow for consistent handling of multiple-interrupt conditions, simultaneous interrupts are
prioritized by the processor. The AMD64 architecture defines priorities between groups of interrupts,
and interrupt prioritization within a group is implementation dependent. Table 8-8 shows the interrupt
priorities defined by the AMD64 architecture.
When simultaneous interrupts occur, the processor transfers control to the highest-priority interrupt
handler. Lower-priority interrupts from external sources are held pending by the processor, and they
are handled after the higher-priority interrupt is handled. Lower-priority interrupts that result from
internal sources are discarded. Those interrupts reoccur when the high-priority interrupt handler
completes and transfers control back to the interrupted instruction. Software interrupts are discarded as
well, and reoccur when the software-interrupt instruction is restarted.
• Unmasked exceptions are reported in the appropriate floating-point status register, and a software-
interrupt handler is invoked. See “#MF—x87 Floating-Point Exception-Pending (Vector 16)” on
page 220 and “#XF—SIMD Floating-Point Exception (Vector 19)” on page 223 for more
information on the floating-point interrupts.
• Masked exceptions are also reported in the appropriate floating-point status register. Instead of
transferring control to an interrupt handler, however, the processor handles the exception in a
default manner and execution proceeds.
If the processor detects more than one exception while executing a single floating-point instruction, it
prioritizes the exceptions in a predictable manner. When responding in a default manner to masked
exceptions, it is possible that the processor acts only on the high-priority exception and ignores lower-
priority exceptions. In the case of vector (SIMD) floating-point instructions, priorities are set on sub-
operations, not across all operations. For example, if the processor detects and acts on a QNaN operand
in one sub-operation, the processor can still detect and act on a denormal operand in another sub-
operation.
When reporting 128-bit media floating-point exceptions before taking an interrupt or handling them in
a default manner, the processor first classifies the exceptions as follows:
• Input exceptions include SNaN operand (#I), invalid operation (#I), denormal operand (#D), or
zero-divide (#Z). Using a NaN operand with a maximum, minimum, compare, or convert
instruction is also considered an input exception.
• Output exceptions include numeric overflow (#O), numeric underflow (#U), and precision (#P).
Using the above classification, the processor applies the following procedure to report the exceptions:
1. The exceptions for all sub-operations are prioritized.
2. The exception conditions for all sub-operations are logically ORed together to form a single set of
exceptions covering all operations. For example, if two sub-operations produce a denormal result,
only one denormal exception is reported.
3. If the set of exceptions includes any unmasked input exceptions, all input exceptions are reported
in MCXSR, and no output exceptions are reported. Otherwise, all input and output exceptions are
reported in MCXSR.
4. If any exceptions are unmasked, control is transferred to the appropriate interrupt handler.
Table 8-9 on page 228 lists the priorities for simultaneous floating-point exceptions.
63 4 3 0
Task Priority
Reserved, MBZ
(TPR)
System software can use the TPR register to temporarily block low-priority interrupts from
interrupting a high-priority task. This is accomplished by loading TPR with a value corresponding to
the highest-priority interrupt that is to be blocked. For example, loading TPR with a value of 9 (1001b)
blocks all interrupts with a priority class of 9 or less, while allowing all interrupts with a priority class
of 10 or more to be recognized. Loading TPR with 0 enables all external interrupts. Loading TPR with
15 (1111b) disables all external interrupts. The TPR is cleared to 0 on reset.
System software reads and writes the TPR using a MOV CR8 instruction. The MOV CR8 instruction
requires a privilege level of 0. Programs running at any other privilege level cannot read or write the
TPR, and an attempt to do so results in a general-protection exception (#GP).
A serializing instruction is not required after loading the TPR, because a new priority level is
established when the MOV instruction completes execution. For example, assume two sequential TPR
loads are performed, in which a low value is first loaded into TPR and immediately followed by a load
of a higher value. Any pending, lower-priority interrupt enabled by the first MOV CR8 is recognized
between the two MOVs.
The TPR is an architectural abstraction of the interrupt controller (IC), which prioritizes and manages
external interrupt delivery to the processor. The IC can be an external system device, or it can be
integrated on the chip like the local advanced programmable interrupt controller (APIC). Typically, the
IC contains a priority mechanism similar, if not identical to, the TPR. The IC, however, is
implementation dependent, and the underlying priority mechanisms are subject to change. The TPR,
by contrast, is part of the AMD64 architecture.
Effect of IC on TPR. The features of the implementation-specific IC can impact the operation of the
TPR. For example, the TPR might affect interrupt delivery only if the IC is enabled. Also, the mapping
of an external interrupt to a specific interrupt priority is an implementation-specific behavior of the IC.
Memory
Interrupt-Descriptor
Table
Interrupt Vector CS
Interrupt Handler
* + Offset
4
513-239.eps
Interrupt-Descriptor-Table Register
When an exception or interrupt occurs in real mode, the processor performs the following:
1. Pushes the FLAGS register (EFLAGS[15:0]) onto the stack.
2. Clears EFLAGS.IF to 0 and EFLAGS.TF to 0.
3. Saves the CS register and IP register (RIP[15:0]) by pushing them onto the stack.
4. Locates the interrupt-handler pointer (CS:IP) in the IDT by scaling the interrupt vector by four
and adding the result to the value in the IDTR.
5. Transfers control to the interrupt handler referenced by the CS:IP in the IDT.
Figure 8-6 on page 231 shows the stack after control is transferred to the interrupt handler in real
mode.
Interrupt-Handler and
Interrupted-Program
Stack
Return FLAGS +4
Return CS +2
Return IP SS:SP
513-243.eps
An IRET instruction is used to return to the interrupted program. When an IRET is executed, the
processor performs the following:
1. Pops the saved CS value off the stack and into the CS register. The saved IP value is popped into
RIP[15:0].
2. Pops the FLAGS value off of the stack and into EFLAGS[15:0].
3. Execution begins at the saved CS.IP location.
occurs. See Chapter 12, “Task Management,” for more information on the hardware task-switch
mechanism.
Interrupt
Descriptor Table
CS Selector DPL
Interrupt Vector
Code-Segment Offset
+ *
8
Interrupt-Descriptor-Table Register
Virtual-Address
Space
Global or Local
Descriptor Table
+ Interrupt Handler
CS Limit DPL
Code Segment
Code-Segment Base
513-240.eps
Interrupt-Handler and
Interrupted Program
Stack
With Error Code With No Error Code
513-242.eps
5. The processor handles the EFLAGS.IF bit based on the gate-descriptor type:
- If the gate descriptor is an interrupt gate, EFLAGS.IF is cleared to 0.
- If the gate descriptor is a trap gate, EFLAGS.IF is not modified.
6. Saves the return-address pointer (CS:EIP) by pushing it onto the stack. The CS value is padded
with two bytes to form a doubleword.
7. If the interrupt-vector number has an error code associated with it, the error code is pushed onto
the stack.
8. The CS register is loaded from the segment-selector field in the gate descriptor, and the EIP is
loaded from the offset field in the gate descriptor.
9. The interrupt handler begins executing with the instruction referenced by new CS:EIP.
Figure 8-9 shows the new stack after control is transferred to the interrupt handler.
Interrupt-Handler Stack
Return SS +20
Return ESP +16 Return SS +16
Return EFLAGS +12 Return ESP +12
Return CS +8 Return EFLAGS +8
Return EIP +4 Return CS +4
Error Code New SS:ESP Return EIP ESS:ESP
513-241.eps
2. The processor compares the CPL with the interrupt-handler code-segment DPL. For this check to
pass, the CPL must be numerically greater-than or equal-to the code-segment DPL. This check
prevents control transfers to less-privileged interrupt handlers.
Unlike call gates, no RPL comparison takes place. This is because the gate descriptor is referenced in
the IDT using the interrupt-vector number rather than a selector, and no RPL field exists in the
interrupt-vector number.
Exception and interrupt handlers should be made reachable from software running at any privilege
level that requires them. If the gate DPL value is too low (requiring more privilege), or the interrupt-
handler code-segment DPL is too high (runs at lower privilege), the interrupt control transfer can fail
the privilege checks. Setting the gate DPL=3 and interrupt-handler code-segment DPL=0 makes the
exception handler or interrupt handler reachable from any privilege level.
Figure 8-10 on page 237 shows two examples of interrupt privilege checks. In Example 1, both
privilege checks pass:
• The interrupt-gate DPL is at the lowest privilege (3), which means that software running at any
privilege level (CPL) can access the interrupt gate.
• The interrupt-handler code segment is at the highest-privilege level, as indicated by DPL=0. This
means software running at any privilege can enter the interrupt handler through the interrupt gate.
CS CPL=2
Code Descriptor
Example 1: Privilege Check Passes
CS CPL=2
Code Descriptor
Example 2: Privilege Check Fails 513-244.eps
• The interrupt handler has a lower privilege (DPL=3) than the currently-running software (CPL=2).
Transitions from more-privileged software to less-privileged software are not allowed, so this
privilege check fails as well.
Although both privilege checks fail, only one such failure is required to deny access to the interrupt
handler.
IRET, Same Privilege. Before performing the IRET, the stack pointer must point to the return EIP. If
there was an error code pushed onto the stack as a result of the exception or interrupt, that error code
should have been popped off the stack earlier by the handler. The IRET reverses the actions of the
interrupt mechanism:
1. Pops the return pointer off of the stack, loading both the CS register and EIP register (RIP[31:0])
with the saved values. The return code-segment RPL is read by the processor from the CS value
stored on the stack to determine that an equal-privilege control transfer is occurring.
2. Pops the saved EFLAGS image off of the stack and into the EFLAGS register.
3. Transfers control to the return program at the target CS:EIP.
IRET, Less Privilege. If an IRET changes privilege levels, the return program must be at a lower
privilege than the interrupt handler. The IRET in this case causes a stack switch to occur:
1. The return pointer is popped off of the stack, loading both the CS register and EIP register
(RIP[31:0]) with the saved values. The return code-segment RPL is read by the processor from the
CS value stored on the stack to determine that a lower-privilege control transfer is occurring.
2. The saved EFLAGS image is popped off of the stack and loaded into the EFLAGS register.
3. The return-program stack pointer is popped off of the stack, loading both the SS register and ESP
register (RSP[31:0]) with the saved values.
4. Control is transferred to the return program at the target CS:EIP.
• EFLAGS.IOPL—This field controls interrupt handling based on the CPL. See “I/O Privilege Level
Field (IOPL) Field” on page 52 for more information on this field.
Setting IOPL<3 redirects the interrupt to the general-protection exception (#GP) handler.
• CR4.VME—This bit enables virtual-mode extensions. See “Virtual-8086 Mode Extensions (VME)
Bit” on page 47 for more information on this bit.
• TSS Interrupt-Redirection Bitmap—The TSS interrupt-redirection bitmap contains 256 bits, one
for each possible INTn vector (software interrupt). When CR4.VME=1, the bitmap is used by the
processor to direct interrupts to the handler provided by the currently-running 8086 program
(bitmap entry is 0), or to the protected-mode operating-system interrupt handler (bitmap entry is
1). See “Legacy Task-State Segment” on page 313 for information on the location of this field
within the TSS.
If IOPL<3, CR4.VME=1, and the corresponding interrupt redirection bitmap entry is 0, the processor
uses the virtual-interrupt mechanism. See “Virtual Interrupts” on page 247 for more information on
this mechanism.
Table 8-10 summarizes the actions of the above system controls on interrupts taken when the
processor is running in virtual-8086 mode.
Interrupt-Handler Stack
With Error Code With No Error Code
Return GS +36
Return FS +32 Return GS +32
Return DS +28 Return FS +28
Return ES +24 Return DS +24
Return SS +20 Return ES +20
Return ESP +16 Return SS +16
Return EFLAGS +12 Return ESP +12
Return CS +8 Return EFLAGS +8
Return EIP +4 Return CS +4
Error Code New SS:ESP Return EIP SS:ESP
(From TSS, CPL=0)
513-249.eps
An IRET from privileged protected-mode software (CPL=0) to virtual-8086 mode reverses the stack-
build process. After the return pointer, EFLAGS, and return stack-pointer are restored, the processor
restores the ES, DS, FS, and GS registers by popping their values off the stack.
Interrupt-Descriptor
Table
Code-Segment Offset
Interrupt Vector
CS Selector DPL
+ *
16
Virtual-Address
Space
CS Limit DPL
Code-Segment Base
Interrupt Handler
513-245.eps
Interrupt-Handler Stack
Return SS +40
Return RSP +32 Return SS +32
Return RFLAGS +24 Return RSP +24
Return CS +16 Return RFLAGS +16
Return RIP +8 Return CS +8
Error Code RSP Return RIP RSP
Interrupt-Stack Alignment. In legacy mode, the interrupt-stack pointer can be aligned at any address
boundary. Long mode, however, aligns the stack on a 16-byte boundary. This alignment is performed
by the processor in hardware before pushing items onto the stack frame. The previous RSP is saved
unconditionally on the new stack by the interrupt mechanism. A subsequent IRET instruction
automatically restores the previous RSP.
Aligning the stack on a 16-byte boundary allows optimal performance for saving and restoring the 16-
byte XMM registers. The interrupt handler can save and restore the XMM registers using the faster 16-
byte aligned loads and stores (MOVAPS), rather than unaligned loads and stores (MOVUPS).
Although the RSP alignment is always performed in long mode, it is only of consequence when the
interrupted program is already running at CPL=0, and it is generally used only within the operating-
system kernel. The operating system should put 16-byte aligned RSP values in the TSS for interrupts
that change privilege levels.
Stack Switch. In long mode, the stack-switch mechanism differs slightly from the legacy stack-
switch mechanism (see “Interrupt To Higher Privilege” on page 234). When stacks are switched
during a long-mode privilege-level change resulting from an interrupt, a new SS descriptor is not
loaded from the TSS. Long mode only loads an inner-level RSP from the TSS. However, the SS
selector is loaded with a null selector, allowing nested control transfers, including interrupts, to be
handled properly in 64-bit mode. The SS.RPL is set to the new CPL value. See “Nested IRETs to 64-
Bit Mode Procedures” on page 247 for additional information.
The interrupt-handler stack that results from a privilege change in long mode looks identical to a long-
mode stack when no privilege change occurs. Figure 8-14 shows the stack after the switch is
performed and control is transferred to the interrupt handler.
Interrupt-Handler Stack
Return SS +40
Return RSP +32 Return SS +32
Return RFLAGS +24 Return RSP +24
Return CS +16 Return RFLAGS +16
Return RIP +8 Return CS +8
Error Code New RSP Return RIP New RSP
(from TSS) (from TSS)
SS=0 SS=0
(if CPL changes) (if CPL changes)
• The long-mode interrupt-gate and trap-gate descriptors define a 3-bit IST-index field in bits 2–0 of
byte +4. Figure 4-24 on page 91 shows the format of long-mode interrupt-gate and trap-gate
descriptors and the location of the IST-index field.
To enable the IST mechanism for a specific interrupt, system software stores a non-zero value in the
interrupt gate-descriptor IST-index field. If the IST index is zero, the modified legacy stack-switching
mechanism (described in the previous section) is used.
Figure 8-15 shows how the IST mechanism is used to create the interrupt-handler stack. When an
interrupt occurs and the IST index is non-zero, the processor uses the index to select the corresponding
IST pointer from the TSS. The IST pointer is loaded into the RSP to establish a new stack for the
interrupt handler. The SS register is loaded with a null selector if the CPL changes and the SS.RPL is
set to the new CPL value. After the stack is loaded, the processor pushes the old stack pointer,
RFLAGS, the return pointer, and the error code (if applicable) onto the stack. Control is then
transferred to the interrupt handler.
64-Bit
Interrupt-Handler Stack
Return SS +40
64-Bit TSS Return RSP +32
Return RFLAGS +24
Long-Mode Return CS +16
Interrupt- or Trap- Return RIP +8 SS=0
Gate Descriptor IST1 : IST7
Error Code RSP
513-248.eps
which saves the stack pointer whether or not a privilege change occurs. IRET also allows a null
selector to be popped off the stack and into the SS register. See “Nested IRETs to 64-Bit Mode
Procedures” on page 247 for additional information.
• In compatibility mode, IRET behaves as it does in legacy mode. The SS:ESP is popped off the
stack only if a control transfer to less privilege (numerically greater CPL) is performed. Otherwise,
it is assumed that a stack pointer is not present on the interrupt-handler stack.
The long-mode interrupt mechanism always uses a 64-bit stack when saving values for the interrupt
handler, and the interrupt handler is always entered in 64-bit mode. To work properly, an IRET used to
exit the 64-bit mode interrupt-handler requires a series of eight-byte pops off the stack. This is
accomplished by using a 64-bit operand-size prefix with the IRET instruction. The default stack size
assumed by an IRET in 64-bit mode is 32 bits, so a 64-bit REX prefix is needed by 64-bit mode
interrupt handlers.
Nested IRETs to 64-Bit Mode Procedures. In long mode, an interrupt causes a null selector to be
loaded into the SS register if the CPL changes (this is the same action taken by a far CALL in long
mode). If the interrupt handler performs a far call, or is itself interrupted, the null SS selector is pushed
onto the stack frame, and another null selector is loaded into the SS register. Using a null selector in
this way allows the processor to properly handle returns nested within 64-bit-mode procedures and
interrupt handlers.
The null selector enables the processor to properly handle nested returns to 64-bit mode (which do not
use the SS register), and returns to compatibility mode (which do use the SS register). Normally, an
IRET that pops a null selector into the SS register causes a general-protection exception (#GP) to
occur. However, in long mode, the null selector indicates the existence of nested interrupt handlers
and/or privileged software in 64-bit mode. Long mode allows an IRET to pop a null selector into SS
from the stack under the following conditions:
• The target mode is 64-bit mode.
• The target CPL<3.
In this case, the processor does not load an SS descriptor, and the null selector is loaded into SS
without causing a #GP exception.
Background. Legacy-8086 programs expect to have full access to the EFLAGS interrupt flag (IF) bit,
allowing programs to enable and disable maskable external interrupts. When those programs run in
virtual-8086 mode under a multitasking protected-mode environment, it can disrupt the operating
system if programs enable or disable interrupts for their own purposes. This is particularly true if
interrupts associated with one program can occur during execution of another program. For example, a
program could request that an area of memory be copied to disk. System software could suspend the
program before external hardware uses an interrupt to acknowledge that the block has been copied.
System software could subsequently start a second program which enables interrupts. This second
program could receive the external interrupt indicating that the memory block of the first program has
been copied. If that were to happen, the second program would probably be unprepared to handle the
interrupt properly.
Access to the IF bit must be managed by system software on a task-by-task basis to prevent corruption
of system resources. In order to completely manage the IF bit, system software must be able to
interrupt all instructions that can read or write the bit. These instructions include STI, CLI, PUSHF,
POPF, INTn, and IRET. These instructions are part of an instruction class that is IOPL-sensitive. The
processor takes a general-protection exception (#GP) whenever an IOPL-sensitive instruction is
executed and the EFLAGS.IOPL field is less than the CPL. Because all virtual-8086 programs run at
CPL=3, system software can interrupt all instructions that modify the IF bit by setting IOPL<3.
System software maintains a virtual image of the IF bit for each virtual-8086 program by emulating the
actions of IOPL-sensitive instructions that modify the IF bit. When an external maskable-interrupt
occurs, system software checks the state of the IF image for the current virtual-8086 program to
determine whether the program is masking interrupts. If the program is masking interrupts, system
software saves the interrupt information until the virtual-8086 program attempts to re-enable
interrupts. When the virtual-8086 program unmasks interrupts with an IOPL-sensitive instruction,
system software traps the action with the #GP handler.
The performance of a processor can be significantly degraded by the overhead of trapping and
emulating IOPL-sensitive instructions, and the overhead of maintaining images of the IF bit for each
virtual-8086 program. This performance loss can be eliminated by running virtual-8086 programs with
IOPL set to 3, thus allowing changes to the real IF flag from any privilege level. Unfortunately, this can
leave critical system resources unprotected.
In addition to the performance problems caused by virtualizing the IF bit, software interrupts (INTn
instructions) cannot be masked by the IF bit or virtual copies of the IF bit. The IF bit only affects
maskable external interrupts. Software interrupts in virtual-8086 mode are normally directed to the
real mode interrupt-vector table (IVT), but it can be desirable to redirect certain interrupts to the
protected-mode interrupt-descriptor table (IDT).
The virtual-8086-mode extensions are designed to support both external interrupts and software
interrupts, with mechanisms that preserve high performance without compromising protection.
Virtualization of external interrupts is supported using two bits in the EFLAGS register: the virtual-
interrupt flag (VIF) bit and the virtual-interrupt pending (VIP) bit. Redirection of software interrupts is
supported using the interrupt-redirection bitmap (IRB) in the TSS. A separate TSS can be created for
each virtual-8086 program, allowing system software to control interrupt redirection independently for
each virtual-8086 program.
VIF and VIP Extensions for External Interrupts. When VME extensions are enabled, the IF-
modifying instructions normally trapped by system software are allowed to execute. However, instead
of modifying the IF bit, they modify the EFLAGS VIF bit. This leaves control over maskable interrupts
to the system software. It can also be used as an indicator to system software that the virtual-8086
program is able to, or is expecting to, receive external interrupts.
When an unmasked external interrupt occurs, the processor transfers control from the virtual-8086
program to a protected-mode interrupt handler. If the interrupt handler determines that the interrupt is
for the virtual-8086 program, it can check the state of the VIF bit in the EFLAGS value pushed on the
stack for the virtual-8086 program. If the VIF bit is set (indicating the virtual-8086 program attempted
to unmask interrupts), system software can allow the interrupt to be handled by the appropriate virtual-
8086 interrupt handler.
If the VIF bit is clear (indicating the virtual-8086 program attempted to mask interrupts) and the
interrupt is for the virtual-8086 program, system software can hold the interrupt pending. System
software holds an interrupt pending by saving appropriate information about the interrupt, such as the
interrupt vector, and setting the virtual-8086 program's VIP bit in the EFLAGS image on the stack.
When the virtual-8086 program later attempts to set IF, the previously set VIP bit causes a general-
protection exception (#GP) to occur. System software can then pass the saved interrupt information to
the virtual-8086 interrupt handler.
To summarize, when the VME extensions are enabled (CR4.VME=1), the VIF and VIP bits are set and
cleared as follows:
• VIF Bit—This bit is set and cleared by the processor in virtual-8086 mode in response to an
attempt by a virtual-8086 program to set and clear the EFLAGS.IF bit. VIF is used by system
software to determine whether a maskable external interrupt should be passed on to the virtual-
8086 program, emulated by system software, or held pending. VIF is also cleared during software
interrupts through interrupt gates, with the original VIF value preserved in the EFLAGS image on
the stack.
• VIP Bit—System software sets and clears this bit in the EFLAGS image saved on the stack after an
interrupt. It can be set when an interrupt occurs for a virtual-8086 program that has a clear VIF bit.
The processor examines the VIP bit when an attempt is made by the virtual-8086 program to set
the IF bit. If VIP is set when the program attempts to set IF, a general-protection exception (#GP)
occurs before execution of the IF-setting instruction. System software must clear VIP to avoid
repeated #GP exceptions when returning to the interrupted instruction.
The VIF and VIP bits can be used by system software to minimize the overhead associated with
managing maskable external interrupts because virtual copies of the IF flag do not have to be
maintained by system software. Instead, VIF and VIP are maintained during context switches along
with the remaining EFLAGS bits.
Table 8-11 on page 252 shows how the behavior of instructions that modify the IF bit are affected by
the VME extensions.
• Recoverable—The error has been corrected by the processor. Recoverable errors do not cause a
machine check exception (#MC). However, the error is still logged in the machine-check registers.
It is the responsibility of system software to periodically poll the machine-check registers to
determine whether recoverable errors have occurred.
• Fatal/Unrecoverable—The error cannot be corrected by the processor. Unrecoverable errors cause
a machine check exception if CR4.MCE is set to 1.
In both cases, the contents of the machine-check registers are maintained through a warm reset, which
allows errors to be reported even if a reset occurs.
Machine-Check Global-Capabilities Register. Figure 9-1 shows the format of the machine-check
global-capabilities register (MCG_CAP). MCG_CAP is a read-only register that specifies the
machine-check mechanism capabilities supported by the processor implementation.
63 32
Reserved
31 9 8 7 0
C
T
Reserved Count
L
P
• MCG_CTL_PMCG_CTL Register Present (CTLP)—Bit 8. This bit specifies whether or not the
MCG_CTL_PMCG_CTL register is supported by the processor. When the bit is set to 1, the
register is supported. When the bit is cleared to 0, the register is unsupported.
All remaining bits in the MCG_CAP register are reserved. Writing values to the MCG_CAP register
produces undefined results.
Machine-Check Global-Status Register. Figure 9-2 shows the format of the machine-check global-
status register (MCG_STATUS). MCG_STATUS provides basic information about the processor state
after the occurrence of a machine-check error.
63 32
Reserved
31 3 2 1 0
M E R
C I I
Reserved
I P P
P V V
Machine-Check Global-Control Register. Figure 9-3 shows the format of the machine-check
global-control register (MCG_CTL). MCG_CTL is used by software to control reporting machine-
check errors from various sources. Each error-reporting register bank supported by the processor is
controlled by a corresponding enable bit in this register. Setting all bits to 1 in this register enables all
error-reporting register banks. The number of controls and how they are used is implementation-
specific (for further information, see the documentation for particular implementations of the
architecture). The presence of the MCG_CTL register is indicated by the MCG_CAP register
MCG_CTL_P bit, described on page 257.
63 2 1 0
E
E E E
N
… Error-Reporting Register-Bank Enable Bits … N N N
6
2 1 0
3
CPU Watchdog Timer Register. The CPU watchdog timer is used to generate a machine check
condition when an instruction does not complete within a time period specified by the CPU Watchdog
Timer register. The timer restarts the count each time an instruction completes, when enabled by the
CPU Watchdog Timer Enable bit. The time period is determined by the Count Select and Time Base
fields. The timer does not count during halt or stop-grant. The machine check condition is controlled
by the appropriate MCi_CTL register.
The format of the CPU watchdog timer is shown in Figure 9-4.
63 32
Reserved, MBZ
31 7 6 3 2 1 0
E
Reserved, MBZ CS TB
N
CPU Watchdog Timer Enable (EN) - Bit 0. This bit specifies whether the CPU Watchdog Timer is
enabled. When the bit is set to 1, the timer increments and generates a machine check when the timer
expires. When cleared to 0, the timer does not increment and no machine check is generated.
CPU Watchdog Timer Time Base (TB) - Bits 2-1. Specifies the time base for the time-out period
indicated in the Count Select field. The allowable time base values are provided in Table 9-1.
.
CPU Watchdog Timer Count Select (CS) - Bits 6-3. Specifies the time period required for the CPU
Watchdog Timer to expire. The time period is this value times the time base specified in the Time Base
field. The allowable values are shown in Table 9-2.
BIOS and Kernel Developer’s Guide for AMD Athlon™ 64 and AMD Opteron™ Processors (order#
26094) for particular implementations of the AMD64 architecture.
Software reads the MCG_CAP register to determine the number of supported register banks. The first
error-reporting register (MC0_CTL) always starts with MSR address 400h, followed by
MC0_STATUS (401h), MC0_ADDR (402h), and MC0_MISC (403h). Error-reporting-register MSR
addresses are assigned sequentially through the remaining supported register banks. Using this
information, software can access all error-reporting registers in an implementation-independent
manner.
63 2 1 0
E
E E E
N
… Error-Reporting Register-Bank Enable Bits … N N N
6
2 1 0
3
63 62 61 60 59 58 57 56 32
M A
O
V I D P
V U E
A S D C Other Information
E C N
L C R C
R
V V
31 16 15 0
See the appropriate implementation specific BIOS and kernel developer’s guide for information on
the format and encoding of the MCA error code.
• Model-Specific Error Code—Bits 31–16. This field encodes model-specific information about the
error. For further information, see the documentation for particular implementations of the
architecture.
• Other Information—Bits 56–32. This field holds model-specific error information. Software
should not rely on the field definitions being consistent between processor implementations.
Presently, the bits in this field are defined as:
- Bits 44–32—Reserved.
- Bit 45—When set to 1, this bit indicates the error is an uncorrectable ECC error.
- Bit 46—When set to 1, this bit indicates the error is a correctable ECC error.
- Bits 54–47—This field holds the ECC syndrome when an ECC error occurs.
- Bits 56–55—Reserved.
• PCC—Bit 57. When set to 1, this bit indicates that the processor state is likely to be corrupt due to
the machine-check error. In this case, it is possible software cannot restart the processor reliably.
When this bit is cleared to 0, the processor state is not corrupted by the machine-check error. If the
PCC bit is set in any error bank, the processor will clear RIPV and EIPV in the MCG_Status
register.
• ADDRV—Bit 58. When set to 1, this bit indicates that the address saved in the corresponding error-
reporting address register (MCi_ADDR) is valid, and contains the address where the error was
detected. When this bit is cleared to 0, MCi_ADDR does not contain a valid error address.
• MISCV—Bit 59. When set to 1, this bit indicates that additional information about the machine-
check error is saved in the corresponding error-reporting miscellaneous register (MCi_MISC).
This bit is cleared to 0 when the MCi_MISC registers are not implemented.
• EN—Bit 60. When set to 1, this bit indicates that the error condition is enabled in the
corresponding error-reporting control register (MCi_CTL). Errors disabled by MCi_CTL do not
cause a machine-check exception, but the machine-check mechanism can log errors when error
reporting is disabled in MCi_CTL.
• UC—Bit 61. When set to 1, this bit indicates that the processor did not correct the error condition.
When this bit is cleared to 0, the processor corrected the error condition.
• OVER—Bit 62. This bit is set to 1 by the processor if the VAL bit is already set to 1 as the
processor attempts to load error information into MCi_STATUS. This indicates that the results of a
previous machine-check error are still in the MCi_STATUS register. In this situation, the machine-
check mechanism handles the contents of MCi_STATUS as follows:
- Status for an enabled error replaces status for a disabled error.
- Status for an uncorrectable error replaces status for a correctable error.
- Status for an enabled uncorrectable error is never replaced.
• VAL—Bit 63. This bit is set to 1 by the processor if the contents of MCi_STATUS are valid.
Software should clear the VAL bit after reading the MCi_STATUS register, otherwise a subsequent
machine-check error sets the OVER bit as described above.
MCi_CTL
MCi_STATUS
MCi_ADDR
MCi_MISC0
MCi_MISC1
C000_0400h + (MCi_MISC0[BlkPtr] << 3)
MCi_MISC2
MCi_MISC3
MCi_MISC4
....
63 62 61 60 56 55 52 51 50 49 48 47 32
C C I
V L
T N N O
A K Reserved LVTOFF ERRCT
R T T F
L D
P E T
31 24 23 0
BLKP Reserved
• Software can periodically examine the machine-check status registers for reported errors, and log
any errors found.
• Software can enable the machine-check exception (#MC). When an uncorrectable error occurs, the
processor immediately transfers control to the machine-check exception handler. In this case,
system software provides a machine-check exception handler that, at a minimum, logs detected
errors. The exception handler can be designed for a specific processor implementation or can be
generalized to work on multiple implementations.
• When identifying the error condition, portable exception handlers should examine only the
MCi_STATUS register MCA error-code field. See “Error Codes” on page 262 for information on
interpreting this field.
• If the MCG_STATUS.RIPV bit is set to 1, the interrupted program can be restarted reliably at the
instruction-pointer address pushed onto the exception-handler stack. If RIPV=0, the interrupted
program cannot be restarted reliably at that location, although it can be restarted at that location for
debugging purposes.
• When logging errors, particularly those that are not recoverable, check the MCG_STATUS.EIPV
bit to see if the instruction-pointer address pushed onto the exception-handler stack is related to the
machine-check error. If EIPV=0, the address is not guaranteed to be related to the error.
• Before exiting the machine-check handler, be sure to clear MCG_STATUS.MCIP to 0. MCIP
indicates a machine-check exception occurred. If this bit is set when another machine-check
exception occurs, the processor enters the shutdown state.
• When an exception handler is able to, at a minimum, successfully log an error condition, the
MCi_STATUS registers should be cleared to 0 before exiting the machine-check handler. Software
is responsible for clearing at least the MCi_STATUS.VAL bits.
• Additional machine-check exception-handler portability can be added by having the handler use
the CPUID instruction to identify the processor and its capabilities. Implementation-specific
software can be added to the machine-check exception handler based on the processor information
reported by CPUID.
10 System-Management Mode
System-management mode (SMM) is an operating mode designed for system-control activities like
power management. Normally, these activities are transparent to conventional operating systems and
applications. SMM is used by system-specific BIOS (basic input-output system) and specialized low-
level device drivers, rather than the operating system.
The SMM interrupt-handling mechanism differs substantially from the standard interrupt-handling
mechanism described in Chapter 8, “Exceptions and Interrupts.” SMM is entered using a special
external interrupt called the system-management interrupt (SMI). After an SMI is received by the
processor, the processor saves the processor state in a separate address space, called SMRAM. The
SMM-handler software and data structures are also located in the SMRAM space. Interrupts and
exceptions that ordinarily cause control transfers to the operating system are disabled when SMM is
entered. The processor exits SMM, restores the saved processor state, and resumes normal execution
by using a special instruction, RSM.
In SMM, address translation is disabled and addressing is similar to real mode. SMM programs can
address up to 4 Gbytes of physical memory. See “SMM Operating-Environment” on page 281 for
additional information on memory addressing in SMM.
The following sections describe the components of the SMM mechanism:
• “SMM Resources” on page 272—this section describes SMRAM, the SMRAM save-state area
used to hold the processor state, and special SMRAM save-state entries used in support of SMM.
• “Using SMM” on page 281—this section describes the mechanism of entering and exiting SMM.
It also describes SMM memory allocation, addressing, and interrupts and exceptions.
Of these mechanisms, only the format of the SMRAM save-state area differs between the AMD64
architecture and the legacy architecture.
• Some previous AMD x86 processors saved and restored the CR2 register in the SMRAM state-
save area. This register is not saved by the SMM implementation in the AMD64 architecture.
SMM handlers that save and restore CR2 must perform the operation in software.
10.2.1 SMRAM
SMRAM is the memory-address space accessed by the processor when in SMM. The default size of
SMRAM is 64 Kbytes and can range in size between 32 Kbytes and 4 Gbytes. System logic can use
physically separate SMRAM and main memory, directing memory transactions to SMRAM after
recognizing SMM is entered, and redirecting memory transactions back to system memory after
recognizing SMM is exited. When separate SMRAM and main memory are used, the system designer
needs to provide a method of mapping SMRAM into main memory so that the SMI handler and data
structures can be loaded.
Figure 10-1 on page 273 shows the default SMRAM memory map. The default SMRAM code-
segment (CS) has a base address of 0003_0000h (the base address is automatically scaled by the
processor using the CS-selector register, which is set to the value 3000h). This default SMRAM-base
address is known as SMBASE. A 64-Kbyte memory region, addressed from 0003_0000h to
0003_FFFFh, makes up the default SMRAM memory space. The top 32 Kbytes (0003_8000h to
0003_FFFFh) must be supported by system logic, with physical memory covering that entire address
range. The top 512 bytes (0003_FE00h to 0003_FFFFh) of this address range are the default SMM
state-save area. The default entry point for the SMM interrupt handler is located at 0003_8000h.
SMRAM
0003_FFFFh
(SMBASE+FFFFh)
SMM State-Save Area
0003_FE00h
SMM Handler
0003_8000h
(SMBASE+8000h)
0003_0000h
(SMBASE)
513-250.eps
31 0
SMRAM Base
In some operating environments, relocation of SMRAM to a higher memory area can provide more
low memory for legacy software. SMBASE relocation is supported when the SMM-base relocation bit
in the SMM-revision identifier (bit 17) is set to 1. In processors implementing the AMD64
architecture, SMBASE relocation is always supported.
Software can only modify SMBASE (relocate the SMRAM-base address) by entering SMM,
modifying the SMBASE image stored in the SMRAM state-save area, and exiting SMM. The SMM-
handler entry point must be loaded at the new memory location specified by SMBASE+8000h. The
next time SMM is entered, the processor saves its state in the new state-save area at
SMBASE+0FE00h, and begins executing the SMM handler at SMBASE+8000h. The new SMBASE
address is used for every SMM until it is changed, or a hardware reset occurs.
When SMBASE is used to relocate SMRAM to an address above 1 Mbyte, 32-bit address-size-
override prefixes must be used to access this memory. This is because addressing in SMM behaves as it
does in real mode, with a 16-bit default operand size and address size. The values in the 16-bit
segment-selector registers are left-shifted four bits to form a 20-bit segment-base address. Without
using address-size overrides, the maximum computable address is 10FFEFh.
Because SMM memory-addressing is similar to real-mode addressing, the SMBASE address must be
less than 4 Gbytes. Physical-address extensions (CR4.PAE) should not be enabled in SMM, restricting
the SMRAM address space to the range 0h to 0FFFF_FFFFh.
A number of other registers are not saved or restored automatically by the SMM mechanism. See
“Saving Additional Processor State” on page 283 for information on using these registers in SMM.
As a reference for legacy processor implementations, the legacy SMM state-save area format is shown
in Table 10-2. Implementations of the AMD64 architecture do not use this format.
Reserved
Description Bits
SMM-Revision Level 15:0
I/O Instruction Restart 16
SMM Base Relocation 17
31 18 17 16 15 0
1 1 SMM-Revision Level
513-251eps
For example, if the TSeg range spans 256K bytes starting at address 10_0000h, then SSM_ADDR
=0010_0000h and SSM_MASK=FFFC_0000h (with zeros in bits 16:0). This results in a TSeg address
range from 0010_0000 to 0013_FFFFh.
63 52 51 32
BASE
Reserved, MBZ
(This is an architectural limit. A given implementation may support fewer bits.)
31 17 16 0
• SMM TSeg Base Address (BASE)—Bits 51-17. Specifies the base address of the TSeg range of
protected addresses.
63 52 51 32
MASK
Reserved, MBZ
(This is an architectural limit. A given implementation may support fewer bits.)
31 17 16 2 1 0
• ASeg Address Range Enable (AE)—Bit 0. Specifies whether the ASeg address range is enabled for
protection. When the bit is set to 1, the ASeg address range is enabled for protection. When cleared
to 0, the ASeg address range is disabled for protection.
• TSeg Address Range Enable (TE)—Bit 1. Specifies whether the TSeg address range is enabled for
protection. When the bit is set to 1, the TSeg address range is enabled for protection. When cleared
to 0, the TSeg address range is disabled for protection.
• TSeg Mask (MASK)—Bits 51-17. Specifies the mask used to determine the TSeg range of protected
addresses. The Phys address[51:17] is in the TSeg range if the following is true:
Phys Addr[51:17] & SSM_MASK[51:17] = SMM_ADDR[51:17] & SSM_MASK[51:17].
The SMM handler can write the auto-halt restart entry to specify whether the return from SMM should
take the processor back to the halt state or to the instruction-execution state specified by the SMM
state-save area. The values written are:
• Clear to 00h—The processor returns to the state specified by the SMM state-save area.
• Set to any non-zero value—The processor returns to the halt state.
If the return from SMM takes the processor back to the halt state, the HLT instruction is not re-
executed. However, the halt special bus-cycle is driven on the processor bus after the RSM instruction
executes.
The result of entering SMM from a non-halt state and returning to a halt state is not predictable.
xmm0
xmm1
xmm2
xmm3
xmm4
xmm5
xmm6
xmm7
xmm8
xmm9
xmm10
xmm11
xmm12
xmm13
xmm14
xmm15
mmx0 fpr0
mmx1 fpr1
mmx2 fpr2
mmx3 fpr3
mmx4 fpr4
mmx5 fpr5
mmx6 fpr6
mmx7 fpr7
TOP ES
x87 Status Word FSW
Visible to application software
Written by processor hardware x87 Tag Word FTW
15 0
513-272.eps
The 64-bit media instructions and x87 floating-point instructions share the same physical data
registers. Figure 11-2 shows how the 64-bit registers (MMX0–MMX7) are aliased onto the low 64 bits
of the 80-bit x87 floating-point physical data registers (FPR0–FPR7). Refer to “64-Bit Media
Programming” in Volume 1 for more information on these registers.
Of the registers shown in Figure 11-2, only the eight 64-bit MMX registers are visible to 64-bit media
application software. The processor maintains the contents of the two fields of the x87 status word—
top-of-stack-pointer (TOP) and exception summary (ES)—and the 16-bit x87 tag word during
execution of 64-bit media instructions, as described in “Actions Taken on Executing 64-Bit Media
Instructions” in Volume 1.
64-bit media instructions do not generate x87 floating-point exceptions, nor do they set any status
flags. However, 64-bit media instructions can trigger an unmasked floating-point exception caused by
a previously executed x87 instruction. 64-bit media instructions do this by reading the x87 FSW.ES bit
to determine whether such an exception is pending.
fpr0
fpr1
fpr2
fpr3
fpr4
fpr5
fpr6
fpr7
513-271.eps
FSAVE/FNSAVE and FRSTOR Instructions. The FSAVE/FNSAVE and FRSTOR instructions save
and restore the entire register state for 64-bit media instructions and x87 floating-point instructions.
The FSAVE instruction stores the register state, but only after handling any pending unmasked-x87
floating-point exceptions. The FNSAVE instruction stores the register state but skips the reporting and
handling of these exceptions. The state of all MMX/FPR registers is saved, as well as all other x87
state (the control word register, status word register, tag word, instruction pointer, data pointer, and last
opcode). After saving this state, the tag state for all MMX/FPR registers is changed to empty and is
thus available for a new procedure.
Starting on page 297, Figure 11-4 through Figure 11-7 show the memory formats used by the
FSAVE/FNSAVE and FRSTOR instructions when storing the x87 state in various processor modes
and using various effective-operand sizes. This state includes:
• x87 Data Registers
- FPR0–FPR7 80-bit physical data registers.
• x87 Environment
- FCW: x87 control word register
- FSW: x87 status word register
- FTW: x87 tag word
- Last x87 instruction pointer
- Last x87 data pointer
- Last x87 opcode
The eight data registers are stored in the 80 bytes following the environment information. Instead of
storing these registers in their physical order (FPR0–FPR7), the processor stores the registers in the
their stack order, ST(0)–ST(7), beginning with the top-of-stack, ST(0).
ST(7)
+68h
(79–48)
… …
ST(1) ST(0)
…
(15–0) (79–64)
ST(0)
…
(63–32)
ST(0)
+1Ch
(31–0)
Data DS Selector
Reserved, IGN +18h
(15–0)
Data Offset
+14h
(31–0)
Instruction Offset
+0Ch
(31–0)
ST(7)
+68h
(79–48)
… …
ST(1) ST(0)
…
(15–0) (79–64)
ST(0)
…
(63–32)
ST(0)
+1Ch
(31–0)
Data Offset
0000b 0000 0000 0000b +18h
(31–16)
Data Offset
Reserved, IGN +14h
(15–0)
Instruction Offset
Reserved, IGN +0Ch
(15–0)
ST(7)
Not Part of x87 State +5Ch
(79–64)
… …
ST(0)
+14h
(79–48)
ST(0)
+10h
(47–16)
Instruction Offset
x87 Tag Word (FTW) +04h
(15–0)
ST(7)
Not Part of x87 State +5Ch
(79–64)
… …
ST(0)
+14h
(79–48)
ST(0)
+10h
(47–16)
ST(0) Data
0000 0000 0000b +0Ch
(15–0) (19–16)
FXSAVE and FXRSTOR Instructions. The FXSAVE and FXRSTOR instructions save and restore
the entire 128-bit media, 64-bit media, and x87 state. These instructions usually execute faster than
FSAVE/FNSAVE and FRSTOR because they do not normally save and restore the x87 exception
pointers (last-instruction pointer, last data-operand pointer, and last opcode). The only case in which
they do save the exception pointers is the relatively rare case in which the exception-summary bit in the
x87 status word (FSW.ES) is set to 1, indicating that an unmasked exception has occurred. The
FXSAVE and FXRSTOR memory format contains fields for storing these values.
Unlike FSAVE and FNSAVE, the FXSAVE instruction does not alter the x87 tag word. Therefore, the
contents of the shared 64-bit MMX and 80-bit FPR registers can remain valid after an FXSAVE
instruction (or any other value the tag bits indicated before the save). Also, FXSAVE (like FNSAVE)
does not check for pending unmasked-x87 floating-point exceptions.
Figure 11-8 on page 302 shows the memory format of the media x87 state in long mode. When in 64-
bit mode using a 64-bit operand size, the format shown in Figure 11-8 is used. If a 32-bit operand size
is used (in 64-bit mode), the memory format is the same, except that RIP and RDS are stored as
sel:offset pointers, as shown in Figure 11-9 on page 303.
F E D C B A 9 8 7 6 5 4 3 2 1 0 Byte
Reserved, IGN +1F0h
… …
XMM15 +190h
XMM14 +180h
XMM13 +170h
XMM12 +160h
XMM11 +150h
XMM10 +140h
XMM9 +130h
XMM8 +120h
XMM7 +110h
XMM6 +100h
XMM5 +F0h
XMM4 +E0h
XMM3 +D0h
XMM2 +C0h
XMM1 +B0h
XMM0 +A0h
1. Stored as sel:offset if operand size is 32 bits. 32bit sel:offset format of the pointers is shown in figure 11-9.
F E D C B A 9 8 7 6 5 4 3 2 1 0 Byte
Reserved, IGN +1F0h
… …
XMM7 +110h
XMM6 +100h
XMM5 +F0h
XMM4 +E0h
XMM3 +D0h
XMM2 +C0h
XMM1 +B0h
XMM0 +A0h
Software can read and write all fields within the FXSAVE and FXRSTOR memory image. These fields
include:
• FCW—Bytes 01h–00h. x87 control word.
• FSW—Bytes 03h–02h. x87 status word.
• FTW—Byte 04h. x87 tag word. See “FXSAVE Format for x87 Tag Word” on page 304 for
additional information on the FTW format saved by the FXSAVE instruction.
• (Byte 05h contains the value 00h.)
• FOP—Bytes 07h–06h. last x87 opcode.
• Last x87 Instruction Pointer—A pointer to the last non-control x87 floating-point instruction
executed by the processor:
- RIP (64-bit format)—Bytes 0Fh–08h. 64-bit offset into the code segment (used without a CS
selector).
- EIP (32-bit format)—Bytes 0Bh–08h. 32-bit offset into the code segment.
- CS (32-bit format)—Bytes 0Dh–0Ch. Segment selector portion of the pointer.
• Last x87 Data Pointer—If the last non-control x87 floating point instruction referenced memory,
this value is a pointer to the data operand referenced by the last non-control x87 floating-point
instruction executed by the processor:
- RDP (64-bit format)—Bytes 17h–10h. 64-bit offset into the data segment (used without a DS
selector).
- DP (32-bit format)—Bytes 13h–10h. 32-bit offset into the data segment.
- DS (32-bit format)—Bytes 15h–14h. Segment selector portion of the pointer.
If the last non-control x87 instruction did not reference memory, then the value in the pointer is
implementation dependent.
• MXCSR—Bytes 1Bh–18h. 128-bit media-instruction control and status register. This register is
saved only if CR4.OSFXSR is set to 1.
• MXCSR_MASK—Bytes 1Fh–1Ch. Set bits in MXCSR_MASK indicate supported feature bits in
MXCSR. For example, if bit 6 (the DAZ bit) in the returned MXCSR_MASK field is set to 1, the
DAZ mode and the DAZ flag in MXCSR are supported. Cleared bits in MXCSR_MASK indicate
reserved bits in MXCSR. If software attempts to set a reserved bit in the MXCSR register, a #GP
exception will occur. To avoid this exception, after software clears the FXSAVE memory image
and executes the FXSAVE instruction, software should use the value returned by the processor in
the MXCSR_MASK field when writing a value to the MXCSR register, as follows:
- MXCSR_MASK = 0: If the processor writes a zero value into the MXCSR_MASK field, the
denormals-are-zeros (DAZ) mode and the DAZ flag in MXCSR are not supported. Software
should use the default mask value, 0000_FFBFh (bit 6, the DAZ bit, and bits 31–16 cleared to
0), to mask any value it writes to the MXCSR register to ensure that all reserved bits in
MXCSR are written with 0, thus avoiding a #GP exception.
- MXCSR_MASK ≠ 0: If the processor writes a non-zero value into the MXCSR_MASK field,
software should AND this value with any value it writes to the MXCSR register.
• MMXn/FPRn—Bytes 9Fh–20h. Shared 64-bit media and x87 floating-point registers. As in the
case of the x87 FSAVE instruction, these registers are stored in stack order ST(0)–ST(7). The
upper six bytes in the memory image for each register are reserved.
• XMMn—Bytes 11Fh–A0h. 128-bit media registers. These registers are saved only if
CR4.OSFXSR is set to 1.
FXSAVE Format for x87 Tag Word. Rather than saving the entire x87 tag word, FXSAVE saves a
single-byte encoded version. FXSAVE encodes each of the eight two-bit fields in the x87 tag word as
follows:
• Two-bit values of 00, 01, and 10 are encoded as a 1, indicating the corresponding x87 FPRn
register holds a value.
x87 Register FPR7 FPR6 FPR5 FPR4 FPR3 FPR2 FPR1 FPR0
Tag Word Value (hex) 8 3 F 1
Tag Value (binary) 10 00 00 11 11 11 00 01
Meaning Special Valid Valid Empty Empty Empty Valid Zero
When an FXSAVE is used to write the x87 tag word to memory, it encodes the value as E3h. This
encoded version describes the x87 FPRn contents as follows:
x87 Register FPR7 FPR6 FPR5 FPR4 FPR3 FPR2 FPR1 FPR0
Encoded Tag Byte
E 3
(hex)
Tag Value (binary) 1 1 1 0 0 0 1 1
Meaning Valid Valid Valid Empty Empty Empty Valid Valid
If necessary, software can decode the single-bit FXSAVE tag-word fields into the two-bit field FSAVE
uses by examining the contents of the corresponding FPR registers saved by FXSAVE. Table 11-1 on
page 306 shows how the FPR contents are used to find the equivalent FSAVE tag-field value. The
fraction column refers to fraction portion of the extended-precision significand (bits 62–0). The
integer bit column refers to the integer-portion of the significand (bit 63). See “x87 Floating-Point
Programming” in Volume 1 for more information on floating-point numbering formats.
Table 11-1. Deriving FSAVE Tag Field from FXSAVE Tag Field
Encoded Equivalent
FXSAVE Exponent Integer Bit2 Fraction1 Type of Value FSAVE
Tag Field Tag Field
0 All 0s Zero 01 (Zero)
0 Not all 0s Denormal
All 0s
1 All 0s
Pseudo Denormal 10 (Special)
1 Not all 0s
Neither 0 Unnormal
1 (Valid) all 0s
nor all 1s 1 Normal 00 (Valid)
don’t care
Pseudo Infinity
0
or Pseudo NaN
All 1s 10 (Special)
All 0s Infinity
1
Not all 0s NaN
0 (Empty) don’t care Empty 11 (Empty)
Note:
1. Bits 62–0 of the significand. Bit 62, the most-significant bit of the fraction, is also called the M bit.
2. Bit 63 of the significand, also called the J bit.
Performance Considerations. When system software supports multi-tasking, it must be able to save
the processor state for one task and load the state for another. For performance reasons, the media
and/or x87 processor state is usually saved and loaded only when necessary. System software can save
and load this state at the time a task switch occurs. However, if the new task does not use the state,
loading the state is unnecessary and reduces performance.
The task-switch bit (CR0.TS) is provided as a lazy context-switch mechanism that allows system
software to save and load the processor state only when necessary. When CR0.TS=1, a device-not-
available exception (#NM) occurs when an attempt is made to execute a 128-bit media, 64-bit media,
or x87 instruction. System software can use the #NM exception handler to save the state of the
previous task, and restore the state of the current task. Before returning from the exception handler to
the media or x87 instruction, system software must clear CR0.TS to 0 to allow the instruction to be
executed. Using this approach, the processor state is saved only when the registers are used.
In legacy mode, the hardware task-switch mechanism sets CR0.TS=1 during a task switch (see “Task
Switched (TS) Bit” on page 44 for more information). In long mode, the hardware task-switching is
not supported, and the CR0.TS bit is not set by the processor. Instead, the architecture assumes that
system software handles all task-switching and state-saving functions. If CR0.TS is to be used in long
mode for controlling the save and restore of media or x87 state, system software must set and clear it
explicitly.
12 Task Management
This chapter describes the hardware task-management features. All of the legacy x86 task-
management features are supported by the AMD64 architecture in legacy mode, but most features are
not available in long mode. Long mode, however, requires system software to initialize and maintain
certain task-management resources. The details of these resource-initialization requirements for long
mode are discussed in “Task-Management Resources” on page 308.
Global-Descriptor
Table
TSS Descriptor
Task-State Segment
I/O-Permission Bitmap
Interrupt-Redirection Bitmap
513-254.eps
A fifth resource is available in legacy mode for use by system software that uses the hardware-
multitasking mechanism to manage more than one task:
• Task-Gate Descriptor—This form of gate descriptor holds a reference to a TSS descriptor and is
used to control access between tasks.
15 3 2 1 0
Selector Index. Bits 15–3. The selector-index field locates the TSS descriptor in the global-
descriptor table.
Table Indicator (TI) Bit. Bit 2. The TI bit must be cleared to 0, which indicates that the GDT is used.
TSS descriptors cannot be located in the LDT. If a reference is made to a TSS descriptor in the LDT, a
general-protection exception (#GP) occurs.
Requestor Privilege-Level (RPL) Field. Bits 1–0. RPL represents the privilege level (CPL) the
processor is operating under at the time the TSS selector is loaded into the task register.
hold the additional information, a #GP exception occurs when an attempt is made to access beyond
the TSS limit. No check for the larger limit is performed during the task switch.
• Type—Four system-descriptor types are defined as TSS types, as shown in Table 4-5 on page 83.
Bit 9 is used as the descriptor busy bit (B). This bit indicates that the task is busy when set to 1, and
available when cleared to 0. Busy tasks are the currently running task and any previous (outer)
tasks in a nested-task hierarchy. Task recursion is not supported, and a #GP exception occurs if an
attempt is made to transfer control to a busy task. See “Nesting Tasks” on page 325 for additional
information.
In long mode, the 32-bit TSS types (available and busy) are redefined as 64-bit TSS types, and only
64-bit TSS descriptors can be used. Loading the task register with an available 64-bit TSS causes
the processor to change the TSS descriptor type to indicate a busy 64-bit TSS. Because long mode
does not support task switching, the TSS-descriptor busy bit is never cleared by the processor to
indicate an available 64-bit TSS.
Sixteen-bit TSS types are illegal in long mode. A general-protection exception (#GP) occurs if a
reference is made to a 16-bit TSS.
Selector
Descriptor Attributes
Figure 12-4 shows the format of the TR in long mode (both compatibility mode and 64-bit mode).
Selector
Descriptor Attributes
The AMD64 architecture expands the TSS-descriptor base-address field to 64 bits so that system
software running in long mode can access a TSS located anywhere in the 64-bit virtual-address space.
The processor ignores the 32 high-order base-address bits when running in legacy mode. Because the
TR is loaded from the GDT, the system-segment descriptor format has been expanded to 16 bytes by
the AMD64 architecture in support of 64-bit mode. See “System Descriptors” on page 88 for more
information on this expanded format. The high-order base-address bits are only loaded from 64-bit
mode using the LTR instruction. Figure 12-5 shows the relationship between the TSS and GDT.
Global Task
Descriptor State
Table Segment
Task Selector
TSS Attributes
Long mode requires the use of a 64-bit TSS type, and this type must be loaded into the TR by
executing the LTR instruction in 64-bit mode. Executing the LTR instruction in 64-bit mode loads the
TR with the full 64-bit TSS base address from the 16-byte TSS descriptor format (compatibility mode
can only load 8-byte system descriptors). A processor running in either compatibility mode or 64-bit
mode uses the full 64-bit TR.base address.
EDI +44h
ESI +40h
EBP +3Ch
ESP +38h
EBX +34h
EDX +30h
ECX +2Ch
EAX +28h
EFLAGS +24h
EIP +20h
CR3 +1Ch
Reserved, IGN SS2 +18h
ESP2 +14h
ESP1 +0Ch
ESP0 +04h
- Whether the port can be accessed when the processor is running in virtual-8086 mode.
Because one bit is used per 8-byte I/O-port, this bitmap can take up to 8 Kbytes of TSS space. The
bitmap can be located anywhere within the first 64 Kbytes of the TSS, as long as it is above byte
103. The last byte of the bitmap must contain all ones (0FFh). See “I/O-Permission Bitmap” on
page 316 for more information.
• Interrupt-Redirection Bitmap—Static field. This field defines how each of the 256-possible
software interrupts is directed in a virtual-8086 environment. One bit is used for each interrupt, for
a total bitmap size of 32 bytes. The bitmap can be located anywhere above byte 103 within the first
64 Kbytes of the TSS. See “Interrupt Redirection of Software Interrupts” on page 250 for
information on using this field.
The TSS can be paged by system software. System software that uses the hardware task-switch
mechanism must guarantee that a page fault does not occur during a task switch. Because the processor
only reads and writes the first 104 TSS bytes during a task switch, this restriction only applies to those
bytes. The simplest approach is to align the TSS on a page boundary so that all critical bytes are either
present or not present. Then, if a page fault occurs when the TSS is accessed, it occurs before the first
byte is read. If the page fault occurs after a portion of the TSS is read, the fault is unrecoverable.
I/O-Permission Bitmap. The I/O-permission bitmap (IOPB) allows system software to grant less-
privileged programs access to individual I/O ports, overriding the effect of RFLAGS.IOPL for those
devices. When an I/O instruction is executed, the processor checks the IOPB only if the processor is in
virtual x86 mode or the CPL is greater than the RFLAGS.IOPL field. Each bit in the IOPB corresponds
to a byte I/O port. A word I/O port corresponds to two consecutive IOPB bits, and a doubleword I/O
port corresponds to four consecutive IOPB bits. Access is granted to an I/O port of a given size when
all IOPB bits corresponding to that port are clear. If any bits are set, a #GP occurs.
The IOPB is located in the TSS, as shown by the example in Figure 12-7 on page 317. Each TSS can
have a different copy of the IOPB, so access to individual I/O devices can be granted on a task-by-task
basis. The I/O-permission bitmap base-address field located at byte 66h in the TSS is an offset into the
TSS locating the start of the IOPB. If all 64K IO ports are supported, the IOPB base address must not
be greater than 0DFFFh, otherwise accesses to the bitmap cause a #GP to occur. An extra byte must be
present after the last IOPB byte. This byte must have all bits set to 1 (0FFh). This allows the processor
to read two IOPB bytes each time an I/O port is accessed. By reading two IOPB bytes, the processor
can check all bits when unaligned, multi-byte I/O ports are accessed.
IOPB+8h
0 0 0 0 IOPB+4h
IOPB
...
+00h
Bits in the IOPB sequentially correspond to I/O port addresses. The example in Figure 12-7 shows bits
12 through 15 in the second doubleword of the IOPB cleared to 0. Those bit positions correspond to
byte I/O ports 44h through 47h, or alternatively, doubleword I/O port 44h. Because the bits are cleared
to zero, software running at any privilege level can access those I/O ports.
By adjusting the TSS limit, it may happen that some ports in the I/O-address space have no
corresponding IOPB entry. Ports not represented by the IOPB will cause a #GP exception. Referring
again to Figure 12-7, the last IOPB entry is at bit 23 in the fourth IOPB doubleword, which
corresponds to I/O port 77h. In this example, all ports from 78h and above will cause a #GP exception,
as if their permission bit was set to 1.
• ISTn—Bytes 5Bh–24h. The full 64-bit canonical forms of the interrupt-stack-table (IST) pointers.
See “Interrupt-Stack Table” on page 245 for a description of the IST mechanism.
• I/O Map Base Address—Bytes 67h–66h. The 16-bit offset to the I/O-permission bit map from the
64-bit TSS base. The function of this field is identical to that in a legacy 32-bit TSS. See “I/O-
Permission Bitmap” on page 316 for more information.
31 16 15 14 13 12 11 8 7 0
Segmented Memory. The segmented memory for a task consists of the segments that are loaded
during a task switch and any segments that are later accessed by the task code. The hardware task-
switch mechanism allows tasks to either share segments with other tasks, or to access segments in
isolation from one another. Tasks that share segments actually share a virtual-address (linear-address)
space, but they do not necessarily share a physical-address space. When paging is enabled, the virtual-
to-physical mapping for each task can differ, as is described in the following section. Shared segments
do share physical memory when paging is disabled, because virtual addresses are used as physical
addresses.
A number of options are available to system software that shares segments between tasks:
• Sharing segment descriptors using the GDT. All tasks have access to the GDT, so it is possible for
segments loaded in the GDT to be shared among tasks.
• Sharing segment descriptors using a single LDT. Each task has its own LDT, and that LDT selector
is automatically saved and restored in the TSS by the processor during task switches. Tasks,
however, can share LDTs simply by storing the same LDT selector in multiple TSSs. Using the
LDT to manage segment sharing and segment isolation provides more flexibility to system
software than using the GDT for the same purpose.
• Copying shared segment descriptors into multiple LDTs. Segment descriptors can be copied by
system software into multiple LDTs that are otherwise not shared between tasks. Allowing
segment sharing at the segment-descriptor level, rather than the LDT level or GDT level, provides
the greatest flexibility to system software.
In all three cases listed above, the actual data and instructions are shared between tasks only when the
tasks’ virtual-to-physical address mappings are identical.
Paged Memory. Each task has its own page-translation table base-address (CR3) register, and that
register is automatically saved and restored in the TSS by the processor during task switches. This
allows each task to point to its own set of page-translation tables, so that each task can translate virtual
addresses to physical addresses independently. Page translation must be enabled for changes in CR3
values to have an effect on virtual-to-physical address mapping. When page translation is disabled, the
tables referenced by CR3 are ignored, and virtual addresses are equivalent to physical addresses.
• The descriptors for all previously-loaded segment selectors are loaded into the hidden portion of
the segment registers. This sets or clears the P bits for the segments as specified by the new
descriptor values.
If the above steps complete successfully, the processor begins executing instructions in the new task
beginning with the instruction referenced by the CS:EIP far pointer loaded from the new TSS. The
privilege level of the new task is taken from the new CS segment selector’s RPL.
Saving Other Processor State. The processor does not automatically save the registers used by the
media or x87 instructions. Instead, the processor sets CR0.TS to 1 during a task switch. Later, when an
attempt is made to execute any of the media or x87 instructions while TS=1, a device-not-available
exception (#NM) occurs. System software can then save the previous state of the media and x87
registers and clear the CR0.TS bit to 0 before executing the next media/x87 instruction. As a result, the
media and x87 registers are saved only when necessary after a task switch.
Because the legacy task-switch mechanism is not supported in long mode, software cannot use task
gates in long mode. Any attempt to transfer control to another task using a task gate in long mode
causes a general-protection exception (#GP) to occur.
CS CPL=2
Task-Gate
RPL=3
Selector
DPLG=3
Task-State
Task-Gate Descriptor Segment
DPLS
Access Allowed
TSS Descriptor
Example 1: Privilege Check Passes
CS CPL=2
Task-Gate
RPL=3
Selector
DPLG=0
Task-State
Task-Gate Descriptor Segment
DPLS
Access Denied
TSS Descriptor
Example 2: Privilege Check Fails 513-255.eps
Programs running at any privilege level can set EFLAGS.NT to 1 and execute the IRET instruction to
transfer control to another task. System software can keep control over improperly nested-task
switches by initializing the link field of all TSSs that it creates. That way, improperly nested-task
switches always transfer control to a known task.
Preventing Recursion. Task recursion is not allowed by the hardware task-switch mechanism. If
recursive-task switches were allowed, they would replace a previous task-state image with a newer
image, discarding the previous information. To prevent recursion from occurring, the processor uses
the busy bit located in the TSS-descriptor type field (bit 9 of byte +4). Use of this bit depends on how
the task switch is initiated:
• The JMP instruction clears the busy bit in the old task to 0 and sets the busy bit in the new task to 1.
A general-protection exception (#GP) occurs if an attempt is made to JMP to a task with a set busy
bit.
• The CALL instruction, INTn instructions, interrupts, and exceptions set the busy bit in the new
task to 1. The busy bit in the old task remains set to 1, preventing recursion through task-nesting
levels. A general-protection exception (#GP) occurs if an attempt is made to switch to a task with a
set busy bit.
• An IRET to another task (EFLAGS.NT must be 1) clears the busy bit in the old task to 0. The busy
bit in the new task is not altered, because it was already set to 1.
Table 12-1 on page 325 summarizes the effect various task-switch initiators have on the TSS-busy bit.
Address-Breakpoint Registers (DR0-DR3). Figure 13-1 shows the format of the four address-
breakpoint registers, DR0-DR3. Software can load a virtual (linear) address into any of the four
registers, and enable breakpoints to occur when the address matches an instruction or data reference.
The MOV DRn instructions do not check that the virtual addresses loaded into DR0–DR3 are in
canonical form. Breakpoint conditions are enabled using the debug-control register, DR7 (see “Debug-
Control Register (DR7)” on page 331).
63 0
63 0
63 0
63 0
Reserved Debug Registers (DR4, DR5). The DR4 and DR5 registers are reserved and should not be
used by software. These registers are aliased to the DR6 and DR7 registers, respectively. When the
debug extensions are enabled (CR4.DE=1) attempts to access these registers cause an invalid-opcode
exception (#UD).
Debug-Status Register (DR6). Figure 13-2 on page 330 shows the format of the debug-status
register, DR6. Debug status is loaded into DR6 when an enabled debug condition is encountered that
causes a #DB exception.
63 32
MBZ
31 16 15 14 13 12 11 4 3 2 1 0
R
B B B B B B B
Read as 1s A Read as 1s
T S D 3 2 1 0
Z
Bits 15:13 of the DR6 register is never cleared by the processor and must be cleared by software after
the contents have been read. Register fields are:
• Breakpoint-Condition Detected (B3–B0)—Bits 3–0. The processor updates these four bits on
every debug breakpoint or general-detect condition. A bit is set to 1 if the corresponding address-
breakpoint register detects an enabled breakpoint condition, as specified by the DR7 Ln, Gn, R/Wn
and LENn controls, and is cleared to 0 otherwise. For example, B1 (bit 1) is set to 1 if an address-
breakpoint condition is detected by DR1.
• Debug-Register-Access Detected (BD)—Bit 13. The processor sets this bit to 1 if software
accesses any debug register (DR0–DR7) while the general-detect condition is enabled
(DR7.GD=1).
• Single Step (BS)—Bit 14. The processor sets this bit to 1 if the #DB exception occurs as a result of
single-step mode (rFLAGS.TF=1). Single-step mode has the highest-priority among debug
exceptions. Other status bits within the DR6 register can be set by the processor along with the BS
bit.
• Task-Switch (BT)—Bit 15. The processor sets this bit to 1 if the #DB exception occurred as a result
of task switch to a task with a TSS T-bit set to 1.
All remaining bits in the DR6 register are reserved. Reserved bits 31–16 and 11–4 must all be set to 1,
while reserved bit 12 must be cleared to 0. In 64-bit mode, the upper 32 bits of DR6 are reserved and
must be written with zeros. Writing a 1 to any of the upper 32 bits results in a general-protection
exception, #GP(0).
Debug-Control Register (DR7). Figure 13-3 shows the format of the debug-control register, DR7.
DR7 is used to establish the breakpoint conditions for the address-breakpoint registers (DR0–DR3)
and to enable debug exceptions for each address-breakpoint register individually. DR7 is also used to
enable the general-detect breakpoint condition.
63 32
MBZ
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
R
G G L G L G L G L G L
LEN3 R/W3 LEN2 R/W2 LEN1 R/W1 LEN0 R/W0 RAZ RAZ A
D E E 3 3 2 2 1 1 0 0
1
The fields within the DR7 register are all read/write. These fields are:
• Local-Breakpoint Enable (L3–L0)—Bits 6, 4, 2, and 0 (respectively). Software individually sets
these bits to 1 to enable debug exceptions to occur when the corresponding address-breakpoint
register (DRn) detects a breakpoint condition while executing the current task. For example, if L1
(bit 2) is set to 1 and an address-breakpoint condition is detected by DR1, a #DB exception occurs.
These bits are cleared to 0 by the processor when a hardware task-switch occurs.
• Global-Breakpoint Enable (G3–G0)—Bits 7, 5, 3, and 1 (respectively). Software sets these bits to
1 to enable debug exceptions to occur when the corresponding address-breakpoint register (DRn)
detects a breakpoint condition while executing any task. For example, if G1 (bit 3) is set to 1 and an
address-breakpoint condition is detected by DR1, a #DB exception occurs. These bits are never
cleared to 0 by the processor.
• Local-Enable (LE)—Bit 8. Software sets this bit to 1 in legacy implementations to enable exact
breakpoints while executing the current task. This bit is ignored by implementations of the
AMD64 architecture. All breakpoint conditions, except certain string operations preceded by a
repeat prefix, are exact.
• Global-Enable (GE)—Bit 9. Software sets this bit to 1 in legacy implementations to enable exact
breakpoints while executing any task. This bit is ignored by implementations of the AMD64
architecture. All breakpoint conditions, except certain string operations preceded by a repeat
prefix, are exact.
• General-Detect Enable (GD)—Bit 13. Software sets this bit to 1 to cause a debug exception to
occur when an attempt is made to execute a MOV DRn instruction to any debug register
(DR0–DR7). This bit is cleared to 0 by the processor when the #DB handler is entered, allowing
the handler to read and write the DRn registers. The #DB exception occurs before executing the
instruction, and DR6.BD is set by the processor. Software debuggers can use this bit to prevent the
currently-executing program from interfering with the debug operation.
• Read/Write (R/W3–R/W0)—Bits 29–28, 25–24, 21–20, and 17–16 (respectively). Software sets
these fields to control the breakpoint conditions used by the corresponding address-breakpoint
registers (DRn). For example, control-field R/W1 (bits 21–20) controls the breakpoint conditions
for the DR1 register. The R/Wn control-field encodings specify the following conditions for an
address-breakpoint to occur:
- 00—Only on instruction execution.
- 01—Only on data write.
- 10—This encoding is further qualified by CR4.DE as follows:
. CR4.DE=0—Condition is undefined.
. CR4.DE=1—Only on I/O read or I/O write.
- 11—Only on data read or data write.
• Length (LEN3–LEN0)—Bits 31–30, 27–26, 23–22, and 19–18 (respectively). Software sets these
fields to control the range used in comparing a memory address with the corresponding address-
breakpoint register (DRn). For example, control-field LEN1 (bits 23–22) controls the breakpoint-
comparison range for the DR1 register.
The value in DRn defines the low-end of the address range used in the comparison. LENn is used
to mask the low-order address bits in the corresponding DRn register so that they are not used in
the address comparison. To work properly, breakpoint boundaries must be aligned on an address
corresponding to the range size specified by LENn. The LENn control-field encodings specify the
following address-breakpoint-comparison ranges:
- 00—1 byte.
- 01—2 byte, must be aligned on a word boundary.
- 10—8 byte, must be aligned on a quadword boundary. (Long mode only; otherwise undefined.)
- 11—4 byte, must be aligned on a doubleword boundary.
If the R/Wn field is used to specify instruction breakpoints (R/Wn=00), the corresponding LENn
field must be set to 00. Setting LENn to any other value produces undefined results.
All remaining bits in the DR7 register are reserved. Reserved bits 15–14 and 12–11 must all be cleared
to 0, while reserved bit 10 must be set to 1. In 64-bit mode, the upper 32 bits of DR7 are reserved and
must be written with zeros. Writing a 1 to any of the upper 32 bits results in a general-protection
#GP(0) exception.
64-Bit-Mode Extended Debug Registers. In 64-bit mode, additional encodings for debug registers
are available. The REX.R bit, in a REX prefix, is used to modify the ModRM reg field when that field
encodes a control register, as shown in “REX Prefixes” in Volume 3. These additional encodings
enable the processor to address DR8–DR15.
Access to the DR8–DR15 registers is implementation-dependent. The architecture does not require
any of these extended debug registers to be implemented. Any attempt to access an unimplemented
register results in an invalid-opcode exception (#UD).
Debug-Control MSR (DebugCtlMSR). Figure 13-4 on page 334 shows the format of the debug-
control MSR, DebugCtlMSR. DebugCtlMSR provides additional debug controls over control-transfer
recording and single stepping, and external-breakpoint reporting and trace messages. DebugCtlMSR is
an MSR and is read and written using the RDMSR and WRMSR instructions.
63 32
Reserved
31 6 5 4 3 2 1 0
P P P P B L
Reserved B B B B T B
3 2 1 0 F R
Control-Transfer Recording MSRs. Figure 13-5 on page 335 shows the format of the 64-bit
control-transfer recording MSRs: LastBranchToIP, LastBranchFromIP, LastExceptionToIP, and
LastExceptionFromIP. These registers are loaded automatically by the processor when the
DebugCtlMSR.LBR bit is set to 1. These MSRs are read-only.
63 0
63 0
63 0
63 0
13.2 Breakpoints
13.2.1 Setting Breakpoints
Breakpoints can be set to occur on either instruction addresses or data addresses using the breakpoint-
address registers, DR0–DR3 (DRn). The values loaded into these registers represent the breakpoint-
location virtual address. The debug-control register, DR7, is used to enable the breakpoint registers
and to specify the type of access and the range of addresses that can trigger a breakpoint.
Software enables the DRn registers using the corresponding local-breakpoint enable (Ln) or global-
breakpoint enable (Gn) found in the DR7 register. Ln is used to enable breakpoints only while the
current task is active, and it is cleared by the processor when a task switch occurs. Gn is used to enable
breakpoints for all tasks, and it is never cleared by the processor.
The R/Wn fields in DR7, along with the CR4.DE bit, specify the type of access required to trigger a
breakpoint when an address match occurs on the corresponding DRn register. Breakpoints can be set to
occur on instruction execution, data reads and writes, and I/O reads and writes. The R/Wn and
CR4.DE encodings used to specify the access type are described on page 332 of “Debug-Control
Register (DR7).”
The LENn fields in DR7 specify the size of the address range used in comparison with data or
instruction addresses. LENn is used to mask the low-order address bits in the corresponding DRn
register so that they are not used in the address comparison. Breakpoint boundaries must be aligned on
an address corresponding to the range size specified by LENn. Assuming the access type matches the
type specified by R/Wn, a breakpoint occurs if any accessed byte falls within the range specified by
LENn. For instruction breakpoints, LENn must specify a single-byte range. The LENn encodings used
to specify the address range are described on page 332 of “Debug-Control Register (DR7).”
Table 13-1 shows several examples of data accesses, and whether or not they cause a #DB exception to
occur based on the breakpoint address in DRn and the breakpoint-address range specified by LENn. In
this table, R/Wn always specifies read/write access.
Instruction breakpoints and general-detect conditions have a lower interrupt-priority than the other
breakpoint and single-stepping conditions (see “Priorities” on page 226). Data-breakpoint conditions
on the previous instruction occur before an instruction-breakpoint condition on the next instruction.
However, if instruction and data breakpoints can occur as a result of executing a single instruction, the
instruction breakpoint occurs first (before the instruction is executed), followed by the data breakpoint
(after the instruction is executed).
stack. If multiple instruction breakpoints are set, the debug handler can use the Bn field to identify
which register caused the breakpoint.
Returning from the debug handler causes the breakpoint instruction to be executed. Before returning
from the debug handler, the rFLAGS.RF bit should be set to 1 to prevent a reoccurrence of the #DB
exception due to the instruction-breakpoint condition. The processor ignores instruction-breakpoint
conditions when rFLAGS.RF=1, until after the next instruction (in this case, the breakpoint
instruction) is executed. After the next instruction is executed, the processor clears rFLAGS.RF to 0.
Data Breakpoints. Data breakpoints are set by loading a breakpoint-address register (DRn) with the
desired data virtual-address, and then setting the corresponding DR7 fields as follows:
• Ln or Gn is set to 1 to enable the breakpoint for either the local task or all tasks, respectively.
• R/Wn is set to 01b to specify that the data virtual-address is compared with the contents of DRn
only during a memory-write. Setting this field to 11b specifies that the comparison takes place
during both memory reads and memory writes.
• LENn is set to 00b, 01b, 11b, or 10b to specify an address-match range of one, two, four, or eight
bytes, respectively. Long mode must be active to set LENn to 10b.
When a #DB exception occurs due to a data breakpoint address in DRn, the corresponding Bn field in
DR6 is set to 1 to indicate that a breakpoint condition occurred. The breakpoint occurs after the data-
access instruction is executed, which means that the original data is overwritten by the data-access
instruction. If the debug handler needs to report the previous data value, it must save that value before
setting the breakpoint.
Because the breakpoint occurs after the data-access instruction is executed, the address of the
instruction following the data-access instruction is pushed onto the debug-handler stack. Repeated
string instructions, however, can trigger a breakpoint before all iterations of the repeat loop have
completed. When this happens, the address of the string instruction is pushed onto the stack during a
#DB exception if the repeat loop is not complete. A subsequent IRET from the #DB handler returns to
the string instruction, causing the remaining iterations to be executed. Most implementations cannot
report breakpoints exactly for repeated string instructions, but instead report the breakpoint on an
iteration later than the iteration where the breakpoint occurred.
I/O Breakpoints. I/O breakpoints are set by loading a breakpoint-address register (DRn) with the I/O-
port address to be trapped, and then setting the corresponding DR7 fields as follows:
• Ln or Gn is set to 1 to enable the breakpoint for either the local task or all tasks, respectively.
• R/Wn is set to 10b to specify that the I/O-port address is compared with the contents of DRn only
during execution of an I/O instruction. This encoding of R/Wn is valid only when debug extensions
are enabled (CR4.DE=1).
• LENn is set to 00b, 01b, or 11b to specify the breakpoint occurs on a byte, word, or doubleword I/O
operation, respectively.
The I/O-port address specified by the I/O instruction is zero extended by the processor to 64 bits before
comparing it with the DRn registers.
When a #DB exception occurs due to an I/O breakpoint in DRn, the corresponding Bn field in DR6 is
set to 1 to indicate that a breakpoint condition occurred. The breakpoint occurs after the instruction is
executed, which means that the original data is overwritten by the breakpoint instruction. If the debug
handler needs to report the previous data value, it must save that value before setting the breakpoint.
Because the breakpoint occurs after the instruction is executed, the address of the instruction following
the I/O instruction is pushed onto the debug-handler stack, in most cases. In the case of INS and OUTS
instructions that use the repeat prefix, however, the breakpoint occurs after the first iteration of the
repeat loop. When this happens, the I/O-instruction address can be pushed onto the stack during a #DB
exception if the repeat loop is not complete. A subsequent return from the debug handler causes the
next I/O iteration to be executed. If the breakpoint condition is still set, the #DB exception reoccurs
after that iteration is complete.
Task-Switch Breakpoints. Breakpoints can be set in a task TSS to raise a #DB exception after a task
switch. Software enables a task breakpoint by setting the T bit in the TSS to 1. When a task switch
occurs into a task with the T bit set, the processor completes loading the new task state. Before the first
instruction is executed, the #DB exception occurs, and the processor sets DR6.BT to 1, indicating that
the #DB exception occurred as a result of task breakpoint.
The processor does not clear the T bit in the TSS to 0 when the #DB exception occurs. Software must
explicitly clear this bit to disable the task breakpoint. Software should never set the T-bit in the debug-
handler TSS if a separate task is used for #DB exception handling, otherwise the processor loops on
the debug handler.
Single-step breakpoints have a higher priority than external interrupts. If an external interrupt occurs
during single stepping, control is transferred to the #DB handler first, causing the rFLAGS.TF bit to be
cleared to 0. Next, before the first instruction in the debug handler is executed, the processor transfers
control to the pending-interrupt handler. This allows external interrupts to be handled outside of
single-step mode.
The INTn, INT3, and INTO instructions clear the rFLAGS.TF bit to 0 when they are executed. If a
debugger is used to single-step software that contains these instructions, it must emulate them instead
of executing them.
The single-step mechanism can also be set to single step only control transfers, rather than single step
every instruction. See “Single Stepping Control Transfers” on page 341 for additional information.
interrupted instruction, and the LastBranchToIP register is loaded with the offset of the interrupt or
exception handler.
• LastExceptionFromIP and LastExceptionToIP Registers—The processor loads these from the
LastBranchFromIP register and the LastBranchToIP register, respectively, when most interrupts
and exceptions are taken. These two registers are not updated, however, when #DB or #MC
exceptions are taken, or the ICEBP instruction is executed.
The processor automatically disables control-transfer recording when a debug exception (#DB) occurs
by clearing DebugCtlMSR.LBR to 0. The contents of the control-transfer recording MSRs are not
altered by the processor when the #DB occurs. Before exiting the debug-exception handler, software
can set DebugCtlMSR.LBR to 1 to re-enable the recording mechanism.
Debuggers can trace a control transfer backward from a bug to its source using the recording MSRs
and the breakpoint-address registers. The debug handler does this by updating the breakpoint registers
from the recording MSRs after a #DB exception occurs, and restarting the program. The program takes
a #DB exception on the previous control transfer, and this process can be repeated. The debug handler
cannot simply copy the contents of the recording MSR into the breakpoint-address register. The
recording MSRs hold segment offsets, while the debug registers hold virtual (linear) addresses. The
debug handler must calculate the virtual address by reading the code-segment selector (CS) from the
interrupt-handler stack, then reading the segment-base address from the CS descriptor, and adding that
base address to the offset in the recording MSR. The calculated virtual-address can then be used as a
breakpoint address.
Single Stepping Control Transfers. Software can enable control-transfer single stepping by setting
DebugCtlMSR.BTF to 1 and rFLAGS.TF to 1. The processor automatically disables control-transfer
single stepping when a debug exception (#DB) occurs by clearing DebugCtlMSR.BTF to 0.
rFLAGS.TF is also cleared when a #DB exception occurs. Before exiting the debug-exception handler,
software must set both DebugCtlMSR.BTF and rFLAGS.TF to 1 to restart single stepping.
When enabled, this single-step mechanism causes a #DB exception to occur on every branch
instruction, interrupt, or exception. Debuggers can use this capability to perform a “coarse” single step
across blocks of code (bound by control transfers), and then, as the problem search is narrowed, switch
into a “fine” single-step mode on every instruction (DebugCtlMSR.BTF=0, rFLAGS.TF=1).
Debuggers can use both the single-step mechanism and recording mechanism to support full backward
and forward tracing of control transfers.
63 0
PerfCtrn
The PerfCtrn registers are model-specific registers that can be read using a special read performance-
monitoring counter instruction, RDPMC. The RDPMC instruction loads the contents of the PerfCtrn
register specified by the ECX register, into the EDX register and the EAX register. The high 32 bits are
loaded into EDX, and the low 32 bits are loaded into EAX. RDPMC can be executed only at CPL=0,
unless system software enables use of the instruction at all privilege levels. RDPMC can be enabled for
use at all privilege levels by setting CR4.PCE (the performance-monitor counter-enable bit) to 1.
When CR4.PCE = 0 and CPL > 0, attempts to execute RDPMC result in a general-protection
exception (#GP).
The performance counters can also be read and written by system software running at CPL=0 using the
RDMSR and WRMSR instructions, respectively. Writing the performance counters can be useful if
software wants to count a specific number of events, and then trigger an interrupt when that count is
reached. An interrupt can be triggered when a performance counter overflows (see “Counter
Overflow” on page 346 for additional information). Software should use the WRMSR instruction to
load the count as a two’s-complement negative number into the performance counter. This causes the
counter to overflow after counting the appropriate number of times.
The performance counters are not guaranteed to produce identical measurements each time they are
used to measure a particular instruction sequence, and they should not be used to take measurements of
very small instruction sequences. The RDPMC instruction is not serializing, and it can be executed
out-of-order with respect to other instructions around it. Even when bound by serializing instructions,
the system environment at the time the instruction is executed can cause events to be counted before
the counter value is loaded into EDX:EAX.
63 42 41 40 39 36 35 32
H G Event
Reserved Reserved
O O Mask[11–8]
31 24 23 22 21 20 19 18 17 16 15 8 7 0
I R I U
E P O
Counter Mask N e N E S Unit Mask Event Mask[7–0]
N C S
V s T R
• Event Mask—Bits 7–0, read/write. This field specifies both the event or event duration to be
counted by the corresponding PerfCtrn register. The events that can be counted are implementation
dependent. For more information, refer to the BIOS writer’s guide for the implementation.
• Unit Mask—Bits 15–8, read/write. This field can be used to specify a particular processor unit to
be monitored, if the event counted can be produced by multiple units within the processor.
Implementations can also use this field to further specify or qualify a monitored event.
• Operating-System Mode (OS) and User Mode (USR)—Bits 17–16 (respectively), read/write.
Software uses these bits to control the privilege level at which event counting is performed
according to Table 13-3.
• Edge Detect (E)—Bit 18, read/write. Software sets this bit to 1 to count the number of edge
transitions from the negated to asserted state. This feature is useful when coupled with event-
duration monitoring, as it can be used to calculate the average time spent in an event. Clearing this
bit to 0 disables edge detection.
• Pin Control (PC)—Bit 19, read/write. Software sets this bit to 1 to cause the external PMi pins on
the processor to toggle when the counter overflows. When this bit is cleared to 0, the processor
toggles the PMi pins each time it increments the performance counter.
• Interrupt Enable (INT)—Bit 20, read/write. Software sets this bit to 1 to enable an interrupt to
occur when the performance counter overflows (see “Counter Overflow” on page 346 for
additional information). Clearing this bit to 0 disables the triggering of the interrupt.
• Counter Enable (EN)—Bit 22, read/write. Software sets this bit to 1 to enable the PerfEvtSeln
register, and counting in the corresponding PerfCtrn register. Clearing this bit to 0 disables the
register pair.
• Invert Mask (INV)—Bit 23, read/write. Software sets this bit to 1 to invert the comparison result
performed on the counter-mask field, so that a less-than or equal-to comparison can be performed.
Clearing this bit to 0 leaves the comparison result alone, so that a greater-than or equal-to
comparison is reported.
• Counter Mask—Bits 31–24, read/write. This field is used to set a threshold for counting multiple
events that can occur in a single clock. If the number of events occurring in the single clock is
greater than or equal to this field, the corresponding PerfCtrn register is incremented. PerfCtrn is
not incremented if the number of events is less than the count mask.
The INV bit, when set, causes the PerfCtrn register to be incremented when the comparison is less
than or equal to the count mask. In this case, PerfCtrn is not incremented if the number of events is
greater than the count mask.
The performance event-select registers can be read and written only by system software running at
CPL = 0 using the RDMSR and WRMSR instructions, respectively. Any attempt to read or write these
registers at CPL > 0 causes a general-protection exception to occur.
Starting and Stopping. Performance counting in a PerfCtrn register is initiated by setting the
corresponding PerfEvtSeln.EN bit to 1. Counting is stopped by clearing PerfEvtSeln.EN to 0.
Software must initialize the remaining PerfEvtSeln fields with the appropriate setup information
before or at the same time EN is set. Counting begins when the WRMSR instruction that sets
PerfEvtSeln.EN to 1 completes execution. Counting stops when the WRMSR instruction that clears
PerfEvtSeln.EN to 0 completes execution.
63 0
TSC
The TSC is a model-specific register that can also be read using one of the special read time-stamp
counter instructions, RDTSC(Read Time-Stamp Counter (TSC)) or RDTSCP (Read Time-Stamp
Counter and Processor ID). The RDTSC and RDTSCP instructions load the contents of the TSC into
the EDX register and the EAX register. The high 32 bits are loaded into EDX, and the low 32 bits are
loaded into EAX. The RDTSC and RDTSCP instructions can be executed at any privilege level and
from any processor mode. However, system software can disable the RDTSC or RDTSCP instructions
for programs that run at CPL > 0 by setting CR4.TSD (the time-stamp disable bit) to 1. When
CR4.TSD = 1 and CPL > 0, attempts to execute RDSTC or RDSTCP result in a general-protection
exception (#GP).
Some implementations allow the TSC register to be read and written using the RDMSR and WRMSR
instructions, respectively. Support of this capability, however, is not required by the architecture, and
software should avoid using these instructions to access the TSC. The programmer should use the
CPUID instruction to determine whether these features are supported. If EDX bit 4 (as returned by
CPUID function 1) is set, then the processor supports TSC, the RDTSC instruction and CR4.TSD. If
EDX bit 27 returned by CPUID function 8000_0001h is set, then the processor supports the RDTSCP
instruction.
The TSC register can be used by performance-analysis applications, along with the performance-
monitoring registers, to help determine the relative frequency of an event or its duration. Software can
also use the TSC to time software routines to help identify candidates for optimization. In general, the
TSC should not be used to take very short time measurements, because the resulting measurement is
not guaranteed to be identical each time it is made. The RDTSC instruction (unlike the RDTSCP
instruction) is not serializing, and can be executed out-of-order with respect to other instructions
around it. Even when bound by serializing instructions, the system environment at the time the
instruction is executed can cause additional cycles to be counted before the TSC value is loaded into
EDX:EAX.
The behavior of the RDTSC (Read Time-Stamp Counter (TSC)) and RDTSCP (Read Time-Stamp
Counter and Processor ID) is implementation dependent. When using these instructions, programmers
must be aware that the TSC counts at a constant rate, but may be affected by power management events
(such as frequency changes), depending on the processor implementation. Consult the BIOS and
kernel developer’s guide for your AMD processor implementation for information concerning the
effect of power management on the TSC.
• Internal ROMs, such as the microcode ROM and floating-point constant ROM.
• Branch-prediction structures.
EAX is loaded with zero if BIST completes without detecting errors. If any hardware faults are
detected during BIST, a non-zero value is loaded into EAX.
Table 14-2 on page 352 shows the initial state of the segment-register attributes (located in the hidden
portion of the segment registers) following either RESET# or INIT.
x87 Floating-Point State Initialization. Table 14-3 on page 354 shows the differences between the
initial x87 floating-point state following a RESET# and the state established by the FINIT/FNINIT
instruction. An INIT does not modify the x87 floating-point state. The initialization software can
execute an FINIT or FNINIT instruction to prepare the x87 floating-point unit for use by application
software. The FINIT and FNINIT instructions have no effect on the 64-bit media state.
Initialization software should also load the MP, EM, and NE bits in the CR0 register as appropriate for
the operating system. The recommended settings for implementations of the AMD64 architecture are:
• MP=1—Setting MP to 1 causes a device-not-available exception (#NM) to occur when the
FWAIT/WAIT instruction is executed and the task-switched bit (CR0.TS) is set to 1. This supports
operating systems that perform lazy context-switching of x87 floating-point state.
• EM=0—Clearing EM to 0 allows the x87 floating-point unit to execute instructions rather than
causing a #NM exception (CR0.EM=1). System software sets EM to 1 only when software
emulation of x87 instructions is desired.
• NE=1—Setting NE to 1 causes x87 floating-point exceptions to be handled by the floating-point
exception-pending exception (#MF) handler. Clearing this bit causes the processor to externally
indicate the exception occurred, and an external device can then cause an external interrupt to
occur in response.
Refer to “CR0 Register” on page 42 for additional information on these control bits.
64-Bit Media State Initialization. There are no special requirements placed on software to initialize
the processor state used by 64-bit media instructions. This state is initialized completely by the
processor following a RESET#. System software should leave CR0.EM cleared to 0 to allow execution
of the 64-bit media instructions. If CR0.EM is set to 1, attempted execution of the 64-bit media
instructions causes an invalid-opcode exception (#UD).
128-Bit Media State Initialization. BIOS or system software must also prepare the processor to
allow execution of 128-bit media instructions. The required preparations include:
• Leaving CR0.EM cleared to 0 to allow execution of the 128-bit media instructions. If CR0.EM is
set to 1, attempted execution of the 128-bit media instructions causes an invalid-opcode exception
(#UD).
• Enabling the 128-bit media instructions by setting CR4.OSFXSR to 1. Software cannot execute the
128-bit media instructions unless this bit is set. Setting this bit also indicates that system software
uses the FXSAVE and FXRSTOR instructions to save and restore, respectively, the 128-bit media
state. These instructions also save and restore the 64-bit media state and x87 floating-point state.
• Indicating that system software uses the SIMD floating-point exception (#XF) for handling 128-bit
media floating-point exceptions. This is done by setting CR4.OSXMMEXCPT to 1.
• Setting (optionally) the MXCSR mask bits to mask or unmask 128-bit media floating-point
exceptions as desired. Because this register can be read and written by application software, it is
not absolutely necessary for system software to initialize it.
Refer to “CR4 Register” on page 47 for additional information on these CR4 control bits.
Some model-specific features are not pervasive across processor implementations and are therefore
not described in this volume. For more information on these features and their initialization
requirements, refer to the BIOS writer’s guide for the implementation.
- A read/write data segment that can be used as a protected-mode stack. This stack can be used
by the interrupt mechanism if interrupts or exceptions occur.
Software can optionally load the GDT with one or more data segment descriptors, a TSS descriptor,
and an LDT descriptor for use by long-mode initialization software.
After the protected-mode data structures are initialized, system software must load the IDTR and
GDTR (and optionally, the LDTR and TR) with pointers to those data structures. Once these registers
are initialized, protected mode can be enabled by setting CR0.PE to 1.
If legacy paging is used during the long-mode initialization process, the page-translation tables must
be initialized before enabling paging. At a minimum, one page directory and one page table are
required to support page translation. The CR3 register must be loaded with the starting physical
address of the highest-level table supported in the page-translation hierarchy. After these structures are
initialized and protected mode is enabled, paging can be enabled by setting CR0.PG to 1.
The existing protected-mode GDT can be used to hold the long-mode descriptors described above.
• A single 64-bit TSS for holding the privilege-level 0, 1, and 2 stack pointers, the interrupt-stack-
table pointers, and the I/O-redirection-bitmap base address (if required). This is the only TSS
required, because hardware task-switching is not supported in long mode. See “64-Bit Task State
Segment” on page 317 for more information.
• The 4-level page-translation tables required by long mode. Long mode also requires the use of
physical-address extensions (PAE) to support physical-address sizes greater than 32 bits. See
“Long-Mode Page Translation” on page 128 for more information.
If paging is enabled during the initialization process, it must be disabled before enabling long mode.
After the long-mode data structures are initialized, and paging is disabled, software can enable and
activate long mode.
Address Data
Mode Size Size
(bits)2 (bits)2
CS.D
CS.L
64-Bit
1 0 64 32
Long Mode
1
Mode Compatibility 1 32 32
0
Mode 0 16 16
1 32 32
Legacy Mode 0 x
0 16 16
Note:
1. EFER.LMA is set by the processor when software sets EFER.LME and CR0.PG
according to the sequence described in “Activating Long Mode” on page 359.
2. See “Instruction Prefixes” in Volume 1 for overrides to default sizes.
Long mode uses two code-segment-descriptor bits, CS.L and CS.D, to control the operating
submodes. If long mode is active, CS.L = 1, and CS.D = 0, the processor is running in 64-bit mode, as
shown in Table 14-4 on page 358. With this encoding (CS.L=1, CS.D=0), default operand size is 32
bits and default address size is 64 bits. Using instruction prefixes, the default operand size can be
overridden to 64 bits or 16 bits, and the default address size can be overridden to 32 bits.
The final encoding of CS.L and CS.D in long mode (CS.L=1, CS.D=1) is reserved for future use.
When long mode is active and CS.L is cleared to 0, the processor is in compatibility mode, as shown in
Table 14-4 on page 358. In compatibility mode, CS.D controls default operand and address sizes
exactly as it does in the legacy x86 architecture. Setting CS.D to 1 specifies default operand and
address sizes as 32 bits. Clearing CS.D to 0 specifies default operand and address sizes as 16 bits.
relocate the page tables anywhere in physical memory, and re-initialize the CR3 register, after long
mode is activated.
Intercepting physical interrupt delivery. The VMM can request that physical interrupts cause a
running guest to exit, allowing the VMM to process the interrupt.
Virtual interrupts. The VMM can inject virtual interrupts into the guest. Under control of the VMM,
a virtual copy of the EFLAGS.IF interrupt mask bit, and a virtual copy of the APIC's task priority
register are used transparently by the guest instead of the physical resources.
Sharing a physical APIC. SVM allows multiple guests to share a physical APIC while guarding
against malicious or defective guests that might leave high-priority interrupts unacknowledged forever
(and thus shut out other guest's interrupts).
Attestation. The SKINIT instruction and associated system support (the Trusted Platform Module, or
TPM) allow for verifiable startup of trusted software (such as a VMM), based on secure hash
comparison.
“VMSAVE and VMLOAD Instructions” on page 388, “Global Interrupt Flag, STGI and CLGI
Instructions” on page 390.)
• Intercepts—allow the VMM to intercept sensitive operations in the guest. (“Intercept Operation”
on page 375 through “Miscellaneous Intercepts” on page 388)
• Interrupt and APIC assists—physical interrupt intercepts, virtual interrupt support, APIC.TPR
virtualization. (“Global Interrupt Flag, STGI and CLGI Instructions” on page 390 and “Interrupt
and Local APIC Support” on page 393)
• SMM intercepts and assists (“SMM Support” on page 396)
• External (DMA) access protection (“External Access Protection” on page 399)
• Nested paging support for two levels of address translation. (“Nested Paging” on page 406)
• Security—SKINIT instruction. (“Secure Startup with SKINIT” on page 414)
if (VM_CR.SVMDIS == 0)
return SVM_ALLOWED;
if (CPUID 8000_000A.EDX[SVM_LOCK]==0)
return SVM_DISABLED_AT_BIOS_NOT_UNLOCKABLE
// the user must change a BIOS setting to enable SVM
else return SVM_DISABLED_WITH_KEY;
// SVMLock may be unlockable; consult the BIOS or TPM to obtain the key.
• various control bits that specify the execution environment of the guest or that indicate special
actions to be taken before running guest code, and
• guest processor state (such as control registers, etc.).
Saving Host State. To assure that the host can resume operation after #VMEXIT, VMRUN saves at
least the following host state information at the physical address specified in the new MSR
VM_HSAVE_PA:
• CS.SEL, NEXT_RIP—The CS selector and rIP of the instruction following the VMRUN. On
#VMEXIT the host resumes running at this address.
• RFLAGS, RAX—Host processor mode and the register used by VMRUN to address the VMCB.
• SS.SEL, RSP—Stack pointer for host.
• CR0, CR3, CR4, EFER—Paging/operating mode for host.
• IDTR, GDTR—The pseudo-descriptors. VMRUN does not save or restore the host LDTR.
Loading Guest State. After saving host state, VMRUN loads the following guest state from the
VMCB:
• CS, rIP—Guest begins execution at this address. The hidden state of the CS segment register is
also loaded from the VMCB.
• RFLAGS, RAX.
• SS, RSP—Includes the hidden state of the SS segment register.
• CR0, CR2, CR3, CR4, EFER—Guest paging mode. Writing paging-related control registers with
VMRUN does not flush the TLB since address spaces are switched. See Section 15.15, “TLB
Control,” on page 389.
• INTERRUPT_SHADOW—This flag indicates whether the guest is currently in an interrupt
lockout shadow; see “Interrupt Shadows” on page 395.
• IDTR, GDTR.
• ES and DS—Includes the hidden state of the segment registers.
• DR7 and DR6—The guest’s breakpoint state.
• V_TPR—The guest’s virtual TPR.
• V_IRQ—The flag indicating whether a virtual interrupt is pending in the guest.
• CPL—If the guest is in real mode, the CPL is forced to 0; if the guest is in v86 mode, the CPL is
forced to 3. Otherwise, the CPL saved in the VMCB is used.
The processor checks the loaded guest state for consistency. If a consistency check fails while loading
guest state, the processor performs a #VMEXIT. For additional information, see “Canonicalization
and Consistency Checks” on page 373.
If the guest is in PAE paging mode according to the registers just loaded, the processor will also read
the four PDPEs pointed to by the newly loaded CR3 value; setting any reserved bits in the PDPEs also
causes a #VMEXIT.
It is possible for the VMRUN instruction to load a guest rIP that is outside the limit of the guest code
segment or that is non-canonical (if running in long mode). If this occurs, a #GP fault is delivered
inside the guest; the rIP falling outside the limit of the guest code segment is not considered illegal
guest state.
After all guest state is loaded, and intercepts and other control bits are set up, the processor reenables
interrupts by setting GIF to 1. It is assumed that VMM software cleared GIF some time before
executing the VMRUN instruction, to ensure an atomic state switch.
Control Bits. Besides loading guest state, the VMRUN instruction reads various control fields from
the VMCB; most of these fields are not written back to the VMCB on #VMEXIT, since they cannot
change during guest execution:
• TSC_OFFSET—an offset to add when the guest reads the TSC (time stamp counter). Guest writes
to the TSC can be intercepted and emulated by changing the offset (without writing the physical
TSC). This offset is cleared when the guest exits back to the host.
• V_INTR_PRIO, V_INTR_VECTOR, V_IGN_TPR—fields used to describe a virtual interrupt for
the guest (see “Injecting Virtual (INTR) Interrupts” on page 394).
• V_INTR_MASKING—controls whether masking of interrupts (in EFLAGS.IF and TPR) is to be
virtualized (see Section 15.20 on page 393).
• The address space ID (ASID) to use while running the guest. (See the CPUID Specification, order#
25481, for feature identification, including how many ASIDs are implemented.)
• A field to control flushing of the TLB during a VMRUN (see Section 15.15).
• The intercept vector describing the active intercepts for the guest. On exit from the guest, the
internal intercept registers are cleared so no host operations will be intercepted.
Segment State in the VMCB. The segment registers are stored in the VMCB in a format similar to
that for SMM: both base and limit are fully expanded; segment attributes are stored as 12-bit values
formed by the concatenation of bits 55–52 and 47–40 from the original 64-bit (in-memory) segment
descriptors; the descriptor “P” bit is used to signal NULL segments (P==0) where permissible and/or
relevant. When loaded from the VMCB, only some of the attribute bits are observed by hardware,
depending on the segment register in question:
• CS—D, L, R (null code segments are not allowed).
• SS—B, P, DPL, E, W (null stack segments allowed in 64-bit mode only).
• DS, ES, FS, GS —D, P, DPL, E, W, Code/Data.
• LDTR—Only the P bit is observed.
• TR—Only TSS type (32 or 16 bit) is relevant, since a null TSS is not allowed.
The VMM should follow these rules when storing segment attributes into the VMCB:
• For NULL segments, set all attribute bits to zero; otherwise, write the concatenation of bits
[55–52] and [47–40] from the original 64-bit (in-memory) segment descriptors.
• The processor reads the current privilege level from the CPL field in the VMCB, not from SS.DPL.
However, SS.DPL should match the CPL field.
• When in virtual x86 or real mode, the processor ignores the CPL field in the VMCB and forces the
values of 3 and 0, respectively.
When examining segment attributes after a #VMEXIT:
• Test the Present (P) bit to check whether a segment is NULL; note that CS and TR never contain
NULL segments and so their P bit is ignored;
• Retrieve the CPL from the CPL field in the VMCB, not from any segment DPL.
VMRUN and TF/RF Bits in EFLAGS. When considering interactions of VMRUN with the TF and
RF bits in EFLAGS, one must distinguish between the behavior of host as opposed to that of the guest.
From the host point of view, VMRUN acts like a single instruction, even though an arbitrary number of
guest instructions may execute before a #VMEXIT effectively completes the VMRUN. As a single
host instruction, VMRUN interacts with EFLAGS.RF and EFLAGS.TF like ordinary instructions.
EFLAGS.RF suppresses any potential instruction breakpoint match on the VMRUN, and EFLAGS.TF
causes a #DB trap after the VMRUN completes on the host side (i.e., after the #VMEXIT from the
guest). As with any normal instruction, completion of the VMRUN instruction clears the host
EFLAGS.RF bit.
The value of EFLAGS.RF from the VMCB affects the first guest instruction. When VMRUN loads a
guest value of 1 for EFLAGS.RF, that value takes effect and suppresses any potential (guest)
instruction breakpoint on the first guest instruction. When VMRUN loads a guest value of 1 in
EFLAGS.TF, that value does not cause a trace trap between the VMRUN and the first guest
instruction, but rather after completion of the first guest instruction.
Host values of EFLAGS have no effect on the guest and guest values of EFLAGS have no effect on the
host.
See also Section 15.7.1 on page 375 regarding the value of EFLAGS.RF saved on #VMEXIT.
15.6 #VMEXIT
When an intercept triggers, the processor performs a #VMEXIT (i.e., an exit from the guest to the host
context).
On #VMEXIT, the processor:
• Disables interrupts by clearing the GIF, so that after the #VMEXIT, VMM software can complete
the state switch atomically.
• Writes back to the VMCB the current guest state—the same subset of processor state as is loaded
by the VMRUN instruction, including the V_IRQ, V_TPR, and the INTERRUPT_SHADOW bits.
• Saves the reason for exiting the guest in the VMCB’s EXITCODE field; additional information
may be saved in the EXITINFO1 or EXITINFO2 fields, depending on the intercept.
• Clears all intercepts.
• Resets the current ASID register to zero (host ASID).
• Clears the V_IRQ and V_INTR_MASKING bits inside the processor.
• Clears the TSC_OFFSET inside the processor.
• Reloads the host state previously saved by the VMRUN instruction. The processor reloads the
host’s CS, SS, DS, and ES segment registers and, if required, re-reads the descriptors from the
host’s segment descriptor tables, depending on the implementation. The segment descriptor tables
must be mapped as present and writable by the host's page tables. Software should keep the host’s
segment descriptor tables consistent with the segment registers when executing VMRUN
instructions. Immediately after #VMEXIT, the processor still contains the guest value for LDTR.
So for CS, SS, DS, and ES, the VMM must only use segment descriptors from the global descriptor
table. Any exception encountered while reloading the host segments causes a shutdown.
• If the host is in PAE mode, the processor reloads the host's PDPEs from the page table indicated by
the host's CR3. If the PDPEs contain illegal state, the processor causes a shutdown.
• Forces CR0.PE = 1, RFLAGS.VM = 0.
• Sets the host CPL to zero.
• Disables all breakpoints in the host DR7 register.
• Checks the reloaded host state for consistency; any error causes the processor to shutdown. If the
host’s rIP reloaded by #VMEXIT is outside the limit of the host’s code segment or non-canonical
(in the case of long mode), a #GP fault is delivered inside the host.
Exception intercepts. Exception intercepts are checked when normal instruction processing must
raise an exception—before resolving possible double-fault conditions according to table 8-3 and
before attempting delivery of the exception (which includes pushing an exception frame, accessing the
IDT, etc.).
For some exceptions, the processor still writes certain exception-specific registers even if the
exception is intercepted. (See the descriptions in Section 15.11 on page 383 and following for details.)
When an external or virtual interrupt is intercepted, the interrupt is left pending.
When an intercept occurs while the guest is in the process of delivering a non-intercepted interrupt or
exception using the IDT, SVM provides additional information on #VMEXIT (See Section 15.7.2 on
page 376).
CS:rIP points to the following instruction, and the saved DR7 includes the effects of matching the data
breakpoint.
Some exceptions write special registers even when they are intercepted; see the individual descriptions
in “Exception Intercepts” on page 383 for details.
63 32 31 30 12 11 10 8 7 0
Despite the instruction name, the events raised by the INT1 (also known as ICEBP), INT3 and INTO
instructions (opcodes F1h, CCh and CEh) are considered exceptions for the purposes of
EXITINTINFO, not software interrupts. Only events raised by the INTn instruction (opcode CDh) are
considered software interrupts.
• Error Code Valid—Bit 11. Set to 1 if the guest exception would have pushed an error code;
otherwise cleared to zero.
• Valid—Bit 31. Set to 1 if the intercept occurred while the guest attempted to deliver an exception
through the IDT; otherwise cleared to zero.
• Errorcode—Bits 63–32. If EV is set to 1, holds the error code that the guest exception would have
pushed; otherwise is undefined.
In the case of multiple exceptions, EXITINTINFO records the aggregate information on all exceptions
but the last (intercepted) one.
Example: A guest raises a #GP during delivery of which a #NP is raised (a scenario that, according to
x86 rules, resolves to a #DF), and an intercepted #PF occurs during the attempt to deliver the #DF.
Upon intercept of the #PF, EXITINTINFO indicates that the guest was in the process of delivering a
#DF when the #PF occurred. The information about the intercepted page fault itself is encoded in the
EXITCODE, EXITINFO1 and EXITINFO2 fields. If the VMM decides to repair and dismiss the #PF,
it can resume guest execution by re-injecting (see “Event Injection” on page 391) the fault recorded in
EXITINTINFO. If the VMM decides that the #PF should be reflected back to the guest, it must
combine the event in EXITINTINFO with the intercepted exception according to x86 rules (see table
8-3). In this case, a #DF plus a #PF would result in a triple fault or shutdown.
When an exception triggers an intercept, the EXITCODE, and optionally EXITINFO1 and
EXITINFO2, fields always reflect the intercepted exception, while EXITINTINFO, if marked valid,
indicates the prior exception the guest was attempting to deliver when the intercept occurred.
I/O Permissions Map. The I/O Permissions Map (IOPM) occupies 12 Kbytes of contiguous physical
memory. The table is structured as a linear array of 64K+3 bits (two 4-Kbyte pages, and the first three
bits of a third 4-Kbyte page) and must be aligned on a 4-Kbyte boundary; the physical base address of
the IOPM is specified in the IOPM_BASE_PA field in the VMCB and loaded into the processor by the
VMRUN instruction. The VMRUN instruction ignores the lower 12 bits of the address specified in the
VMCB. If the address of the last byte in the table is greater than or equal to the maximum supported
physical address, this is treated as illegal VMCB state and causes a #VMEXIT(VMEXIT_INVALID).
Each bit in the table corresponds to an 8-bit I/O port. Bit 0 in the table corresponds to I/O port 0, bit 1
to I/O port 1 and so on. A bit set to 1 indicates that accesses to the corresponding port should be
intercepted. The IOPM is accessed by physical address, and should reside in memory that is mapped as
writeback (WB).
IN and OUT Behavior. If the IOIO_PROT intercept bit is set, the IOPM table controls port access.
For IN/OUT instructions that access more than a single byte, the permission bits for all bytes are
checked; if any bit is set to 1, the I/O operation is intercepted.
Exceptions related to virtual x86 mode, IOPL, or the TSS-bitmap are checked before the SVM
intercept check. All other exceptions are checked after the SVM intercept check.
I/O Intercept Information. When an IOIO intercept triggers, the following information (describing
the intercepted operation in order to facilitate emulation) is saved in the VMCB’s EXITINFO1 field:
31 16 15 10 9 8 7 6 5 4 3 2 1 0
T
S S S R S
A A A Y
PORT Reserved Z Z Z E T 0
64 32 16 P
32 16 8 P R
E
The rIP of the instruction following the IN/OUT is saved in EXITINFO2, so that the VMM can easily
resume the guest after I/O emulation.
MSR Permissions Map. The MSR permissions bitmap consists of a number of smaller separate
bitmaps of 2K bytes each covering a defined range of 8K MSRs. Four of these smaller bitmaps reside
in two physical pages (8KB, covering 32K MSRs). One 8Kbyte range is used for the Pentium®
compatible MSRs, the next 8K range is used for the AMD sixth generation x86 processor (AMD-K6®)
MSRs, and the third 8K range for the AMD seventh and eighth generation x86 processors (e.g., the
AMD Athlon™ and AMD Opteron™) MSRs. If the MSR_PROT intercept is active, any attempt to
read or write an MSR not covered by the bitmap will automatically cause an intercept.
The MSRPM is accessed by physical address, and should reside in memory that is mapped as
writeback (WB). The MSRPM must be aligned on a 4KB boundary. The physical base address of the
MSRPM is specified in MSRPM_BASE_PA field in the VMCB and is loaded into the processor by the
VMRUN instruction. The VMRUN instruction ignores the lower 12 bits of the address specified in the
VMCB, and if the address of the last byte in the table is greater than or equal to the maximum
supported physical address, this is treated as illegal VMCB state and causes a
#VMEXIT(VMEXIT_INVALID).
Table 15-3 defines the ranges of the MSR permissions map. For each MSR mapped by the table, two
bits are allocated—the lower order of the two bits controls read access to the MSR, and the higher
order of the two bits controls write access. A bit value of 1 indicates that the operation is intercepted.
RDMSR and WRMSR Behavior. If the MSR_PROT bit in the VMCB’s intercept vector is clear,
RDMSR/WRMSR instructions are not intercepted.
RDMSR and WRMSR instructions check for exceptions and intercepts in the following order:
• Exceptions common to all MSRs (e.g., #GP if not at CPL-0)
• Check SVM intercepts in the MSR permission map, if the MSR_PROT intercept is requested.
• Exceptions specific to a given MSR (including password protection, unimplemented MSRs,
reserved bits, etc.)
MSR Intercept Information. On #VMEXIT, the processor indicates in the VMCB’s EXITINFO1
whether a RDMSR (EXITINFO1 = 0) or WRMSR (EXITINFO1 = 1) was intercepted.
Example: Assume that the VMM intercepts #GP and #DF exceptions, and the guest raises a (non-
intercepted) #NP, during the delivery of which it also gets a #GP (e.g., due to an illegal IDT entry)—a
situation that, according to x86 semantics, results in a #DF. In this case, #VMEXIT signals an
intercepted #GP, not an intercepted #DF and fills EXITINTINFO with the #NP fault. On the other
hand, if only the #DF intercept were active in this scenario, #VMEXIT would signal an intercepted
#DF.
The following subsections detail the individual intercepts.
#DB (Debug)
The #DB exception can have fault-type (e.g., instruction breakpoint) or trap-type (e.g., data
breakpoint) behavior; accordingly the intercept differs in what state is saved in the VMCB (see “State
Saved on Exit” on page 375). In either case, however, the value saved for DR6 and DR7 matches what
would be visible to a #DB exception handler (i.e., both #DB faults and traps are permitted to write
DR6 and DR7 before the intercept). The EXITINFO1 and EXITINFO2 fields are undefined.
Fault-type #DB exceptions, whether indicated in EXITCODE or EXITINTINFO, cause the CS:rIP
saved in the VMCB to indicate the instruction that caused the #DB exception. Trap-type #DB
exceptions cause the VMCB’s CS:rIP to indicate the instruction following the instruction that caused
the exception. A vector 1 exception generated by the single byte INT1 instruction (also known as
ICEBP) does not trigger the #DB intercept. Software should use the dedicated ICEBP intercept to
intercept ICEBP (see “Instruction Intercepts” on page 378).
63 48 47 44 43 42 41 40 39 38 37 36 35 34 33 32
T
R S S S R S V
T A A A Y
PORT BRP A Z Z Z E T A
F
Z
64 32 16 P
32 16 8 P R L
E
31 1 0
S
M
Reserved I
0 S
R
C
provide access to hidden processor state that software cannot otherwise access, as well as additional
privileged state.
VMSAVE saves the following state to the VMCB indicated by rAX:
• FS, GS, TR, LDTR (including all hidden state)
• KernelGsBase
• STAR, LSTAR, CSTAR, SFMASK
• SYSENTER_CS, SYSENTER_ESP, SYSENTER_EIP
VMLOAD loads the corresponding state from the VMCB. VMLOAD and VMSAVE are available
only at CPL-0 (#GP otherwise), and in protected mode with SVM enabled in EFER.SVME (#UD
otherwise).
Software Rule. When the VMM changes a guest’s paging mode by changing entries in the guest’s
VMCB, the VMM must ensure that the guest’s TLB entries are flushed from the TLB. The relevant
VMCB state includes:
• CR0—PG, WP, CD, NW.
• CR3—Any bit.
• CR4—PGE, PAE, PSE.
When an event is injected by means of this mechanism, the VMRUN instruction causes the guest to
unconditionally take the specified exception or interrupt before executing the first guest instruction.
Injected events are treated in every way as though they had occurred normally in the guest (in
particular, they are recorded in EXITINTINFO) with the following exceptions:
• Injected events are not subject to intercept checks. (Note, however, that if secondary exceptions
occur during delivery of an injected event, those exceptions are subject to exception intercepts.)
• An injected NMI does not block delivery of further NMIs.
• If the VMM attempts to inject an event that is impossible for the guest mode (e.g., a #BR exception
when the guest is in 64-bit mode), the event injection will fail and no guest state instructions will be
executed; VMRUN will immediately exit with an error code of VMEXIT_INVALID.
63 32 31 30 12 11 10 8 7 0
• EV (Error Code Valid)—Bit 11. Set to 1 if the exception should push an error code onto the stack;
clear to 0 otherwise.
• V (Valid)—Bit 31. Set to 1 if an event is to be injected into the guest; clear to 0 otherwise.
• ERRORCODE—Bits 63–32. If EV is set to 1, the error code to be pushed onto the stack, ignored
otherwise. Injecting an exception (TYPE = 3) with vectors 3 or 4 behaves like a trap raised by
INT3 and INTO instructions, respectively, in which case the processor checks the DPL of the IDT
descriptor before dispatching to the handler.
To improve the efficiency of TPR accesses in 32-bit mode, SVM makes CR8 available to 32-bit code
by means of an alternate encoding of MOV TO/FROM CR8 (namely, MOV TO/FROM CR0 with a
LOCK prefix). To achieve better performance, 32-bit guests should be modified to use this access
method, instead of the memory-mapped TPR. (For details, see “MOV (CRn)” on page 286 of the
AMD64 Programmer’s Reference Volume 3: General Purpose and System Instructions, order# 24594.)
The alternate encodings of the MOV TO/FROM CR8 instructions are available even if SVM is
disabled in EFER.SVME. They are available in both 64-bit and 32-bit mode.
• Physical interrupts take priority over virtual interrupts, whether they are taken directly or through a
#VMEXIT.
• On #VMEXIT, the processor clears its internal copies of V_IRQ and V_INTR_MASKING, so
virtual interrupts do not remain pending in the VMM, and interrupt control reverts to normal.
FFFFFFF0h). Unlike RESET#, INIT is not expected to be visible to the memory controller, and hence
will not trigger automatic clearing of trusted memory pages by memory controller hardware.
To maintain the security of such pages, the VMM can request that INITs be redirected and turned into
#SX exceptions by setting the R_INIT bit in the VM_CR MSR (see Section 15.28.1, “VM_CR MSR
(C001_0114h),” on page 420). This allows the VMM to gain control when an INIT is requested. The
VMM may thus disable the redirection of INIT and then cause the platform to reassert INIT, at which
point the processor will respond in the normal manner. The actions initiated by the INIT pin may also
be initiated by an incoming APIC INIT interrupt; the mechanisms described here apply in either case.
Table 15-6 summarizes the handling of INITs.
By intercepting SMIs, the VMM can gain control before the processor enters SMM.
hypervisor is undefined. To handle a pending SMI due to an I/O instruction, the hypervisor must
either containerize SMM or not intercept SMI.
• The most involved solution is to containerize SMM by placing it in a guest. Containerizing gives
the VMM full control over the state that the SMM handler can access.
Containerizing Platform SMM. A VMM can containerize SMM by creating its own trusted SMM
hypervisor and use that handler to run the platform SMM code in a container. The SMM hypervisor
may be the same code as the VMM itself, or may be an entirely different set of code. The trusted SMM
hypervisor sets up a guest context to run the platform SMM as a guest. The guest context consists of a
VMCB and related state and the guest's (real or virtual) SMM save area. The SMM hypervisor
emulates SMM entry, including setup of the SMM save area, and emulates RSM at the end of SMM
operation. The guest executes the platform SMM code in paged real mode with appropriate SVM
intercepts in place, thus ensuring security.
For this approach to work, the VMM may need to write the SMM_BASE MSR, as well as related
SMM control registers. As part of the emulation of SMM entry and RSM, the VMM needs to access
the SMM_CTL MSR (see Section 15.28.3, “SMM_CTL MSR (C001_0116h),” on page 421).
However, these actions conflict with any BIOS that locks SMM control registers.
A VMM can determine if it is running with a compatible BIOS setup by checking the SMMLOCK bit
in the HWCR MSR (described in the applicable BIOS and Kernel Developer's Guide for your
processor). If the bit is 1, the BIOS has locked the SMM control registers and the VMM is unable to
move them or insert its own SMM hypervisor.
As the processor physically enters SMM, the SMRAM regions are remapped. The VMM design must
ensure that none of its code or data disappears when the SMRAM areas are mapped or unmapped.
Also note that the ASEG region of the SMRAM overlaps with a portion of video memory, so the SMM
hypervisor should not attempt to write diagnostic messages to the screen. Any attempt by guests to
relocate any of the SMRAM areas (by means of certain MSR writes) must also be intercepted to
prevent malicious SMM code from interfering with VMM operation.
Writes to the SMM_CTL MSR cause a #GP if the BIOS has locked the SMM control registers.
Host Bridge and Processor DEV Caching. For improved performance, the host bridge may cache
portions of the DEV. Any such cached information can be invalidated by setting the DEV_FLUSH flag
in the DEV control register to 1. Software must set this flag after modifying DEV contents to ensure
that the protection logic uses the updated values. The host bridge automatically clears this flag when
the flush operation completes. After setting this flag, software should monitor it until it has cleared, in
order to synchronize DEV updates with subsequent activity.
By default, the host bridge probes the processor caches for the latest data when it accesses the DEV in
DRAM. However, it is possible to disable probing by means of the DEV_CR register (see “DEV_CR
Register” on page 404); this is recommended in the case of unified memory architecture (UMA)
graphics systems. If cache probing is disabled, host bridge reads of the DEV will not check processor
caches for more recent copies. This requires software on the CPU to map the memory containing the
DEV as uncacheable (UC) or write-through (WT). Alternatively, software must perform a CLFLUSH
before it can expect a change to the DEV to be visible by the Northbridge (and before software flushes
the DEV cache in the host controller).
Multiprocessor Issues. Device-originated memory requests are checked against the DEV at the
point of entry to the system—the Northbridge to which the device is physically attached. Each
Northbridge can have its own set of domains, device-to-domain mappings, and DEV tables (e.g.,
domain #2 on one node can encompass different devices, and can have different access rights than
domain #2 on another node). Thus, the number of protection domains available to software can scale
with the number of Northbridges in the system.
Memory Space Accesses. When a memory-space read or write request is received on an external
host bridge port, the host bridge maps the HyperTransport bus device ID to a protection domain
number, which in turn selects the DEV defining the access permissions for the device (see
Figure 15-5 on page 402). The host bridge then checks the memory address against the DEV contents
by indexing into the DEV with the PFN portion of the address (bits 39–12). The PFN is used as a bit
index within the DEV. If the bit read from the DEV is set to 1, the host bridge inhibits the access by
returning all ones for the data for a read request, or suppressing the store operation on a write request.
A Master Abort error response will be returned to the requesting device.
Peer-to-peer memory accesses routed up to the host bridge are also subjected to checks against the
DEV. Peer-to-peer transfers that may be occurring behind bridges are not checked.
DEV checks are applied before addresses are translated by the GART. The DEV table is never
consulted by accesses originating in the CPU.
I/O Space Accesses. The host bridge can be configured to reject all I/O space accesses from
devices, by setting the IOSPE bit in the DEV_CR control register (see “DEV_CR Register” on
page 404). I/O space peer-to-peer transfers behind bridges are not checked.
Config Space Accesses. Major aspects of host bridge functionality are configured by means of
control registers that are accessed through PCI configuration space. Because this is potentially
accessible by means of device peer-to-peer transfers, the host bridge always blocks access to this space
from anything other than the CPU.
Physical Address
TM
HyperTransport
Bus/Dev ID Bus/Dev ID DEV Cache
Domain# Tagged
to
Domain#
with
Domain#
(Zero if No Match)
DEV_BASE/LIMIT[0]
DEV_BASE/LIMIT[1]
DEV Table
DEV_BASE/LIMIT[2] Walker
DEV_BASE/LIMIT[3]
DEV Capability Header. The DEV capability header (DEV_HDR) is defined in Table 15-10.
Table 15-10. DEV Capability Header (DEV_HDR) (in PCI Config Space)
Bit(s) Definition
31–22 Reserved, MBZ
21 Interrupt Reporting Capability
20 Machine Check Exception Reporting Capability
19 Reserved, MBZ
18–16 DEV Capability Block Type; hardwired to 000b.
15–8 PCI Capability pointer; points to next capability in list
7–0 PCI Capability ID; hardwired to 0x0F
31 16 15 8 7 0
The FUNCTION field in the DEV_OP register selects the function/register to read or write according
to the encoding in Table 15-11; for blocks of registers that have multiple instances (e.g., multiple
DEV_BASE_HI/LO registers), the INDEX field selects the instance; otherwise it is ignored.
For example, to write the DEV_BASE_HI register for protection domain number 2, software sets
DEV_OP.FUNCTION to 1, and DEV_OP.INDEX to 2, and then writes the desired 32-bit value into
DEV_DATA. As the DEV_OP and DEV_DATA registers are accessed through PCI config space (ports
0CF8h–0CFFh), they may be secured from unauthorized access by software executing on the
processor by appropriate settings in the SVM I/O protection bitmap. These registers are also protected
by the host bridge from external access as described in “Config Space Accesses” on page 401.
DEV_CAP Register. Read-only register; holds implementation specific information: the number of
protection domains supported, the number of DEV_MAP registers (which map device/unit IDs to
domain numbers), and the revision ID.
31 24 23 16 15 8 7 0
The initial implementation provide four domains and three map registers.
DEV_CR Register. This is the main control register for the DEV mechanism; it is cleared to zero by
RESET.
DEV_BASE Address/Limit Registers. The DEV base address registers (one set per domain) each
point to the physical address of a DEV table corresponding to a protection domain. The address and
size are encoded in a pair (high/low) of 32-bit registers. The N_DOMAINS field in DEV_CAP
indicates how many (pairs of) DEV_BASE registers are implemented. The register format is as shown
in Figures 15-8 and 15-9.
31 7 0
31 12 11 7 6 2 1 0
DEV_MAP Registers. The DEV_MAP registers assign protection domain numbers to device-
originated requests by matching the device ID (HT bus and unit number) associated with the request
against bus and unit numbers in the registers. If no match is found in any of the registers, a domain
number of zero is returned. The number of DEV_MAP registers implemented by the chip is indicated
by the N_MAPS field in DEV_CAP.
The format of the DEV_MAP registers is shown in Figure 15-10.
31 26 25 20 19 12 11 10 6 5 4 0
Linear Space
CR3
PT
With nested paging enabled, two levels of address translation are applied; refer to Figure 15-12 below.
• Both guest and host levels have their own copy of CR3, referred to as gCR3 and nCR3,
respectively.
• Guest page tables (gPT) map guest linear addresses to guest physical addresses. The guest page
tables are in guest physical memory, and are pointed to by gCR3.
• Nested page tables (nPT) map guest physical addresses to system physical addresses. The nested
page tables are in system physical memory, and are pointed to by nCR3.
• The most-recently used translations from guest linear to system physical address are cached in the
TLB and used on subsequent guest accesses.
It is important to note that gCR3 and the guest page table entries contain guest physical addresses, not
system physical addresses. Hence, before accessing a guest page table entry, the table walker first
translates that entry’s guest physical address into a system physical address.
Guest Linear
y
db
gCR3
R3
ge
gC
pa
0
paged by
y
paged b
nCR3
nCR3
nCR3
TLB
d by
page ’s CR3 CR3 (used by VMM)
MM
0 the V
The VMM can give each guest a different ASID, so that TLB entries from different guests can coexist
in the TLB. The ASID value of zero is reserved for the host; if the VMM attempts to execute VMRUN
with a guest ASID of zero, the result is #VMEXIT(VMEXIT_INVALID).
• Final Guest-Physical Page—once a guest linear to guest physical mapping is known, guest
permissions can be checked. If the guest page tables allow the access, the guest physical address is
walked in the nested page tables to find the system physical address.
Table walks for guest page tables are always treated as user writes at the nested page table level. For
this reason,
• the page must be writable by user at the nested page table level, or else a #VMEXIT(NPF) is
raised, and
• the dirty and accessed bits are always set in the nested page table entries that were touched during
nested page table walks for guest page table entries.
A table walk for the guest page itself is always treated as a user access at the nested page table level,
but is treated as a data read, data write, or code read, depending on the guest access.
If the guest has paging disabled (gCR0.PG = 0), there are no guest page table entries to be translated in
the nested page tables. In this case, the final guest-physical address is equal to the guest-linear address,
and is still translated in the nested page tables.
• Bit 4 (ID)—set to 1 if the nested page table level access was a code read. Note that nested table
walks for guest page tables are always treated as data writes, even if the access itself is a code read
Guest faults are entirely a function of the guest page tables and processor mode; they are delivered to
the guest as normal #PF exceptions without any VMM intervention, unless the VMM is intercepting
guest #PF exceptions.
• Nested and guest PAT types are combined according to Table 15-13 on page 412, producing a
“combined PAT type”
• the combined PAT type is further combined with the MTRR type according to Table 15-14 on
page 413, where the relevant MTRRs are determined by the system physical address.
• either gCR0.CD or hCR0.CD can disable caching
Memory Consistency Issues. Because the guest uses extra fields to determine the memory type, the
VMM may use a different memory type to access a given piece of memory than does the guest. If one
access is cacheable and the other is not, the VMM and guest could observe different memory images,
which is undesirable. (MP systems are particularly sensitive to this problem when the VMM desires to
migrate a virtual processor from one physical processor to another.)
To address this issue, the following mechanisms are provided:
• VMRUN and #VMEXIT flush the write combiners. This ensures that all writes to WC memory by
the guest are visible to the host (or vice-versa) regardless of memory type. (It does not ensure that
cacheable writes by one agent are properly observed by WC reads or writes by the other agent.)
• A new memory type WC+ is introduced. WC+ is an uncacheable memory type, and combines
writes in write-combining buffers like WC. Unlike WC (but like the CD memory type), accesses to
WC+ memory also snoop the caches on all processors (including self-snooping the caches of the
processor issuing the request) to maintain coherency. This ensures that cacheable writes are
observed by WC+ accesses.
• When combining nested and guest memory types that are incompatible with respect to caching, the
WC+ memory type is used instead of WC (and Table 15-14 on page 413 ensures that the snooping
behavior is retained regardless of the host MTRR settings). Refer to Table 15-13 on page 412 or
details.
Table 15-13 shows how guest and host PAT types are combined into an effective PAT type. When
interpreting this table, recall that the intent is for the VMM to use its PAT type to simulate guest
MTRRs.
UC– UC UC WC UC UC UC
WC WC WC WC WC+ WC+ WC+
WP UC UC UC WP UC WP
WT UC UC UC UC WT WT
WB UC UC WC WP WT WB
The existing AMD64 table that defines how PAT types are combined with the physical MTRRs is
extended to handle CD and WC+ PAT types as shown in Table 15-14.
UC– UC WC CD CD CD
WC WC WC WC WC WC
WC+ WC WC WC+ WC+ WC+
WP UC CD WP CD WP
WT UC CD CD WT WT
WB UC WC WP WT WB
15.25 Security
SVM provides additional hardware support that is designed to facilitate the construction of trusted
software systems. While the security features described in this section are orthogonal to SVM’s
virtualization support (and are not required for processor virtualization), the two form building blocks
for trusted systems.
SKINIT Instruction. The SKINIT instruction and associated system support (the Trusted Platform
Module or TPM) are designed to allow for verifiable startup of trusted software (such as a VMM),
based on secure hash comparison.
Security Exception. A security exception (#SX) is used to signal certain security-critical events.
The first word (16 bits) of the SL image must specify the SL entry point as an unsigned offset into the
SL image. The second word must contain the length of the image in bytes; the maximum length
allowed is 65535 bytes. These two values are used by the SKINIT instruction. The layout of the rest of
the image is determined by software conventions. The image typically includes a digital signature for
validation purposes. The digital signature hash must include the entry point and length fields. SKINIT
transfers the SL image to the TPM for validation prior to starting SL execution (see “SKINIT
Operation” on page 417 for further details of this transfer). The SL image for which the hash is
computed must be ready to execute without prior manipulation.
SL Stack SL Runtime
Data Area
64 KB
SL Code SL Image
and (Hash Area)
Static Data
SL Entry Point
SL Header
X
T EA
s t SKINI
Length EP Offset Po
31 16 15 0
7. Update the ESP register to point to the first byte beyond the end of the SLB (SLB base + 65536),
so that the first item pushed onto the stack by the SL will be at the top of the SLB.
8. Add the unsigned 16-bit entry point offset value from the SLB to the SLB base address to form
the SL entry point address, and jump to it.
The validation of the SL image by the TPM is a one-way transaction as far as SKINIT is concerned. It
does not depend on any response from the TPM after transferring the SL image before jumping to the
SL entry point, and initiates execution of the Secure Loader unconditionally. Because of the processor
initialization performed, SKINIT does not honor instruction or data breakpoint traps, or trace traps due
to EFLAGS.TF.
Pending interrupts. Device interrupts that may be pending prior to SKINIT execution due to
EFLAGS.IF being clear, or that assert during the execution of SKINIT, will be held pending until
software subsequently sets GIF to 1. Similarly, SMI, INIT and NMI interrupts that assert after the start
of SKINIT execution will also be held pending until GIF is set to 1.
15.26.7 SL Abort
If the SL determines that it cannot properly initialize a valid SK, it must cause GIF to be set to 1 and
clear the VM_CR MSR to re-enable normal processor operation.
Software Requirements for Secure MP initialization. The driver that starts the SL must execute on
the BSP. Prior to executing the SKINIT instruction, the driver must save any processor-specific system
register contents to memory for restoration after reinitialization of the APs. The driver should also put
all APs in an idle state. The driver must first confirmed that all APs are idle and then it must issue an
INIT IPI to all APs and wait for its local APIC busy indication to clear. This places the APs into a
halted state which is responsive only to a subsequent Startup IPI. APs will still respond to snoops for
cache coherency. The driver may execute SKINIT at any time after this point. Depending on processor
implementation, a fixed delay of no more than 1000 processor cycles may be necessary before
executing SKINIT to ensure reliable sensing of APIC INIT state by the SKINIT.
AP Startup Sequence. While the SL starts executing on the BSP, the APs remain halted in APIC
INIT state. Either the SL or the SK may issue the Startup IPI for the APs at whatever point is deemed
appropriate. The Startup IPI conveys an 8-bit vector specified by the software that issues the IPI to the
APs. This vector provides the upper 8 bits of a 20-bit physical address. Therefore, the AP startup code
must reside in the lower 1Mbyte of physical memory—with the entry point at offset 0 on that particular
page.
In response to the Startup IPI, the APs start executing at the specified location in 16-bit real mode. This
AP startup code must set up protections on each processor as determined by the SL or SK. It must also
set GIF to re-enable interrupts, and restore the pre-SKINIT system context (as directed by the SL or
SK executing on the BSP), before resuming normal system operation.
The SL must guarantee the integrity of the AP startup sequence, for example by including the startup
code in the hashed SL image and setting up DEV protection for it before copying it to the desired area.
The AP startup code does not need to (and should not) execute SKINIT.
Pending interrupts. Device interrupts that may be pending on an AP prior to the APIC INIT IPI due
to EFLAGS.IF being clear, or that assert any time after the processor has accepted the INIT IPI, will be
held pending through the subsequent Startup IPI, and remain pending until software sets GIF to 1 on
that AP. Similarly, SMI, INIT, and NMI interrupts that assert after the processor has accepted the INIT
IPI will also be held pending until GIF is set to 1.
Aborting MP initialization. In the event that the SL or SK on the BSP decides to abort SVM system
initialization for any reason, the following clean-up actions must be performed by SL code executing
on each processor before returning control to the original operating environment:
• The BSP and all APs that responded to the Startup IPI must restore GIF and clear VM_CR on each
processor for normal operation.
• For each processor that has a distinct memory controller associated with it, the SL_DEV_EN flag
in the DEV control register must be cleared in order to restore normal device accessibility to the
64KB SL memory range.
Any secure context created by the SL that should not be exposed to untrusted code should be cleaned
up as appropriate before these steps are taken.
#SX is to redirect external INITs into an exception so that the VMM may — among other possibilities
— destroy sensitive information before re-issuing the INIT, this time without redirection. The INIT
redirection is controlled by the VM_CR.R_INIT bit.
The #SX exception dispatches to vector 30, and behaves like other fault-class exceptions such as
General Protection Fault (#GP). The #SX exception pushes an error code. The only error code
currently defined is 1, and indicates redirection of INIT has occurred.
The #SX exception is a contributory fault.
63 5 4 3 2 1 0
63 5 4 3 2 1 0
15.29 SVM-Lock
The SVM-Lock feature allows software to prevent EFER.SVME from being set, either
unconditionally or with a 64-bit key to re-enable SVM functionality.
Support for SVM-Lock is indicated by EDX bit 2 as returned by CPUID function 8000_000Ah. On
processors that support the SVM-Lock feature, SKINIT and STGI can be executed even if
EFER.SVME=0. See descriptions of LOCK and SVMDIS bits in Section 15.28.1, “VM_CR MSR
(C001_0114h),” on page 420. When the SVM-Lock feature is not available, hypervisors can use the
read-only VM_CR.SVMDIS bit to detect SVM (see Section 15.4, “Enabling SVM,” on page 369).
15.30 SMM-Lock
The SMM-Lock feature allows software to prevent System Management Interrupts (SMI) from being
intercepted in SVM. The SmmLock bit is located in the HWCR MSR register.
Interrupt Messages
Interrupt Messages
Message
Legacy Signalled
I/O Interrupts IOAPIC Interrupts PIC Interrupts
63 52 51 32
ABA
Reserved, MBZ
(This is an architectural limit. A given implementation may support fewer bits)
31 12 11 10 9 8 7 0
B
A Res,
ABA S Reserved, MBZ
E MBZ
C
The fields within the APIC Base Address register are as follows:
• Boot Strap CPU Core (BSC)—Bit 8. The BSC bit indicates that this CPU core is the boot core of
the BSP. Each CPU core that is not the boot core of the boot processor is an AP (Application
Processor).
• APIC Enable (AE)—Bit 11. This is the APIC enable bit. The local APIC is enabled and all
interruption types are accepted when AE is set to 1. Clearing AE to 0 disables the local APIC, and
no local vector table interrupts are supported.
• APIC Base Address (ABA)—Bits 51-12. Specifies the base physical address for the APIC register
set. The address is extended by 12 bits at the least significant end to form a base address that is
reset to a value of 0 FEE0 0000h.
The state of the APIC registers after reset is provided in Table 16-2.
31 24 23 0
• APIC ID (AID)—Bits 31-24. The APIC ID field contains the unique APIC ID value assigned to
this specific CPU core. A given implementation may use some bits to represent the CPU core and
other bits represent the processor.
31 30 24 23 16 15 8 7 0
E
A Reserved, MBZ MLE Reserved, MBZ VER
S
31 3 2 1 0
X
I
A
S E
Reserved, MBZ I
N R
D
N
N
• Extended APIC ID Enable (XAIDN)—Bit 2. Setting XAIDN to 1 enables the upper four bits of the
APIC ID field described in “APIC ID Register (APIC Offset 20h)” on page 430. Clearing this bit,
specifies a 4-bit APIC ID using only the lower four bits of the APIC ID field of the APIC ID
register.
• Enable SEOI Generation (SN)—Bit 1. Read-write. This bit enables Specific End of Interrupt
(SEOI) generation when a write to the specific end of interrupt register is received.
• Enable Interrupt Enable Registers (IERN)—Bit 0. This bit enables writes to the interrupt enable
registers.
31 18 17 16 15 14 13 12 11 10 8 7 0
T T R R D R
Reserved, MBZ M M G I e e MT VEC
M M R s S s
The fields within the General Local Vector Table register are as follows:
• Vector (VEC)—Bits 7-0. The VEC field contains the vector that is sent for this interrupt source
when the message type is fixed. It is ignored when the message type is NMI and is set to 00h when
the message type is SMI. Valid values for the vector field are from 16 to 255. A value of 0 to 15
when the message type is fixed results in an illegal vector APIC error.
• Message Type (MT)—Bits 10-8. The MT field specifies the delivery mode sent to the CPU core
interrupt handler. The legal values are:
- 000b = Fixed - The vector field specifies the interrupt delivered.
- 010b = SMI - An SMI interrupt is delivered. In this case, the vector field should be set to 00h.
- 100b = NMI - A NMI interrupt is delivered with the vector field being ignored.
- 111b = External interrupt is delivered.
• Delivery Status (DS)—Bit 12. The DS bit indicates the interrupt delivery status. The DS bit is set to
1 when the interrupt is pending at the CPU core interrupt handler. After a successful delivery of the
interrupt, the associated bit in the IRR is set and this bit is cleared to zero. See Section 16.6.2,
“Lowest Priority Messages and Arbitration,” on page 443 for details. The bit is cleared to 0 when
the interrupt is idle.
• Remote IRR (RIR)—Bit 14. The RIR bit is set to 1 when the local APIC accepts an LINT0 or
LINT1 interrupt with the trigger mode=1 (level sensitive). The bit is cleared to 0 when the interrupt
completes, as indicated when an EOI is received.
• Trigger Mode (TGM)—Bit 15. Specifies how interrupts to the local APIC are triggered. The TGM
bit is set to 1 when the interrupt is level-sensitive. It is cleared to 0 when the interrupt is edge-
triggered. When the message type is SMI or NMI, the trigger mode is edge triggered.
• Mask (M)—Bit 16. When the M bit is set to 1, reception of the interrupt is disabled. When the M
bit is cleared to 0, reception of the interrupt is enabled. For example, the mask bit is set in the
Performance Monitor Counter LVT Register by hardware during a performance monitor counter
interrupt and stays set until software resets it.
• Timer Mode (TMM)—Bit 17. Specifies the timer mode for the APIC Timer interrupt. The TMM bit
set to 1 indicates periodic timer interrupts. The TMM bit cleared to 0 indicates one-shot operation.
31 18 17 16 15 13 12 11 8 7 0
T
D
Reserved, MBZ M M Res Res VEC
S
M
Figure 16-7. APIC Timer Local Vector Table Register (APIC Offset 320h)
Three APIC registers are defined for the APIC timer function:
• Current Count Register (CCR) is the actual APIC timer. It is initialized to a start count loaded from
the ICR and then decrements. The APIC timer interrupt is generated when the CCR value reaches
zero. The counting rate is controlled by the DCR. See Figure 16-8.
• Initial Count Register (ICR) contains the start count value for the APIC timer. See Table 16-9.
• Divide Configuration Register (DCR) controls the counting rate of the APIC timer by dividing the
CPU core clock by a programmable amount. See Figure 16-10. Refer to the BIOS and kernel
developer’s guide for the specific implementation of the base clock rate.
31 0
APICTCC
• APIC Timer Current Count (APICTCC)—Bits 31-0. The APICTCC field contains the current
value of the APIC timer.
31 0
APICTIC
• APIC Timer Initial Count (APICTIC)—Bits 31-0. The APICTIC field contains the value that is
loaded into the APIC Timer Current Count Register when the APIC timer is initialized.
31 4 3 2 1 0
R
D
Reserved, MBZ e DV
V
s
• Divide Value (DV)—Bits 3, and 1-0. The DV field specifies the value of the CPU core clock
divisor. Table 16-3 lists the allowable values.
Table 16-3. Divide Values
Bits 3, 1-0 Resulting Timer Divide
000b Divide by 2
001b Divide by 4
010b Divide by 8
011b Divide by 16
100b Divide by 32
101b Divide by 64
110b Divide by 128
111b Divide by 1
Figure 16-11. Local Interrupt 0/1 (LINT0/1) Local Vector Table Register
(APIC Offset 350h/360h)
In addition to the normal LVT control bits (mask, delivery status and vector offset), the LINT0/LINT1
interrupts provide the following controls:
• Trigger Mode - indicates whether the interrupt pin is edge triggered or level sensitive when the
message type is fixed.
• Remote IRR - When the trigger mode indicates level, this flag is set when the local APIC accepts
the interrupt, and is reset when the local APIC receives an EOI. When the flag is set, no additional
local interrupt requests are sent to the local APIC, and they remain pending.
31 17 16 15 13 12 11 10 8 7 0
R
D
Reserved, MBZ M Res e MT VEC
S s
31 17 16 15 13 12 11 10 8 7 0
D R
Reserved, MBZ M Res e MT VEC
S s
Figure 16-13. Thermal Sensor Local Vector Table Register (APIC Offset 330h)
31 17 16 15 13 12 11 10 8 7 0
R
D
Reserved, MBZ M Res e MT VEC
S s
Figure 16-14. APIC Error Local Vector Table Register (APIC Offset 370h)
The error information is recorded in the APIC Error Status Registers. The APIC Error Status Register
is a read-write register. Writes to the register cause the internal error state to be recorded in the register,
clearing the original error. See Figure 16-15.
31 8 7 6 5 4 3 2 1 0
I R S R R S
Res,
Reserved, MBZ R I I e A A
MBZ
A V V s E E
The fields within the APIC Error Status register are as follows:
• Sent Accept Error (SAE)—Bit 2. The SAE bit when set to 1 indicates that a message sent by the
local APIC was not accepted by any other APIC.
• Receive Accept Error (RAE)—Bit 3. The RAE bit when set to 1 indicates that a message received
by the local APIC was not accepted by this or any other APIC
• Sent Illegal Vector (SIV)—Bit 5. The SIV bit when set to 1 indicates that the local APIC attempted
to send a message with an illegal vector value.
• Receive Illegal Vector (RIV)—Bit 6. The RIV bit when set to 1 indicates that the local APIC has
received a message with an illegal vector value.
• Illegal Register Address (IRA)—Bit 7. The IRA bit when set to 1 indicates that an access to an
unimplemented register location within the local APIC register range (APIC Base Address + 4
Kbytes) was attempted.
31 10 9 8 7 0
F A
Reserved, MBZ C S VEC
C E
63 56 55 32
31 20 19 18 17 16 15 14 13 12 11 10 8 7 0
T R
D D
Reserved, MBZ DSH RRS G L e MT VEC
S M
M s
• Message Type (MT)—Bits 10-8. The MT field specifies the message type sent to the CPU core
interrupt handler. The legal values are:
- 000b = Fixed - The IPI delivers an interrupt to the target local APIC specified in Destination
field.
- 001b = Lowest Priority - The IPI delivers an interrupt to the local APIC executing at the lowest
priority of all local APICs that match the destination logical ID specified in the Destination
field. See Section 16.6.1, “Receiving System and IPI Interrupts,” on page 442.
- 010b = SMI - The IPI delivers an SMI interrupt to target local APIC(s). The trigger mode is
edge-triggered and the Vector field must = 00h.
- 011b = Remote read - The IPI delivers a read request to read an APIC register in the target local
APIC specified in Destination field. The trigger mode is edge triggered and the Vector field
specifies the APIC offset of the APIC register to be read. The Remote Status field provides the
current status of the remote read access after it has been issued. Data is returned from the target
local APIC and captured in the Remote Read Register of the issuing local APIC. See
Figure 16-18 on page 441.
- 100b = NMI - The IPI delivers a non-maskable interrupt to the target local APIC specified in
the Destination field. The Vector field is ignored.
- 101b = INIT - The IPI delivers an INIT request to the target local APIC(s) specified in the
Destination field, causing the CPU core to assume the INIT state. The trigger mode is edge-
triggered, and the Vector field must =00h.
- 110b = Startup - The IPI delivers a start-up request (SIPI) to the target local APIC(s) specified
in Destination field, causing the CPU core to start processing the BIOS boot-strap routine
whose address is specified by the Vector field.
- 111b = External interrupt - The IPI delivers an external interrupt to the target local APIC
specified in Destination field. The interrupt can be delivered even if the APIC is disabled.
• Destination Mode (DM)—Bit 11. The DM bit when set to 1 specifies a logical destination which
may be one or more local APICs with a common destination logical ID. When cleared to 0, the DM
bit specifies a physical destination which indicates a single local APIC ID.
• Delivery Status (DS)—Bit 12. The DS bit indicates the interrupt delivery status. The DS bit is set to
1 when the local APIC has sent the IPI is waiting for it to be accepted by another local APIC (the
ICR is not idle). Clearing the DS bit indicates that the target local APIC is idle.
• Level (L)—Bit 14. The L bit when set to 1 indicates assert. Clearing the L bit to 0 indicates
deassert.
• Trigger Mode (TGM)—Bit 15. Specifies how IPIs to the local APIC are triggered. The TGM bit is
set to 1 when the interrupt is level-sensitive. It is cleared to 0 when the interrupt is edge-triggered.
• Remote Read Status (RRS)—Bits 17-16. The RRS field indicates the current read status of a
Remote Read from another local APIC. The encoding for this field is as follows:
- 00b = Read was invalid
- 01b = Delivery pending
- 10b = Delivery done and access was valid. Data available in Remote Read Register.
- 11b = Reserved
• Destination Shorthand (DSH)—Bits 19-18. The DSH field indicates whether a shorthand notation
is used, and provides a quick way to specify a destination for a message. It replaces the Destination
field, when the destination field is not required (DS>00b), allowing software to use a single write
to the low order ICR. The encoding are as follows:
- 00b = Destination - The Destination field is required to specify the destination.
- 01b = Self - The issuing APIC is the only destination.
- 10b = All including self - The IPI is sent to all local APICs including itself (destination
field=FFh).
- 11b = All excluding self - The IPI is sent to all local APICs except itself (destination
field=FFh).
Note that if the lowest priority is used, the message could end up being reflected back to this
local APIC. If DS=1xb, the destination mode is ignored and physical is automatically used.
• Destination (DES)—Bits 63-56. The DES field identifies the target local APIC(s) for the IPI and
contains the destination encoding used when the Destination Shorthand field=00b. The field
indicates the target local APIC when the destination mode=0 (physical), and the destination logical
ID (as indicated by LDR and DFR) when the destination mode=1 (logical).
31 0
RRD
• Remote Read Data (RRD)—Bits 31-0. The RRD field contains the data resulting from a valid
completion of a remote read interprocessor interrupt.
Not all combinations of ICR fields are valid. Only the combinations indicated in Table 16-4 are valid.
31 24 23 0
• Destination Logical ID (DLID)—Bits 31-24. The DLID field contains the logical APIC ID
assigned to this specific CPU core. The logical APIC ID is not unique, allowing for interrupts to be
sent to multiple local APICs.
Two interrupt models are defined for the logical destination mode, the flat model and the cluster
model, under the control of the Destination Format Register. See Figure 16-20.
31 28 27 0
• Model (MOD)—Bits 31-28. The MOD field controls which format to use when accepting
interrupts in logical destination mode. The allowable values are 0h= cluster model and Fh=flat
model.
With the flat model, up to eight unique logical APIC ID values can be provided by software by setting
a different bit in the LDR. When the logical ID of the destination is compared with the LDR, if any bit
position is set in both fields, this local APIC is a valid destination. A broadcast to all local APICs
occurs when the LDR is set to all ones.
In the cluster model, bits 31:28 of the logical ID of the destination are compared with bits 31:28 of the
LDR. If there is a match, then bits 27:24 are tested for matching ones, similar to the flat model. If bits
31:28 match, and any of bits 27:24 are set in both fields, this local APIC is a valid destination. The
cluster model allows for 15 unique clusters to be defined, with each cluster having four unique logical
APIC values to be addressed. In cluster logical destination mode, lowest priority message type is not
supported.
In both the flat model and the cluster model, if the destination field = FFh, the interrupt is accepted by
all local APICs.
31 8 7 4 3 0
The value in the Arbitration Priority field is equal to the highest priority of the Task Priority field of the
Task Priority Register (TPR), the highest bit set in the In-Service Register (ISR) vector, or the highest
bit set in the Interrupt Request Register (IRR) vector. The value in the Arbitration Priority Sub-class
field is equal to the Task Priority Sub-class if the APR is equal to the TPR, and zero otherwise.
If focus CPU core checking is enabled (Spurious Interrupt Register bit 9=0), the focus CPU core for an
interrupt can always accept the interrupt. A CPU core is the focus of an interrupt if it is already
servicing that interrupt (corresponding ISR bit is set) or if it already has a pending request for that
interrupt (corresponding IRR bit is set). If there is no focus CPU core for an interrupt or if focus CPU
core checking is disabled (Spurious Interrupt Register bit 9=1), all target local APICs identified as
candidates for the interrupt arbitrate to determine which is executing with the lowest arbitration
priority. If there is a tie for lowest priority, the local APIC with the highest APIC ID is selected.
accepted by the local APIC and the IRR bit is set, the associated TMR bit is set for level-sensitive
interrupts or reset for edge-triggered interrupts. At the end of the interrupt handler routine, when
the EOI is received at the local APIC, an EOI message is sent to the I/O APIC if the associated
TMR bit is set for a system interrupt. See Figure 16-24 on page 446.
255 16 15 0
IR Res, MBZ
• Interrupt Request bits (IR)—Bits 255-16. The corresponding request bit is set when an interrupt is
accepted by the local APIC. The interrupt request registers provide a bit per interrupt to indicate
that the corresponding interrupt has been accepted by the local APIC. Interrupts are mapped as
follows:
255 16 15 0
IS Res, MBZ
• In Service bits (IS)—Bits 255–16. These bits are set when the corresponding interrupt is being
serviced by the CPU core. The in-service registers provide a bit per interrupt to indicate that the
corresponding interrupt is being serviced by the CPU core. Interrupts are mapped as follows:
255 16 15 0
TM Res, MBZ
• Trigger Mode bits (TM)—Bits 255–16. These bits provide a bit per interrupt to indicate the
assertion mode of each interrupt. Interrupts are mapped as follows:
31 8 7 4 3 0
31 8 7 4 3 0
31 0
EOI
• End of Interrupt (EOI)—Bits 31-0. Write-only operation signals end of interrupt processing to
source of interrupt.
31 24 23 16 15 3 0
X
S
A I
N
Reserved, MBZ XLC Reserved, MBZ I N
I
D C
C
C
• Extended LVT Count (XLC)—(Bits 23–16) Specifies the number of extended local vector table
registers in the local APIC.
• Extended APIC ID Capability (XAIDC)—(Bit 2) Indicates that the processor is capable of
supporting an 8-bit APIC ID.
• Specific End of Interrupt Capable—(Bit 1) Indicates that the Specific End Of Interrupt Register is
present.
• Interrupt Enable Register Capable—(Bit 0) Read-only. Indicates that the Interrupt Enable
Registers are present.
31 8 7 0
The IER is made available to software by means of eight 32-bit registers in the local APIC; bit i of the
256-bit IER is located at bit position (i mod 32) in the local APIC register IER[i / 32]. The eight IER
registers are located at offsets 480h, 490h, ...,4F0h in APIC space. The IER format is shown in Figure
16-30.
255 16 15 0
IE Res, MBZ
The IER and SEOI registers are located in the APIC Extended Space area. The presence of the APIC
Extended Space area is indicated by bit 31 of the APIC Version Register (at offset 30h in APIC space).
The presence of the IER and SEOI functionality is identified by bits 0 and 1, respectively, of the APIC
Extended Feature Register (located at offset 400h in APIC space). IER and SEOI are enabled by
setting bits 0 and 1, respectively, of the APIC Extended Control Register (located at offset 410h).
Only vectors that are enabled in IER participate in APIC's computation of the highest-priority pending
interrupt. The reset value of IER is all ones.
63 16 15 0
Reserved OSVW_ID_Length
OSVW_ID_Length—Bits 15–0. The highest OSVW_ID erratum number supported in the latest
release. If a specific erratum has an OSVW ID that is greater than OSVW_ID_Length, the erratum is
unknown to the latest release. Otherwise, the associated status bit in the OSVW MSR1-n can be
checked to see if a workaround is required. Reset is the highest OSVW_ID of erratum known at time of
release.
63 62 1 0
63 62 1 0
OS Valid Workaround Status (OSVW E[i])—Bits n–0. Each bit indicates that a processor model is
affected by OS-visible erratum and whether the OS needs to apply a workaround. The
OSVW_ID_Length indicates the highest OSVW_ID for known errata at the time of release. If a
specific erratum has a lower or equal OSVW_ID than the OSVW_ID_Length, the workaround
information is contained in the corresponding status bit in the OSVW Status register.
For the status bit:
1 = Hardware contains the erratum, and an OS software workaround is required.
0 = Hardware has corrected the erratum, so an OS software workaround is not necessary.
The location of an OSVW ID status bit within a bank of OSVW MSRs is determined as follows:
• MSR address = OSVW_MSR0 + 1 + floor (OSVW_ID /64)
• Bit offset in MSR = OSVW_ID mod 64
If a specific erratum has an OSVW_ID that is greater than the OSVW_ID_LENGTH, hardware does
not know about the erratum and the processor model must be used to determine whether the
workaround must be applied. Reset is the array of workaround status known at time of processor
release.
63 8 7 43 0
63 4 3 0
Reserved PstateCmd
• P-State Change Command (PstateCmd)—Bits 3–0. Writes to this field cause the CPU core to
change to the indicated P-state number, which may be clipped by the PstateMaxVal field of the P-
State Current Limit Register. Reset value is implementation specific.
63 4 3 0
Reserved CurPstate
• Current P-State (CurPstate)—Bits 3–0. This field provides the current P-state of the CPU core
regardless of the source of the P-state change, including writes to the P-State Control Register:
0=P-state 0, 1=P-state 1, etc. The value of this field is updated when the frequency transitions to a
new value associated with the P-state. Reset value is implementation specific.
C000_0408h MC4_MISC1
C000_0409h MC4_MISC2 c00x_xxxx_0000_0000
C000_040Ah MC4_MISC3
Timer that can cause a machine check
C001_0074h CPU_Watchdog_Timer error if no operation completes after a 0000_0000_0000_0000h
specified time period.
The state-save area within the VMCB starts at offset 400h into the VMCB page; Table B-2 describes
the fields within the state-save area; note that the table lists offsets relative to the state-save area (not
the VMCB as a whole).
Index
Symbols A
#AC ...................................................................... 221 A bit .......................................................... 80, 82, 137
#BP ...................................................................... 212 A20 Masking ......................................................... 413
#BR ...................................................................... 213 abort ..................................................................... 206
#D ................................................................. 221, 223 AC bit ..................................................................... 53
#DB ...................................................................... 211 access checking ..................................................... 401
#DE ...................................................................... 211 accessed (A)
#DF ...................................................................... 214 code segment ........................................................ 80
#GP ...................................................................... 218 data segment ........................................................ 82
#I................................................................... 220, 223 page-translation tables ......................................... 137
#IA ....................................................................... 220 address space identifier (ASID) ....................... 368, 389
#IS........................................................................ 221 address-breakpoint registers (DR0-DR3) .................. 329
#MC ..................................................................... 222 addressing
#MF ..................................................................... 220 RIP-relative ..................................................... xxxiv
#NM ..................................................................... 214 address-size prefix.................................................... 31
#NP ...................................................................... 217 ADDRV bit ........................................................... 263
#O ................................................................. 221, 223 Advanced Programmable Interrupt Controller (APIC) ......
#OF ...................................................................... 213 425
alignment check (rFLAGS.AC) ......................... 53, 221
#P .................................................................. 221, 223
alignment mask (CR0.AM) ............................... 45, 221
#PF ....................................................................... 219
alignment-check exception (#AC) ................ 45, 53, 221
#SS ....................................................................... 218
AM bit .................................................................... 45
#SX ...................................................................... 419
AP startup sequence ............................................... 419
#TS ...................................................................... 216
APIC..................................................................... 425
#U ................................................................. 221, 223
base address ....................................................... 428
#UD ..................................................................... 213
enable ................................................................ 428
#VMEXIT...................................................... 370, 371
error interrupts .................................................... 436
#XF ...................................................................... 223 internal error ...................................................... 426
#Z.................................................................. 221, 223 registers ............................................................. 428
Numerics timer interrupt .................................................... 433
version register ................................................... 430
128-bit media instructions APIC.TPR ............................................................. 393
enabling ............................................................. 290 APIC.TPR virtualization ......................................... 369
feature identification ........................................... 289 Application Processors (APs) .................................. 418
MXCSR............................................................. 291 Arbitration ............................................................. 443
saving state ........................................................ 295
architecture differences ............................................. 23
XMM registers .............................................. 28, 291
ARPL instruction ................................................... 156
16-bit mode .......................................................... xxix
ASID ............................................................ 368, 389
1-Gbyte page ......................................................... 133
attributes ................................................................. 76
32-bit mode .......................................................... xxix
available to software (AVL)
64-bit media instructions
descriptor ............................................................. 79
causing #MF exception ....................................... 293 page-translation tables ......................................... 138
feature identification ........................................... 289
AVL bit ........................................................... 79, 138
initializing................................................... 354, 355
MMX registers ................................................... 292 B
saving state ........................................................ 295
64-bit mode ..................................................... xxix, 13 B3–B0 bits ............................................................ 330
base address........................... 73, 75, 78, 121, 129, 136
BD bit ................................................................... 330
Index 479
AMD64 Technology 24593—Rev. 3.14—September 2007
480 Index
24593—Rev. 3.14—September 2007 AMD64 Technology
Index 481
AMD64 Technology 24593—Rev. 3.14—September 2007
482 Index
24593—Rev. 3.14—September 2007 AMD64 Technology
Index 483
AMD64 Technology 24593—Rev. 3.14—September 2007
484 Index
24593—Rev. 3.14—September 2007 AMD64 Technology
Index 485
AMD64 Technology 24593—Rev. 3.14—September 2007
486 Index
24593—Rev. 3.14—September 2007 AMD64 Technology
Index 487
AMD64 Technology 24593—Rev. 3.14—September 2007
OVER bit .............................................................. 263 page directory entry (PDE) .................................. 120
overflow ............................................................. xxxiii page size (PS) ..................................................... 137
overflow exception (#OF) ....................................... 213 page table entry (PTE) ........................................ 120
overflow exception (OE) ................................. 221, 223 page-attribute table (PAT) .................................... 138
owned state, MOESI .............................................. 167 page-directory pointer entry (PDPE) ....... 25, 120, 125
page-level cache disable (PCD) ............................ 137
P page-level write-through (PWT)........................... 137
page-map level-4 entry (PML4E) ................... 25, 128
P bit ......................................................... 79, 136, 320 physical-page base address .................................. 136
packed ............................................................... xxxiii present (P) .......................................................... 136
PAE ...................................................................... 371 read/write (R/W)................................................. 137
PAE bit ............................................................ 48, 119 translation-table base address ............................... 136
PAE paging ...................................................... 25, 120 user/supervisor (U/S) .......................................... 137
CR3 format ................................................... 46, 120 paging ......................................................... 7, 25, 115
CR3 format, long mode ....................................... 128 See also PAE paging and non-PAE paging.
legacy mode ....................................................... 124 effect of segment protection ................................. 146
long mode .......................................................... 129 protection across translation hierarchy .................. 144
page directory ........................................................ 120 protection checks ................................................ 142
page size (PS) ..................................... 119, 123, 125 supported translations ......................................... 117
page directory pointer ..................................... 120, 125 paging enable (CR0.PG) ................................... 45, 118
page faults activating long mode ................................... 118, 359
guest level .......................................................... 410 parameter count field ................................................ 86
page size (PS), page-translation tables ..................... 137 PAT ....................................................................... 412
page splintering ..................................................... 413 See page-attribute table (PAT).
page table .............................................................. 120 PAT bit .................................................................. 138
page translation ..................................................... 115 PAT register ............................................ 193, 460, 465
page-attribute table (PAT) ....................................... 193 PAUSE .................................................................. 379
combined with MTRR ........................................ 196 PBi bits ................................................................. 334
effect on memory access ..................................... 195 PC bit.................................................................... 345
identifying support ............................................. 195 PCC bit ................................................................. 263
indexing ............................................................. 194 PCD bit .................................................. 121, 129, 137
page-translation tables, bit in ............................... 138 PCE bit ................................................................... 49
Paged Real Mode ................................................... 391 PDE ...................................................................... 120
page-fault exception (#PF) ................ 136, 143, 144, 219 PDPE ..................................................... 120, 371, 413
page-fault virtual address ........................................ 220 PE bit ...................................................................... 43
page-global enable (CR4.PGE) .......................... 49, 140 PE exception.................................................. 221, 223
page-level cache disable (PCD) ............................... 180 PerfCtrn registers .................................... 342, 462, 468
CR3, bit in ......................................................... 121 PerfEvtSeln registers ............................... 343, 462, 468
page-translation tables, bit in ............................... 137 performance counter ............................................... 154
page-level write-through (PWT) .............................. 180 performance counter enable (CR4.PCE)...... 49, 154, 343
CR3, bit in ......................................................... 121 Performance Monitor Counter Interrupts .................. 435
page-translation tables, bit in ............................... 137 performance optimization ................................. 22, 341
page-map level-4 ................................................... 128 performance-monitoring counter
page-size extensions (CR4.PSE) ..... 25, 26, 48, 119, 123 overflow ............................................................. 346
40-bit physical address support ..................... 119, 124 PerfCtrn ............................................................. 342
unsupported in long mode ................................... 119 PerfEvtSeln ........................................................ 343
page-translation cache ............................................ 139 starting and stopping ........................................... 346
page-translation tables .............................................. 25 PG bit ............................................................. 45, 118
accessed (A)....................................................... 137 PGE bit ................................................................... 49
available to software (AVL) ................................. 138 physical address ................................................... 3, 24
dirty (D) ............................................................ 137 as index into cache .............................................. 178
global page (G) .................................................. 138 physical memory ........................................................ 4
hierarchy............................................................ 117
no-execute ......................................................... 138
488 Index
24593—Rev. 3.14—September 2007 AMD64 Technology
physical-address extensions (CR4.PAE) 25, 48, 119, 128 PWT bit ................................................. 121, 129, 137
activating long mode .................................... 119, 359
See also PAE paging.
Q
POP instruction ..................................................... 154 quadword ........................................................... xxxiv
POPF .................................................................... 379
precise exceptions and interrupts ............................. 205
R
precision exception (PE) .................................. 221, 223 R bit ........................................................................ 80
PREFETCH instruction .......................................... 181 R/W bit ......................................................... 137, 144
present (P) R/W3–R/W0 bits ................................................... 332
descriptor ...................................................... 79, 320 r8–r15 ............................................................... xxxvii
page-translation tables ........................................ 136 RAX ............................................................. 370, 371
principle of locality ................................................ 139 rAX–rSP ........................................................... xxxvii
priorities, interrupt ................................................. 226 RAZ................................................................... xxxiv
privilege level .......................................................... 94 RdMem, MTRR type field ................................ 58, 197
probe, cache ................................................... 161, 168 RDMSR ................................................... 56, 154, 382
during cache disable ........................................... 179 RDP field .............................................................. 304
processor feature identification (rFLAGS.ID) ............. 54 RDPMC ................................................... 49, 343, 379
processor halt ........................................................ 156 RDPMC instruction................................................ 154
processor state ....................................................... 350 RDTSC ....................................... 48, 60, 154, 346, 379
protected mode ...................................... xxxiv, 14, 370 RDTSCP ..................................... 48, 60, 154, 346, 380
initial operating environment ............................... 356 read hit .................................................................. 161
protected-mode virtual interrupts (CR4.PVI) .............. 48 read miss ............................................................... 161
protection checks read ordering ......................................................... 181
adjusting RPL .................................................... 156 read/write (R/W)
call gate ............................................................. 103
page protection ................................................... 144
checking access rights ......................................... 156
page-translation tables, bit in ............................... 137
data segment ........................................................ 95
readable (R), code segment ....................................... 80
direct call, conforming ........................................ 100
direct call, nonconforming..................................... 98 real address.............................................................. 10
enabling ............................................................... 64 real address mode. See real mode
far return ............................................................ 109 real mode .................................................. xxxiv, 4, 14
interrupt return ................................................... 238 initial operating environment ............................... 356
interrupt to higher privilege ................................. 235 registers
limit check, 64-bit mode...................................... 110 See also entries for individual registers.
long mode changes ............................................... 27 128-bit media registers (XMM) .............................. 28
long mode interrupt ............................................ 244 address-breakpoint registers (DR0-DR3) .............. 329
long mode interrupt return ................................... 246 control registers .............................................. 29, 41
stack segment ....................................................... 96 control-transfer recording MSRs .......................... 334
type check .......................................................... 112 CR0 ..................................................................... 42
verifying read/write access .................................. 156 CR2 ................................................................... 220
protection domains................................................. 400 CR3 ................................................ 25, 46, 120, 128
protection enable (CR0.PE) ........................... 43, 64, 71 CR4 ..................................................................... 47
PS bit ............................................................. 119, 137 CSTAR .............................................................. 150
debug registers.............................................. 29, 328
PSE bit.................................................................... 48
debug-control MSR (DebugCtlMSR).................... 333
PSE paging ............................................................. 25 debug-control register (DR7) ............................... 331
P-State .................................................................. 457 debug-extension MSRs.......................................... 60
control ............................................................... 457 descriptor-table registers .................................. 26, 66
current limit register ........................................... 457 eAX–eSP ........................................................ xxxvi
status register ..................................................... 458 EFER ............................................................. 29, 54
PTE ...................................................................... 120 eFLAGS......................................................... xxxvii
PUSH instruction ................................................... 154 eIP ................................................................. xxxvii
PUSHF ................................................................. 379 FPR ........................................................... 294, 296
PVI bit .................................................................... 48 FS and GS ............................................................ 70
FS.base ................................................................ 71
Index 489
AMD64 Technology 24593—Rev. 3.14—September 2007
490 Index
24593—Rev. 3.14—September 2007 AMD64 Technology
Index 491
AMD64 Technology 24593—Rev. 3.14—September 2007
492 Index
24593—Rev. 3.14—September 2007 AMD64 Technology
Index 493
AMD64 Technology 24593—Rev. 3.14—September 2007
494 Index