UNIT-2 (Memory Hierarchy Design)
UNIT-2 (Memory Hierarchy Design)
• Spatial Locality
❖ There is high probability that the other data in the block will be
needed soon.
100 Processor-Memory
Performance Gap:
(grows 50% / year)
10
DRAM
DRAM 9%/yr.
1
1989
(2X/10 yrs)
1984
1986
1996
1999
1980
1981
1982
1983
1985
1987
1988
1990
1991
1992
1993
1994
1995
1997
1998
2000
Problem: Time
Improvements in access time are not enough to catch up
Solution:
Increase the bandwidth of main memory (improve throughput)
Dr. Sarvesh Vishwakarma TCS-704 ACA
Capacity Memory Hierarchy Upper Level
Access Time Staging
Transfer Unit faster
CPU Registers
1KB SIZE Registers
50,000-500,000 MB/sec
< 0.5 ns Instr. Operands Prog./compiler
Cache
16 MB Cache
5000-20,000 MB/sec
0.5-25 ns cache cntl
Blocks
Main Memory
2-8 G B Main Memory
2500-10,000 MB/sec
50-250 ns OS
Pages
Disk
G-T Bytes
50-500 MB/sec
Disk
50,000 ns
user/operator
Files
Tape Larger
infinite
sec-min Tape Lower Level
Dr. Sarvesh Vishwakarma TCS-704 ACA
Introduction
Memory Hierarchy
• Causes of misses
– Compulsory
• First reference to a block
– Capacity
• Blocks discarded and later retrieved
– Conflict
• Program makes repeated references to multiple addresses from
different blocks that map to the same location in the cache
•Pyramid
•Technical specification of memory elements
•Concepts of cache memory
•Temporal locality
•Spatial locality
•Hit
•Miss
•Hit Rate
•Miss Rate
•Miss penalty
1.Compulsory misses
2.Capacity misses
3.Conflict misses
3 C’s Model
Cache
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
Main Memory
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
Main Memory
How many groups can we form within the cache by distributing N number of
cache blocks or line lengths……………………………….???
If N = 16
Referenced
address
Store Ref. Address in
➢ any location or
➢only one location or
➢fixed number of location
?
Dr. Sarvesh Vishwakarma TCS-704 ACA
Fully Associative Caches
Associativity
Referenced Referenced Referenced Referenced Referenced Referenced Referenced Referenced
address address address address address address address address
Referenced
address
Associativity
Referenced
address
Referenced
address
Referenced
address
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1 MOD 4 = 1 17 MOD 4 = 1
5 MOD 4 = 1 21 MOD 4 = 1
9 MOD 4 = 1 25 MOD 4 = 1 Two-way set associative
13 MOD 4 = 1 29 MOD 4 = 1
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
2 MOD 4 = 2 18 MOD 4 = 2
6 MOD 4 = 2 22 MOD 4 = 2
10 MOD 4 = 2 26 MOD 4 = 2 Two-way set associative
14 MOD 4 = 2 30 MOD 4 = 2
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
3 MOD 4 = 3 19 MOD 4 = 3
7 MOD 4 = 3 23 MOD 4 = 3
11 MOD 4 = 3 27 MOD 4 = 3
Two-way set associative
15 MOD 4 = 3 31 MOD 4 = 3
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
0 MOD 2 = 0 16 MOD 2 = 0
2 MOD 2 = 0 18 MOD 2 = 0
4 MOD 2 = 0 20 MOD 2 = 0
6 MOD 2 = 0 22 MOD 2 = 0
8 MOD 2 = 0 24 MOD 2 = 0
10 MOD 2 = 0 26 MOD 2 = 0
12 MOD 2 = 0 28 MOD 2 = 0
14 MOD 2 = 0 30 MOD 2 = 0
0 1
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
0 1
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
0 1 2 3 4 5 6 7 8 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
Cache
Memory
storage
Cache
Memory
storage
Cache
Memory
storage
Miss Penalty L1: cache = Hit time Memory + Miss Rate Memory x Miss Penalty Memory
storage
L1: Cache
L2: Cache
Memory
storage
L1: Cache
L2: Cache
Memory
storage
Dr. Sarvesh Vishwakarma TCS-704 ACA
Request for data to L1: cache Processor
L2: Cache
Memory
storage
Dr. Sarvesh Vishwakarma TCS-704 ACA
Request for data to L1: cache Processor
Memory
storage
Dr. Sarvesh Vishwakarma TCS-704 ACA
Request for data to L1: cache Processor
Storage DATA
FOUND
Dr. Sarvesh Vishwakarma TCS-704 ACA
Global Miss Rate for L2: cache
L 2 : CacheMiss
L1 : Cache Re quest
L2: Cache
MemoryRequest Request for L2:
missing data cache
= L2:CacheMiss to Memory Miss
t3 t4 Memory Hit
Memory
storage
Dr. Sarvesh Vishwakarma TCS-704 ACA
L 2 : CacheMiss
Global Miss Rate for L2: cache =
L1 : Cache Re quest
L1 : CacheMiss
Local Miss Rate for L1: cache =
L1 : Cache Re quest
L 2 : CacheMiss
Local Miss Rate for L2: cache =
L1 : CacheMiss
Global Miss Rate for L2: cache = (Local Miss Rate for L1: cache)x (Local Miss Rate for L2: cache)
Miss Penalty L1: cache = Hit time L2:Cache + (Miss Rate L2:Cache x Miss Penalty L2:Cache)
Miss Penalty L2:Cache = Hit time Memory + (Miss Rate Memory x Miss Penalty Memory)
Avg. Memory Access Time= Hit time L1:Cache +(Miss Rate L1: Cache) x (Hit time
L2:Cache + (Miss Rate L2:Cache x Miss Penalty L2:Cache))
Avg. Memory Access Time= Hit time L1:Cache +Miss Rate L1: Cache x (Hit time L2:Cache +
Miss Rate L2:Cache x Miss Penalty L2:Cache)
T = Hit time L1 +Miss Rate L1 x (Hit time L2 +Miss Rate L2 x Miss Penalty L2)
Dr. Sarvesh Vishwakarma TCS-704 ACA
Introduction
Memory Hierarchy
No write
buffering
Write buffering
Advanced Optimizations
• Blocking
– Instead of accessing entire rows or columns, subdivide matrices
into blocks
– Requires more memory accesses but improves locality of accesses
Y: 1 2 3 1 2 2 Z: 1 1 0 1 2 1 X: 14
0 0 2 0 1 1 1 0 0 1 3 1
1 2 3 2 2 1 2 2 0 3 2 0
2 3 0 2 0 1 3 3 2 1 1 1
1 2 3 4 1 0 1 0 1 4 0 2
3 2 1 2 0 1 0 1 2 0 1 2
Advanced Optimizations
• Blocking
– Instead of accessing entire rows or columns, subdivide matrices
into blocks
– Requires more memory accesses but improves locality of accesses
Y: 1 2 3 1 2 2 Z: 1 1 0 1 2 1 X: 9
0 0 2 0 1 1 1 0 0 1 3 1
1 2 3 2 2 1 2 2 0 3 2 0
2 3 0 2 0 1 3 3 2 1 1 1
1 2 3 4 1 0 1 0 1 4 0 2
3 2 1 2 0 1 0 1 2 0 1 2
Pentium 4 Pre-fetching
Dr. Sarvesh Vishwakarma TCS-704 ACA
Advanced Optimizations
Compiler Prefetching
• Insert prefetch instructions before data is needed
• Non-faulting: prefetch doesn’t cause exceptions
• Register prefetch
– Loads data into register
• Cache prefetch
– Loads data into cache
0000H
8085 ROM RAM 2000H Disk
CPU 8k 8k Drive
1FFFH 3FFFH
2000H Storage
(8 K)
2000H
3FFFH
Program
(16 KB)
4000H
5FFFH
(8 K)
5FFFH
Print bill
(1 KB)
❖This mean that memory references from one program can not target the
physical addresses containing another program’s data, preventing programs
from accessing each other’s data.
Main memory
Page table
Physical address
3
128:1 MUX 4
2 Step 2: At the same time, the type of memory access is checked for a violation
against protection information in the TLB.
To reduce TLB misses due to context switches, each entry has an 8-bit address
space number (ASN), which plays the same role as the PID. If the context
switching returns to the process with the same ASN, it can still match the TLB.
The process ASN and the page table entry (PTE) ASN must also match for a
valid tag.
4 Step 4: The page offset is then combined with the physical page frame to form
a full physical address.