Chapter 5
Chapter 5
Contents
• Memory hierarchy
• Cache organization
– Direct mapped
– Fully associative
– n-way associative
• Virtual memory (self-learning)
Motivation example
Publishers/Secondary
memory
Bookshelf/Memory Table/Cache
Principle of Locality
• Memory hierarchy
• Store everything on disk
• Copy recently accessed (and nearby) items from disk to
smaller DRAM memory
– Main memory
• Copy more recently accessed (and nearby) items from
DRAM to smaller SRAM memory
– Cache memory attached to CPU
Memory Technology
Cache Memory
• Cache memory
– The level of the memory hierarchy closest to the CPU
• Given accesses X1, X2, . . . , Xn−1, Xn
#Blocks is a power of 2
Use low-order address bits
Cache Example
Cache Example
Word addr Binary addr Hit/miss Cache block
22 10 110 Miss 110
• 64 blocks, 16 bytes/block
– To what bytenumber does address 1200 map?
1200
• Block address = ⌊ 16 ⌋ = 75
• Block number = 75 modulo 64 = 11
31 10 9 4 3 0
Tag Index Offset
Cache Misses
Write-Through
Write-Back
Write Allocation
• Given
– I-cache miss rate = 2%
– D-cache miss rate = 4%
– Miss penalty = 100 cycles
– Base CPI (ideal cache) = 2
– Load & stores are 36% of instructions
• Miss cycles per instruction
– I-cache: 0.02 × 100 = 2
– D-cache: 0.36 × 0.04 × 100 = 1.44
• Actual CPI = 2 + 2 + 1.44 = 5.44
– Ideal CPU is 5.44/2 =2.72 times faster
Computer Architecture (c) Cuong Pham-Quoc/HCMUT 29
Performance Summary
Associative Caches
• Fully associative
– Allow a given block to go in any cache entry
– Requires all entries to be searched at once
– Comparator per entry (expensive)
• n-way set associative
– Each set contains n entries
– Block number determines which set
• (Block number) modulo (#Sets in cache)
– Search all entries in a given set at once
– n comparators (less expensive)
Computer Architecture (c) Cuong Pham-Quoc/HCMUT 32
Associativity Example
• Fully associative
Block
Hit/miss Cache content after access
address
0 miss Mem[0]
8 miss Mem[0] Mem[8]
0 hit Mem[0] Mem[8]
6 miss Mem[0] Mem[8] Mem[6]
8 hit Mem[0] Mem[8] Mem[6]
Computer Architecture (c) Cuong Pham-Quoc/HCMUT 36
Multilevel Caches
• Given
– CPU base CPI = 1, clock rate = 4GHz
– Miss rate/instruction = 2%
– Main memory access time = 100ns
• With just primary cache
– Miss penalty = 100ns/0.25ns = 400 cycles
– Effective CPI = 1 + 0.02 × 400 = 9
Example (cont.)
• Primary cache
– Focus on minimal hit time
• L-2 cache
– Focus on low miss rate to avoid main memory access
– Hit time has less overall impact
• Results
– L-1 cache usually smaller than a single cache
– L-1 block size smaller than L-2 block size
• Misses depend on
memory access patterns
– Algorithm behavior
– Compiler optimization
for memory access
• C, A, and B arrays
older accesses
new accesses
Unoptimized Blocked
Computer Architecture (c) Cuong Pham-Quoc/HCMUT 51
Virtual Memory
Address Translation
Page Tables
TLB Misses
• If page is in memory
– Load the PTE from memory and retry
– Could be handled in hardware
• Can get complex for more complicated page table structures
– Or in software
• Raise a special exception, with optimized handler
• If page is not in memory (page fault)
– OS handles fetching the page and updating the page table
– Then restart the faulting instruction
Memory Protection
Exercise
• Given the TLB (fully associative) and the V Physical or V Tag Physical
Page table (4KB pages) with LRU Disk 1 11 12
replacement 1 5 1 7 4
• If pages must be brought from disk, 0 Disk 1 3 6
increment the next largest page 0 Disk 0 4 9
number 1 6
1 9
• Show the final state of the TLB and Page
table if virtual address requests are as 1 11
follow: 0 Disk
1 4
– 4669, 2227, 13916, 34587, 48870,
12608, 49225
0 Disk
0 Disk
– 12948, 49419, 46814, 13975, 40004, 1 3
12707, 52236
1 12
• The same question but 16KB pages
instead of 4KB
Hint: Analyse the virtual address to extract virtual page number
Computer Architecture (c) Cuong Pham-Quoc/HCMUT 67
Solution
• Determined by associativity
– Direct mapped (1-way associative)
• One choice for placement
– n-way set associative
• n choices within a set
– Fully associative
• Any location
• Higher associativity reduces miss rate
– Increases complexity, cost, and access time
Finding a Block
• Hardware caches
– Reduce comparisons to reduce cost
• Virtual memory
– Full table lookup makes full associativity feasible
– Benefit in reduced miss rate
Associativity Location method Tag comparisons
Direct mapped Index 1
n-way set associative Set index, then search entries n
within the set
Fully associative Search all entries #entries
Full lookup table 0
Replacement
Write Policy
• Write-through
– Update both upper and lower levels
– Simplifies replacement, but may require write buffer
• Write-back
– Update upper level only
– Update lower level when block is replaced
– Need to keep more state
• Virtual memory
– Only write-back is feasible, given disk write latency
Sources of Misses
Concluding Remarks