SlideShare a Scribd company logo
Memory Hierarchy
CS4342 Advanced Computer Architecture
Dilum Bandara
Dilum.Bandara@uom.lk
Slides adapted from “Computer Architecture, A Quantitative Approach” by
John L. Hennessy and David A. Patterson, 5th Edition, 2012, Morgan
Kaufmann Publishers
Processor-Memory Performance Gap
2
Gap grew 50%
per year
Why Memory Hierarchy?
 Applications want unlimited amounts of memory
with low latency
 Fast memory is more expensive per bit
 Solution
 Organize memory system into a hierarchy
 Entire addressable memory space available in largest,
slowest memory
 Incrementally smaller & faster memories
 Temporal & spatial locality ensures that nearly all
references can be found in smaller memories
 Gives illusion of a large, fast memory being presented
to processor 3
Memory Hierarchy
4
Why Hierarchical Design?
 Becomes more crucial with multi-core
processors
 Aggregate peak bandwidth grows with no of cores
 Intel Core i7 can generate 2 references per core per clock
 4 cores and 3.2 GHz clock
 25.6 billion 64-bit data references/second +
12.8 billion 128-bit instruction references
= 409.6 GB/s
 DRAM bandwidth is only 6% of this (25 GB/s)
 Requires
 Multi-port, pipelined caches
 2 levels of cache per core
 Shared third-level cache on chip 5
Core i7 Die & Major Components
6
Source: Intel
Performance vs. Power
 High-end microprocessors have >10 MB on-chip
cache
 Consumes large amount of chip area & power budget
 Leakage current – when not operating
 Active current – when operating
 Major limiting factor for processors used in
mobile devices
7
Definitions – Blocks
 Multiple blocks are moved between levels in the
hierarchy
 Spatial locality  efficiency
 Blocks are tagged with memory address
 Tags are searched parallel
8
Source: http://archive.arstechnica.com/paedia/c/caching/m-caching-5.html
Definitions – Associativity
 Defines where blocks can be placed in a cache
9
Pentium 4 vs. Opteron Memory
Hierarchy
10
CPU Pentium 4 (3.2
GHz)
Opteron (2.8 GHz)
Instruction
Cache
Trace Cache 8K
micro-ops)
2-way associative,
64 KB, 64B block
Data Cache 8-way associative,
16 KB, 64B block,
inclusive in L2
2-way associative,
64 KB, 64B block,
exclusive to L2
L2 Cache 8-way associative, 2
MB, 128B block
16-way associative,
1 MB, 64B block
Prefetch 8 streams to L2 1 stream to L2
Memory 200 MHz x 64 bits 200 MHz x 128 bits
Definitions – Updating Cache
 Write-through
 Update cache block & all other levels below
 Use write buffers to speed up
 Write-back
 Update cache block
 Update lower level when replacing cached block
 Use write buffers to speed up
11
Definitions – Replacing Cached Blocks
 Cache replacement policies
 Random
 Least Recently Used (LRU)
 Need to track last access time
 Least Frequently Used (LFU)
 Need to track no of accesses
 First In First Out (FIFO)
12
Definitions – Cache Misses
 When required items is not found in cache
 Miss rate – fraction of cache accesses that result
in a failure
 Types of misses
 Compulsory – 1st access to a block
 Capacity – limited cache capacity force blocked to be
removed from a cache & later retrieved
 Conflict – if placement strategy is not fully associative
 Average memory access time
= Hit time + Miss rate x Miss penalty 13
Definitions – Cache Misses (Cont.)
 Memory stall cycles
= Instruction count x Fraction of memory access per
instructions x Miss rate x Miss Penalty
 Fraction of memory access per instructions
= Instruction memory access per instruction + Data memory
access per instruction
 Example
 50% instructions are load & store. Miss rate is 2% &
penalty is 25 clock cycles. Suppose CPI is 1. How fast
can this be if all instructions are cache hits?
IC x (1 + 0.75) x CC = 1.75
IC x 1 x CC 14
Cache Performance Metrics
 Hit time
 Miss rate
 Miss penalty
 Cache bandwidth
 Power consumption
15
6 Basic Cache Optimization Techniques
1. Larger block sizes
 Reduce compulsory misses
 Increase capacity & conflict misses
 Increase miss penalty
 Choosing a correct block size is challenging
2. Larger total cache capacity to reduce miss rate
 Reduce misses
 Increase hit time
 Increase power consumption & cost
3. Higher no of cache levels
 Reduce overall memory access time 16
6 Basic Cache Optimization Techniques
(Cont.)
4. Higher associativity
 Reduce conflict misses
 Increase hit time
 Increase power consumption
5. Giving priority to read misses over writes
 Allow reads to check write buffer
 Reduce miss penalty
6. Avoiding address translation in cache indexing
 Virtual to physical address mapping
 Reduce hit time
17
10 Advanced Cache Optimization
Techniques
 5 categories
1. Reducing hit time
2. Increasing cache bandwidth
3. Reducing miss penalty
4. Reducing miss rate
5. Reducing miss penalty or miss rate via parallelism
18
Advanced Optimizations 1
 Small & simple 1st level caches
 Recently size of L1 cache increased either slightly or
not at all
 Critical timing path in a cache hit
 addressing tag memory, then
 comparing tags, then
 selecting correct set
 Direct-mapped caches can overlap tag compare &
transmission of data
 Improve hit time
 Lower associativity reduces power because fewer
cache lines are accessed
19
L1 Size & Associativity – Access Time
20
L1 Size & Associativity – Energy
21
Advanced Optimizations 2
 Way Prediction
 Given access to the current block, predict which block
to access next
 Improve hit time
 Mis-prediction increase hit time
 Prediction accuracy
 > 90% for 2-way
 > 80% for 4-way
 Instruction cache has better accuracy than Data cache
 First used on MIPS R10000 in mid-90s
 Used on ARM Cortex-A8
22
Advanced Optimizations 3
 Pipeline cache access
 Enable L1 cache access to be multiple cycles
 Examples
 Pentium – 1 cycle
 Pentium Pro to Pentium III – 2 cycles
 Pentium 4 to Core i7 – 4 cycles
 Improve bandwidth
 Makes it easier to increase associativity
 Increase hit time
 Increases branch mis-prediction penalty
23
Advanced Optimizations 4
 Nonblocking Caches
 Allow hits before previous
misses complete
 “Hit under miss”
 “Hit under multiple miss”
 L2 must support this
 In general, processors can
hide L1 miss penalty but
not L2 miss penalty
 Increase bandwidth
24
Advanced Optimizations 5
 Multibanked Caches
 Organize cache as independent banks to support
simultaneous access
 Examples
 ARM Cortex-A8 supports 1-4 banks for L2
 Intel i7 supports 4 banks for L1 & 8 banks for L2
 Interleave banks according to block address
 Increase bandwidth
25
Advanced Optimizations 6
 Critical Word First, Early Restart
 Critical word first
 Request missed word from memory first
 Send it to processor as soon as it arrives
 Early restart
 Request words in normal order
 Send missed work to processor as soon as it arrives
 Reduce miss penalty
 Effectiveness depends on block size & likelihood of
another access to portion of the block that has not yet
been fetched
26
Advanced Optimizations 7 - 10
 Merging Write Buffer
 Reduce miss penalty
 Compiler Optimizations
 Examples
 Loop Interchange – Swap nested loops to access memory in
sequential order
 Instead of accessing entire rows or columns, subdivide matrices into
blocks
 Reduce miss rate
 Hardware Prefetching
 Fetch 2 blocks on miss
 Reduce miss penalty or miss rate
 Compiler Prefetching
 Reduce miss penalty or miss rate 27
Summary of Techniques
28
Memory Technologies
 Performance metrics
 Latency is concern of cache
 Bandwidth is concern of multiprocessors & I/O
 Access time
 Time between read request & when desired word arrives
 Cycle time
 Minimum time between unrelated requests to memory
 DRAM used for main memory
 SRAM used for cache
29
Memory Technology (Cont.)
 Amdahl
 Memory capacity should grow linearly with processor
speed
 Unfortunately, memory capacity & speed hasn’t kept
pace with processors
 Some optimizations
 Multiple accesses to same row
 Synchronous DRAM (SDRAM)
 Added clock to DRAM interface
 Burst mode with critical word first
 Wider interfaces
 Double data rate (DDR)
 Multiple banks on each DRAM device 30
DRAM Optimizations
31
MB/sec = Clock rate x 2 x 8 bytes
DRAM Power Consumption
 Reducing power in DRAMs
 Lower voltage
 Low power mode (ignores clock, continues to refresh)
32
Flash Memory
 Type of EEPROM
 Must be erased (in blocks) before being
overwritten
 Non volatile
 Limited no of write cycles
 Cheaper than DRAM, more expensive than disk
 Slower than SRAM, faster than disk
33
Modern Memory Hierarchy
34
Source: http://blog.teachbook.com.au/index.php/2012/02/memory-hierarchy/
Intel Optane Non-volatile Memory
35
Source: www.forbes.com/sites/tomcoughlin/2018/06/11/intel-optane-finally-on-dimms/#5792e114190b
Intel Optane (Cont.)
36
Source: www.anandtech.com/show/9541/intel-announces-optane-storage-brand-for-3d-xpoint-products
Virtual Memory
 Each process has its own address space
 Protection via virtual memory
 Keeps processes in their own memory space
 Role of architecture
 Provide user mode & supervisor mode
 Protect certain aspects of CPU state
 Provide mechanisms for switching between user
mode & supervisor mode
 Provide mechanisms to limit memory accesses
 Provide TLB to translate addresses
37
Paging Hardware With TLB
 Parallel search on TLB
 Address translation (p, d)
 If p is in associative register,
get frame # out
 Otherwise get frame # from
page table in memory
Summary
 Caching techniques are continuing to evolve
 Combination of techniques are combined
 Cache sizes are unlikely to increase significantly
 Better performance when programs are
optimized based on cache architecture
39

More Related Content

PPTX
memorytechnologyandoptimization-140416131506-phpapp02.pptx
PDF
Unit I Memory technology and optimization
PPTX
Memory technology and optimization in Advance Computer Architechture
PPTX
Memory Organization
PPTX
Memory Hierarchy Design, Basics, Cache Optimization, Address Translation
PPTX
hierarchical memory technology.pptx
PPTX
2021Arch_5_ch2_2.pptx How to improve the performance of Memory hierarchy
PPTX
cache cache memory memory cache memory.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptx
Unit I Memory technology and optimization
Memory technology and optimization in Advance Computer Architechture
Memory Organization
Memory Hierarchy Design, Basics, Cache Optimization, Address Translation
hierarchical memory technology.pptx
2021Arch_5_ch2_2.pptx How to improve the performance of Memory hierarchy
cache cache memory memory cache memory.pptx

Similar to CPU Memory Hierarchy and Caching Techniques (20)

PPTX
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
PPTX
CAQA5e_ch2.pptx memory hierarchy design storage
PDF
1083 wang
PPT
Memory Mapping Cache
PDF
Architecture_L5 (3).pdf wwwwwwwwwwwwwwwwwwwwwwwwwww
PPTX
Study of various factors affecting performance of multi core processors
PDF
Computer architecture
PPTX
Intelligent ram
PDF
Computer architecture for HNDIT
PPT
Chapter 5 a
PPT
Kiến trúc máy tính - COE 301 - Memory.ppt
PPT
Ways to reduce misses
PPTX
Computer System Architecture Lecture Note 8.1 primary Memory
PDF
computer-memory
PPT
Memory Hierarchy PPT of Computer Organization
PPT
Cache memory
PPSX
Coa presentation3
PDF
Multicore Computers
PPT
Cache memory and cache
PDF
PPT_on_Cache_Partitioning_Techniques.pdf
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
CAQA5e_ch2.pptx memory hierarchy design storage
1083 wang
Memory Mapping Cache
Architecture_L5 (3).pdf wwwwwwwwwwwwwwwwwwwwwwwwwww
Study of various factors affecting performance of multi core processors
Computer architecture
Intelligent ram
Computer architecture for HNDIT
Chapter 5 a
Kiến trúc máy tính - COE 301 - Memory.ppt
Ways to reduce misses
Computer System Architecture Lecture Note 8.1 primary Memory
computer-memory
Memory Hierarchy PPT of Computer Organization
Cache memory
Coa presentation3
Multicore Computers
Cache memory and cache
PPT_on_Cache_Partitioning_Techniques.pdf
Ad

More from Dilum Bandara (20)

PPTX
Designing for Multiple Blockchains in Industry Ecosystems
PPTX
Introduction to Machine Learning
PPTX
Time Series Analysis and Forecasting in Practice
PPTX
Introduction to Dimension Reduction with PCA
PPTX
Introduction to Descriptive & Predictive Analytics
PPTX
Introduction to Concurrent Data Structures
PPTX
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
PPTX
Introduction to Map-Reduce Programming with Hadoop
PPTX
Embarrassingly/Delightfully Parallel Problems
PPTX
Introduction to Warehouse-Scale Computers
PPTX
Introduction to Thread Level Parallelism
PPTX
Data-Level Parallelism in Microprocessors
PDF
Instruction Level Parallelism – Hardware Techniques
PPTX
Instruction Level Parallelism – Compiler Techniques
PPTX
CPU Pipelining and Hazards - An Introduction
PPTX
Advanced Computer Architecture – An Introduction
PPTX
High Performance Networking with Advanced TCP
PPTX
Introduction to Content Delivery Networks
PPTX
Peer-to-Peer Networking Systems and Streaming
PPTX
Mobile Services
Designing for Multiple Blockchains in Industry Ecosystems
Introduction to Machine Learning
Time Series Analysis and Forecasting in Practice
Introduction to Dimension Reduction with PCA
Introduction to Descriptive & Predictive Analytics
Introduction to Concurrent Data Structures
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Introduction to Map-Reduce Programming with Hadoop
Embarrassingly/Delightfully Parallel Problems
Introduction to Warehouse-Scale Computers
Introduction to Thread Level Parallelism
Data-Level Parallelism in Microprocessors
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Compiler Techniques
CPU Pipelining and Hazards - An Introduction
Advanced Computer Architecture – An Introduction
High Performance Networking with Advanced TCP
Introduction to Content Delivery Networks
Peer-to-Peer Networking Systems and Streaming
Mobile Services
Ad

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
KodekX | Application Modernization Development
PDF
Electronic commerce courselecture one. Pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Per capita expenditure prediction using model stacking based on satellite ima...
Chapter 3 Spatial Domain Image Processing.pdf
Modernizing your data center with Dell and AMD
Advanced Soft Computing BINUS July 2025.pdf
KodekX | Application Modernization Development
Electronic commerce courselecture one. Pdf
cuic standard and advanced reporting.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Mobile App Security Testing_ A Comprehensive Guide.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Machine learning based COVID-19 study performance prediction
Understanding_Digital_Forensics_Presentation.pptx
Review of recent advances in non-invasive hemoglobin estimation
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Big Data Technologies - Introduction.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...

CPU Memory Hierarchy and Caching Techniques

  • 1. Memory Hierarchy CS4342 Advanced Computer Architecture Dilum Bandara [email protected] Slides adapted from “Computer Architecture, A Quantitative Approach” by John L. Hennessy and David A. Patterson, 5th Edition, 2012, Morgan Kaufmann Publishers
  • 3. Why Memory Hierarchy?  Applications want unlimited amounts of memory with low latency  Fast memory is more expensive per bit  Solution  Organize memory system into a hierarchy  Entire addressable memory space available in largest, slowest memory  Incrementally smaller & faster memories  Temporal & spatial locality ensures that nearly all references can be found in smaller memories  Gives illusion of a large, fast memory being presented to processor 3
  • 5. Why Hierarchical Design?  Becomes more crucial with multi-core processors  Aggregate peak bandwidth grows with no of cores  Intel Core i7 can generate 2 references per core per clock  4 cores and 3.2 GHz clock  25.6 billion 64-bit data references/second + 12.8 billion 128-bit instruction references = 409.6 GB/s  DRAM bandwidth is only 6% of this (25 GB/s)  Requires  Multi-port, pipelined caches  2 levels of cache per core  Shared third-level cache on chip 5
  • 6. Core i7 Die & Major Components 6 Source: Intel
  • 7. Performance vs. Power  High-end microprocessors have >10 MB on-chip cache  Consumes large amount of chip area & power budget  Leakage current – when not operating  Active current – when operating  Major limiting factor for processors used in mobile devices 7
  • 8. Definitions – Blocks  Multiple blocks are moved between levels in the hierarchy  Spatial locality  efficiency  Blocks are tagged with memory address  Tags are searched parallel 8 Source: http://archive.arstechnica.com/paedia/c/caching/m-caching-5.html
  • 9. Definitions – Associativity  Defines where blocks can be placed in a cache 9
  • 10. Pentium 4 vs. Opteron Memory Hierarchy 10 CPU Pentium 4 (3.2 GHz) Opteron (2.8 GHz) Instruction Cache Trace Cache 8K micro-ops) 2-way associative, 64 KB, 64B block Data Cache 8-way associative, 16 KB, 64B block, inclusive in L2 2-way associative, 64 KB, 64B block, exclusive to L2 L2 Cache 8-way associative, 2 MB, 128B block 16-way associative, 1 MB, 64B block Prefetch 8 streams to L2 1 stream to L2 Memory 200 MHz x 64 bits 200 MHz x 128 bits
  • 11. Definitions – Updating Cache  Write-through  Update cache block & all other levels below  Use write buffers to speed up  Write-back  Update cache block  Update lower level when replacing cached block  Use write buffers to speed up 11
  • 12. Definitions – Replacing Cached Blocks  Cache replacement policies  Random  Least Recently Used (LRU)  Need to track last access time  Least Frequently Used (LFU)  Need to track no of accesses  First In First Out (FIFO) 12
  • 13. Definitions – Cache Misses  When required items is not found in cache  Miss rate – fraction of cache accesses that result in a failure  Types of misses  Compulsory – 1st access to a block  Capacity – limited cache capacity force blocked to be removed from a cache & later retrieved  Conflict – if placement strategy is not fully associative  Average memory access time = Hit time + Miss rate x Miss penalty 13
  • 14. Definitions – Cache Misses (Cont.)  Memory stall cycles = Instruction count x Fraction of memory access per instructions x Miss rate x Miss Penalty  Fraction of memory access per instructions = Instruction memory access per instruction + Data memory access per instruction  Example  50% instructions are load & store. Miss rate is 2% & penalty is 25 clock cycles. Suppose CPI is 1. How fast can this be if all instructions are cache hits? IC x (1 + 0.75) x CC = 1.75 IC x 1 x CC 14
  • 15. Cache Performance Metrics  Hit time  Miss rate  Miss penalty  Cache bandwidth  Power consumption 15
  • 16. 6 Basic Cache Optimization Techniques 1. Larger block sizes  Reduce compulsory misses  Increase capacity & conflict misses  Increase miss penalty  Choosing a correct block size is challenging 2. Larger total cache capacity to reduce miss rate  Reduce misses  Increase hit time  Increase power consumption & cost 3. Higher no of cache levels  Reduce overall memory access time 16
  • 17. 6 Basic Cache Optimization Techniques (Cont.) 4. Higher associativity  Reduce conflict misses  Increase hit time  Increase power consumption 5. Giving priority to read misses over writes  Allow reads to check write buffer  Reduce miss penalty 6. Avoiding address translation in cache indexing  Virtual to physical address mapping  Reduce hit time 17
  • 18. 10 Advanced Cache Optimization Techniques  5 categories 1. Reducing hit time 2. Increasing cache bandwidth 3. Reducing miss penalty 4. Reducing miss rate 5. Reducing miss penalty or miss rate via parallelism 18
  • 19. Advanced Optimizations 1  Small & simple 1st level caches  Recently size of L1 cache increased either slightly or not at all  Critical timing path in a cache hit  addressing tag memory, then  comparing tags, then  selecting correct set  Direct-mapped caches can overlap tag compare & transmission of data  Improve hit time  Lower associativity reduces power because fewer cache lines are accessed 19
  • 20. L1 Size & Associativity – Access Time 20
  • 21. L1 Size & Associativity – Energy 21
  • 22. Advanced Optimizations 2  Way Prediction  Given access to the current block, predict which block to access next  Improve hit time  Mis-prediction increase hit time  Prediction accuracy  > 90% for 2-way  > 80% for 4-way  Instruction cache has better accuracy than Data cache  First used on MIPS R10000 in mid-90s  Used on ARM Cortex-A8 22
  • 23. Advanced Optimizations 3  Pipeline cache access  Enable L1 cache access to be multiple cycles  Examples  Pentium – 1 cycle  Pentium Pro to Pentium III – 2 cycles  Pentium 4 to Core i7 – 4 cycles  Improve bandwidth  Makes it easier to increase associativity  Increase hit time  Increases branch mis-prediction penalty 23
  • 24. Advanced Optimizations 4  Nonblocking Caches  Allow hits before previous misses complete  “Hit under miss”  “Hit under multiple miss”  L2 must support this  In general, processors can hide L1 miss penalty but not L2 miss penalty  Increase bandwidth 24
  • 25. Advanced Optimizations 5  Multibanked Caches  Organize cache as independent banks to support simultaneous access  Examples  ARM Cortex-A8 supports 1-4 banks for L2  Intel i7 supports 4 banks for L1 & 8 banks for L2  Interleave banks according to block address  Increase bandwidth 25
  • 26. Advanced Optimizations 6  Critical Word First, Early Restart  Critical word first  Request missed word from memory first  Send it to processor as soon as it arrives  Early restart  Request words in normal order  Send missed work to processor as soon as it arrives  Reduce miss penalty  Effectiveness depends on block size & likelihood of another access to portion of the block that has not yet been fetched 26
  • 27. Advanced Optimizations 7 - 10  Merging Write Buffer  Reduce miss penalty  Compiler Optimizations  Examples  Loop Interchange – Swap nested loops to access memory in sequential order  Instead of accessing entire rows or columns, subdivide matrices into blocks  Reduce miss rate  Hardware Prefetching  Fetch 2 blocks on miss  Reduce miss penalty or miss rate  Compiler Prefetching  Reduce miss penalty or miss rate 27
  • 29. Memory Technologies  Performance metrics  Latency is concern of cache  Bandwidth is concern of multiprocessors & I/O  Access time  Time between read request & when desired word arrives  Cycle time  Minimum time between unrelated requests to memory  DRAM used for main memory  SRAM used for cache 29
  • 30. Memory Technology (Cont.)  Amdahl  Memory capacity should grow linearly with processor speed  Unfortunately, memory capacity & speed hasn’t kept pace with processors  Some optimizations  Multiple accesses to same row  Synchronous DRAM (SDRAM)  Added clock to DRAM interface  Burst mode with critical word first  Wider interfaces  Double data rate (DDR)  Multiple banks on each DRAM device 30
  • 31. DRAM Optimizations 31 MB/sec = Clock rate x 2 x 8 bytes
  • 32. DRAM Power Consumption  Reducing power in DRAMs  Lower voltage  Low power mode (ignores clock, continues to refresh) 32
  • 33. Flash Memory  Type of EEPROM  Must be erased (in blocks) before being overwritten  Non volatile  Limited no of write cycles  Cheaper than DRAM, more expensive than disk  Slower than SRAM, faster than disk 33
  • 34. Modern Memory Hierarchy 34 Source: http://blog.teachbook.com.au/index.php/2012/02/memory-hierarchy/
  • 35. Intel Optane Non-volatile Memory 35 Source: www.forbes.com/sites/tomcoughlin/2018/06/11/intel-optane-finally-on-dimms/#5792e114190b
  • 36. Intel Optane (Cont.) 36 Source: www.anandtech.com/show/9541/intel-announces-optane-storage-brand-for-3d-xpoint-products
  • 37. Virtual Memory  Each process has its own address space  Protection via virtual memory  Keeps processes in their own memory space  Role of architecture  Provide user mode & supervisor mode  Protect certain aspects of CPU state  Provide mechanisms for switching between user mode & supervisor mode  Provide mechanisms to limit memory accesses  Provide TLB to translate addresses 37
  • 38. Paging Hardware With TLB  Parallel search on TLB  Address translation (p, d)  If p is in associative register, get frame # out  Otherwise get frame # from page table in memory
  • 39. Summary  Caching techniques are continuing to evolve  Combination of techniques are combined  Cache sizes are unlikely to increase significantly  Better performance when programs are optimized based on cache architecture 39