0% found this document useful (0 votes)

76 views20 pages

Understanding Cache Hierarchies and Misses

The document discusses various topics related to cache hierarchies and optimizations: - It describes the different types of cache misses and techniques to reduce miss rates such as larger block/cache sizes and higher associativity. - It covers cache basics like write policies and priority handling. Prefetching methods like stream buffers and stride-based prefetching are also summarized. - Examples of real computer caches are provided, including a dual-core Intel chip with private L3 caches and an 80-core prototype with a stacked die of SRAM cache. Shared versus private caches in multi-core designs are compared. - The concepts of uniform (UCA) and non-uniform (NUCA) cache architectures are

Uploaded by

Matthew Battle

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views20 pages

Understanding Cache Hierarchies and Misses

Uploaded by

Matthew Battle

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Lecture: Cache Hierarchies

Topics: cache innovations (Sections B.1-B.3, 2.1)

1
Types of Cache Misses

Compulsory misses: happens the first time a memory

word is accessed the misses for an infinite cache

Capacity misses: happens because the program touched

many other words before re-touching the same word the
misses for a fully-associative cache

Conflict misses: happens because two words map to the

same location in the cache the misses generated while
moving from a fully-associative to a direct-mapped cache

Sidenote: can a fully-associative cache have more misses

than a direct-mapped cache of the same size?
2
Reducing Miss Rate

Large block size reduces compulsory misses, reduces

miss penalty in case of spatial locality increases traffic
between different levels, space waste, and conflict misses

Large cache reduces capacity/conflict misses access

time penalty

High associativity reduces conflict misses rule of thumb:

2-way cache of capacity N/2 has the same miss rate as
1-way cache of capacity N more energy

3
More Cache Basics
L1 caches are split as instruction and data; L2 and L3
are unified

The L1/L2 hierarchy can be inclusive, exclusive, or

non-inclusive

On a write, you can do write-allocate or write-no-allocate

On a write, you can do writeback or write-through;

write-back reduces traffic, write-through simplifies coherence

Reads get higher priority; writes are usually buffered

L1 does parallel tag/data access; L2/L3 does serial tag/data

4
Techniques to Reduce Cache Misses

Victim caches

Better replacement policies pseudo-LRU, NRU, DRRIP

Cache compression

5
Victim Caches

A direct-mapped cache suffers from misses because

multiple pieces of data map to the same location

The processor often tries to access data that it recently

discarded all discards are placed in a small victim cache
(4 or 8 entries) the victim cache is checked before going
to L2

Can be viewed as additional associativity for a few sets

that tend to have the most conflicts

6
Replacement Policies

Pseudo-LRU: maintain a tree and keep track of which

side of the tree was touched more recently; simple bit ops

NRU: every block in a set has a bit; the bit is made zero
when the block is touched; if all are zero, make all one;
a block with bit set to 1 is evicted

DRRIP: use multiple (say, 3) NRU bits; incoming blocks

are set to a high number (say 6), so they are close to
being evicted; similar to placing an incoming block near
the head of the LRU list instead of near the tail

7
Tolerating Miss Penalty

Out of order execution: can do other useful work while

waiting for the miss can have multiple cache misses
-- cache controller has to keep track of multiple
outstanding misses (non-blocking cache)

Hardware and software prefetching into prefetch buffers

aggressive prefetching can increase contention for buses

8
Stream Buffers

Simplest form of prefetch: on every miss, bring in

multiple cache lines

When you read the top of the queue, bring in the next line

Sequential lines
L1
Stream buffer

9
Stride-Based Prefetching

For each load, keep track of the last address accessed

by the load and a possibly consistent stride

FSM detects consistent stride and issues prefetches

incorrect
init steady
correct
correct
incorrect
(update stride) PC tag prev_addr stride state
correct

correct
trans no-pred
incorrect
(update stride) incorrect
10
(update stride)
Prefetching

Hardware prefetching can be employed for any of the

cache levels

It can introduce cache pollution prefetched data is

often placed in a separate prefetch buffer to avoid
pollution this buffer must be looked up in parallel
with the cache access

Aggressive prefetching increases coverage, but leads

to a reduction in accuracy wasted memory bandwidth

Prefetches must be timely: they must be issued sufficiently

in advance to hide the latency, but not too early (to avoid
pollution and eviction before use) 11
Intel Montecito Cache

Two cores, each

with a private
12 MB L3 cache
and 1 MB L2

Naffziger et al., Journal of Solid-State Circuits, 2006

12
Intel 80-Core Prototype Polaris

Prototype chip with an entire

die of SRAM cache stacked
upon the cores

13
Example Intel Studies

C C C C C C C C
Memory interface

L1 L1 L1 L1 L1 L1 L1 L1

L2 L2 L2 L2
Interconnect
L3
From Zhao et al.,
IO interface CMP-MSI Workshop 2007

L3 Cache sizes up to 32 MB

14
Shared Vs. Private Caches in Multi-Core

What are the pros/cons to a shared L2 cache?

P1 P2 P3 P4 P1 P2 P3 P4

L1 L1 L1 L1 L1 L1 L1 L1

L2 L2 L2 L2
L2

15
Shared Vs. Private Caches in Multi-Core

Advantages of a shared cache:

Space is dynamically allocated among cores
No waste of space because of replication
Potentially faster cache coherence (and easier to
locate data on a miss)

Advantages of a private cache:

small L2 faster access time
private bus to L2 less contention

16
UCA and NUCA

The small-sized caches so far have all been uniform cache

access: the latency for any access is a constant, no matter
where data is found

For a large multi-megabyte cache, it is expensive to limit

access time by the worst case delay: hence, non-uniform
cache architecture

17
Large NUCA

Issues to be addressed for

Non-Uniform Cache Access:

Mapping

CPU Migration

Replication

18
Shared NUCA Cache

A single tile composed

of a core, L1 caches, and
Core 0 Core 1 Core 2 Core 3
a bank (slice) of the
L1 L1 L1 L1 L1 L1 L1 L1 shared L2 cache
D$ I$ D$ I$ D$ I$ D$ I$
L2 $ L2 $ L2 $ L2 $

Core 4 Core 5 Core 6 Core 7

The cache controller
L1 L1 L1 L1 L1 L1 L1 L1 forwards address requests
D$ I$ D$ I$ D$ I$ D$ I$ to the appropriate L2 bank
L2 $ L2 $ L2 $ L2 $ and handles coherence
operations

Memory Controller for off-chip access

Title

Bullet

10 Caches
No ratings yet
10 Caches
34 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
10 Caches
No ratings yet
10 Caches
124 pages
Unit II
No ratings yet
Unit II
9 pages
Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday
No ratings yet
Lecture 19: Cache Basics: Today's Topics: Out-Of-Order Execution Cache Hierarchies Reminder: Assignment 7 Due On Thursday
17 pages
Memory Hierarchy and Cache Optimization
No ratings yet
Memory Hierarchy and Cache Optimization
36 pages
Lecture 12: Cache Innovations
No ratings yet
Lecture 12: Cache Innovations
17 pages
Intel Optane Memory Business Overview
No ratings yet
Intel Optane Memory Business Overview
27 pages
Memory Hierarchy for Engineers
No ratings yet
Memory Hierarchy for Engineers
15 pages
Memory 2
No ratings yet
Memory 2
31 pages
Module4 CAche Performance
No ratings yet
Module4 CAche Performance
40 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
2015Sp CS61C L16 Kavs Caches3
No ratings yet
2015Sp CS61C L16 Kavs Caches3
25 pages
Optimize Cache Performance Techniques
No ratings yet
Optimize Cache Performance Techniques
41 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
5.2 Eleven Advanced Optimizations of Cache Performance
No ratings yet
5.2 Eleven Advanced Optimizations of Cache Performance
13 pages
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
No ratings yet
Computer Architecture: Assoc. Prof. Nguyễn Trí Thành, Phd
55 pages
EECS 470 Final Review
No ratings yet
EECS 470 Final Review
16 pages
Memory Hierarchy for Engineers
No ratings yet
Memory Hierarchy for Engineers
32 pages
Stanford Advanced Caches
No ratings yet
Stanford Advanced Caches
46 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
4 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
23 pages
Cache
No ratings yet
Cache
36 pages
Cache Memory for CS Students
No ratings yet
Cache Memory for CS Students
45 pages
Module 5
No ratings yet
Module 5
17 pages
Unit 4
No ratings yet
Unit 4
72 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
57 pages
Sampriya Chandra Cache Memory
No ratings yet
Sampriya Chandra Cache Memory
36 pages
Cache
No ratings yet
Cache
34 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
Memory Hierarchy in Computer Architecture
No ratings yet
Memory Hierarchy in Computer Architecture
48 pages
Memory Hierarchy and Cache Optimization
No ratings yet
Memory Hierarchy and Cache Optimization
20 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
Memory & Cache Fundamentals
No ratings yet
Memory & Cache Fundamentals
38 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
No ratings yet
Computer Organization and Architecture Chapter 7 Large and Fast Exploiting
32 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
Cache Memory for Tech Enthusiasts
No ratings yet
Cache Memory for Tech Enthusiasts
3 pages
10 Caches Detail
No ratings yet
10 Caches Detail
45 pages
Lecture 16
No ratings yet
Lecture 16
22 pages
Cache Performance and Write Strategies
No ratings yet
Cache Performance and Write Strategies
30 pages
Cache Memory Organization Guide
No ratings yet
Cache Memory Organization Guide
19 pages
Cache
No ratings yet
Cache
7 pages
High Performance Computing: Jeremy R. Johnson
No ratings yet
High Performance Computing: Jeremy R. Johnson
19 pages
09 Caches Tlbs
No ratings yet
09 Caches Tlbs
33 pages
Shared Memory Architecture Concepts and Performance Issues: Outline
No ratings yet
Shared Memory Architecture Concepts and Performance Issues: Outline
7 pages
Cache PPT
No ratings yet
Cache PPT
38 pages
Cache Memory Essentials
No ratings yet
Cache Memory Essentials
36 pages
Memory Hierarchy Design in Computer Architecture
No ratings yet
Memory Hierarchy Design in Computer Architecture
18 pages
Gesture Controlled Video Playback: Overview of The Project
No ratings yet
Gesture Controlled Video Playback: Overview of The Project
4 pages
Yusong Guo, Zhen Yu and Weimei Zhao
No ratings yet
Yusong Guo, Zhen Yu and Weimei Zhao
6 pages
The High Security Smart Helmet Using Internet of Things: Abstract
No ratings yet
The High Security Smart Helmet Using Internet of Things: Abstract
12 pages
Start-Up Sample BP
100% (2)
Start-Up Sample BP
54 pages
ARM7 LPC2148 Timer Guide
No ratings yet
ARM7 LPC2148 Timer Guide
26 pages
ARM7 LPC2148 Timer Guide
No ratings yet
ARM7 LPC2148 Timer Guide
26 pages
Logic Design
No ratings yet
Logic Design
2 pages
Cache Memory Design & Policies
No ratings yet
Cache Memory Design & Policies
56 pages
ICT Applications For Environmental Management in Korea: Park, Sang Hyun
No ratings yet
ICT Applications For Environmental Management in Korea: Park, Sang Hyun
24 pages
A Ac Ca Ad de em Mi Ic C R Re Es Se Ea Ar RC CH H D Di IV VI Is Si Io On N
No ratings yet
A Ac Ca Ad de em Mi Ic C R Re Es Se Ea Ar RC CH H D Di IV VI Is Si Io On N
1 page
Technological Improvement in Medical Field Using Final
No ratings yet
Technological Improvement in Medical Field Using Final
4 pages
P - CNT - Ce CNT - RST: Counter 7-Bit Ce Reset Q TC CLK Counter 7-Bit Ce Reset Q
No ratings yet
P - CNT - Ce CNT - RST: Counter 7-Bit Ce Reset Q TC CLK Counter 7-Bit Ce Reset Q
7 pages
Check List For 3G Huawei Equipments (Outdoor Site APM+IBBS-1099)
No ratings yet
Check List For 3G Huawei Equipments (Outdoor Site APM+IBBS-1099)
34 pages
Process Scheduler Issues
No ratings yet
Process Scheduler Issues
7 pages
MTX Terminal Accessories Guide
No ratings yet
MTX Terminal Accessories Guide
16 pages
Final2 TYBCA Slips SemVI
No ratings yet
Final2 TYBCA Slips SemVI
31 pages
Samsung Color Laser Printer CLP 510 510N Parts and Service Manual
No ratings yet
Samsung Color Laser Printer CLP 510 510N Parts and Service Manual
227 pages
E1 994D Specalog
100% (1)
E1 994D Specalog
24 pages
Teradata Performance Tuning and Optimization
100% (1)
Teradata Performance Tuning and Optimization
9 pages
Symantec Drive Encryption POC Guide
No ratings yet
Symantec Drive Encryption POC Guide
9 pages
Informatin Technology
No ratings yet
Informatin Technology
49 pages
Distributed Systems: Xining Li
No ratings yet
Distributed Systems: Xining Li
21 pages
APL V80 SP2-Readme en
No ratings yet
APL V80 SP2-Readme en
16 pages
MAX660 Power Supply Repair Guide
60% (5)
MAX660 Power Supply Repair Guide
16 pages
330 - 11th Computer Science - Chapter 1-8 Book Back One Marks
No ratings yet
330 - 11th Computer Science - Chapter 1-8 Book Back One Marks
9 pages
Linux Admin Courses for IT Pros
No ratings yet
Linux Admin Courses for IT Pros
6 pages
Votano 100
No ratings yet
Votano 100
26 pages
What Are The Benefits of An Information System in Accounting?
No ratings yet
What Are The Benefits of An Information System in Accounting?
5 pages
CNC Controller III
No ratings yet
CNC Controller III
22 pages
Budget-Friendly 24" FHD Monitor
No ratings yet
Budget-Friendly 24" FHD Monitor
16 pages
NVR Am Original
No ratings yet
NVR Am Original
1,006 pages
VLSI Lab Manual V Sem July16
No ratings yet
VLSI Lab Manual V Sem July16
65 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
48 pages
Battery Life Optimization in Mobile Devices With Internet Usage
No ratings yet
Battery Life Optimization in Mobile Devices With Internet Usage
6 pages
Autodesk Navisworks Manage 2010: DR - Le Hung Tien
No ratings yet
Autodesk Navisworks Manage 2010: DR - Le Hung Tien
25 pages
Pci e Error Rate
No ratings yet
Pci e Error Rate
6 pages
RF Unit TX Gain Alarm Overview
100% (1)
RF Unit TX Gain Alarm Overview
2 pages
Pneumatic Double Axis JCB Equipment
No ratings yet
Pneumatic Double Axis JCB Equipment
8 pages
شيت مختبر المايكرو
No ratings yet
شيت مختبر المايكرو
51 pages
Arduino Software Notes
No ratings yet
Arduino Software Notes
3 pages
C7 633P Panel
No ratings yet
C7 633P Panel
258 pages

Understanding Cache Hierarchies and Misses

Uploaded by

Understanding Cache Hierarchies and Misses

Uploaded by

Lecture: Cache Hierarchies

Topics: cache innovations (Sections B.1-B.3, 2.1)

Compulsory misses: happens the first time a memory

Capacity misses: happens because the program touched

Conflict misses: happens because two words map to the

Sidenote: can a fully-associative cache have more misses

Large block size reduces compulsory misses, reduces

Large cache reduces capacity/conflict misses access

High associativity reduces conflict misses rule of thumb:

The L1/L2 hierarchy can be inclusive, exclusive, or

On a write, you can do write-allocate or write-no-allocate

On a write, you can do writeback or write-through;

Reads get higher priority; writes are usually buffered

L1 does parallel tag/data access; L2/L3 does serial tag/data

Better replacement policies pseudo-LRU, NRU, DRRIP

A direct-mapped cache suffers from misses because

The processor often tries to access data that it recently

Can be viewed as additional associativity for a few sets

Pseudo-LRU: maintain a tree and keep track of which

DRRIP: use multiple (say, 3) NRU bits; incoming blocks

Out of order execution: can do other useful work while

Hardware and software prefetching into prefetch buffers

Simplest form of prefetch: on every miss, bring in

For each load, keep track of the last address accessed

FSM detects consistent stride and issues prefetches

Hardware prefetching can be employed for any of the

It can introduce cache pollution prefetched data is

Aggressive prefetching increases coverage, but leads

Prefetches must be timely: they must be issued sufficiently

Two cores, each

Naffziger et al., Journal of Solid-State Circuits, 2006

Prototype chip with an entire

What are the pros/cons to a shared L2 cache?

Advantages of a shared cache:

Advantages of a private cache:

The small-sized caches so far have all been uniform cache

For a large multi-megabyte cache, it is expensive to limit

Issues to be addressed for

A single tile composed

Core 4 Core 5 Core 6 Core 7

Memory Controller for off-chip access

You might also like