0% found this document useful (0 votes)

79 views41 pages

11 Memory

This document discusses different views of memory from the perspective of programmers, CPUs, and paging. It explains how memory is divided into pages that are mapped to physical locations using page tables. Translation lookaside buffers (TLBs) are used to cache these mappings to reduce lookup latency. The differences between SRAM and DRAM are outlined, including how DRAM uses row buffers and requires regular refreshes. Finally, techniques for improving memory performance like caching, faster DRAM speeds, and on-chip memory controllers are briefly discussed.

Uploaded by

Khoa Pham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views41 pages

11 Memory

Uploaded by

Khoa Pham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 41

CS6290 Memory

Views of Memory
Real machines have limited amounts of memory
640KB? A few GB? (This laptop = 2GB)

Programmer doesnt want to be bothered

Do you think, oh, this computer only has 128MB so Ill write my code this way What happens if you run on a different machine?

Programmers View
Example 32-bit memory
When programming, you dont care about how much real memory there is Even if you use a lot, memory can always be paged to disk
AKA Virtual Addresses 0-2GB

Kernel

Text Data

Heap

Stack 4GB

Programmers View
Really Programs View Each program/process gets its own 4GB space
Kernel Kernel Text Data Heap Text Data Heap Kernel

Stack

Text Data
Heap

Stack

CPUs View
At some point, the CPU is going to have to load-from/store-to memory all it knows is the real, A.K.A. physical memory

which unfortunately is often < 4GB and is never 4GB per process

Pages
Memory is divided into pages, which are nothing more than fixed sized and aligned regions of memory
Typical size: 4KB/page (but not always)
0-4095 4096-8191 8192-12287 Page 0 Page 1

Page 2
Page 3

12288-16383

Page Table
Map from virtual addresses to physical Physical locations 0K Addresses
0K 4K 4K Page Table implements this VP mapping 8K 12K 16K 20K 24K 28K Virtual Addresses Physical Location may include hard-disk

8K
12K

Page Tables
0K 4K

Physical Memory
0K 4K 8K 12K 16K 20K 24K 28K

8K
12K

0K 4K 8K

12K

Need for Translation

0xFC51908B Virtual Address Virtual Page Number Page Offset

Physical Address

0xFC519

Page Table

Main Memory 0x00152 0x0015208B

Simple Page Table

Flat organization
One entry per page Entry contains physical page number (PPN) or indicates page is on disk or invalid Also meta-data (e.g., permissions, dirtiness, etc.)

One entry per page

Multi-Level Page Tables

Virtual Page Number Level 1 Level 2 Page Offset

Physical Page Number

Choosing a Page Size

Page size inversely proportional to page table overhead Large page size permits more efficient transfer to/from disk
vs. many small transfers Like downloading from Internet

Small page leads to less fragmentation

Big page likely to have more bytes unused

CPU Memory Access

Program deals with virtual addresses
Load R1 = 0[R2]

On memory instruction
1. Compute virtual address (0[R2]) 2. Compute virtual page number 3. Compute physical address of VPNs page table entry Could be more depending On page table organization 4. Load* mapping 5. Compute physical address 6. Do the actual Load* from memory

Impact on Performance?
Every time you load/store, the CPU must perform two (or more) accesses! Even worse, every fetch requires translation of the PC! Observation:
Once a virtual page is mapped into a physical page, itll likely stay put for quite some time

Idea: Caching!
Not caching of data, but caching of translations
0K 0K 4K 8K 12K

Physical Addresses

4K
8K 12K 16K 20K 24K 28K

Virtual Addresses

VPN 8

0 4 12 8

20 4 X 16

PPN 16

Translation Cache: TLB

TLB = Translation Look-aside Buffer
Virtual Address TLB Physical Address Cache Data Cache Tags

Hit?

If TLB hit, no need to do page table lookup from memory

Note: data cache accessed by physical addresses now

PAPT Cache
Previous slide showed PhysicallyAddressed Physically-Tagged cache
Sometimes called PIPT (I=Indexed)

Con: TLB lookup and cache access serialized

Caches already take > 1 cycle

Pro: cache contents valid so long as page table not modified

Virtually Addressed Cache

Virtual Address Cache Data Cache Tags (VIVT: vitually indexed, virtually tagged) TLB Hit? On Cache Miss To L2 Physical Address

Pro: latency no need to check TLB Con: Cache must be flushed on process change
How to enforce permissions?

Virtually Indexed Physically Tagged

Virtual Address Cache Data Cache Tags Physical Tag = TLB Hit?

Physical Address

Big page size can help here

Pro: latency TLB parallelized Pro: dont need to flush $ on process swap Con: Limit on cache indexing (can only use bits not from the VPN/PPN)

TLB Design
Often fully-associative
For latency, this means few entries However, each entry is for a whole page Ex. 32-entry TLB, 4KB page how big of working set while avoiding TLB misses?

If many misses:
Increase TLB size (latency problems) Increase page size (fragmenation problems)

Process Changes
With physically-tagged caches, dont need to flush cache on context switch
But TLB is no longer valid! Add process ID to translation
PID:0 VPN:8 0 1 1 0 0 0 1 1 4 0 12 8 0 12 8 4 Only flush TLB when Recycling PIDs 20 32 36 28 16 8 44 52

PPN: 28

PID:1 VPN:8

PPN: 44

SRAM vs. DRAM

DRAM = Dynamic RAM SRAM: 6T per bit
built with normal high-speed CMOS technology

DRAM: 1T per bit

built with special DRAM process optimized for density

Hardware Structures

SRAM wordline wordline

DRAM

Implementing the Capacitor

Cell Plate Si

Trench Cell
Cap Insulator Storage Node Poly Si Substrate Field Oxide Refilling Poly

DRAM figures from this slide were taken from Prof. Nikolics EECS141/2003 Lecture notes from UC-Berkeley

DRAM Chip Organization

Row Decoder

Row Address

Memory Cell Array

Sense Amps Row Buffer Column Address Column Decoder Data Bus

DRAM Chip Organization (2)

Differences with SRAM
reads are destructive: contents are erased after reading

row buffer
read lots of bits all at once, and then parcel them out based on different column addresses
similar to reading a full cache line, but only accessing one word at a time

Fast-Page Mode FPM DRAM organizes the DRAM row to contain bits for a complete page
row address held constant, and then fast read from different locations from the same page

DRAM Read Operation

Row Decoder

0x1FE

Memory Cell Array

Sense Amps Row Buffer

0x001 0x000 0x002

Column Decoder Data Bus

Accesses need not be sequential

Destructive Read
Vdd 1 0
Wordline Enabled Sense Amp Enabled

sense amp bitline voltage

After read of 0 or 1, cell contains something close to 1/2 storage cell voltage

Vdd

Refresh
So after a read, the contents of the DRAM cell are gone The values are stored in the row buffer Write them back into the cells for the next read in the future
DRAM cells

Sense Amps

Row Buffer

Refresh (2)
Fairly gradually, the DRAM cell will lose its contents even if its not accessed
This is why its called dynamic Contrast to SRAM which is static in that once written, it maintains its value forever (so long
as power remains on)

1 0 Gate Leakage

All DRAM rows need to be regularly read and re-written

If it keeps its value even if power is removed, then its non-volatile (e.g., flash, HDD, DVDs)

DRAM Read Timing

Accesses are asynchronous: triggered by RAS and CAS signals, which can in theory occur at arbitrary times (subject to DRAM timing constraints)

SDRAM Read Timing

Double-Data Rate (DDR) DRAM transfers data on both rising and falling edge of the clock

Command frequency does not change

Burst Length

Timing figures taken from A Performance Comparison of Contemporary DRAM Architectures by Cuppu, Jacob, Davis and Mudge

Rambus (RDRAM)
Synchronous interface Row buffer cache
last 4 rows accessed cached higher probability of low-latency hit DRDRAM increases this to 8 entries

Uses other tricks since adopted by SDRAM

multiple data words per clock, high frequencies

Chips can self-refresh Expensive for PCs, used by X-Box, PS2

Example Memory Latency Computation

FSB freq = 200 MHz, SDRAM RAS delay = 2, CAS delay = 2
A0, A1, B0, C0, D3, A2, D0, C1, A3, C3, C2, D1, B1, D2

Whats this in CPU cycles? (assume 2GHz) Impact on AMAT?

More Latency
More wire delay getting to the memory chips

Significant wire delay just getting from the CPU to the memory controller

Width/Speed varies depending on memory type

(plus the return trip)

Memory Controller

Like Write-Combining Buffer, Scheduler may coalesce multiple accesses together, or re-order to reduce number of row accesses

Read Queue

Write Queue

Response Queue

Commands Data To/From CPU

Scheduler

Buffer

Memory Controller
Bank 0 Bank 1

Memory Reference Scheduling

Just like registers, need to enforce RAW, WAW, WAR dependencies No memory renaming in memory controller, so enforce all three dependencies Like everything else, still need to maintain appearance of sequential access
Consider multiple read/write requests to the same address

Example Memory Latency Computation (3) FSB freq = 200 MHz, SDRAM RAS delay = 2, CAS delay = 2 Scheduling in memory controller
A0, A1, B0, C0, D3, A2, D0, C1, A3, C3, C2, D1, B1, D2

Think about hardware complexity

So what do we do about it?

Caching
reduces average memory instruction latency by avoiding DRAM altogether

Limitations
Capacity
programs keep increasing in size

Compulsory misses

Faster DRAM Speed

Clock FSB faster
DRAM chips may not be able to keep up
Latency dominated by wire delay

Bandwidth may be improved (DDR vs. regular) but latency doesnt change much
Instead of 2 cycles for row access, may take 3 cycles at a faster bus speed Doesnt address latency of the memory access

On-Chip Memory Controller

Also: more sophisticated memory scheduling algorithms
Memory controller can run at CPU speed instead of FSB clock speed

All on same chip: No slow PCB wires to drive Disadvantage: memory type is now tied to the CPU implementation

How To Compute Planetary Positions
100% (1)
How To Compute Planetary Positions
22 pages
ECE 554 Computer Architecture Main Memory Spring 2013
No ratings yet
ECE 554 Computer Architecture Main Memory Spring 2013
35 pages
WATERAX Genuine Parts Catalog PDF
100% (1)
WATERAX Genuine Parts Catalog PDF
77 pages
Chapter-4-Continued (Autosaved)
No ratings yet
Chapter-4-Continued (Autosaved)
29 pages
CompArch 20c MM-1
No ratings yet
CompArch 20c MM-1
14 pages
CHAP8
No ratings yet
CHAP8
29 pages
Module 5.3
No ratings yet
Module 5.3
39 pages
Virtual Memory
No ratings yet
Virtual Memory
68 pages
Chapter 4.2-Continued (Autosaved)
No ratings yet
Chapter 4.2-Continued (Autosaved)
29 pages
Virtual Memory-Unit 5
No ratings yet
Virtual Memory-Unit 5
24 pages
2 Virtual Memory
No ratings yet
2 Virtual Memory
23 pages
Presentation 3333
No ratings yet
Presentation 3333
20 pages
CS203A Computer Architecture Cache and Memory Technology and Virtual Memory
No ratings yet
CS203A Computer Architecture Cache and Memory Technology and Virtual Memory
28 pages
CS5204/EE5364 - Advanced Computer Architecture - Memory
No ratings yet
CS5204/EE5364 - Advanced Computer Architecture - Memory
67 pages
Cache Associativity and Virtual Memory: Prof. Dr. E. Damiani
No ratings yet
Cache Associativity and Virtual Memory: Prof. Dr. E. Damiani
29 pages
Memory Location
No ratings yet
Memory Location
40 pages
Lecture 7 Memory 2021
No ratings yet
Lecture 7 Memory 2021
64 pages
ACA Unit 6 - Part - Virtual Memory Notes - U6
No ratings yet
ACA Unit 6 - Part - Virtual Memory Notes - U6
4 pages
Memory Organization & 8085 Assembly Language Programming
No ratings yet
Memory Organization & 8085 Assembly Language Programming
10 pages
Topics: - Cache Operations
No ratings yet
Topics: - Cache Operations
6 pages
Cache Memory
No ratings yet
Cache Memory
89 pages
Module 6 - Memory
No ratings yet
Module 6 - Memory
32 pages
Onur 447 Spring15 Lecture20 Virtual Memory Afterlecture
No ratings yet
Onur 447 Spring15 Lecture20 Virtual Memory Afterlecture
89 pages
Chapter 4 - Cache Memory
0% (1)
Chapter 4 - Cache Memory
50 pages
Memory L
No ratings yet
Memory L
44 pages
L09 AddressTranslation
No ratings yet
L09 AddressTranslation
39 pages
Coa Blog
No ratings yet
Coa Blog
17 pages
Cache Memory: Computer Organization and Architecture Characteristics of Memory Systems
No ratings yet
Cache Memory: Computer Organization and Architecture Characteristics of Memory Systems
16 pages
3 Physical Memory Architecture: Assignments
No ratings yet
3 Physical Memory Architecture: Assignments
22 pages
CS433: Computer System Organization: Main Memory Virtual Memory Translation Lookaside Buffer
No ratings yet
CS433: Computer System Organization: Main Memory Virtual Memory Translation Lookaside Buffer
41 pages
CH05
No ratings yet
CH05
56 pages
11 VM
No ratings yet
11 VM
118 pages
Apznzaamctazn99jih Ofgi9jwpiyc9codb0ph7tj8yjlr3 Dzqnakn1ksagsrpdvcso1cs8k5ifjukwlhjtelmi3djkvpyzglka2wuhj9hdt6gkvhbqlnworok2wa41bx0get0yrynu1b3vkqw Ukhpt2jz Bavhgvplyq84ijt0ak87lwdndjf92ld6ab8dovyhkgv4ayg2yc5ydarsqquvsy21s5lbjxwbgggut3wkyyn
No ratings yet
Apznzaamctazn99jih Ofgi9jwpiyc9codb0ph7tj8yjlr3 Dzqnakn1ksagsrpdvcso1cs8k5ifjukwlhjtelmi3djkvpyzglka2wuhj9hdt6gkvhbqlnworok2wa41bx0get0yrynu1b3vkqw Ukhpt2jz Bavhgvplyq84ijt0ak87lwdndjf92ld6ab8dovyhkgv4ayg2yc5ydarsqquvsy21s5lbjxwbgggut3wkyyn
37 pages
Memory Managment
No ratings yet
Memory Managment
40 pages
Supplemental Material On Cache From ECE-341 Memory
No ratings yet
Supplemental Material On Cache From ECE-341 Memory
79 pages
More Elaborations With Cache & Virtual Memory: CMPE 421 Parallel Computer Architecture
No ratings yet
More Elaborations With Cache & Virtual Memory: CMPE 421 Parallel Computer Architecture
31 pages
Virtual Memory
No ratings yet
Virtual Memory
18 pages
Chap 4 Cache Memory
No ratings yet
Chap 4 Cache Memory
55 pages
CS 152 Computer Architecture and Engineering Lecture 6 - Memory
No ratings yet
CS 152 Computer Architecture and Engineering Lecture 6 - Memory
29 pages
Unit 5
No ratings yet
Unit 5
21 pages
04 - Computer Memory Systems
No ratings yet
04 - Computer Memory Systems
91 pages
Chapter 5-The Memory System
No ratings yet
Chapter 5-The Memory System
80 pages
L05 Memory
No ratings yet
L05 Memory
45 pages
Co327 Memory Allocation
No ratings yet
Co327 Memory Allocation
26 pages
Memory: Computer Architecture and Assembly Language
No ratings yet
Memory: Computer Architecture and Assembly Language
15 pages
04 Cache Memory
No ratings yet
04 Cache Memory
75 pages
Memory Management
No ratings yet
Memory Management
25 pages
Lecture 7 Cache Memory
No ratings yet
Lecture 7 Cache Memory
44 pages
CH05
No ratings yet
CH05
10 pages
Computer Architecture: Cache Memory
No ratings yet
Computer Architecture: Cache Memory
28 pages
Virtual Memory
No ratings yet
Virtual Memory
48 pages
Microprocessor CoursePDF
No ratings yet
Microprocessor CoursePDF
31 pages
Memory Organization AndCache Mapping Study 13
100% (1)
Memory Organization AndCache Mapping Study 13
55 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
FCA2
No ratings yet
FCA2
46 pages
04 Cache Memory Internal Memory Revised 2
No ratings yet
04 Cache Memory Internal Memory Revised 2
43 pages
Cache Memory
No ratings yet
Cache Memory
51 pages
Virtual Memory
No ratings yet
Virtual Memory
19 pages
Embest Register Editor User's Manual: Shenzhen Embest Info&Tech Co., LTD
No ratings yet
Embest Register Editor User's Manual: Shenzhen Embest Info&Tech Co., LTD
41 pages
LCD Subboard Schematic
No ratings yet
LCD Subboard Schematic
1 page
uClinuxforS3CEV40 English V3.1
No ratings yet
uClinuxforS3CEV40 English V3.1
19 pages
S3CEV40 Schematic
No ratings yet
S3CEV40 Schematic
5 pages
SLIDES Intro Embedded Linux 3day Color
No ratings yet
SLIDES Intro Embedded Linux 3day Color
180 pages
User Guide: Embest ICE Server For ARM
No ratings yet
User Guide: Embest ICE Server For ARM
53 pages
Lecture 06 - Optimal Receiver Design
No ratings yet
Lecture 06 - Optimal Receiver Design
21 pages
The Packet List : No. Name Type Num. Demo 1 2 3 4 5 6
No ratings yet
The Packet List : No. Name Type Num. Demo 1 2 3 4 5 6
1 page
UnetICE Manual
No ratings yet
UnetICE Manual
28 pages
uClinuxforS3CEV40 English V3.1
No ratings yet
uClinuxforS3CEV40 English V3.1
19 pages
Lecture 04 - Signal Space Approach and Gram Schmidt Procedure
No ratings yet
Lecture 04 - Signal Space Approach and Gram Schmidt Procedure
20 pages
06-Designing The Memory System
0% (1)
06-Designing The Memory System
13 pages
ATMEL Flash Ucontroller
No ratings yet
ATMEL Flash Ucontroller
31 pages
A Comparison of Router Architectures For Virtual Cut-Through and Wormhole Switching in A NOW Environment
No ratings yet
A Comparison of Router Architectures For Virtual Cut-Through and Wormhole Switching in A NOW Environment
8 pages
Tutorial NetworksonChip
No ratings yet
Tutorial NetworksonChip
133 pages
Graphtheory
No ratings yet
Graphtheory
100 pages
A Comparison of Router Architectures For Virtual Cut-Through and Wormhole Switching in A NOW Environment
No ratings yet
A Comparison of Router Architectures For Virtual Cut-Through and Wormhole Switching in A NOW Environment
8 pages
Hussain CV To The Public PDF
No ratings yet
Hussain CV To The Public PDF
2 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
3 pages
Advanced Micro :nanotechnologies For Exosome Encapsulation and Targeting in Regenerative Medicine
No ratings yet
Advanced Micro :nanotechnologies For Exosome Encapsulation and Targeting in Regenerative Medicine
22 pages
Dialogo Ingles Semana 8
No ratings yet
Dialogo Ingles Semana 8
3 pages
MTH 601 MCQ
0% (1)
MTH 601 MCQ
4 pages
Isotope Geochemistry
No ratings yet
Isotope Geochemistry
44 pages
G7 Q1 L2 Landforms and Waterforms of Asia and Its Implications
No ratings yet
G7 Q1 L2 Landforms and Waterforms of Asia and Its Implications
31 pages
Pipe Support Span Chart
No ratings yet
Pipe Support Span Chart
1 page
Answer Key PDF
No ratings yet
Answer Key PDF
199 pages
Separation of Polymers by Solvent Fractionation
No ratings yet
Separation of Polymers by Solvent Fractionation
3 pages
Astronomy: Astronomy C10 - Intro To General Astronomy
No ratings yet
Astronomy: Astronomy C10 - Intro To General Astronomy
5 pages
Anthotypes Explore The Darkroom in Your Garden and Make Photographs Using Plants 1466261005 9781466261006 - Compress
No ratings yet
Anthotypes Explore The Darkroom in Your Garden and Make Photographs Using Plants 1466261005 9781466261006 - Compress
100 pages
Transport 2 QP - Merged
No ratings yet
Transport 2 QP - Merged
11 pages
Persuasive Essay On School Uniforms
100% (2)
Persuasive Essay On School Uniforms
7 pages
All About Gog and Magog, The Anti-Christ, and The Beast - Islam Question & Answer
No ratings yet
All About Gog and Magog, The Anti-Christ, and The Beast - Islam Question & Answer
3 pages
Olimpiade Guru Bahasa Inggris SMP Sce 2017 (Soal)
No ratings yet
Olimpiade Guru Bahasa Inggris SMP Sce 2017 (Soal)
5 pages
Shop Christian Louboutin Loubi Girl 100 Leather Sandals Saks Fifth Avenue
No ratings yet
Shop Christian Louboutin Loubi Girl 100 Leather Sandals Saks Fifth Avenue
1 page
Stress Strain
No ratings yet
Stress Strain
17 pages
3.1 BSMarE 1st Yr Level - REVALIDA SET B
No ratings yet
3.1 BSMarE 1st Yr Level - REVALIDA SET B
11 pages
Old Question Plus 2
No ratings yet
Old Question Plus 2
18 pages
Implementing Binary Adder and Subtractor Circuits: Laboratory Exercise 4
100% (1)
Implementing Binary Adder and Subtractor Circuits: Laboratory Exercise 4
11 pages
Eco Titrator
No ratings yet
Eco Titrator
191 pages
Eco Blocks
No ratings yet
Eco Blocks
49 pages
Workshop:: How To Write An Effective Business Plan in Just 3 Hours
100% (2)
Workshop:: How To Write An Effective Business Plan in Just 3 Hours
24 pages
Rsi Opr-17-002c
No ratings yet
Rsi Opr-17-002c
6 pages
Navsure N400i
No ratings yet
Navsure N400i
76 pages
High Quality Knitting in The Nordic Tradition Instant EPUB Download
0% (1)
High Quality Knitting in The Nordic Tradition Instant EPUB Download
15 pages
FF ADCD ابو غانم
No ratings yet
FF ADCD ابو غانم
5 pages

11 Memory

Uploaded by

11 Memory

Uploaded by

CS6290 Memory

Programmer doesnt want to be bothered

Need for Translation

Main Memory 0x00152 0x0015208B

Simple Page Table

One entry per page

Multi-Level Page Tables

Physical Page Number

Choosing a Page Size

Small page leads to less fragmentation

CPU Memory Access

Translation Cache: TLB

If TLB hit, no need to do page table lookup from memory

Note: data cache accessed by physical addresses now

Con: TLB lookup and cache access serialized

Pro: cache contents valid so long as page table not modified

Virtually Addressed Cache

Virtually Indexed Physically Tagged

Big page size can help here

SRAM vs. DRAM

DRAM: 1T per bit

SRAM wordline wordline

Implementing the Capacitor

DRAM Chip Organization

Memory Cell Array

DRAM Chip Organization (2)

DRAM Read Operation

Memory Cell Array

Sense Amps Row Buffer

0x001 0x000 0x002

Column Decoder Data Bus

Accesses need not be sequential

sense amp bitline voltage

All DRAM rows need to be regularly read and re-written

DRAM Read Timing

SDRAM Read Timing

Command frequency does not change

Uses other tricks since adopted by SDRAM

Chips can self-refresh Expensive for PCs, used by X-Box, PS2

Example Memory Latency Computation

Whats this in CPU cycles? (assume 2GHz) Impact on AMAT?

Width/Speed varies depending on memory type

Commands Data To/From CPU

Memory Reference Scheduling

Think about hardware complexity

So what do we do about it?

Faster DRAM Speed

On-Chip Memory Controller

You might also like