بسم الله الرحمن الرحيم
Future university
Faculty Of Engineering
Computer Organization
+
Memory
Cache Memory
Prepared by: Duaa Mohammed
Table 4.1
Key Characteristics of Computer Memory Systems
+ Characteristics of Memory
Systems
Location
Refers to whether memory is internal and external to the
computer
Internal memory is often equated with main memory
Processor requires its own local memory, in the form of
registers
Cache is another form of internal memory
External memory consists of peripheral storage devices
that are accessible to the processor via I/O controllers
Capacity
Memory is typically expressed in terms of bytes
Unit of transfer
For internal memory the unit of transfer is equal to the
number of electrical lines into and out of the memory
Method of Accessing Units of
Data
Sequential Direct Random Associativ
access access access e
Each A word is
addressable retrieved based
Memory is Involves a location in
organized into on a portion of
shared read- memory has a
units of data its contents
write mechanism unique,
called records rather than its
physically wired- address
in addressing
mechanism Each location
Individual blocks has its own
Access must be or records have a The time to addressing
made in a unique address access a given mechanism and
specific linear based on location is retrieval time is
sequence physical location independent of constant
the sequence of independent of
prior accesses location or prior
and is constant access patterns
Access time is Access time is
variable Any location can Cache
variable be selected at
memories may
random and
directly employ
addressed and associative
accessed access
Main memory
and some cache
systems are
random access
Capacity and Performance:
The two most important characteristics
of memory
Three performance parameters are
used:
Access time (latency) Memory cycle time Transfer rate
• For random-access memory • The rate at which data can
• Access time plus any be transferred into or out of
it is the time it takes to
perform a read or write additional time required a memory unit
operation before second access • For random-access memory
• For non-random-access can commence it is equal to 1/(cycle time)
memory it is the time it
takes to position the read-
write mechanism at the
desired location
+ Memory
The most common forms are:
Semiconductor memory
Magnetic surface memory
Optical
Magneto-optical
Several physical characteristics of data storage are important:
Volatile memory
Information decays naturally or is lost when electrical power is switched off
Nonvolatile memory
Once recorded, information remains without deterioration until deliberately changed
No electrical power is needed to retain information
Magnetic-surface memories
Are nonvolatile
Semiconductor memory
May be either volatile or nonvolatile
Nonerasable memory
Cannot be altered, except by destroying the storage unit
Semiconductor memory of this type is known as read-only memory (ROM)
For random-access memory the organization is a key design issue
Organization refers to the physical arrangement of bits to form words
+
Memory Hierarchy
Design constraints on a computer’s memory can be
summed up by three questions:
How much, how fast, how expensive
There is a trade-off among capacity, access time, and
cost
Faster access time, greater cost per bit
Greater capacity, smaller cost per bit
Greater capacity, slower access time
The way out of the memory dilemma is not to rely on a
single memory component or technology, but to
employ a memory hierarchy
+
Memory
The use of three levels exploits the fact that
semiconductor memory comes in a variety of types
which differ in speed and cost
Data are stored more permanently on external mass
storage devices
External, nonvolatile memory is also referred to as
secondary memory or auxiliary memory
Disk cache
A portion of main memory can be used as a buffer to hold
data temporarily that is to be read out to disk
A few large transfers of data can be used instead of many
small transfers of data
Data can be retrieved rapidly from the software cache rather
than slowly from the disk
Table 4.2
Elements of Cache Design
+
Cache Addresses
Virtual Memory
Virtual memory
Facility that allows programs to address memory from
a logical point of view, without regard to the amount
of main memory physically available
When used, the address fields of machine
instructions contain virtual addresses
For reads to and writes from main memory, a
hardware memory management unit (MMU)
translates each virtual address into a physical
address in main memory
Mapping Function
Because there are fewer cache lines than main
memory blocks, an algorithm is needed for mapping
main memory blocks into cache lines
Three techniques can be used:
Direct Associative Set Associative
• The simplest technique • Permits each main • A compromise that
• Maps each block of memory block to be exhibits the strengths
main memory into only loaded into any line of of both the direct and
one possible cache line the cache associative approaches
• The cache control logic while reducing their
disadvantages
interprets a memory
address simply as a Tag
and a Word field
• To determine whether a
block is in the cache,
the cache control logic
must simultaneously
examine every line’s
Tag for a match
+
Victim Cache
Originally proposed as an approach to reduce the
conflict misses of direct mapped caches without
affecting its fast access time
Fully associative cache
Typical size is 4 to 16 cache lines
Residing between direct mapped L1 cache and the next
level of memory
+
Set Associative Mapping
Compromise that exhibits the strengths of both the
direct and associative approaches while reducing their
disadvantages
Cache consists of a number of sets
Each set contains a number of lines
A given block maps to any line in a given set
e.g. 2 lines per set
2 way associative mapping
A given block can be in one of 2 lines in only one set
+
Replacement Algorithms
Once the cache has been filled, when a new block is
brought into the cache, one of the existing blocks must
be replaced
For direct mapping there is only one possible line for
any particular block and no choice is possible
For the associative and set-associative techniques a
replacement algorithm is needed
To achieve high speed, an algorithm must be
implemented in hardware
+ The most common replacement
algorithms are:
Least recently used (LRU)
Most effective
Replace that block in the set that has been in the cache longest
with no reference to it
Because of its simplicity of implementation, LRU is the most
popular replacement algorithm
First-in-first-out (FIFO)
Replace that block in the set that has been in the cache longest
Easily implemented as a round-robin or circular buffer technique
Least frequently used (LFU)
Replace that block in the set that has experienced the fewest
references
Could be implemented by associating a counter with each line
Write Policy
When a block that is resident
in the cache is to be There are two problems to
replaced there are two cases contend with:
to consider:
If the old block in the cache has not
been altered then it may be More than one device may have
overwritten with a new block without access to main memory
first writing out the old block
A more complex problem occurs
If at least one write operation has when multiple processors are
been performed on a word in that line attached to the same bus and each
of the cache then main memory must processor has its own local cache - if
be updated by writing the line of a word is altered in one cache it could
cache out to the block of memory conceivably invalidate a word in
before bringing in the new block other caches
+
Write Through
and Write Back
Write through
Simplest technique
All write operations are made to main memory as well as to the
cache
The main disadvantage of this technique is that it generates
substantial memory traffic and may create a bottleneck
Write back
Minimizes memory writes
Updates are made only in the cache
Portions of main memory are invalid and hence accesses by I/O
modules can be allowed only through the cache
This makes for complex circuitry and a potential bottleneck
Two specific
Line Size
When a
effects
come into
block of data
is retrieved
play:
and placed • Larger blocks
in the cache As the
reduce the number
not only the block size
of blocks that fit
desired word increases
into a cache
but also more
• As a block becomes
some useful
data are larger each
number of additional word is
adjacent brought
into the farther from the
words are requested word
retrieved cache
As the The hit ratio will
block size begin to decrease
increases as the block
the hit becomes bigger
ratio will and the probability
at first of using the newly
increase fetched information
because becomes less than
of the the probability of
principle reusing the
of locality information that
+
Multilevel Caches
As logic density has increased it has become possible to have a
cache on the same chip as the processor
The on-chip cache reduces the processor’s external bus activity and
speeds up execution time and increases overall system performance
When the requested instruction or data is found in the on-chip cache, the
bus access is eliminated
On-chip cache accesses will complete appreciably faster than would even
zero-wait state bus cycles
During this period the bus is free to support other transfers
Two-level cache:
Internal cache designated as level 1 (L1)
External cache designated as level 2 (L2)
Potential savings due to the use of an L2 cache depends on the hit
rates in both the L1 and L2 caches
The use of multilevel caches complicates all of the design issues
related to caches, including size, replacement algorithm, and write
policy
+
Unified Versus Split Caches
Has become common to split cache:
One dedicated to instructions
One dedicated to data
Both exist at the same level, typically as two L1 caches
Advantages of unified cache:
Higher hit rate
Balances load of instruction and data fetches
automatically
Only one cache needs to be designed and implemented
Trend is toward split caches at the L1 and unified
caches for higher levels
Advantages of split cache:
Eliminates cache contention between instruction
fetch/decode unit and execution unit
Table
4.4
Intel
Cache
Evolution
(Table is on page
150 in the
textbook.)