System Bus in Computer Architecture
• A bus is a set of electrical wires (lines) that connects the various hardware
components of a computer system.
• It works as a communication pathway through which information flows from one
hardware component to the other hardware component.
A bus that connects major components (CPU, memory, and I/O devices) of a computer system
is called as a System Bus.
Why Do We Need Bus?
• A computer system is made of different components such as memory, ALU, registers etc.
• Each component should be able to communicate with other for proper execution of
instructions and information flow.
• If we try to implement a mesh topology among different components, it would be really
expensive.
• So, we use a common component to connect each necessary component i.e. BUS.
Components Of A System Bus-
The system bus consists of three major components-
1. Data Bus
2. Address Bus
3. Control Bus
1) Data Bus-
• As the name suggests, data bus is used for transmitting the data / instruction from
CPU to memory/IO and vice-versa.
• It is bi-directional.
Data Bus Width
• The width of a data bus refers to the number of bits (electrical wires) that the bus
can carry at a time.
• Each line carries 1 bit at a time. So, the number of lines in data bus determine how
many bits can be transferred parallely.
• The width of data bus is an important parameter because it determines how much
data can be transmitted at one time.
• The wider the bus width, faster would be the data flow on the data bus and thus
better would be the system performance.
Examples-
• A 32-bit bus has thirty two (32) wires and thus can transmit 32 bits of data at a time.
• A 64-bit bus has sixty four (64) wires and thus can transmit 64 bits of data at a time.
• 2) Control Bus-
• As the name suggests, control bus is used to transfer the control and timing signals
from one component to the other component.
• The CPU uses control bus to communicate with the devices that are connected to the
computer system.
• The CPU transmits different types of control signals to the system components.
• It is bi-directional.
What Are Control & Timing Signals?
• Control signals are generated in the control unit of CPU.
• Timing signals are used to synchronize the memory and I/O operations with a CPU
clock.
• Typical control signals hold by control bus-
• Memory read – Data from memory address location to be placed on data bus.
• Memory write – Data from data bus to be placed on memory address location.
• I/O Read – Data from I/O address location to be placed on data bus.
• I/O Write – Data from data bus to be placed on I/O address location.
• Other control signals hold by control bus are interrupt, interrupt acknowledge, bus
request, bus grant and several others.
• These control signals indicate the type of action taking place on the system bus.
Example-
When the CPU wants to read or write data, it sends the memory read or memory write control
signal on the control bus to perform the memory read or write operation from the main memory.
Similarly, when the processor wants to read from an I/O device, it generates the I/O read signal.
3) Address Bus-
• As the name suggests, address bus is used to carry address from CPU to memory/IO
devices.
• It is used to identify the particular location in memory.
• It carries the source or destination address of data i.e. where to store or from where to
retrieve the data.
• It is uni-directional.
Example-
When CPU wants to read or write data, it sends the memory read or memory write control signal
on the control bus to perform the memory read or write operation from the main memory and the
address of the memory location is sent on the address bus.
If CPU wants to read data stored at the memory location (address) 4, the CPU send the value 4 in
binary on the address bus.
Address Bus Width
• The width of address bus determines the amount of physical memory addressable by the
processor.
• In other words, it determines the size of the memory that the computer can use.
• The wider is the address bus, the more memory a computer will be able to use.
• The addressing capacity of the system can be increased by adding more address lines.
Examples-
• An address bus that consists of 16 wires can convey 216 (= 64K) different addresses.
• An address bus that consists of 32 wires can convey 232 (= 4G) different addresses.
PRACTICE PROBLEMS BASED ON SYSTEM BUS-
Problem-01:
Which of the following system bus is used to designate the source or destination of the data on
the bus itself?
1. Control bus
2. Data bus
3. Address bus
4. System bus
Solution-
The correct option is (C) Address bus.
Address bus carries the source or destination address of data i.e. where to store or from where to
retrieve the data.
Problem-02:
The bus which is used to transfer data from main memory to peripheral device is-
1. Data bus
2. Input bus
3. DMA bus
4. Output bus
Solution-
The correct option is (A) Data bus.
Data bus carries data / instruction from CPU to memory/IO and vice-versa.
Problem-03:
How many memory locations a system with a 32-bit address bus can address?
1. 28
2. 216
3. 232
4. 264
Solution-
The correct option is (C) 232.
232 memory locations can be addressed by a 32-bit address bus.
Problem-04:
How many bits can be transmitted at a time using a bus with 32 data lines?
1. 8 bits
2. 16 bits
3. 32 bits
4. 1024 bits
Solution-
Each line carries one bit. So, a bus with 32 data lines can transmit 32 bits at a time.
Problem-05:
A microprocessor has a data bus with 64 lines and an address bus with 32 lines. The maximum
number of bits that can be stored in memory is-
1. 32 x 212
2. 32 x 264
3. 64 x 232
4. 64 x 264
Solution-
The correct option is (C) 64 x 232.
The amount of blocks that could be located is 232. Now, since data bus has 64 lines, so each
block is 64 bits. Thus, maximum number of bits stored in memory is 232 x 64 bits.
Problem-06:
The address bus with a ROM of size 1024 x 8 bits is-
1. 8 bits
2. 10 bits
3. 12 bits
4. 16 bits
Solution-
The correct option is (B) 10 bits.
The size of the ROM is 1024 x 8 = 210 x 8. Here, 10 indicates the address bus and 8 indicates the
data bus width.
Problem-07:
The data bus width of a ROM of size 2048 x 8 bits is-
1. 8
2. 10
3. 12
4. 16
Solution-
The correct option is (A) 8.
The size of the ROM is 2048 x 8 = 211 x 8. Here, 11 indicates the address bus and 8 indicates the
data bus width.
Cache Memory in Computer Architecture
Cache Memory-
• Cache memory is a Random Access Memory.
• The main advantage of cache memory is its very fast speed.
• It can be accessed by the CPU at much faster speed than main memory.
Location-
• Cache memory lies on the path between the CPU and the main memory.
• It facilitates the transfer of data between the processor and the main
memory at the speed which matches to the speed of the processor.
• Data is transferred in the form of words between the cache memory and
the CPU.
• Data is transferred in the form of blocks or pages between the cache
memory and the main memory.
Purpose-
• The fast speed of the cache memory makes it extremely useful.
• It is used for bridging the speed mismatch between the fastest CPU and
the main memory.
• It does not let the CPU performance suffer due to the slower speed of
the main memory.
Execution Of Program-
• Whenever any program has to be executed, it is first loaded in the main
memory.
• The portion of the program that is mostly probably going to be executed
in the near future is kept in the cache memory.
• This allows CPU to access the most probable portion at a faster speed.
Step-01:
Whenever CPU requires any word of memory, it is first searched in the CPU
registers.
Now, there are two cases possible-
Case-01:
• If the required word is found in the CPU registers, it is read from there.
Case-02:
• If the required word is not found in the CPU registers, Step-02 is
followed.
Step-02:
• When the required word is not found in the CPU registers, it is searched
in the cache memory.
• Tag directory of the cache memory is used to search whether the
required word is present in the cache memory or not.
Now, there are two cases possible-
Case-01:
• If the required word is found in the cache memory, the word is delivered
to the CPU.
• This is known as Cache hit.
Case-02:
• If the required word is not found in the cache memory, Step-03 is
followed.
• This is known as Cache miss.
Step-03:
• When the required word is not found in the cache memory, it is
searched in the main memory.
• Page Table is used to determine whether the required page is present in
the main memory or not.
Now, there are two cases possible-
Case-01:
If the page containing the required word is found in the main memory,
• The page is mapped from the main memory to the cache memory.
• This mapping is performed using cache mapping techniques.
• Then, the required word is delivered to the CPU.
Case-02:
If the page containing the required word is not found in the main memory,
• A page fault occurs.
• The page containing the required word is mapped from the secondary
memory to the main memory.
• Then, the page is mapped from the main memory to the cache memory.
• Then, the required word is delivered to the CPU.
Multilevel Cache Organization-
• A multilevel cache organization is an organization where cache
memories of different sizes are organized at multiple levels to increase
the processing speed to a greater extent.
• The smaller the size of cache, the faster its speed.
• The smallest size cache memory is placed closest to the CPU.
• This helps to achieve better performance in terms of speed.
Example-
Three level cache organization consists of three cache memories of different
size organized at three different levels as shown below-
Size (L1 Cache) < Size (L2 Cache) < Size (L3 Cache) < Size (Main Memory)
Cache Mapping | Cache Mapping Techniques
❖ Cache memory bridges the speed mismatch between the processor and
the main memory.
When cache hit occurs,
• The required word is present in the cache memory.
• The required word is delivered to the CPU from the cache memory.
When cache miss occurs,
• The required word is not present in the cache memory.
• The page containing the required word has to be mapped from the main
memory.
• This mapping is performed using cache mapping techniques.
In this article, we will discuss different cache mapping techniques.
Cache Mapping-
• Cache mapping defines how a block from the main memory is mapped
to the cache memory in case of a cache miss.
OR
• Cache mapping is a technique by which the contents of main memory
are brought into the cache memory.
The following diagram illustrates the mapping process-
NOTES
• Main memory is divided into equal size partitions called
as blocks or frames.
• Cache memory is divided into partitions having same size as that of
blocks called as lines.
• During cache mapping, block of main memory is simply copied to the
cache and the block is not actually brought from the main memory.
Cache Mapping Techniques-
Cache mapping is performed using following three different techniques-
1. Direct Mapping
2. Fully Associative Mapping
3. K-way Set Associative Mapping
1. Direct Mapping-
In direct mapping,
• A particular block of main memory can map only to a particular line of the
cache.
• The line number of cache to which a particular block can map is given by-
Cache line number
= ( Main Memory Block Address ) Modulo (Number of lines in Cache)
Example-
• Consider cache memory is divided into ‘n’ number of lines.
• Then, block ‘j’ of main memory can map to line number (j mod n) only of
the cache
Need of Replacement Algorithm-
In direct mapping,
• There is no need of any replacement algorithm.
• This is because a main memory block can map only to a particular line
of the cache.
• Thus, the new incoming block will always replace the existing block (if
any) in that particular line.
Division of Physical Address-
In direct mapping, the physical address is divided as-
2. Fully Associative Mapping-
In fully associative mapping,
A block of main memory can map to any line of the cache that is freely available at
that moment.
This makes fully associative mapping more flexible than direct mapping.
Example-
Consider the following scenario-
Here,
All the lines of cache are freely available.
Thus, any block of main memory can map to any line of the cache.
Had all the cache lines been occupied, then one of the existing blocks will have to
be replaced.
Need of Replacement Algorithm-
In fully associative mapping, A replacement algorithm is required.
Replacement algorithm suggests the block to be replaced if all the cache lines are
occupied.
Thus, replacement algorithm like FCFS Algorithm, LRU Algorithm etc is
employed.
Division of Physical Address-
In fully associative mapping, the physical address is divided as-
3. K-way Set Associative Mapping-
In k-way set associative mapping,
• Cache lines are grouped into sets where each set contains k number of lines.
• A particular block of main memory can map to only one particular set of the
cache.
• However, within that set, the memory block can map any cache line that is
freely available.
• The set of the cache to which a particular block of the main memory can
map is given by-
Cache set number
= ( Main Memory Block Address ) Modulo (Number of sets in Cache)
Example-
Consider the following example of 2-way set associative mapping-
Here,
• k = 2 suggests that each set contains two cache lines.
• Since cache contains 6 lines, so number of sets in the cache = 6 / 2 = 3 sets.
• Block ‘j’ of main memory can map to set number (j mod 3) only of the
cache.
• Within that set, block ‘j’ can map to any cache line that is freely available at
that moment.
• If all the cache lines are occupied, then one of the existing blocks will have
to be replaced.
Need of Replacement Algorithm-
• Set associative mapping is a combination of direct mapping and fully
associative mapping.
• It uses fully associative mapping within each set.
• Thus, set associative mapping requires a replacement algorithm.
Division of Physical Address-
In set associative mapping, the physical address is divided as-
Special Cases-
• If k = 1, then k-way set associative mapping becomes direct mapping i.e.
1-way Set Associative Mapping ≡ Direct Mapping
• If k = Total number of lines in the cache, then k-way set associative mapping
becomes fully associative mapping.
Direct Mapping | Direct Mapped Cache
Cache mapping is a technique by which the contents of main memory are brought
into the cache memory.
Direct Mapping-
In direct mapping,
• A particular block of main memory can map to only one particular line of
the cache.
• The line number of cache to which a particular block can map is given by-
Cache line number
= ( Main Memory Block Address ) Modulo (Number of lines in Cache)
Division of Physical Address-
In direct mapping, the physical address is divided as-
Direct Mapped Cache-
Direct mapped cache employs direct cache mapping technique.
The following steps explain the working of direct mapped cache-
After CPU generates a memory request,
• The line number field of the address is used to access the particular line of
the cache.
• The tag field of the CPU address is then compared with the tag of the line.
• If the two tags match, a cache hit occurs and the desired word is found in the
cache.
• If the two tags do not match, a cache miss occurs.
• In case of a cache miss, the required word has to be brought from the main
memory.
• It is then stored in the cache together with the new tag replacing the previous
one.
Implementation-
The following diagram shows the implementation of direct mapped cache-
(For simplicity, this diagram shows does not show all the lines of multiplexers)
The steps involved are as follows-
Step-01:
• Each multiplexer reads the line number from the generated physical address
using its select lines in parallel.
• To read the line number of L bits, number of select lines each multiplexer
must have = L.
Step-02:
• After reading the line number, each multiplexer goes to the corresponding
line in the cache memory using its input lines in parallel.
• Number of input lines each multiplexer must have = Number of lines in the
cache memory
Step-03:
• Each multiplexer outputs the tag bit it has selected from that line to the
comparator using its output line.
• Number of output line in each multiplexer = 1.
UNDERSTAND
It is important to understand-
• A multiplexer can output only a single bit on output line.
• So, to output the complete tag to the comparator,
Number of multiplexers required = Number of bits in the tag
• Each multiplexer is configured to read the tag bit at specific location.
Example-
• 1st multiplexer is configured to output the first bit of the tag.
• 2nd multiplexer is configured to output the second bit of the tag.
• 3rd multiplexer is configured to output the third bit of the tag and so on.
So,
• Each multiplexer selects the tag bit of the selected line for which it has been
configured and outputs on the output line.
• The complete tag as a whole is sent to the comparator for comparison in
parallel.
Step-04:
• Comparator compares the tag coming from the multiplexers with the tag of
the generated address.
• Only one comparator is required for the comparison where-
Size of comparator = Number of bits in the tag
• If the two tags match, a cache hit occurs otherwise a cache miss occurs.
Hit latency-
The time taken to find out whether the required word is present in the Cache Memory or not is
called hit latency.
For direct mapped cache,
Hit latency = Multiplexer latency + Comparator latency
Important Results-
Following are the few important results for direct mapped cache-
• Block j of main memory can map to line number (j mod number of lines in
cache) only of the cache.
• Number of multiplexers required = Number of bits in the tag
• Size of each multiplexer = Number of lines in cache x 1
• Number of comparators required = 1
• Size of comparator = Number of bits in the tag
• Hit latency = Multiplexer latency + Comparator latency
Direct Mapping | Cache | Practice Problems
Problem-01:
Consider a direct mapped cache of size 16 KB with block size 256 bytes. The
size of main memory is 128 KB. Find-
1. Number of bits in tag
2. Tag directory size
Solution-
Given-
• Cache memory size = 16 KB
• Block size = Frame size = Line size = 256 bytes
• Main memory size = 128 KB
We consider that the memory is byte addressable.
Number of Bits in Physical Address-
We have,
Size of main memory
= 128 KB
= 217 bytes
Thus, Number of bits in physical address = 17 bits
Number of Bits in Block Offset-
We have,
Block size
= 256 bytes
= 28 bytes
Thus, Number of bits in block offset = 8 bits
Number of Bits in Line Number-
Total number of lines in cache
= Cache size / Line size
= 16 KB / 256 bytes
= 214 bytes / 28 bytes
= 26 lines
Thus, Number of bits in line number = 6 bits
Number of Bits in Tag-
Number of bits in tag
= Number of bits in physical address – (Number of bits in line number +
Number of bits in block offset)
= 17 bits – (6 bits + 8 bits)
= 17 bits – 14 bits
= 3 bits
Thus, Number of bits in tag = 3 bits
Tag Directory Size-
Tag directory size
= Number of tags x Tag size
= Number of lines in cache x Number of bits in tag
= 26 x 3 bits
= 192 bits
= 24 bytes
Thus, size of tag directory = 24 bytes
Problem-02:
Consider a direct mapped cache of size 512 KB with block size 1 KB. There
are 7 bits in the tag. Find-
1. Size of main memory
2. Tag directory size
Solution-
Given-
• Cache memory size = 512 KB
• Block size = Frame size = Line size = 1 KB
• Number of bits in tag = 7 bits
We consider that the memory is byte addressable.
Number of Bits in Block Offset-
We have,
Block size
= 1 KB
= 210 bytes
Thus, Number of bits in block offset = 10 bits
Number of Bits in Line Number-
Total number of lines in cache
= Cache size / Line size
= 512 KB / 1 KB
= 29 lines
Thus, Number of bits in line number = 9 bits
Number of Bits in Physical Address-
Number of bits in physical address
= Number of bits in tag + Number of bits in line number + Number of bits in
block offset
= 7 bits + 9 bits + 10 bits
= 26 bits
Thus, Number of bits in physical address = 26 bits
Size of Main Memory-
We have,
Number of bits in physical address = 26 bits
Thus, Size of main memory
= 226 bytes
= 64 MB
Tag Directory Size-
Tag directory size
= Number of tags x Tag size
= Number of lines in cache x Number of bits in tag
= 29 x 7 bits
= 3584 bits
= 448 bytes
Thus, size of tag directory = 448 bytes
Problem-03:
Consider a direct mapped cache with block size 4 KB. The size of main
memory is 16 GB and there are 10 bits in the tag. Find-
1. Size of cache memory
2. Tag directory size
Solution-
Given-
• Block size = Frame size = Line size = 4 KB
• Size of main memory = 16 GB
• Number of bits in tag = 10 bits
We consider that the memory is byte addressable.
Number of Bits in Physical Address-
We have,
Size of main memory
= 16 GB
= 234 bytes
Thus, Number of bits in physical address = 34 bits
Number of Bits in Block Offset-
We have,
Block size
= 4 KB
= 212 bytes
Thus, Number of bits in block offset = 12 bits
Number of Bits in Line Number-
Number of bits in line number
= Number of bits in physical address – (Number of bits in tag + Number of
bits in block offset)
= 34 bits – (10 bits + 12 bits)
= 34 bits – 22 bits
= 12 bits
Thus, Number of bits in line number = 12 bits
Number of Lines in Cache-
We have-
Number of bits in line number = 12 bits
Thus, Total number of lines in cache = 212 lines
Size of Cache Memory-
Size of cache memory
= Total number of lines in cache x Line size
= 212 x 4 KB
= 214 KB
= 16 MB
Thus, Size of cache memory = 16 MB
Tag Directory Size-
Tag directory size
= Number of tags x Tag size
= Number of lines in cache x Number of bits in tag
= 212 x 10 bits
= 40960 bits
= 5120 bytes
Thus, size of tag directory = 5120 bytes
Problem-04:
Consider a direct mapped cache of size 32 KB with block size 32 bytes. The
CPU generates 32 bit addresses. The number of bits needed for cache indexing
and the number of tag bits are respectively-
1. 10, 17
2. 10, 22
3. 15, 17
4. 5, 17
Solution-
Given-
• Cache memory size = 32 KB
• Block size = Frame size = Line size = 32 bytes
• Number of bits in physical address = 32 bits
Number of Bits in Block Offset-
We have,
Block size
= 32 bytes
= 25 bytes
Thus, Number of bits in block offset = 5 bits
Number of Bits in Line Number-
Total number of lines in cache
= Cache size / Line size
= 32 KB / 32 bytes
= 210 lines
Thus, Number of bits in line number = 10 bits
Number of Bits Required For Cache Indexing-
Number of bits required for cache indexing
= Number of bits in line number
= 10 bits
Number Of Bits in Tag-
Number of bits in tag
= Number of bits in physical address – (Number of bits in line number +
Number of bits in block offset)
= 32 bits – (10 bits + 5 bits)
= 32 bits – 15 bits
= 17 bits
Thus, Number of bits in tag = 17 bits
Thus, Option (A) is correct.
Problem-05:
Consider a machine with a byte addressable main memory of 232 bytes divided
into blocks of size 32 bytes. Assume that a direct mapped cache having 512
cache lines is used with this machine. The size of the tag field in bits is ______.
Solution-
Given-
• Main memory size = 232 bytes
• Block size = Frame size = Line size = 32 bytes
• Number of lines in cache = 512 lines
Number of Bits in Physical Address-
We have,
Size of main memory
= 232 bytes
Thus, Number of bits in physical address = 32 bits
Number of Bits in Block Offset-
We have,
Block size
= 32 bytes
= 25 bytes
Thus, Number of bits in block offset = 5 bits
Number of Bits in Line Number-
Total number of lines in cache
= 512 lines
= 29 lines
Thus, Number of bits in line number = 9 bits
Number Of Bits in Tag-
Number of bits in tag
= Number of bits in physical address – (Number of bits in line number +
Number of bits in block offset)
= 32 bits – (9 bits + 5 bits)
= 32 bits – 14 bits
= 18 bits
Thus, Number of bits in tag = 18 bits
Problem-06:
An 8 KB direct-mapped write back cache is organized as multiple blocks, each
of size 32 bytes. The processor generates 32 bit addresses. The cache controller
maintains the tag information for each cache block comprising of the
following-
• 1 valid bit
• 1 modified bit
• As many bits as the minimum needed to identify the memory block
mapped in the cache
What is the total size of memory needed at the cache controller to store meta
data (tags) for the cache?
1. 4864 bits
2. 6144 bits
3. 6656 bits
4. 5376 bits
Solution-
Given-
• Cache memory size = 8 KB
• Block size = Frame size = Line size = 32 bytes
• Number of bits in physical address = 32 bits
Number of Bits in Block Offset-
We have,
Block size
= 32 bytes
= 25 bytes
Thus, Number of bits in block offset = 5 bits
Number of Bits in Line Number-
Total number of lines in cache
= Cache memory size / Line size
= 8 KB / 32 bytes
= 213 bytes / 25 bytes
= 28 lines
Thus, Number of bits in line number = 8 bits
Number Of Bits in Tag-
Number of bits in tag
= Number of bits in physical address – (Number of bits in line number +
Number of bits in block offset)
= 32 bits – (8 bits + 5 bits)
= 32 bits – 13 bits
= 19 bits
Thus, Number of bits in tag = 19 bits
Memory Size Needed At Cache Controller-
Size of memory needed at cache controller
= Number of lines in cache x (1 valid bit + 1 modified bit + 19 bits to identify
block)
= 28 x 21 bits
= 5376 bits
Fully Associative Mapping | Practice Problems
In fully associative mapping,
• A block of main memory can be mapped to any freely available cache line.
• This makes fully associative mapping more flexible than direct mapping.
• A replacement algorithm is needed to replace a block if the cache is full.
PRACTICE PROBLEMS BASED ON FULLY ASSOCIATIVE MAPPING-
Problem-01:
Consider a fully associative mapped cache of size 16 KB with block size 256
bytes. The size of main memory is 128 KB. Find-
1. Number of bits in tag
2. Tag directory size
Solution-
Given-
• Cache memory size = 16 KB
• Block size = Frame size = Line size = 256 bytes
• Main memory size = 128 KB
We consider that the memory is byte addressable.
Number of Bits in Physical Address-
We have,
Size of main memory
= 128 KB
= 217 bytes
Thus, Number of bits in physical address = 17 bits
Number of Bits in Block Offset-
We have,
Block size
= 256 bytes
= 28 bytes
Thus, Number of bits in block offset = 8 bits
Number of Bits in Tag-
Number of bits in tag
= Number of bits in physical address – Number of bits in block offset
= 17 bits – 8 bits
= 9 bits
Thus, Number of bits in tag = 9 bits
Number of Lines in Cache-
Total number of lines in cache
= Cache size / Line size
= 16 KB / 256 bytes
= 214 bytes / 28 bytes
= 26 lines
Tag Directory Size-
Tag directory size
= Number of tags x Tag size
= Number of lines in cache x Number of bits in tag
= 26 x 9 bits
= 576 bits
= 72 bytes
Thus, size of tag directory = 72 bytes
Problem-02:
Consider a fully associative mapped cache of size 512 KB with block size 1 KB.
There are 17 bits in the tag. Find-
1. Size of main memory
2. Tag directory size
Solution-
Given-
• Cache memory size = 512 KB
• Block size = Frame size = Line size = 1 KB
• Number of bits in tag = 17 bits
We consider that the memory is byte addressable.
Number of Bits in Block Offset-
We have,
Block size
= 1 KB
= 210 bytes
Thus, Number of bits in block offset = 10 bits
Number of Bits in Physical Address-
Number of bits in physical address
= Number of bits in tag + Number of bits in block offset
= 17 bits + 10 bits
= 27 bits
Thus, Number of bits in physical address = 27 bits
Size of Main Memory-
We have,
Number of bits in physical address = 27 bits
Thus, Size of main memory
= 227 bytes
= 128 MB
Number of Lines in Cache-
Total number of lines in cache
= Cache size / Line size
= 512 KB / 1 KB
= 512 lines
= 29 lines
Tag Directory Size-
Tag directory size
= Number of tags x Tag size
= Number of lines in cache x Number of bits in tag
= 29 x 17 bits
= 8704 bits
= 1088 bytes
Thus, size of tag directory = 1088 bytes
Problem-03:
Consider a fully associative mapped cache with block size 4 KB. The size of main
memory is 16 GB. Find the number of bits in tag.
Solution-
Given-
• Block size = Frame size = Line size = 4 KB
• Size of main memory = 16 GB
We consider that the memory is byte addressable.
Number of Bits in Physical Address-
We have,
Size of main memory
= 16 GB
= 234 bytes
Thus, Number of bits in physical address = 34 bits
Number of Bits in Block Offset-
We have,
Block size
= 4 KB
= 212 bytes
Thus, Number of bits in block offset = 12 bits
Number of Bits in Tag-
Number of bits in tag
= Number of bits in physical address – Number of bits in block offset
= 34 bits – 12 bits
= 22 bits
Thus, Number of bits in tag = 22 bits
Set Associative Mapping | Set Associative Cache
Cache Mapping-
Cache mapping is a technique by which the contents of main memory are brought into the cache memory.
Set Associative Mapping-
In k-way set associative mapping,
• Cache lines are grouped into sets where each set contains k number of lines.
• A particular block of main memory can map to only one particular set of the
cache.
• However, within that set, the memory block can map to any freely available
cache line.
• The set of the cache to which a particular block of the main memory can
map is given by-
Cache set number
= ( Main Memory Block Address ) Modulo (Number of sets in Cache)
Division of Physical Address-
In set associative mapping, the physical address is divided as-
Set Associative Cache-
Set associative cache employs set associative cache mapping technique.
The following steps explain the working of set associative cache-
After CPU generates a memory request,
• The set number field of the address is used to access the particular set of the
cache.
• The tag field of the CPU address is then compared with the tags of all k lines
within that set.
• If the CPU tag matches to the tag of any cache line, a cache hit occurs.
• If the CPU tag does not match to the tag of any cache line, a cache miss
occurs.
• In case of a cache miss, the required word has to be brought from the main
memory.
• If the cache is full, a replacement is made in accordance with the employed
replacement policy.
Implementation-
The following diagram shows the implementation of 2-way set associative cache-
The steps involved are as follows-
Step-01:
• Each multiplexer reads the set number from the generated physical address
using its select lines in parallel.
• To read the set number of S bits, number of select lines each multiplexer
must have = S.
Step-02:
• After reading the set number, each multiplexer goes to the corresponding set
in the cache memory.
• Then, each multiplexer goes to the lines of that set using its input lines in
parallel.
• Number of input lines each multiplexer must have = Number of lines in one
set
Step-03:
• Each multiplexer outputs the tag bit it has selected from the lines of selected
set to the comparators using its output line.
• Number of output line in each multiplexer = 1.
UNDERSTAND
It is important to understand-
• A multiplexer can output only a single bit on output line.
• So, to output one complete tag to the comparator,
Number of multiplexers required = Number of bits in the tag
• If there are k lines in one set, then number of tags to output = k, thus-
Number of multiplexers required = Number of lines in one set (k) x Number of bits
in the tag
• Each multiplexer is configured to read the tag bit of specific line at specific
location.
• So, each multiplexer selects the tag bit for which it has been configured and
outputs on the output line.
• The complete tags as whole are sent to the comparators for comparison in
parallel.
Step-04:
• Comparators compare the tags coming from the multiplexers with the tag of
the generated address.
• This comparison takes place in parallel.
• If there are k lines in one set (thus k tags), then-
Number of comparators required = k
and
Size of each comparator = Number of bits in the tag
• The output result of each comparator is fed as an input to an OR Gate.
• OR Gate is usually implemented using 2 x 1 multiplexer.
• If the output of OR Gate is 1, a cache hit occurs otherwise a cache miss
occurs.
Hit latency-
• The time taken to find out whether the required word is present in the Cache
Memory or not is called as hit latency.
For set associative mapping,
Hit latency = Multiplexer latency + Comparator latency + OR Gate latency
Important Results-
Following are the few important results for set associative cache-
• Block j of main memory maps to set number (j mod number of sets in cache)
of the cache.
• Number of multiplexers required = Number of lines in one set (k) x Number
of bits in tag
• Size of each multiplexer = Number of lines in one set (k) x 1
• Number of comparators required = Number of lines in one set (k)
• Size of each comparator = Number of bits in the tag
• Hit latency = Multiplexer latency + Comparator latency + OR Gate latency
Set Associative Mapping | Practice Problems
Set Associative Mapping-
• A particular block of main memory can be mapped to one particular cache
set only.
• Block ‘j’ of main memory will map to set number (j mod number of sets in
cache) of the cache.
• A replacement algorithm is needed if the cache is full.
Problem-01:
Consider a 2-way set associative mapped cache of size 16 KB with block size 256
bytes. The size of main memory is 128 KB. Find-
1. Number of bits in tag
2. Tag directory size
Solution-
Given-
• Set size = 2
• Cache memory size = 16 KB
• Block size = Frame size = Line size = 256 bytes
• Main memory size = 128 KB
We consider that the memory is byte addressable.
Number of Bits in Physical Address-
We have,
Size of main memory
= 128 KB
Advertisements
= 217 bytes
Thus, Number of bits in physical address = 17 bits
Number of Bits in Block Offset-
We have,
Block size
= 256 bytes
= 28 bytes
Thus, Number of bits in block offset = 8 bits
Number of Lines in Cache-
Total number of lines in cache
= Cache size / Line size
= 16 KB / 256 bytes
= 214 bytes / 28 bytes
= 64 lines
Thus, Number of lines in cache = 64 lines
Number of Sets in Cache-
Total number of sets in cache
= Total number of lines in cache / Set size
= 64 / 2
= 32 sets
= 25 sets
Thus, Number of bits in set number = 5 bits
Number of Bits in Tag-
Number of bits in tag
= Number of bits in physical address – (Number of bits in set number + Number of
bits in block offset)
= 17 bits – (5 bits + 8 bits)
= 17 bits – 13 bits
= 4 bits
Thus, Number of bits in tag = 4 bits
Tag Directory Size-
Tag directory size
= Number of tags x Tag size
= Number of lines in cache x Number of bits in tag
= 64 x 4 bits
= 256 bits
= 32 bytes
Thus, size of tag directory = 32 byte
Problem-02:
Consider a 8-way set associative mapped cache of size 512 KB with block size 1
KB. There are 7 bits in the tag. Find-
1. Size of main memory
2. Tag directory size
Solution-
Given-
• Set size = 8
• Cache memory size = 512 KB
• Block size = Frame size = Line size = 1 KB
• Number of bits in tag = 7 bits
We consider that the memory is byte addressable.
Number of Bits in Block Offset-
We have,
Block size
= 1 KB
= 210 bytes
Thus, Number of bits in block offset = 10 bits
Number of Lines in Cache-
Total number of lines in cache
= Cache size / Line size
= 512 KB / 1 KB
= 512 lines
Thus, Number of lines in cache = 512 lines
Number of Sets in Cache-
Total number of sets in cache
= Total number of lines in cache / Set size
= 512 / 8
= 64 sets
= 26 sets
Thus, Number of bits in set number = 6 bits
Number of Bits in Physical Address-
Number of bits in physical address
= Number of bits in tag + Number of bits in set number + Number of bits in block
offset
= 7 bits + 6 bits + 10 bits
= 23 bits
Thus, Number of bits in physical address = 23 bits
Size of Main Memory-
We have,
Number of bits in physical address = 23 bits
Thus, Size of main memory
= 223 bytes
= 8 MB
Tag Directory Size-
Tag directory size
= Number of tags x Tag size
= Number of lines in cache x Number of bits in tag
= 512 x 7 bits
= 3584 bits
= 448 bytes
Thus, size of tag directory = 448 bytes
Problem-03:
Consider a 4-way set associative mapped cache with block size 4 KB. The size of
main memory is 16 GB and there are 10 bits in the tag. Find-
1. Size of cache memory
2. Tag directory size
Solution-
Given-
• Set size = 4
• Block size = Frame size = Line size = 4 KB
• Main memory size = 16 GB
• Number of bits in tag = 10 bits
We consider that the memory is byte addressable.
Number of Bits in Physical Address-
We have,
Size of main memory
= 16 GB
= 234 bytes
Thus, Number of bits in physical address = 34 bits
Number of Bits in Block Offset-
We have,
Block size
= 4 KB
= 212 bytes
Thus, Number of bits in block offset = 12 bits
Number of Bits in Set Number-
Number of bits in set number
= Number of bits in physical address – (Number of bits in tag + Number of bits in
block offset)
= 34 bits – (10 bits + 12 bits)
= 34 bits – 22 bits
= 12 bits
Thus, Number of bits in set number = 12 bits
Number of Sets in Cache-
We have-
Number of bits in set number = 12 bits
Thus, Total number of sets in cache = 212 sets
Number of Lines in Cache-
We have-
Total number of sets in cache = 212 sets
Each set contains 4 lines
Thus,
Total number of lines in cache
= Total number of sets in cache x Number of lines in each set
= 212 x 4 lines
= 214 lines
Size of Cache Memory-
Size of cache memory
= Total number of lines in cache x Line size
= 214 x 4 KB
= 216 KB
= 64 MB
Thus, Size of cache memory = 64 MB
Tag Directory Size-
Tag directory size
= Number of tags x Tag size
= Number of lines in cache x Number of bits in tag
= 214 x 10 bits
= 163840 bits
= 20480 bytes
= 20 KB
Thus, size of tag directory = 20 KB
Also Read- Practice Problems On Fully Associative Mapping
Problem-04:
Consider a 8-way set associative mapped cache. The size of cache memory is 512
KB and there are 10 bits in the tag. Find the size of main memory.
Solution-
Given-
• Set size = 8
• Cache memory size = 512 KB
• Number of bits in tag = 10 bits
We consider that the memory is byte addressable.
Let-
• Number of bits in set number field = x bits
• Number of bits in block offset field = y bits
Sum of Number Of Bits Of Set Number Field And Block Offset Field-
We have,
Cache memory size = Number of sets in cache x Number of lines in one set x Line
size
Now, substituting the values, we get-
512 KB = 2x x 8 x 2y bytes
219 bytes = 23+x+y bytes
19 = 3 +x + y
x + y = 19 – 3
x + y = 16
Number of Bits in Physical Address-
Number of bits in physical address
= Number of bits in tag + Number of bits in set number + Number of bits in block
offset
= 10 bits + x bits + y bits
= 10 bits + (x + y) bits
= 10 bits + 16 bits
= 26 bits
Thus, Number of bits in physical address = 26 bits
Size of Main Memory-
We have,
Number of bits in physical address = 26 bits
Thus, Size of main memory
= 226 bytes
= 64 MB
Thus, size of main memory = 64 MB
Problem-05:
Consider a 4-way set associative mapped cache. The size of main memory is 64
MB and there are 10 bits in the tag. Find the size of cache memory.
Solution-
Given-
• Set size = 4
• Main memory size = 64 MB
• Number of bits in tag = 10 bits
We consider that the memory is byte addressable.
Number of Bits in Physical Address-
We have,
Size of main memory
= 64 MB
= 226 bytes
Thus, Number of bits in physical address = 26 bits
Sum Of Number Of Bits Of Set Number Field And Block Offset Field-
Let-
• Number of bits in set number field = x bits
• Number of bits in block offset field = y bits
Then, Number of bits in physical address
= Number of bits in tag + Number of bits in set number + Number of bits in block
offset
So, we have-
26 bits = 10 bits + x bits + y bits
26 = 10 + (x + y)
x + y = 26 – 10
x + y = 16
Thus, Sum of number of bits of set number field and block offset field = 16 bits
Size of Cache Memory-
Cache memory size
= Number of sets in cache x Number of lines in one set x Line size
= 2x x 4 x 2y bytes
= 22+x+y bytes
= 22+16 bytes
= 218 bytes
= 256 KB
Thus, size of cache memory = 256 KB
Cache Line | Cache Line Size | Cache Memory
Cache Lines-
Cache memory is divided into equal size partitions called as cache lines.
• While designing a computer’s cache system, the size of cache lines is an
important parameter.
• The size of cache line affects a lot of parameters in the caching system.
The following results discuss the effect of changing the cache block (or line)
size in a caching system.
Result-01: Effect of Changing Block Size on Spatial Locality-
The larger the block size, better will be the spatial locality.
Explanation-
Keeping the cache size constant, we have-
Case-01: Decreasing the Block Size-
• A smaller block size will contain a smaller number of near by addresses
in it.
• Thus, only smaller number of near by addresses will be brought into the
cache.
• This increases the chances of cache miss which reduces the exploitation
of spatial locality.
• Thus, smaller is the block size, inferior is the spatial locality.
Case-02: Increasing the Block Size-
• A larger block size will contain a larger number of near by addresses in
it.
• Thus, larger number of near by addresses will be brought into the
cache.
• This increases the chances of cache hit which increases the exploitation
of spatial locality.
• Thus, larger is the block size, better is the spatial locality.
Result-02: Effect of Changing Block Size On Cache Tag in Direct Mapped
Cache-
In direct mapped cache, block size does not affect the cache tag anyhow.
Explanation-
Keeping the cache size constant, we have-
Case-01: Decreasing the Block Size-
• Decreasing the block size increases the number of lines in cache.
• With the decrease in block size, the number of bits in block offset
decreases.
• However, with the increase in the number of cache lines, number of bits
in line number increases.
• So, number of bits in line number + number of bits in block offset =
remains constant.
• Thus, there is no effect on the cache tag.
Example-
Case-02: Increasing the Block Size-
• Increasing the block size decreases the number of lines in cache.
• With the increase in block size, the number of bits in block offset increases.
• However, with the decrease in the number of cache lines, number of bits in
line number decreases.
• Thus, number of bits in line number + number of bits in block offset =
remains constant.
• Thus, there is no effect on the cache tag.
Example-
Result-03: Effect of Changing Block Size On Cache Tag in Fully Associative
Cache-
In fully associative cache, on decreasing block size, cache tag is reduced and vice
versa.
Explanation-
Keeping the cache size constant, we have-
Case-01: Decreasing the Block Size-
• Decreasing the block size decreases the number of bits in block offset.
• With the decrease in number of bits in block offset, number of bits in tag
increases.
Case-02: Increasing the Block Size-
• Increasing the block size increases the number of bits in block offset.
• With the increase in number of bits in block offset, number of bits in tag
decreases.
Result-04: Effect of Changing Block Size On Cache Tag in Set Associative
Cache-
In set associative cache, block size does not affect cache tag anyhow.
Explanation-
Keeping the cache size constant, we have-
Case-01: Decreasing the Block Size-
• Decreasing the block size increases the number of lines in cache.
• With the decrease in block size, number of bits in block offset decreases.
• With the increase in the number of cache lines, number of sets in cache
increases.
• With the increase in number of sets in cache, number of bits in set number
increases.
• So, number of bits in set number + number of bits in block offset = remains
constant.
• Thus, there is no effect on the cache tag.
Example-
Case-02: Increasing the Block Size-
• Increasing the block size decreases the number of lines in cache.
• With the increase in block size, number of bits in block offset increases.
• With the decrease in the number of cache lines, number of sets in cache
decreases.
• With the decrease in number of sets in cache, number of bits in set number
decreases.
• So, number of bits in set number + number of bits in block offset = remains
constant.
• Thus, there is no effect on the cache tag.
Example-
Result-05: Effect of Changing Block Size On Cache Miss Penalty-
A smaller cache block incurs a lower cache miss penalty.
Explanation-
• When a cache miss occurs, block containing the required word has to be
brought from the main memory.
• If the block size is small, then time taken to bring the block in the cache will
be less.
• Hence, less miss penalty will incur.
• But if the block size is large, then time taken to bring the block in the cache
will be more.
• Hence, more miss penalty will incur.
Result-06: Effect of Cache Tag On Cache Hit Time-
A smaller cache tag ensures a lower cache hit time.
Explanation-
• Cache hit time is the time required to find out whether the required block is
in cache or not.
• It involves comparing the tag of generated address with the tag of cache
lines.
• Smaller is the cache tag, lesser will be the time taken to perform the
comparisons.
• Hence, smaller cache tag ensures lower cache hit time.
• On the other hand, larger is the cache tag, more will be time taken to
perform the comparisons.
• Thus, larger cache tag results in higher cache hit time.
PRACTICE PROBLEM BASED ON CACHE LINE-
Problem-
In designing a computer’s cache system, the cache block or cache line size is an
important parameter. Which of the following statements is correct in this context?
1. A smaller block size implies better spatial locality
2. A smaller block size implies a smaller cache tag and hence lower cache tag
overhead
3. A smaller block size implies a larger cache tag and hence lower cache hit
time
4. A smaller bock size incurs a lower cache miss penalty
Solution-
Option (D) is correct. (Result-05)
Reasons-
Option (A) is incorrect because-
• Smaller block does not imply better spatial locality.
• Always, Larger the block size, better is the spatial locality.
Option (B) is incorrect because-
• In direct mapped cache and set associative cache, there is no effect of
changing block size on cache tag.
• In fully associative mapped cache, on decreasing block size, cache tag
becomes larger.
• Thus, smaller block size does not imply smaller cache tag in any cache
organization.
Option (C) is incorrect because-
• “A smaller block size implies a larger cache tag” is true only for fully
associative mapped cache.
• Larger cache tag does not imply lower cache hit time rather cache hit time is
increased.
Cache Mapping | Practice Problems
Problem-01:
The main memory of a computer has 2 cm blocks while the cache has 2c blocks. If
the cache uses the set associative mapping scheme with 2 blocks per set, then block
k of the main memory maps to the set-
1. (k mod m) of the cache
2. (k mod c) of the cache
3. (k mod 2 c) of the cache
4. (k mod 2 cm) of the cache
Solution-
Given-
• Number of blocks in main memory = 2 cm
• Number of blocks in cache = 2 c
• Number of blocks in one set of cache = 2
Number of Sets in Cache-
Number of sets in cache
= Number of blocks in cache / Number of blocks in one set
=2c/2
=c
Required Mapping-
In set associative mapping,
• Block ‘j’ of main memory maps to set number (j modulo number of sets in
cache) of the cache.
• So, block ‘k’ of main memory maps to set number (k mod c) of the cache.
• Thus, option (B) is correct.
Also Read- Practice Problems On Direct Mapping
Problem-02:
In a k-way set associative cache, the cache is divided into v sets, each of which
consists of k lines. The lines of a set placed in sequence one after another. The lines
in set s are sequenced before the lines in set (s+1). The main memory blocks are
numbered 0 on wards. The main memory block numbered ‘j’ must be mapped to
any one of the cache lines from-
1. (j mod v) x k to (j mod v) x k + (k – 1)
2. (j mod v) to (j mod v) + (k – 1)
3. (j mod k) to (j mod k) + (v – 1)
4. (j mod k) x v to (j mod k) x v + (v – 1)
Solution-
Let-
• 2-way set associative mapping is used, then k = 2
• Number of sets in cache = 4, then v = 4
• Block number ‘3’ has to be mapped, then j = 3
The given options on substituting these values reduce to-
1. (3 mod 4) x 2 to (3 mod 4) x 2 + (2 – 1) = 6 to 7
2. (3 mod 4) to (3 mod 4) + (2 – 1) = 3 to 4
3. (3 mod 2) to (3 mod 2) + (4 – 1) = 1 to 4
4. (3 mod 2) x 4 to (3 mod 2) x 4 + (4 – 1) = 4 to 7
Now, we have been asked to what range of cache lines, block number 3 can be
mapped.
According to the above data, the cache memory and main memory will look like-
In set associative mapping,
• Block ‘j’ of main memory maps to set number (j modulo number of sets in
cache) of the cache.
• So, block 3 of main memory maps to set number (3 mod 4) = 3 of cache.
• Within set number 3, block 3 can be mapped to any of the cache lines.
• Thus, block 3 can be mapped to cache lines ranging from 6 to 7.
Thus, Option (A) is correct.
Problem-03:
A block-set associative cache memory consists of 128 blocks divided into four
block sets . The main memory consists of 16,384 blocks and each block contains
256 eight bit words.
1. How many bits are required for addressing the main memory?
2. How many bits are needed to represent the TAG, SET and WORD fields?
Solution-
Given-
• Number of blocks in cache memory = 128
• Number of blocks in each set of cache = 4
• Main memory size = 16384 blocks
• Block size = 256 bytes
• 1 word = 8 bits = 1 byte
Main Memory Size-
We have-
Size of main memory
= 16384 blocks
= 16384 x 256 bytes
= 222 bytes
Thus, Number of bits required to address main memory = 22 bits
Number of Bits in Block Offset-
We have-
Block size
= 256 bytes
= 28 bytes
Thus, Number of bits in block offset or word = 8 bits
Number of Bits in Set Number-
Number of sets in cache
= Number of lines in cache / Set size
= 128 blocks / 4 blocks
= 32 sets
= 25 sets
Thus, Number of bits in set number = 5 bits
Number of Bits in Tag Number-
Number of bits in tag
= Number of bits in physical address – (Number of bits in set number + Number of
bits in word)
= 22 bits – (5 bits + 8 bits)
= 22 bits – 13 bits
= 9 bits
Thus, Number of bits in tag = 9 bits
Thus, physical address is-
Problem-04:
A computer has a 256 KB, 4-way set associative, write back data cache with block
size of 32 bytes. The processor sends 32 bit addresses to the cache controller. Each
cache tag directory entry contains in addition to address tag, 2 valid bits, 1
modified bit and 1 replacement bit.
Part-01:
The number of bits in the tag field of an address is-
1. 11
2. 14
3. 16
4. 27
Part-02:
The size of the cache tag directory is-
1. 160 Kbits
2. 136 Kbits
3. 40 Kbits
4. 32 Kbits
Solution-
Given-
• Cache memory size = 256 KB
• Set size = 4 blocks
• Block size = 32 bytes
• Number of bits in physical address = 32 bits
Number of Bits in Block Offset-
We have-
Block size
= 32 bytes
= 25 bytes
Thus, Number of bits in block offset = 5 bits
Number of Lines in Cache-
Number of lines in cache
= Cache size / Line size
= 256 KB / 32 bytes
= 218 bytes / 25 bytes
= 213 lines
Thus, Number of lines in cache = 213 lines
Number of Sets in Cache-
Number of sets in cache
= Number of lines in cache / Set size
= 213 lines / 22 lines
= 211 sets
Thus, Number of bits in set number = 11 bits
Number of Bits in Tag-
Number of bits in tag
= Number of bits in physical address – (Number of bits in set number + Number of
bits in block offset)
= 32 bits – (11 bits + 5 bits)
= 32 bits – 16 bits
= 16 bits
Thus, Number of bits in tag = 16 bits
Tag Directory Size-
Size of tag directory
= Number of lines in cache x Size of tag
= 213 x (16 bits + 2 valid bits + 1 modified bit + 1 replacement bit)
= 213 x 20 bits
= 163840 bits
= 20 KB or 160 Kbits
Thus,
• For part-01, Option (C) is correct.
• For part-02, Option (A) is correct.
Also Read- Practice Problems On Fully Associative Mapping
Problem-05:
A 4-way set associative cache memory unit with a capacity of 16 KB is built using
a block size of 8 words. The word length is 32 bits. The size of the physical address
space is 4 GB. The number of bits for the TAG field is _____.
Solution-
Given-
• Set size = 4 lines
• Cache memory size = 16 KB
• Block size = 8 words
• 1 word = 32 bits = 4 bytes
• Main memory size = 4 GB
Number of Bits in Physical Address-
We have,
Main memory size
= 4 GB
= 232 bytes
Thus, Number of bits in physical address = 32 bits
Number of Bits in Block Offset-
We have,
Block size
= 8 words
= 8 x 4 bytes
= 32 bytes
= 25 bytes
Thus, Number of bits in block offset = 5 bits
Number of Lines in Cache-
Number of lines in cache
= Cache size / Line size
= 16 KB / 32 bytes
= 214 bytes / 25 bytes
= 29 lines
= 512 lines
Thus, Number of lines in cache = 512 lines
Number of Sets in Cache-
Number of sets in cache
= Number of lines in cache / Set size
= 512 lines / 4 lines
= 29 lines / 22 lines
= 27 sets
Thus, Number of bits in set number = 7 bits
Number of Bits in Tag-
Number of bits in tag
= Number of bits in physical address – (Number of bits in set number + Number of
bits in block offset)
= 32 bits – (7 bits + 5 bits)
= 32 bits – 12 bits
= 20 bits
Thus, number of bits in tag = 20 bits
Problem-06:
If the associativity of a processor cache is doubled while keeping the capacity and
block size unchanged, which one of the following is guaranteed to be NOT
affected?
1. Width of tag comparator
2. Width of set index decoder
3. Width of way selection multiplexer
4. Width of processor to main memory data bus
Solution-
Since block size is unchanged, so number of bits in block offset will remain
unchanged.
Effect On Width Of Tag Comparator-
• Associativity of cache is doubled means number of lines in one set is
doubled.
• Since number of lines in one set is doubled, therefore number of sets reduces
to half.
• Since number of sets reduces to half, so number of bits in set number
decrements by 1.
• Since number of bits in set number decrements by 1, so number of bits in tag
increments by 1.
• Since number of bits in tag increases, therefore width of tag comparator also
increases.
Effect On Width of Set Index Decoder-
• Associativity of cache is doubled means number of lines in one set is
doubled.
• Since number of lines in one set is doubled, therefore number of sets reduces
to half.
• Since number of sets reduces to half, so number of bits in set number
decrements by 1.
• Since number of bits in set number decreases, therefore width of set index
decoder also decreases.
Effect On Width Of Way Selection Multiplexer-
• Associativity of cache (k) is doubled means number of lines in one set is
doubled.
• New associativity of cache is 2k.
• To handle new associativity, size of multiplexers must be 2k x 1.
• Therefore, width of way selection multiplexer increases.
Effect On Width Of Processor To Main Memory Data Bus-
• Processor to main memory data bus has nothing to do with cache
associativity.
• It depends on the number of bits in block offset which is unchanged here.
• So, width of processor to main memory data bus is not affected and remains
unchanged.
Thus, Option (D) is correct.
Problem-07:
Consider a direct mapped cache with 8 cache blocks (0-7). If the memory block
requests are in the order-
3, 5, 2, 8, 0, 6, 3, 9, 16, 20, 17, 25, 18, 30, 24, 2, 63, 5, 82, 17, 24
Which of the following memory blocks will not be in the cache at the end of the
sequence?
1. 3
2. 18
3. 20
4. 30
Also, calculate the hit ratio and miss ratio.
Solution-
We have,
• There are 8 blocks in cache memory numbered from 0 to 7.
• In direct mapping, a particular block of main memory is mapped to a
particular line of cache memory.
• The line number is given by-
Cache line number = Block address modulo Number of lines in cache
For the given sequence-
• Requests for memory blocks are generated one by one.
• The line number of the block is calculated using the above relation.
• Then, the block is placed in that particular line.
• If already there exists another block in that line, then it is replaced.
Thus,
• Out of given options, only block-18 is not present in the main memory.
• Option-(B) is correct.
• Hit ratio = 3 / 20
• Miss ratio = 17 / 20
Problem-08:
Consider a fully associative cache with 8 cache blocks (0-7). The memory block
requests are in the order-
4, 3, 25, 8, 19, 6, 25, 8, 16, 35, 45, 22, 8, 3, 16, 25, 7
If LRU replacement policy is used, which cache block will have memory block 7?
Also, calculate the hit ratio and miss ratio.
Solution-
We have,
• There are 8 blocks in cache memory numbered from 0 to 7.
• In fully associative mapping, any block of main memory can be mapped to
any line of the cache that is freely available.
• If all the cache lines are already occupied, then a block is replaced in
accordance with the replacement policy.
Thus,
• Line-5 contains the block-7.
• Hit ratio = 5 / 17
• Miss ratio = 12 / 17
Problem-09:
Consider a 4-way set associative mapping with 16 cache blocks. The memory
block requests are in the order-
0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 155
If LRU replacement policy is used, which cache block will not be present in the
cache?
1. 3
2. 8
3. 129
4. 216
Also, calculate the hit ratio and miss ratio.
Solution-
We have,
• There are 16 blocks in cache memory numbered from 0 to 15.
• Each set contains 4 cache lines.
• In set associative mapping, a particular block of main memory is mapped to
a particular set of cache memory.
• The set number is given by-
Cache line number = Block address modulo Number of sets in cache
For the given sequence-
• Requests for memory blocks are generated one by one.
• The set number of the block is calculated using the above relation.
• Within that set, the block is placed in any freely available cache line.
• If all the blocks are already occupied, then one of the block is replaced in
accordance with the employed replacement policy.
Thus,
• Out of given options, only block-216 is not present in the main memory.
• Option-(D) is correct.
• Hit ratio = 1 / 17
• Miss ratio = 16 / 17
Also Read- Practice Problems On Set Associative Mapping
Problem-10:
Consider a small 2-way set associative mapping with a total of 4 blocks. LRU
replacement policy is used for choosing the block to be replaced. The number of
cache misses for the following sequence of block addresses 8, 12, 0, 12, 8 is ____.
Solution-
• Practice yourself.
• Total number of cache misses = 4
Problem-11:
Consider the cache has 4 blocks. For the memory references-
5, 12, 13, 17, 4, 12, 13, 17, 2, 13, 19, 13, 43, 61, 19
What is the hit ratio for the following cache replacement algorithms-
1. FIFO
2. LRU
3. Direct mapping
4. 2-way set associative mapping using LRU
Solution-
• Practice yourself.
• Using FIFO as cache replacement algorithm, hit ratio = 5/15 = 1/3.
• Using LRU as cache replacement algorithm, hit ratio = 6/15 = 2/5.
• Using direct mapping as cache replacement algorithm, hit ratio = 1/15.
• Using 2-way set associative mapping as cache replacement algorithm, hit
ratio = 5/15 = 1/3
Problem-12:
A hierarchical memory system has the following specification, 20 MB main
storage with access time of 300 ns, 256 bytes cache with access time of 50 ns,
word size 4 bytes, page size 8 words. What will be the hit ratio if the page address
trace of a program has the pattern 0, 1, 2, 3, 0, 1, 2, 4 following LRU page
replacement technique?
Solution-
Given-
• Main memory size = 20 MB
• Main memory access time = 300 ns
• Cache memory size = 256 bytes
• Cache memory access time = 50 ns
• Word size = 4 bytes
• Page size = 8 words
Line Size-
We have,
Line size
= 8 words
= 8 x 4 bytes
= 32 bytes
Number of Lines in Cache-
Number of lines in cache
= Cache size / Line size
= 256 bytes / 32 bytes
= 8 lines
Thus,
• Number of hits = 3
• Hit ratio = 3 / 8
Problem-13:
Consider an array A[100] and each element occupies 4 words. A 32 word cache is
used and divided into 8 word blocks. What is the hit ratio for the following code-
for (i=0 ; i < 100 ; i++)
A[i] = A[i] + 10;
Solution-
Number of lines in cache
= Cache size / Line size
= 32 words / 8 words
= 4 lines
Since each element of the array occupies 4 words, so-
Number of elements that can be placed in one line = 2
Now, let us analyze the statement-
A[i] = A[i] + 10;
For each i,
• Firstly, the value of A[i] is read.
• Secondly, 10 is added to the A[i].
• Thirdly, the updated value of A[i] is written back.
Thus,
• For each i, A[i] is accessed two times.
• Two memory accesses are required-one for read operation and other for
write operation.
Assume the cache is all empty initially.
In the first loop iteration,
• First, value of A[0] has to be read.
• Since cache is empty, so A[0] is brought in the cache.
• Along with A[0], element A[1] also enters the cache since each block can
hold 2 elements of the array.
• Thus, For A[0], a miss occurred for the read operation.
• Now, 10 is added to the value of A[0].
• Now, the updated value has to be written back to A[0].
• Again cache memory is accessed to get A[0] for writing its updated value.
• This time, for A[0], a hit occurs for the write operation.
In the second loop iteration,
• First, value of A[1] has to be read.
• A[1] is already present in the cache.
• Thus, For A[1]. a hit occurs for the read operation.
• Now, 10 is added to the value of A[1].
• Now, the updated value has to be written back to A[1].
• Again cache memory is accessed to get A[1] for writing its updated value.
• Again, for A[1], a hit occurs for the write operation.
In the similar manner, for every next two consecutive elements-
• There will be a miss for the read operation for the first element.
• There will be a hit for the write operation for the first element.
• There will be a hit for both read and write operations for the second element.
Likewise, for 100 elements, we will have 50 such pairs in cache and in every pair,
there will be one miss and three hits.
Thus,
• Total number of hits = 50 x 3 = 150
• Total number of misses = 50 x 1 = 50
• Total number of references = 200 (100 for read and 100 for write)
Thus,
• Hit ratio = 150 / 200 = 3 / 4 = 0.75
• Miss ratio = 50 / 200 = 1 / 4 = 0.25
Problem-14:
Consider an array has 100 elements and each element occupies 4 words. A 32 bit
word cache is used and divided into a block of 8 words.
What is the hit ratio for the following statement-
for (i=0 ; i < 10 ; i++)
for (j=0 ; j < 10 ; j++)
A[i][j] = A[i][j] + 10;
if-
1. Row major order is used
2. Column major order is used
Solution-
According to question, cache is divided as-
In each cache line, two elements of the array can reside.
Case-01: When Row Major Order is Used-
In row major order, elements of the array are stored row wise in the memory as-
• In the first iteration, when A[0][0] will be brought in the cache, A[0][1] will
also be brought.
• Then, things goes as it went in the previous question.
• There will be a miss for A[0,0] read operation and hit for A[0,0] write
operation.
• For A[0,1], there will be a hit for both read and write operations.
• Similar will be the story with every next two consecutive elements.
Thus,
• Total number of hits = 50 x 3 = 150
• Total number of misses = 50 x 1 = 50
• Total number of references = 200 (100 for read and 100 for write)
Thus,
• Hit ratio = 150 / 200 = 3 / 4 = 0.75
• Miss ratio = 50 / 200 = 1 / 4 = 0.25
Case-02: When Column Major Order is Used-
In column major order, elements of the array are stored column wise in the
memory as-
• In the first iteration, when A[0,0] will be brought in the cache, A[1][0] will
also be brought.
• This time A[1][0] will not be useful in iteration-2.
• A[1][0] will be needed after surpassing 10 element.
• But by that time, this block would have got replaced since there are only 4
lines in the cache.
• For A[1][0] to be useful, there has to be at least 10 lines in the cache.
Thus,
Under given scenario, for each element of the array-
• There will be a miss for the read operation.
• There will be a hit for the write operation.
Thus,
• Total number of hits = 100 x 1 = 100
• Total number of misses = 100 x 1 = 100
• Total number of references = 200 (100 for read and 100 for write)
Thus,
• Hit ratio = 100 / 200 = 1 / 2 = 0.50
• Miss ratio = 100 / 200 = 1 / 2 = 0.50
To increase the hit rate, following measures can be taken-
• Row major order can be used (Hit rate = 75%)
• The statement A[i][j] = A[i][j] + 10 can be replaced with A[j,i] = A[j][i] + 10
(Hit rate = 75%)
Problem-15:
An access sequence of cache block addresses is of length N and contains n unique
block addresses. The number of unique block addresses between two consecutive
accesses to the same block address is bounded above by k. What is the miss ratio if
the access sequence is passed through a cache of associativity A>=k exercising
LRU replacement policy?
1. n/N
2. 1/N
3. 1/A
4. k/n
Solution-
Required miss ratio = n/N.
Thus, Option (A) is correct.
Magnetic Disk in Computer Architecture
In computer architecture,
• Magnetic disk is a storage device that is used to write, rewrite and
access data.
• It uses a magnetization process.
Architecture-
• The entire disk is divided into platters.
• Each platter consists of concentric circles called as tracks.
• These tracks are further divided into sectors which are the smallest
divisions in the disk.
• A cylinder is formed by combining the tracks at a given radius of a disk
pack.
• There exists a mechanical arm called as Read / Write head.
• It is used to read from and write to the disk.
• Head has to reach at a particular track and then wait for the rotation of the
platter.
• The rotation causes the required sector of the track to come under the head.
• Each platter has 2 surfaces- top and bottom and both the surfaces are used to
store the data.
• Each surface has its own read / write head.
Disk Performance Parameters-
The time taken by the disk to complete an I/O request is called as disk service
time or disk access time.
Components that contribute to the service time are-
1. Seek Time-
• The time taken by the read / write head to reach the desired track is called
as seek time.
• It is the component which contributes the largest percentage of the disk
service time.
• The lower the seek time, the faster the I/O operation.
Specifications
Seek time specifications include-
1. Full stroke
2. Average
3. Track to Track
1. Full Stroke-
• It is the time taken by the read / write head to move across the entire width of
the disk from the innermost track to the outermost track
2. Average-
• It is the average time taken by the read / write head to move from one random
track to another.
Average seek time = 1 / 3 x Full stroke
3. Track to Track-
• It is the time taken by the read-write head to move between the adjacent tracks.
2. Rotational Latency-
• The time taken by the desired sector to come under the read / write head is
called as rotational latency.
• It depends on the rotation speed of the spindle.
Average rotational latency = 1 / 2 x Time taken for full rotation
3. Data Transfer Rate-
• The amount of data that passes under the read / write head in a given amount
of time is called as data transfer rate.
• The time taken to transfer the data is called as transfer time.
It depends on the following factors-
1. Number of bytes to be transferred
2. Rotation speed of the disk
3. Density of the track
4. Speed of the electronics that connects the disk to the computer
4. Controller Overhead-
• The overhead imposed by the disk controller is called as controller
overhead.
• Disk controller is a device that manages the disk.
5. Queuing Delay-
• The time spent waiting for the disk to become free is called as queuing
delay.
NOTE-
All the tracks of a disk have the same storage capacity.
Storage Density-
• All the tracks of a disk have the same storage capacity.
• This is because each track has different storage density.
• Storage density decreases as we from one track to another track away from
the center.
Thus,
• Innermost track has maximum storage density.
• Outermost track has minimum storage density.
Important Formulas-
1. Disk Access Time-
Disk access time is calculated as-
Disk access time
= Seek time + Rotational delay + Transfer time + Controller overhead + Queuing
delay
2. Average Disk Access Time-
Average disk access time is calculated as-
Average disk access time
= Average seek time + Average rotational delay + Transfer time + Controller
overhead + Queuing delay
3. Average Seek Time-
Average seek time is calculated as-
Average seek time
= 1 / 3 x Time taken for one full stroke
Alternatively,
If time taken by the head to move from one track to adjacent track = t units and
there are total k tracks, then-
Average seek time
= { Time taken to move from track 1 to track 1 + Time taken to move from track 1
to last track } / 2
= { 0 + (k-1)t } / 2
= (k-1)t / 2
4. Average Rotational Latency-
Average rotational latency is calculated as-
Average rotational latency
= 1 / 2 x Time taken for one full rotation
Average rotational latency may also be referred as-
• Average rotational delay
• Average latency
• Average delay
5. Capacity Of Disk Pack-
Capacity of a disk pack is calculated as-
Capacity of a disk pack
= Total number of surfaces x Number of tracks per surface x Number of sectors per
track x Storage capacity of one sector
6. Formatting Overhead-
Formatting overhead is calculated as-
Formatting overhead
= Number of sectors x Overhead per sector
7. Formatted Disk Space-
Formatted disk space also called as usable disk space is the disk space excluding
formatting overhead.
It is calculated as-
Formatted disk space
= Total disk space or capacity – Formatting overhead
8. Recording Density Or Storage Density-
Recording density or Storage density is calculated as-
Storage density of a track
= Capacity of the track / Circumference of the track
From here, we can infer-
Storage density of a track ∝ 1 / Circumference of the track
9. Track Capacity-
Capacity of a track is calculated as-
Capacity of a track
= Recording density of the track x Circumference of the track
10. Data Transfer Rate-
Data transfer rate is calculated as-
Data transfer rate
= Number of heads x Bytes that can be read in one full rotation x Number of
rotations in one second
OR
Data transfer rate
= Number of heads x Capacity of one track x Number of rotations in one second
11. Tracks Per Surface-
Total number of tracks per surface is calculated as-
Total number of tracks per surface
= (Outer radius – Inner radius) / Inter track gap
Points to Remember-
• The entire disk space is not usable for storage because some space is wasted
in formatting.
• When rotational latency is not given, use average rotational latency for
solving numerical problems.
• When seek time is not given, use average seek time for solving numerical
problems.
• It is wrong to say that as we move from one track to another away from the
center, the capacity increases.
• All the tracks have same storage capacity.
Magnetic Disk | Practice Problems | COA
Problem-01:
Consider a disk pack with the following specifications- 16 surfaces, 128 tracks
per surface, 256 sectors per track and 512 bytes per sector.
Answer the following questions-
1. What is the capacity of disk pack?
2. What is the number of bits required to address the sector?
3. If the format overhead is 32 bytes per sector, what is the formatted disk
space?
4. If the format overhead is 64 bytes per sector, how much amount of
memory is lost due to formatting?
5. If the diameter of innermost track is 21 cm, what is the maximum
recording density?
6. If the diameter of innermost track is 21 cm with 2 KB/cm, what is the
capacity of one track?
7. If the disk is rotating at 3600 RPM, what is the data transfer rate?
8. If the disk system has rotational speed of 3000 RPM, what is the average
access time with a seek time of 11.5 msec?
Solution-
Given-
• Number of surfaces = 16
• Number of tracks per surface = 128
• Number of sectors per track = 256
• Number of bytes per sector = 512 bytes
Part-01: Capacity of Disk Pack-
Capacity of disk pack
= Total number of surfaces x Number of tracks per surface x Number of sectors per
track x Number of bytes per sector
= 16 x 128 x 256 x 512 bytes
= 228 bytes
= 256 MB
Part-02: Number of Bits Required To Address Sector-
Total number of sectors
= Total number of surfaces x Number of tracks per surface x Number of sectors per
track
= 16 x 128 x 256 sectors
= 219 sectors
Thus, Number of bits required to address the sector = 19 bits
Part-03: Formatted Disk Space-
Formatting overhead
= Total number of sectors x overhead per sector
= 219 x 32 bytes
= 219 x 25 bytes
= 224 bytes
= 16 MB
Now, Formatted disk space
= Total disk space – Formatting overhead
= 256 MB – 16 MB
= 240 MB
Part-04: Formatting Overhead-
Amount of memory lost due to formatting
= Formatting overhead
= Total number of sectors x Overhead per sector
= 219 x 64 bytes
= 219 x 26 bytes
= 225 bytes
= 32 MB
Part-05: Maximum Recording Density-
Storage capacity of a track
= Number of sectors per track x Number of bytes per sector
= 256 x 512 bytes
= 28 x 29 bytes
= 217 bytes
= 128 KB
Circumference of innermost track
= 2 x π x radius
= π x diameter
= 3.14 x 21 cm
= 65.94 cm
Now, Maximum recording density
= Recording density of innermost track
= Capacity of a track / Circumference of innermost track
= 128 KB / 65.94 cm
= 1.94 KB/cm
Part-06: Capacity Of Track-
Circumference of innermost track
= 2 x π x radius
= π x diameter
= 3.14 x 21 cm
= 65.94 cm
Capacity of a track
= Storage density of the innermost track x Circumference of the innermost track
= 2 KB/cm x 65.94 cm
= 131.88 KB
≅ 132 KB
Part-07: Data Transfer Rate-
Number of rotations in one second
= (3600 / 60) rotations/sec
= 60 rotations/sec
Now, Data transfer rate
= Number of heads x Capacity of one track x Number of rotations in one second
= 16 x (256 x 512 bytes) x 60
= 24 x 28 x 29 x 60 bytes/sec
= 60 x 221 bytes/sec
= 120 MBps
Part-08: Average Access Time-
Time taken for one full rotation
= (60 / 3000) sec
= (1 / 50) sec
= 0.02 sec
= 20 msec
Average rotational delay
= 1/2 x Time taken for one full rotation
= 1/2 x 20 msec
= 10 msec
Now, average access time
= Average seek time + Average rotational delay + Other factors
= 11.5 msec + 10 msec + 0
= 21.5 msec
Problem-02:
What is the average access time for transferring 512 bytes of data with the
following specifications-
• Average seek time = 5 msec
• Disk rotation = 6000 RPM
• Data rate = 40 KB/sec
• Controller overhead = 0.1 msec
Solution-
Given-
• Average seek time = 5 msec
• Disk rotation = 6000 RPM
• Data rate = 40 KB/sec
• Controller overhead = 0.1 msec
Time Taken For One Full Rotation-
Time taken for one full rotation
= (60 / 6000) sec
= (1 / 100) sec
= 0.01 sec
= 10 msec
Average Rotational Delay-
Average rotational delay
= 1/2 x Time taken for one full rotation
= 1/2 x 10 msec
= 5 msec
Transfer Time-
Transfer time
= (512 bytes / 40 KB) sec
= 0.0125 sec
= 12.5 msec
Average Access Time-
Average access time
= Average seek time + Average rotational delay + Transfer time + Controller
overhead + Queuing delay
= 5 msec + 5 msec + 12.5 msec + 0.1 msec + 0
= 22.6 msec
Problem-03:
A certain moving arm disk storage with one head has the following specifications-
• Number of tracks per surface = 200
• Disk rotation speed = 2400 RPM
• Track storage capacity = 62500 bits
• Average latency = P msec
• Data transfer rate = Q bits/sec
What is the value of P and Q?
Solution-
Given-
• Number of tracks per surface = 200
• Disk rotation speed = 2400 RPM
• Track storage capacity = 62500 bits
Time Taken For One Full Rotation-
Time taken for one full rotation
= (60 / 2400) sec
= (1 / 40) sec
= 0.025 sec
= 25 msec
Average Latency-
Average latency or Average rotational latency
= 1/2 x Time taken for one full rotation
= 1/2 x 25 msec
= 12.5 msec
Data Transfer Rate-
Data transfer rate
= Number of heads x Capacity of one track x Number of rotations in one second
= 1 x 62500 bits x (2400 / 60)
= 2500000 bits/sec
= 2.5 x 106 bits/sec
Thus, P = 12.5 and Q = 2.5 x 106
Problem-04:
A disk pack has 19 surfaces and storage area on each surface has an outer diameter
of 33 cm and inner diameter of 22 cm. The maximum recording storage density on
any track is 200 bits/cm and minimum spacing between tracks is 0.25 mm.
Calculate the capacity of disk pack.
Solution-
Given-
• Number of surfaces = 19
• Outer diameter = 33 cm
• Inner diameter = 22 cm
• Maximum recording density = 200 bits/cm
• Inter track gap = 0.25 mm
Number Of Tracks On Each Surface-
Number of tracks on each surface
= (Outer radius – Inner radius) / Inter track gap
= (16.5 cm – 11 cm) / 0.25 mm
= 5.5 cm / 0.25 mm
= 55 mm / 0.25 mm
= 220 tracks
Capacity Of Each Track-
Capacity of each track
= Maximum recording density x Circumference of innermost track
= 200 bits/cm x (3.14 x 22 cm)
= 200 x 69.08 bits
= 13816 bits
= 1727 bytes
Capacity Of Disk Pack-
Capacity of disk pack
= Total number of surfaces x Number of tracks per surface x Capacity of one track
= 19 x 220 x 1727 bytes
= 7218860 bytes
= 6.88 MB
Problem-05:
Consider a typical disk that rotates at 15000 RPM and has a transfer rate of 50 x
106 bytes/sec. If the average seek time of the disk is twice the average rotational
delay and the controller’s transfer time is 10 times the disk transfer time. What is
the average time (in milliseconds) to read or write a 512 byte sector of the disk?
Solution-
Given-
• Rotation speed of the disk = 15000 RPM
• Transfer rate = 50 x 106 bytes/sec
• Average seek time = 2 x Average rotational delay
• Controller’s transfer time = 10 x Disk transfer time
Time Taken For One Full Rotation-
Time taken for one full rotation
= (60 / 15000) sec
= 0.004 sec
= 4 msec
Average Rotational Delay-
Average rotational delay
= 1/2 x Time taken for one full rotation
= 1/2 x 4 msec
= 2 msec
Average Seek Time-
Average seek time
= 2 x Average rotational delay
= 2 x 2 msec
= 4 msec
Disk Transfer Time-
Disk transfer time
= Time taken to read or write 512 bytes
= 512 bytes / (50 x 106 bytes/sec)
= 10.24 x 10-6 sec
= 0.01024 msec
Controller’s Transfer Time-
Controller’s transfer time
= 10 x Disk transfer time
= 10 x 0.01024 msec
= 0.1024 msec
Average Time To Read Or Write 512 Bytes-
Average time to read or write 512 bytes
= Average seek time + Average rotational delay + Disk transfer time + Controller’s
transfer time + Queuing delay
= 4 msec + 2 msec + 0.01024 msec + 0.1024 msec + 0
= 6.11 msec
Problem-06:
A hard disk system has the following parameters-
• Number of tracks = 500
• Number of sectors per track = 100
• Number of bytes per sector = 500
• Time taken by the head to move from one track to another adjacent track = 1
msec
• Rotation speed = 600 RPM
What is the average time taken for transferring 250 bytes from the disk?
Solution-
Given-
• Number of tracks = 500
• Number of sectors per track = 100
• Number of bytes per sector = 500
• Time taken by the head to move from one track to another adjacent track = 1
msec
• Rotation speed = 600 RPM
Average Seek Time-
Average seek time
= (Time taken by the head to move from track-1 to track-1 + Time taken by the
head to move from track-1 to track-500) / 2
= (0 + 499 x 1 msec) / 2
= 249.5 msec
Time Taken For One Full Rotation-
Time taken for one full rotation
= (60 / 600) sec
= 0.1 sec
= 100 msec
Average Rotational Delay-
Average rotational delay
= 1/2 x Time taken for one full rotation
= 1/2 x 100 msec
= 50 msec
Capacity Of One Track-
Capacity of one track
= Number of sectors per track x Number of bytes per sector
= 100 x 500 bytes
= 50000 bytes
Data Transfer Rate-
Data transfer rate
= Number of heads x Capacity of one track x Number of rotations in one second
= 1 x 50000 bytes x (600 / 60)
= 50000 x 10 bytes/sec
= 5 x 105 bytes/sec
Transfer Time-
Transfer time
= (250 bytes / 5 x 105 bytes) sec
= 50 x 10-5 sec
= 0.5 msec
Average Time Taken To Transfer 250 Bytes-
Average time taken to transfer 250 bytes
= Average seek time + Average rotational delay + Transfer time + Controller
overhead + Queuing delay
= 249.5 msec + 50 msec + 0.5 msec + 0 + 0
= 300 msec
Problem-07:
A hard disk has 63 sectors per track, 10 platters each with 2 recording surfaces and
1000 cylinders.
The address of a sector is given as a triple (c, h, s) where c is the cylinder number,
h is the surface number and s is the sector number. Thus, the 0th sector is addressed
as (0,0,0), the 1st sector as (0,0,1) and so on.
Part-01:
The address <400, 16, 29> corresponds to sector number-
1. 505035
2. 505036
3. 505037
4. 505038
Part-02:
The address of 1039 sector is-
1. <0, 15, 31>
2. <0, 16, 30>
3. <0, 16, 31>
4. <0, 17, 31>
Solution-
Know this Concept?
In general, when counting of items is started from 0, then-
• For any item-n, number ‘n’ specifies the number of items that must be
crossed in order to reach that item.
Example-
If counting is started from 0, then-
• To reach cylinder-5, the number of cylinders that must be crossed = 5
cylinders
• To reach surface-5, the number of surfaces that must be crossed = 5
surfaces
• To reach sector-5, the number of sectors that must be crossed = 5 sectors
To solve this question, we assume there is only one track on each surface.
Part-01:
We have to calculate the sector number for the address <400, 16, 29>
Step-01:
To reach our desired cylinder, we have to cross 400 cylinders.
Total number of sectors that are crossed in 400 cylinders
= Number of cylinders x Number of surfaces per cylinder x Number of tracks per
surface x Number of sectors per track
= 400 x (10 x 2) x 1 x 63
= 504000
Now, after crossing 400 cylinders (cylinder-0 to cylinder-399), we are at cylinder-
400.
Step-02:
To reach our desired surface, we have to cross 16 surfaces.
Total number of sectors that are crossed in 16 surfaces
= Number of surfaces x Number of tracks per surface x Number of sectors per
track
= 16 x 1 x 63
= 1008
Now, after crossing 16 surfaces (surface-0 to surface-15) in cylinder-400, we are at
surface-16.
Step-03:
To reach our desired sector, we have to cross 29 sectors.
Now, after crossing 29 sectors on surface-16 of cylinder-400, we are at sector-29.
Thus
Total number of sectors that are crossed
= 504000 + 1008 + 29
= 505037
Thus,
• After crossing 505037 sectors, we are at sector-505037.
• So, required address of the sector is 505037.
• Option (C) is correct.
Part-02:
We have to find the address of the sector-2039.
Let us check all the options one by one.
Option-A:
For the address <0, 15, 31>, the sector number is-
Sector number = 0 + (15 x 1 x 63) + 31 = 976
Option-B:
For the address <0, 16, 30>, the sector number is-
Sector number = 0 + (16 x 1 x 63) + 30 = 1038
Option-C:
For the address <0, 16, 31>, the sector number is-
Sector number = 0 + (16 x 1 x 63) + 31 = 1039
Option-D:
For the address <0, 17, 31>, the sector number is-
Sector number = 0 + (17 x 1 x 63) + 31 = 1102
Thus, Option (C) is correct.
Addressing Modes | Types of Addressing Modes
The different ways of specifying the location of an operand in an instruction are
called as addressing modes.
Types of Addressing Modes-
In computer architecture, there are following types of addressing modes-
1. Implied Addressing Mode-
In this addressing mode,
• The definition of the instruction itself specify the operands implicitly.
• It is also called as implicit addressing mode.
Examples-
• The instruction “Complement Accumulator” is an implied mode instruction.
• In a stack organized computer, Zero Address Instructions are implied mode
instructions.
(since operands are always implied to be present on the top of the stack)
2. Stack Addressing Mode-
In this addressing mode,
• The operand is contained at the top of the stack.
Example-
ADD
• This instruction simply pops out two symbols contained at the top of the
stack.
• The addition of those two operands is performed.
• The result so obtained after addition is pushed again at the top of the stack.
3. Immediate Addressing Mode-
In this addressing mode,
• The operand is specified in the instruction explicitly.
• Instead of address field, an operand field is present that contains the
operand.
Examples-
• ADD 10 will increment the value stored in the accumulator by 10.
• MOV R #20 initializes register R to a constant value 20.
4. Direct Addressing Mode-
In this addressing mode,
• The address field of the instruction contains the effective address of the
operand.
• Only one reference to memory is required to fetch the operand.
• It is also called as absolute addressing mode.
Example-
• ADD X will increment the value stored in the accumulator by the value
stored at memory location X.
AC ← AC + [X]
5. Indirect Addressing Mode-
In this addressing mode,
• The address field of the instruction specifies the address of memory location
that contains the effective address of the operand.
• Two references to memory are required to fetch the operand.
Example-
• ADD X will increment the value stored in the accumulator by the value
stored at memory location specified by X.
AC ← AC + [[X]]
6. Register Direct Addressing Mode-
In this addressing mode,
• The operand is contained in a register set.
• The address field of the instruction refers to a CPU register that contains the
operand.
• No reference to memory is required to fetch the operand.
Example-
• ADD R will increment the value stored in the accumulator by the content of
register R.
AC ← AC + [R]
NOTE-
It is interesting to note-
• This addressing mode is similar to direct addressing mode.
• The only difference is address field of the instruction refers to a CPU
register instead of main memory.
7. Register Indirect Addressing Mode-
In this addressing mode,
• The address field of the instruction refers to a CPU register that contains the
effective address of the operand.
• Only one reference to memory is required to fetch the operand.
Example-
• ADD R will increment the value stored in the accumulator by the content of
memory location specified in register R.
AC ← AC + [[R]]
NOTE-
It is interesting to note-
• This addressing mode is similar to indirect addressing mode.
• The only difference is address field of the instruction refers to a CPU
register.
8. Relative Addressing Mode-
In this addressing mode,
• Effective address of the operand is obtained by adding the content of
program counter with the address part of the instruction.
Effective Address
= Content of Program Counter + Address part of the instruction
NOTE-
• Program counter (PC) always contains the address of the next instruction
to be executed.
• After fetching the address of the instruction, the value of program counter
immediately increases.
• The value increases irrespective of whether the fetched instruction has
completely executed or not.
9. Indexed Addressing Mode-
In this addressing mode,
• Effective address of the operand is obtained by adding the content of index
register with the address part of the instruction.
Effective Address
= Content of Index Register + Address part of the instruction
10. Base Register Addressing Mode-
In this addressing mode,
• Effective address of the operand is obtained by adding the content of
base register with the address part of the instruction.
Effective Address
= Content of Base Register + Address part of the instruction
11. Auto-Increment Addressing Mode-
• This addressing mode is a special case of Register Indirect Addressing Mode
where-
Effective Address of the Operand
= Content of Register
In this addressing mode,
• After accessing the operand, the content of the register is automatically
incremented by step size ‘d’.
• Step size ‘d’ depends on the size of operand accessed.
• Only one reference to memory is required to fetch the operand.
Example-
Assume operand size = 2 bytes.
Here,
• After fetching the operand 6B, the instruction register RAUTO will be
automatically incremented by 2.
• Then, updated value of RAUTO will be 3300 + 2 = 3302.
• At memory address 3302, the next operand will be found.
NOTE-
In auto-increment addressing mode,
• First, the operand value is fetched.
• Then, the instruction register RAUTO value is incremented by step size ‘d’.
12. Auto-Decrement Addressing Mode-
• This addressing mode is again a special case of Register Indirect Addressing
Mode where-
Effective Address of the Operand
= Content of Register – Step Size
In this addressing mode,
• First, the content of the register is decremented by step size ‘d’.
• Step size ‘d’ depends on the size of operand accessed.
• After decrementing, the operand is read.
• Only one reference to memory is required to fetch the operand.
Example-
Assume operand size = 2 bytes.
Here,
• First, the instruction register RAUTO will be decremented by 2.
• Then, updated value of RAUTO will be 3302 – 2 = 3300.
• At memory address 3300, the operand will be found.
NOTE-
In auto-decrement addressing mode,
• First, the instruction register RAUTO value is decremented by step size ‘d’.
• Then, the operand value is fetched.
Applications of Addressing Modes-
Addressing Modes Applications
• To initialize registers to a constant
Immediate Addressing Mode
value
Direct Addressing Mode
• To access static data
and
• To implement variables
Register Direct Addressing Mode
• To implement pointers because
pointers are memory locations that
Indirect Addressing Mode store the address of another
variable
and
• To pass array as a parameter
Register Indirect Addressing Mode because array name is the base
address and pointer is needed to
point the address
• For program relocation at run time
i.e. for position independent code
• To change the normal sequence of
Relative Addressing Mode execution of instructions
• For branch type instructions since
it directly updates the program
counter
• For array implementation or array
Index Addressing Mode addressing
• For records implementation
• For writing relocatable code i.e.
for relocation of program in
Base Register Addressing Mode memory even at run time
• For handling recursive procedures
• For implementing loops
Auto-increment Addressing Mode
• For stepping through arrays in a
and loop
Auto-decrement Addressing Mode • For implementing a stack as push
and pop
Syntax Of Addressing Modes-
• The computer instructions may use different addressing modes.
• Different addressing modes use different syntax for their representation.
The syntax used for addressing modes are discussed below-
1. Immediate Addressing Mode-
Syntax-
# Expression
Interpretation-
Operand = Expression
Examples-
• Load R1, #1000 is interpreted as R1 ← 1000
• ADD R2, #3 is interpreted as R2 ← [R2] + 3
2. Direct Addressing Mode-
Syntax-
Constant
Interpretation-
Effective address of the operand = Constant
Operand = Content of the memory location ‘Constant’
Operand = [Constant]
Examples-
• Load R1, 1000 is interpreted as R1 ← [1000]
• ADD R2, 3 is interpreted as R2 ← [R2] + [3]
3. Register Direct Addressing Mode-
Syntax-
Rn or [Rn]
Interpretation-
Operand = Content of the register Rn
Operand = [Rn]
Examples-
• Load R1, R2 is interpreted as R1 ← [R2]
• ADD R1, R2 is interpreted as R1 ← [R1] + [R2]
4. Indirect Addressing Mode-
Syntax-
@Expression or @(Expression) or (Expression)
Interpretation-
Effective address of the operand = Content of the memory location ‘expression’
Operand = [[ Expression]]
Examples-
• Load R1, @1000 is interpreted as R1 ← [[1000]]
• ADD R1, @(1000) is interpreted as R1 ← [R1] + [[1000]]
• ADD R1, (1000) is interpreted as R1 ← [R1] + [[1000]]
5. Register Indirect Addressing Mode-
Syntax-
@Rn or @(Rn) or (Rn)
Interpretation-
Effective address of the operand = Content of the register Rn
Operand = [[ Rn ]]
Examples-
• Load R1, @R2 is interpreted as R1 ← [[R1]]
• ADD R1, @(R2) is interpreted as R1 ← [R1] + [[R2]]
• ADD R1, (R2) is interpreted as R1 ← [R1] + [[R2]]
6. Displacement Addressing Mode-
This addressing mode may be-
• Relative addressing mode
• Index addressing mode
• Base register addressing mode
Syntax-
disp (Rn)
Interpretation-
Effective address of the operand = disp + Content of the register Rn
Operand = [disp + [ Rn ]]
Examples-
• Load R1, 100(R2) is interpreted as R1 ← [100 + [R1]]
• ADD R1, 100(R2) is interpreted as R1 ← [R1] + [100 + [R2]]
7. Auto-Increment Addressing Mode-
Syntax-
(Rn)+
Interpretation-
Effective address of the operand = Content of the register Rn
Operand = [[ Rn ]]
After fetching the operand, increment [Rn] by step size ‘d’
Examples-
• Load R1, (R2)+ is interpreted as R1 ← [[R2]] followed by R2 ← [R2] + d
• ADD R1, (R2)+ is interpreted as R1 ← [R1] + [[R2]] followed by R2 ←
[R2] + d
8. Auto-Decrement Addressing Mode-
Syntax-
-(Rn)
Interpretation-
Before fetching the operand, decrement [Rn] by step size ‘d’
Then, Effective address of the operand = Content of the register Rn
Operand = [[ Rn ]]
Examples-
• Load R1, -(R2) is interpreted as R2 ← [R2] – d followed by R1 ← [[R2]]
• ADD R1, -(R2) is interpreted as R2 ← [R2] – d followed by R1 ← [R1] +
[[R2]]
PRACTICE PROBLEMS BASED ON ADDRESSING MODES-
Problem-01:
The most appropriate matching for the following pairs is-
Column-1:
X: Indirect addressing
Y: Immediate addressing
Z: Auto decrement addressing
Column-2:
1. Loops
2. Pointers
3. Constants
1. X-3, Y-2, Z-1
2. X-1, Y-3, Z-2
3. X-2, Y-3, Z-1
4. X-3, Y-1, Z-2
Solution-
Option (C) is correct.
Problem-02:
In the absolute addressing mode,
1. The operand is inside the instruction
2. The address of the operand is inside the instruction
3. The register containing the address of the operand is specified inside the
instruction
4. The location of the operand is implicit
Solution-
Option (B) is correct.
Problem-03:
Which of the following addressing modes are suitable for program relocation at
run time?
1. Absolute addressing
2. Base addressing
3. Relative addressing
4. Indirect addressing
1. 1 and 4
2. 1 and 2
3. 2 and 3
4. 1, 2 and 4
Solution-
Option (C) is correct.
Problem-04:
What is the most appropriate match for the items in the first column with the items
in the second column-
Column-1:
X: Indirect addressing
Y: Indexed addressing
Z: Base register addressing
Column-2:
1. Array implementation
2. Writing relocatable code
3. Passing array as parameter
1. X-3, Y-1, Z-2
2. X-2, Y-3, Z-1
3. X-3, Y-2, Z-1
4. X-1, Y-3, Z-2
Solution-
Option (A) is correct.
Problem-05:
Which of the following addressing modes permits relocation without any change
whatsoever in the code?
1. Indirect addressing
2. Indexed addressing
3. Base register addressing
4. PC relative addressing
Solution-
Option (C) is correct.
Problem-06:
Consider a three word machine instruction-
ADD A[R0], @B
The first operand (destination) “A[R0]” uses indexed addressing mode with R0 as
the index register. The second operand operand (source) “@B” uses indirect
addressing mode. A and B are memory addresses residing at the second and the
third words, respectively. The first word of the instruction specifies the opcode, the
index register designation and the source and destination addressing modes. During
execution of ADD instruction, the two operands are added and stored in the
destination (first operand).
The number of memory cycles needed during the execution cycle of the instruction
is-
1. 3
2. 4
3. 5
4. 6
Solution-
For the first operand,
• It uses indexed addressing mode.
• Thus, one memory cycle will be needed to fetch the operand.
For the second operand,
• It uses indirect addressing mode.
• Thus, two memory cycles will be needed to fetch the operand.
After fetching the two operands,
• The operands will be added and result is stored back in the memory.
• Thus, one memory cycle will be needed to store the result.
Total number of memory cycles needed
= 1+2+1
=4
Thus, Option (B) is correct.
To watch video solution, click here.
Problem-07:
Consider a hypothetical processor with an instruction of type LW R1, 20(R2),
which during execution reads a 32-bit word from memory and stores it in a 32-bit
register R1. The effective address of the memory location is obtained by the
addition of a constant 20 and the contents of register R2. Which of the following
best reflects the addressing mode implemented by this instruction for operand in
memory?
1. Immediate Addressing
2. Register Addressing
3. Register Indirect Scaled Addressing
4. Base Indexed Addressing
Solution-
Clearly, the instruction uses base indexed addressing mode.
Thus, Option (D) is correct.
Problem-08:
The memory locations 1000, 1001 and 1020 have data values 18, 1 and 16
respectively before the following program is executed.
MOVI Rs, 1 Move immediate
LOAD Rd, 1000(Rs) Load from memory
ADDI Rd, 1000 Add immediate
STOREI 0(Rd), 20 Store immediate
Which of the statements below is TRUE after the program is executed?
1. Memory location 1000 has value 20
2. Memory location 1020 has value 20
3. Memory location 1021 has value 20
4. Memory location 1001 has value 20
Solution-
Before the execution of program, the memory is-
Now, let us execute the program instructions one by one-
Instruction-01: MOVI Rs, 1
• This instruction uses immediate addressing mode.
• The instruction is interpreted as Rs ← 1.
• Thus, value = 1 is moved to the register Rs.
Instruction-02: LOAD Rd, 1000(Rs)
• This instruction uses displacement addressing mode.
• The instruction is interpreted as Rd ← [1000 + [Rs]].
• Value of the operand = [1000 + [Rs]] = [1000 + 1] = [1001] = 1.
• Thus, value = 1 is moved to the register Rd.
Instruction-03: ADDI Rd, 1000
• This instruction uses immediate addressing mode.
• The instruction is interpreted as Rd ← [Rd] + 1000.
• Value of the operand = [Rd] + 1000 = 1 + 1000 = 1001.
• Thus, value = 1001 is moved to the register Rd.
Instruction-04: STOREI 0(Rd), 20
• This instruction uses displacement addressing mode.
• The instruction is interpreted as 0 + [Rd] ← 20.
• Value of the destination address = 0 + [Rd] = 0 + 1001 = 1001.
• Thus, value = 20 is moved to the memory location 1001.
Thus,
• After the program execution is completed, memory location 1001 has value
20.
• Option (D) is correct.
To watch video solution, click here.
Problem-09:
Consider the following memory values and a one-address machine with an
accumulator, what values do the following instructions load into accumulator?
• Word 20 contains 40
• Word 30 contains 50
• Word 40 contains 60
• Word 50 contains 70
Instructions are-
1. Load immediate 20
2. Load direct 20
3. Load indirect 20
4. Load immediate 30
5. Load direct 30
6. Load indirect 30
Solution-
Instruction-01: Load immediate 20
• This instruction uses immediate addressing mode.
• The instruction is interpreted as Accumulator ← 20.
• Thus, value 20 is loaded into the accumulator.
Instruction-02: Load direct 20
• This instruction uses direct addressing mode.
• The instruction is interpreted as Accumulator ← [20].
• It is given that word 20 contains 40.
• Thus, value 40 is loaded into the accumulator
Instruction-03: Load indirect 20
• This instruction uses indirect addressing mode.
• The instruction is interpreted as Accumulator ← [[20]].
• It is given that word 20 contains 40 and word 40 contains 60.
• Thus, value 60 is loaded into the accumulator.
Instruction-04: Load immediate 30
• This instruction uses immediate addressing mode.
• The instruction is interpreted as Accumulator ← 30.
• Thus, value 30 is loaded into the accumulator.
Instruction-02: Load direct 30
• This instruction uses direct addressing mode.
• The instruction is interpreted as Accumulator ← [30].
• It is given that word 30 contains 50.
• Thus, value 50 is loaded into the accumulator
Instruction-03: Load indirect 30
• This instruction uses indirect addressing mode.
• The instruction is interpreted as Accumulator ← [[30]].
• It is given that word 30 contains 50 and word 50 contains 70.
• Thus, value 70 is loaded into the accumulator.
Pipelining in Computer Architecture
• A program consists of several number of instructions.
• These instructions may be executed in the following two ways-
1. Non-Pipelined Execution-
In non-pipelined architecture,
• All the instructions of a program are executed sequentially one after the
other.
• A new instruction executes only after the previous instruction has executed
completely.
• This style of executing the instructions is highly inefficient.
Example-
Consider a program consisting of three instructions.
In a non-pipelined architecture, these instructions execute one after the other as-
If time taken for executing one instruction = t, then-
Time taken for executing ‘n’ instructions = n x t
2. Pipelined Execution-
In pipelined architecture,
• Multiple instructions are executed parallely.
• This style of executing the instructions is highly efficient.
Now, let us discuss instruction pipelining in detail.
Instruction Pipelining-
Instruction pipelining is a technique that implements a form of parallelism called as
instruction level parallelism within a single processor.
• A pipelined processor does not wait until the previous instruction has
executed completely.
• Rather, it fetches the next instruction and begins its execution.
Pipelined Architecture-
In pipelined architecture,
• The hardware of the CPU is split up into several functional units.
• Each functional unit performs a dedicated task.
• The number of functional units may vary from processor to processor.
• These functional units are called as stages of the pipeline.
• Control unit manages all the stages using control signals.
• There is a register associated with each stage that holds the data.
• There is a global clock that synchronizes the working of all the stages.
• At the beginning of each clock cycle, each stage takes the input from its
register.
• Each stage then processes the data and feed its output to the register of the
next stage.
Four-Stage Pipeline-
In four stage pipelined architecture, the execution of each instruction is completed
in following 4 stages-
1. Instruction fetch (IF)
2. Instruction decode (ID)
3. Instruction Execute (IE)
4. Write back (WB)
To implement four stage pipeline,
• The hardware of the CPU is divided into four functional units.
• Each functional unit performs a dedicated task.
Stage-01:
At stage-01,
• First functional unit performs instruction fetch.
• It fetches the instruction to be executed.
Stage-02:
At stage-02,
• Second functional unit performs instruction decode.
• It decodes the instruction to be executed.
Stage-03:
At stage-03,
• Third functional unit performs instruction execution.
• It executes the instruction.
Stage-04:
At stage-04,
• Fourth functional unit performs write back.
• It writes back the result so obtained after executing the instruction.
Execution-
In pipelined architecture,
• Instructions of the program execute parallely.
• When one instruction goes from nth stage to (n+1)th stage, another instruction
goes from (n-1)th stage to nth stage.
Phase-Time Diagram-
• Phase-time diagram shows the execution of instructions in the pipelined
architecture.
• The following diagram shows the execution of three instructions in four
stage pipeline architecture.
Time taken to execute three instructions in four stage pipelined architecture = 6
clock cycles.
NOTE-
In non-pipelined architecture,
Time taken to execute three instructions would be
= 3 x Time taken to execute one instruction
= 3 x 4 clock cycles
= 12 clock cycles
Clearly, pipelined execution of instructions is far more efficient than non-pipelined
execution.
Instruction Pipelining | Performance
In instruction pipelining,
• A form of parallelism called as instruction level parallelism is implemented.
• Multiple instructions execute simultaneously.
• The efficiency of pipelined execution is more than that of non-pipelined
execution.
Performance of Pipelined Execution-
The following parameters serve as criterion to estimate the performance of
pipelined execution-
• Speed Up
• Efficiency
• Throughput
1. Speed Up-
It gives an idea of “how much faster” the pipelined execution is as compared to
non-pipelined execution.
It is calculated as-
2. Efficiency-
The efficiency of pipelined execution is calculated as-
3. Throughput-
Throughput is defined as number of instructions executed per unit time.
It is calculated as-
Calculation of Important Parameters-
Let us learn how to calculate certain important parameters of pipelined
architecture.
Consider-
• A pipelined architecture consisting of k-stage pipeline
• Total number of instructions to be executed = n
Point-01: Calculating Cycle Time-
In pipelined architecture,
• There is a global clock that synchronizes the working of all the stages.
• Frequency of the clock is set such that all the stages are synchronized.
• At the beginning of each clock cycle, each stage reads the data from its
register and process it.
• Cycle time is the value of one clock cycle.
There are two cases possible-
Case-01: All the stages offer same delay-
If all the stages offer same delay, then-
Cycle time = Delay offered by one stage including the delay due to its register
Case-02: All the stages do not offer same delay-
If all the stages do not offer same delay, then-
Cycle time = Maximum delay offered by any stage including the delay due to its
register
Point-02: Calculating Frequency Of Clock-
Frequency of the clock (f) = 1 / Cycle time
Point-03: Calculating Non-Pipelined Execution Time-
In non-pipelined architecture,
• The instructions execute one after the other.
• The execution of a new instruction begins only after the previous instruction
has executed completely.
• So, number of clock cycles taken by each instruction = k clock cycles
Thus,
Non-pipelined execution time
= Total number of instructions x Time taken to execute one instruction
= n x k clock cycles
Point-04: Calculating Pipelined Execution Time-
In pipelined architecture,
• Multiple instructions execute parallely.
• Number of clock cycles taken by the first instruction = k clock cycles
• After first instruction has completely executed, one instruction comes out
per clock cycle.
• So, number of clock cycles taken by each remaining instruction = 1 clock
cycle
Thus,
Pipelined execution time
= Time taken to execute first instruction + Time taken to execute remaining
instructions
= 1 x k clock cycles + (n-1) x 1 clock cycle
= (k + n – 1) clock cycles
Point-04: Calculating Speed Up-
Speed up
= Non-pipelined execution time / Pipelined execution time
= n x k clock cycles / (k + n – 1) clock cycles
= n x k / (k + n – 1)
= n x k / n + (k – 1)
= k / { 1 + (k – 1)/n }
• For very large number of instructions, n→∞. Thus, speed up = k.
• Practically, total number of instructions never tend to infinity.
• Therefore, speed up is always less than number of stages in pipeline.
Important Notes-
Note-01:
• The aim of pipelined architecture is to execute one complete instruction in
one clock cycle.
• In other words, the aim of pipelining is to maintain CPI ≅ 1.
• Practically, it is not possible to achieve CPI ≅ 1 due to delays that get
introduced due to registers.
• Ideally, a pipelined architecture executes one complete instruction per clock
cycle (CPI=1).
Note-02:
• The maximum speed up that can be achieved is always equal to the number
of stages.
• This is achieved when efficiency becomes 100%.
• Practically, efficiency is always less than 100%.
• Therefore speed up is always less than number of stages in pipelined
architecture.
Note-03:
Under ideal conditions,
• One complete instruction is executed per clock cycle i.e. CPI = 1.
• Speed up = Number of stages in pipelined architecture
Note-04:
• Experiments show that 5 stage pipelined processor gives the best
performance.
Note-05:
In case only one instruction has to be executed, then-
• Non-pipelined execution gives better performance than pipelined execution.
• This is because delays are introduced due to registers in pipelined
architecture.
• Thus, time taken to execute one instruction in non-pipelined architecture is
less.
Note-06:
High efficiency of pipelined processor is achieved when-
• All the stages are of equal duration.
• There are no conditional branch instructions.
• There are no interrupts.
• There are no register and memory conflicts.
Performance degrades in absence of these conditions.
Pipelining | Practice Problems
Problem-01:
Consider a pipeline having 4 phases with duration 60, 50, 90 and 80 ns. Given
latch delay is 10 ns. Calculate-
1. Pipeline cycle time
2. Non-pipeline execution time
3. Speed up ratio
4. Pipeline time for 1000 tasks
5. Sequential time for 1000 tasks
6. Throughput
Solution-
Given-
• Four stage pipeline is used
• Delay of stages = 60, 50, 90 and 80 ns
• Latch delay or delay due to each register = 10 ns
Part-01: Pipeline Cycle Time-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 60, 50, 90, 80 } + 10 ns
= 90 ns + 10 ns
= 100 ns
Part-02: Non-Pipeline Execution Time-
Non-pipeline execution time for one instruction
Advertisements
= 60 ns + 50 ns + 90 ns + 80 ns
= 280 ns
Part-03: Speed Up Ratio-
Speed up
= Non-pipeline execution time / Pipeline execution time
= 280 ns / Cycle time
= 280 ns / 100 ns
= 2.8
Part-04: Pipeline Time For 1000 Tasks-
Pipeline time for 1000 tasks
= Time taken for 1st task + Time taken for remaining 999 tasks
= 1 x 4 clock cycles + 999 x 1 clock cycle
= 4 x cycle time + 999 x cycle time
= 4 x 100 ns + 999 x 100 ns
= 400 ns + 99900 ns
= 100300 ns
Part-05: Sequential Time For 1000 Tasks-
Non-pipeline time for 1000 tasks
= 1000 x Time taken for one task
= 1000 x 280 ns
= 280000 ns
Part-06: Throughput-
Throughput for pipelined execution
= Number of instructions executed per unit time
= 1000 tasks / 100300 ns
Problem-02:
A four stage pipeline has the stage delays as 150, 120, 160 and 140 ns respectively.
Registers are used between the stages and have a delay of 5 ns each. Assuming
constant clocking rate, the total time taken to process 1000 data items on the
pipeline will be-
1. 120.4 microseconds
2. 160.5 microseconds
3. 165.5 microseconds
4. 590.0 microseconds
Solution-
Given-
• Four stage pipeline is used
• Delay of stages = 150, 120, 160 and 140 ns
• Delay due to each register = 5 ns
• 1000 data items or instructions are processed
Cycle Time-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 150, 120, 160, 140 } + 5 ns
= 160 ns + 5 ns
= 165 ns
Pipeline Time To Process 1000 Data Items-
Pipeline time to process 1000 data items
= Time taken for 1st data item + Time taken for remaining 999 data items
= 1 x 4 clock cycles + 999 x 1 clock cycle
= 4 x cycle time + 999 x cycle time
= 4 x 165 ns + 999 x 165 ns
= 660 ns + 164835 ns
= 165495 ns
= 165.5 μs
Thus, Option (C) is correct.
Problem-03:
Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and average
cycles per instruction of 4. The same processor is upgraded to a pipelined
processor with five stages but due to the internal pipeline delay, the clock speed is
reduced to 2 gigahertz. Assume there are no stalls in the pipeline. The speed up
achieved in this pipelined processor is-
1. 3.2
2. 3.0
3. 2.2
4. 2.0
Solution-
Cycle Time in Non-Pipelined Processor-
Frequency of the clock = 2.5 gigahertz
Cycle time
= 1 / frequency
= 1 / (2.5 gigahertz)
= 1 / (2.5 x 109 hertz)
= 0.4 ns
Non-Pipeline Execution Time-
Non-pipeline execution time to process 1 instruction
= Number of clock cycles taken to execute one instruction
= 4 clock cycles
= 4 x 0.4 ns
= 1.6 ns
Cycle Time in Pipelined Processor-
Frequency of the clock = 2 gigahertz
Cycle time
= 1 / frequency
= 1 / (2 gigahertz)
= 1 / (2 x 109 hertz)
= 0.5 ns
Pipeline Execution Time-
Since there are no stalls in the pipeline, so ideally one instruction is executed per
clock cycle. So,
Pipeline execution time
Advertisements
= 1 clock cycle
= 0.5 ns
Speed Up-
Speed up
= Non-pipeline execution time / Pipeline execution time
= 1.6 ns / 0.5 ns
= 3.2
Thus, Option (A) is correct.
Problem-04:
The stage delays in a 4 stage pipeline are 800, 500, 400 and 300 picoseconds. The
first stage is replaced with a functionally equivalent design involving two stages
with respective delays 600 and 350 picoseconds.
The throughput increase of the pipeline is _____%.
Solution-
Execution Time in 4 Stage Pipeline-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 800, 500, 400, 300 } + 0
= 800 picoseconds
Thus, Execution time in 4 stage pipeline = 1 clock cycle = 800 picoseconds.
Throughput in 4 Stage Pipeline-
Throughput
= Number of instructions executed per unit time
= 1 instruction / 800 picoseconds
Execution Time in 2 Stage Pipeline-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 600, 350 } + 0
= 600 picoseconds
Thus, Execution time in 2 stage pipeline = 1 clock cycle = 600 picoseconds.
Throughput in 2 Stage Pipeline-
Throughput
= Number of instructions executed per unit time
= 1 instruction / 600 picoseconds
Throughput Increase-
Throughput increase
= { (Final throughput – Initial throughput) / Initial throughput } x 100
= { (1 / 600 – 1 / 800) / (1 / 800) } x 100
= { (800 / 600) – 1 } x 100
= (1.33 – 1) x 100
= 0.3333 x 100
= 33.33 %
Problem-05:
A non-pipelined single cycle processor operating at 100 MHz is converted into a
synchronous pipelined processor with five stages requiring 2.5 ns, 1.5 ns, 2 ns, 1.5
ns and 2.5 ns respectively. The delay of the latches is 0.5 sec.
The speed up of the pipeline processor for a large number of instructions is-
1. 4.5
2. 4.0
3. 3.33
4. 3.0
Solution-
Cycle Time in Non-Pipelined Processor-
Frequency of the clock = 100 MHz
Cycle time
= 1 / frequency
= 1 / (100 MHz)
= 1 / (100 x 106 hertz)
= 0.01 μs
Non-Pipeline Execution Time-
Non-pipeline execution time to process 1 instruction
= Number of clock cycles taken to execute one instruction
= 1 clock cycle
= 0.01 μs
= 10 ns
Cycle Time in Pipelined Processor-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 2.5, 1.5, 2, 1.5, 2.5 } + 0.5 ns
= 2.5 ns + 0.5 ns
= 3 ns
Pipeline Execution Time-
Pipeline execution time
= 1 clock cycle
= 3 ns
Speed Up-
Speed up
= Non-pipeline execution time / Pipeline execution time
= 10 ns / 3 ns
= 3.33
Thus, Option (C) is correct.
Problem-06:
We have 2 designs D1 and D2 for a synchronous pipeline processor. D1 has 5 stage
pipeline with execution time of 3 ns, 2 ns, 4 ns, 2 ns and 3 ns. While the design D2
has 8 pipeline stages each with 2 ns execution time. How much time can be saved
using design D2 over design D1 for executing 100 instructions?
1. 214 ns
2. 202 ns
3. 86 ns
4. 200 ns
Solution-
Cycle Time in Design D1-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 3, 2, 4, 2, 3 } + 0
= 4 ns
Execution Time For 100 Instructions in Design D1-
Execution time for 100 instructions
= Time taken for 1st instruction + Time taken for remaining 99 instructions
= 1 x 5 clock cycles + 99 x 1 clock cycle
= 5 x cycle time + 99 x cycle time
= 5 x 4 ns + 99 x 4 ns
= 20 ns + 396 ns
= 416 ns
Cycle Time in Design D2-
Cycle time
= Delay due to a stage + Delay due to its register
= 2 ns + 0
= 2 ns
Execution Time For 100 Instructions in Design D2-
Execution time for 100 instructions
= Time taken for 1st instruction + Time taken for remaining 99 instructions
= 1 x 8 clock cycles + 99 x 1 clock cycle
= 8 x cycle time + 99 x cycle time
= 8 x 2 ns + 99 x 2 ns
= 16 ns + 198 ns
= 214 ns
Time Saved-
Time saved
= Execution time in design D1 – Execution time in design D2
= 416 ns – 214 ns
= 202 ns
Thus, Option (B) is correct.
Problem-07:
Consider an instruction pipeline with four stages (S1, S2, S3 and S4) each with
combinational circuit only. The pipeline registers are required between each stage
and at the end of the last stage. Delays for the stages and for the pipeline registers
are as given in the figure-
What is the approximate speed up of the pipeline in steady state under ideal
conditions when compared to the corresponding non-pipeline implementation?
1. 4.0
2. 2.5
3. 1.1
4. 3.0
Solution-
Non-Pipeline Execution Time-
Non-pipeline execution time for 1 instruction
= 5 ns + 6 ns + 11 ns + 8 ns
= 30 ns
Cycle Time in Pipelined Processor-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 5, 6, 11, 8 } + 1 ns
= 11 ns + 1 ns
= 12 ns
Pipeline Execution Time-
Pipeline execution time
= 1 clock cycle
= 12 ns
Speed Up-
Speed up
= Non-pipeline execution time / Pipeline execution time
= 30 ns / 12 ns
= 2.5
Thus, Option (B) is correct.
Problem-08:
Consider a 4 stage pipeline processor. The number of cycles needed by the four
instructions I1, I2, I3 and I4 in stages S1, S2, S3 and S4 is shown below-
S1 S2 S3 S4
I1 2 1 1 1
I2 1 3 2 2
I3 2 1 1 3
I4 1 2 2 2
What is the number of cycles needed to execute the following loop?
for(i=1 to 2) { I1; I2; I3; I4; }
1. 16
2. 23
3. 28
4. 30
Solution-
The phase-time diagram is-
From here, number of clock cycles required to execute the loop = 23 clock cycles.
Thus, Option (B) is correct.
Problem-09:
Consider a pipelined processor with the following four stages-
IF : Instruction Fetch
ID : Instruction Decode and Operand Fetch
EX : Execute
WB : Write Back
The IF, ID and WB stages take one clock cycle each to complete the operation. The
number of clock cycles for the EX stage depends on the instruction. The ADD and
SUB instructions need 1 clock cycle and the MUL instruction need 3 clock cycles
in the EX stage. Operand forwarding is used in the pipelined processor. What is the
number of clock cycles taken to complete the following sequence of instructions?
ADD R2, R1, R0 R2 ← R0 + R1
MUL R4, R3, R2 R4 ← R3 + R2
SUB R6, R5, R4 R6 ← R5 + R4
1. 7
2. 8
3. 10
4. 14
Solution-
The phase-time diagram is-
From here, number of clock cycles required to execute the instructions = 8 clock
cycles.
Thus, Option (B) is correct.
Problem-10:
Consider the following procedures. Assume that the pipeline registers have zero
latency.
P1 : 4 stage pipeline with stage latencies 1 ns, 2 ns, 2 ns, 1 ns
P2 : 4 stage pipeline with stage latencies 1 ns, 1.5 ns, 1.5 ns, 1.5 ns
P3 : 5 stage pipeline with stage latencies 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns
P4 : 5 stage pipeline with stage latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns
Which procedure has the highest peak clock frequency?
1. P1
2. P2
3. P3
4. P4
Solution-
It is given that pipeline registers have zero latency. Thus,
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Maximum delay due to any stage
For Processor P1:
Cycle time
= Max { 1 ns, 2 ns, 2 ns, 1 ns }
= 2 ns
Clock frequency
= 1 / Cycle time
= 1 / 2 ns
= 0.5 gigahertz
For Processor P2:
Cycle time
= Max { 1 ns, 1.5 ns, 1.5 ns, 1.5 ns }
= 1.5 ns
Clock frequency
= 1 / Cycle time
= 1 / 1.5 ns
= 0.67 gigahertz
For Processor P3:
Cycle time
= Max { 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns }
= 1 ns
Clock frequency
= 1 / Cycle time
= 1 / 1 ns
= 1 gigahertz
For Processor P4:
Cycle time
= Max { 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns }
= 1.1 ns
Clock frequency
= 1 / Cycle time
= 1 / 1.1 ns
= 0.91 gigahertz
Clearly, Process P3 has the highest peak clock frequency.
Thus, Option (C) is correct.
Problem-11:
Consider a 3 GHz (gigahertz) processor with a three-stage pipeline and stage
latencies T1, T2 and T3 such that T1 = 3T2/4 = 2T3. If the longest pipeline stage is
split into two pipeline stages of equal latency, the new frequency is ____ GHz,
ignoring delays in the pipeline registers.
Solution-
Let ‘t’ be the common multiple of each ratio, then-
• T1 = t
• T2 = 4t / 3
• T3 = t / 2
Pipeline Cycle Time-
Pipeline cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { t, 4t/3, t/2 } + 0
= 4t/3
Frequency Of Pipeline-
Frequency
= 1 / Pipeline cycle time
= 1 / (4t / 3)
= 3 / 4t
Given frequency = 3 GHz. So,
3 / 4t = 3 GHz
4t = 1 ns
t = 0.25 ns
Stage Latencies Of Pipeline-
Stage latency of different stages are-
• Latency of stage-01 = 0.25 ns
• Latency of stage-02 = 0.33 ns
• Latency of stage-03 = 0.125 ns
Splitting The Pipeline-
The stage with longest latency i.e. stage-02 is split up into 4 stages.
After splitting, the latency of different stages are-
• Latency of stage-01 = 0.25 ns
• Latency of stage-02 = 0.165 ns
• Latency of stage-03 = 0.165 ns
• Latency of stage-04 = 0.125 ns
Splitted Pipeline Cycle Time-
Splitted pipeline cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 0.25, 0.165, 0.165, 0.125 } + 0
= 0.25 ns
Frequency Of Splitted Pipeline-
Frequency
= 1 / splitted pipeline cycle time
= 1 / 0.25 ns
= 4 GHz
Thus, new frequency = 4 GHz.