Computer Architecture and Design-3117
Md. Arifuzzaman (Soukhin)
Lecturer, Dept. of CSE, Leading University
E-Mail:
[email protected] Cell number : 01998740789
Module-6
Pipeline & Parallel Processing
[email protected] 1
Parallel Processing
• Parallel processing can be described as a class of techniques which
enables the system to achieve simultaneous data-processing tasks to
increase the computational speed of a computer system.
• A parallel processing system can carry out simultaneous data-
processing to achieve faster execution time.
• For instance, while an instruction is being processed in the ALU
component of the CPU, the next instruction can be read from
memory.
Parallel Processing
• The primary purpose of parallel processing is to enhance the
computer processing capability and increase its throughput, i.e. the
amount of processing that can be accomplished during a given
interval of time.
• A parallel processing system can be achieved by having a multiplicity
of functional units that perform identical or different operations
simultaneously. The data can be distributed among various multiple
functional units.
Parallel Processing
• The following diagram shows one possible way of separating the
execution unit into eight functional units operating in parallel.
• The operation performed in each functional unit is indicated in each
block if the diagram:
Parallel Processing
The adder and integer multiplier performs the arithmetic operation
with integer numbers.
The floating-point operations are separated into three circuits
operating in parallel.
The logic, shift, and increment operations can be performed
concurrently on different data. All units are independent of each
other, so one number can be shifted while another number is being
incremented.
Pipelining
• The term Pipelining refers to a technique of decomposing a sequential
process into sub-operations.
• With each sub-operation being executed in a dedicated segment that
operates concurrently with all other segments.
• The most important characteristic of a pipeline technique is that
several computations can be in progress in distinct segments at the
same time.
Pipelining
• The overlapping of computation is made possible by associating a
register with each segment in the pipeline.
• The registers provide isolation between each segment so that each
can operate on distinct data simultaneously.
• The structure of a pipeline organization can be represented simply by
including an input register for each segment followed by a
combinational circuit.
• Let us consider an example of combined multiplication and addition
operation to get a better understanding of the pipeline organization.
Pipelining
• The combined multiplication and addition operation is done with a
stream of numbers such as:
• The operation to be performed on the numbers is decomposed into
sub-operations with each sub-operation to be implemented in a
segment within a pipeline.
• The sub-operations performed in each segment of the pipeline are
defined as:
Pipelining
Pipelining
• Registers R1, R2, R3, and R4 hold the data and the combinational
circuits operate in a particular segment.
• The output generated by the combinational circuit in a given segment
is applied as an input register of the next segment. For instance, from
the block diagram, we can see that the register R3 is used as one of
the input registers for the combinational adder circuit.
Arithmetic Pipeline
• Arithmetic Pipelines are mostly used in high-speed
computers. They are used to implement floating-point
operations, multiplication of fixed-point numbers, and
similar computations encountered in scientific
problems.
• To understand the concepts of arithmetic pipeline in a
more convenient way, let us consider an example of a
pipeline unit for floating-point addition and subtraction.
• The inputs to the floating-point adder pipeline are two
normalized
X = A * 2a =floating-point
0.9504 * 103 binary numbers defined as:
Y = B * 2b = 0.8200 * 102
Arithmetic Pipeline
• Where A and B are two fractions that represent the mantissa
and a and b are the exponents.
• The combined operation of floating-point addition and
subtraction is divided into four segments. Each segment
contains the corresponding suboperation to be performed in
the given pipeline. The suboperations that are shown in the
four segments are:
1.Compare the exponents by subtraction.
2.Align the mantissas.
3.Add or subtract the mantissas.
4.Normalize the result.
Arithmetic Pipeline
• We will discuss each suboperation
in a more detailed manner later
in this section.
• The following block diagram represents
the suboperations performed
in each segment of the pipeline.
Arithmetic Pipeline
1. Compare exponents by subtraction:
• The exponents are compared by subtracting them to
determine their difference. The larger exponent is
chosen as the exponent of the result.
• The difference of the exponents,
i.e., 3 - 2 = 1 determines how many times the mantissa
associated with the smaller exponent must be shifted to
the right.
2. Align the mantissas:
• The mantissa associated with the smaller exponent is
shifted according to the difference of exponents
X = 0.9504
determined in segment one.
* 10 3
Y = 0.08200 * 10 3
Arithmetic Pipeline
3. Add mantissas:
• The two mantissas are added in segment three.
Z = X + Y = 1.0324 * 103
4. Normalize the result:
• After normalization, the result is written as:
Z = 0.1324 * 104
Instruction Pipeline
• Pipeline processing can occur not only in the data
stream but in the instruction stream as well.
• Most of the digital computers with complex instructions
require instruction pipeline to carry out operations like
fetch, decode and execute instructions.
• In general, the computer needs to process each
instruction with the following sequence of steps.
Instruction Pipeline
1.Fetch instruction from memory. (FI)
2.Decode the instruction. (DI)
3.Calculate the effective address. (CA)
4.Fetch the operands from memory. (FO)
5.Execute the instruction. (EI)
6.Store the result in the proper place. (SR)
Instruction Pipeline
• Each step is executed in a particular segment, and there
are times when different segments may take different
times to operate on the incoming information.
Moreover, there are times when two or more segments
may require memory access at the same time, causing
one segment to wait until another is finished with the
memory.
• The organization of an instruction pipeline will be more
efficient if the instruction cycle is divided into segments
of equal duration. One of the most common examples of
this type of organization is a Four-segment
instruction pipeline.
Instruction Pipeline
• A four-segment instruction pipeline combines two or
more different segments and makes it as a single one.
For instance, the decoding of the instruction can be
combined with the calculation of the effective address
into one segment.
• The following block diagram shows a typical example of
a four-segment instruction pipeline. The instruction
cycle is completed in four segments.
Instruction Pipeline
Instruction Pipeline
• Segment 1:
• The instruction fetch segment can be implemented using first in, first
out (FIFO) buffer.
• Segment 2:
• The instruction fetched from memory is decoded in the second
segment, and eventually, the effective address is calculated in a
separate arithmetic circuit.
• Segment 3:
• An operand from memory is fetched in the third segment.
• Segment 4:
• The instructions are finally executed in the last segment of the pipeline
organization.
Instruction Pipeline Example