0% found this document useful (0 votes)
416 views27 pages

Arithmetic Pipeline in Computer Architecture

The document discusses arithmetic pipelines, which can be used to speed up fixed-point and floating-point arithmetic operations. Fixed-point arithmetic pipelines work by breaking down multiplication into a series of addition and shift operations that can be pipelined. Floating-point addition and subtraction can also be pipelined into four stages: mantissa alignment, exponent difference, mantissa addition, and rounding. Vector and array processors are also discussed as ways to parallelize arithmetic tasks like matrix multiplication using pipelined multiply-add units.

Uploaded by

s1910576101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
416 views27 pages

Arithmetic Pipeline in Computer Architecture

The document discusses arithmetic pipelines, which can be used to speed up fixed-point and floating-point arithmetic operations. Fixed-point arithmetic pipelines work by breaking down multiplication into a series of addition and shift operations that can be pipelined. Floating-point addition and subtraction can also be pipelined into four stages: mantissa alignment, exponent difference, mantissa addition, and rounding. Vector and array processors are also discussed as ways to parallelize arithmetic tasks like matrix multiplication using pipelined multiply-add units.

Uploaded by

s1910576101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Arithmetic Pipeline

• Main topics in Pipeline processing is

• Arithmetic pipeline :
• fixed Arithmetic pipeline
• floating point
• Vector processing : adder/multiplier pipeline
• Array processing : array processor
• Attached array processor
• SIMD Array Processor
Parallel Processing Adder-subtractor

Integer multiply
• Simultaneous data processing tasks
for the purpose of increasing the Logic unit
computational speed
• Perform concurrent data Shift unit

processing to achieve faster To Memory

execution time Incrementer

Processor

• Multiple Functional Unit : registers


Floatint-point
add-
subtract
• Separate the execution unit Floatint-point
into eight functional units multiply

operating in parallel. Floatint-point


divide
Pipelining: Laundry
Example
A B C D
 Small laundry has one washer, one
dryer and one operator, it takes 90
minutes to finish one load:
 Washer takes 30 minutes
 Dryer takes 40 minutes
 “operator folding” takes 20 minutes
Sequential
Laundry
6 PM 7 8 9 11 Midnight
10 This operator scheduled his
loads to be delivered to
Time • the laundry every 90
minutes which is the time
required to finish one load.
30 40 20 30 40 20 30 40 20 30 40 20
T
a A • In other words he will not
start a new task unless
s he is already done with
B the previous task
k
O
r
C • The process is sequential.
Sequential laundry takes 6
d 90 min hours for 4 loads
D
e

r
Efficiently scheduled
laundry: Pipelined Laundry
Operator
6 PM 7 8 9 10
11 • Another operator
Time20 asks for the delivery
30 40 40 40 40 of loads to the
40 40 40
T laundry every 40
a A minutes!?.
s
B • Pipelined laundry
k
O takes 3.5 hours for 4
r loads
C
d

e
D

r
• Multiple tasks operating
Pipelining simultaneously
Facts6 PM
7 8 9 • Pipelining doesn’t help
Time latency (response time) of
single task, it helps throughput
T of entire workload
a 30 40 40 40 40
20
s A • Pipeline rate limited by slowest
k
O
pipeline stage
r B
• Potential speedup = Number of
d
C The washer
waits for the
pipe stages
e dryer for 10
minutes
D • Unbalanced lengths of pipe
r
stages reduces speedup
Pipelining
Decomposing a sequential process into suboperations
Each subprocess is executed in a special dedicated segment concurrently

• Instruction execution is divided into k segments or stages


• Instruction exits pipe stage k-1 and proceeds into pipe stage k
• All pipe stages take the same amount of time; called one processor cycle
• Length of the processor cycle is determined by the slowest pipe stage

k segments
Pipelinin
g
• Suppose we want to perform the combined multiply and add
operations with a stream of numbers:
• Ai * Bi + Ci for i =1,2,3,…,7

• The sub operations performed in each segment of the pipeline are as


follows:

• R1  Ai
R2  Bi
,
• R3  R1 * R2 R4  Ci
• R5  R3 + R4
Arithmetic
•Pipeline
Pipeline arithmetic units are usually found in very high speed computers.
• Arithmetic pipelines are constructed for :
simple fixed-point
floating-point arithmetic operations.

• For implementing the arithmetic pipelines we generally use following two types
of adder:

• i) Carry propagation adder (CPA): It adds two numbers such that carries
generated in successive digits are propagated.

• ii)Carry save adder (CSA): It adds two numbers such that carries
generated are
not propagated rather these are saved in a carry vector.
Fixed Arithmetic
pipeline
• We take the example of multiplication of fixed numbers.
• Two fixed-point numbers are added by the ALU using add and shift
operations.
• This sequential execution makes the multiplication a slow process.
• Observe that this is the process of adding the multiple copies of
shifted multiplicands as show below:
Fixed Arithmetic
pipeline
Now, we can identify the following stages for
the pipeline:

•The first stage generates the partial product of the numbers, which form the six
rows of shifted multiplicands.
•In the second stage, the six numbers are given to the two CSAs merging into four
numbers.
• In the third stage, there is a single CSA merging the numbers into 3numbers.
• In the fourth stage, there is a single number merging three numbers into
2numbers.
•In the fifth stage, the last two numbers are added through a CPA to get the final
product.
Floating point
operations.
• The inputs to floating point adder pipeline are two normalized
floating point numbers.

Mantissa Exponent

• A and B are mantissas and a and b are the exponents.


• The floating point addition and subtraction can be performed in four
segments.
Mantissa Exponent

Floating-Point
Add/Subtracti
on Pipeline:
Vector
Processing
• Science and Engineering Applications
• Long-range weather forecasting,
• Petroleum explorations,
• Seismic data analysis
• Medical diagnosis ,
• Aerodynamics and space flight simulators,
• Artificial intelligence and expert systems,
• Mapping the human genome, Image processing
Vector
Processing
Vector Instruction Format :
Operation Base address Base address Base address Vector
code source 1 source 2 destination
length
ADD A B C 100
Matrix Multiplication
3 x 3 matrices multiplication : n2 = 9 inner product

a11 a12 a13  b11 b12 b13  c11 c12 c13 


a a a   b21 b c c 
21 22 23   22
b23   c
21 22 23 
a31 a32 a33 
b32 b 
33 
: inner productc329
c11  a11 b11b3a1 12 b21  a13 b31 c31
Cumulative multiply-add operation : n3 = 27c multiply-add
33 

c ca : Three such multiply-add


b
therefore 9 X 3 multiply-add = 27
c11  c11  a11 b11  a12 b21  a13 b31
C11 initial value = 0  
 
• Pipeline for calculating an inner product :
• Floating point multiplier pipeline : 4 segment
• Floating point adder pipeline : 4 segment
• Example: C  A1B1 A2 B2  A3B3   Ak Bk

• after 1st clock input


• after 4th clock input
Source
Source
A
A

A A4B4 A3B3 A2B 2 A1B1


1B1

Source Multiplier Adder Source Multiplier Adder


B pipeline pipeline B pipeline pipeline

• after 8th clock input • after 9th, 10th, 11th ,...


Source Source
A A

A8B8 A7B7 A6B 6 A5B5 A4B4 A3B3 A2B 2 A1B 1


A A7B7 A6B6 A5B5 A4B4 A3B B2 A1B1
8B8 3 A2

Source Source Multiplier Adder


Multiplier Adder B
B pipeline pipeline pipeline pipeline

C  A1B1  A5B5  A9 B9  A13B13  A2 B2  A6B6 A1B1  A5B5


• The four partial sum are added  A2 B2  A6 B6  A10B10  A14B14  ,,,
to form the final sum  A3B3  A7 B7  A11B11  A15B15  
 A4 B4  A8 B8  A12B12  A16B16 
Memory Interleaving
• Memory Interleaving :
• Simultaneous access to memory from two or more source using one memory bus system.
• Select one of 4 memory modules using lower 2 bits of AR
• Example) Even / Odd Address Memory Access

Address bus

AR AR AR AR

Memory Memory Memory


Memory array array
array array

DR DR DR
DR

D a t a bus
Array
Processor
• Processor that performs the computations on large arrays of
data.

Vector processing : Adder/Multiplier pipeline use


Array processing: using a separate array processor

• There are two different types of (array processor)


:
• Attached Array Processor
• SIMD Array Processor
Attached Array
•Processor
It is designed as a peripheral for a conventional host computer.
• Its purpose is to enhance the performance of the computer by
providing vector processing.
• It achieves high performance by means of parallel processing with
multiple functional units.

General-purpose Input-Output Attached array


computer interface Processor

Main memory Local memory


High-speed memory to-
memory bus
SIMD Array
•Processor
It is processor which consists of multiple processing unit operating in
parallel.
• The processing units are synchronized to perform the same task
under control of common control unit.
• Each processor elements(PE) includes an ALU , a floating point
arithmetic unit and working register.
PE 1 M1

Master control
unit
PE 2 M2

PE 3 M3

Main memory
PE n Mn

You might also like