0 ratings0% found this document useful (0 votes) 12 views5 pagesComputer Systems Architecture 308 312
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
throughput
CHAPTER NINE
Pipeline and Vector
Processing
IN THIS CHAPTER
9-1 Parallel Processing
9-2 Pipelining
9-3 Arithmetic Pipeline
9-4 Instruction Pipeline
9-5 RISC Pipeline
9-7 Array Processors
9-1 Parallel Processing
Parallel processing is a term used to denote a large class of techniques that are
used to provide simultaneous data-processing tasks for the purpose of increas-
ing the computational speed of a computer system. Instead of processing each
instruction sequentially as in a conventional computer, a parallel processing
system is able to perform concurrent data processing to achieve faster execu-
tion time. For example, while an instruction is being executed in the ALU, the
next instruction can be read from memory. The system may have two or more
ALUs and be able to execute two or more instructions at the same time.
Furthermore, the system may have two or more processors operating concur-
rently, The purpose of parallel processing is to speed up the computer process-
ing capability and increase its throughput, that is, the amount of i
that can be accomplished during a given interval of time. The amount of
hardware increases with parallel processing, and with it, the cost of the system.
increases. However, technological developments have reduced hardware costs
to the point where parallel processing techniques are economically feasible.
Parallel processing can be viewed from various levels of complexity. At
the lowest level, we distinguish between parallel and serial operations by the
type of registers used. Shift registers operate in serial fashion one bit at a time,
299300 — cHaPTER NINE Pipeline and Vector Processing
multiple functional
units
while registers with parallel load operate with all the bits of the word simulta-
neously. Parallel processing at a higher level of complexity can be achieved by
having a multiplicity of functional units that perform identical or different
operations simultaneously. Parallel processing is established by distributing
the data among the multiple functional units. For example, the arithmetic,
logic, and shift operations can be separated into three units and the operands
diverted to each unit under the supervision of a control unit.
Figure 9-1 shows one possible way of separating the execution unit into
eight functional units operating in parallel. The operands in the registers are
applied to one of the units depending on the operation specified by the instruc-
Figure 9-1 Processor with multiple functional units.
-—P] Adder-subsractor |—m,
T—>}__ Integer multiply }+]
TH] Logic unit Le
t—>| Shift unit |}
Tomemory —<—>| t—>{ Incrementer >|
Processor
registers © [>|
Floating-point
[| add-subtract. © [7]
Floating-point
| multiply el
Floating-point
divideSIMD
MIMD
SECTION 9-1 Parallel Processing 301
tion associated with the operands. The operation performed in each functional
unit is indicated in each block of the diagram. The adder and integer multiplier
perform the arithmetic operations with integer numbers. The floating-point
operations are separated into three circuits operating in parallel. The logic,
shift, and increment operations can be performed concurrently on different
data. All units are independent of each other, so one number can be shifted
while another number is being incremented. A multifunctional organization
is usually associated with a complex control unit to coordinate all the activities
among the various components.
There are a variety of ways that parallel processing can be classified. It
can be considered from the internal organization of the processors, from the
interconnection structure between processors, or from the flow of information
through the system. One classification introduced by M. J. Flynn considers the
organization of a computer system by the number of instructions and data
items that are manipulated simultaneously. The normal operation of a com-
puter is to fetch instructions from memory and execute them in the processor.
‘The sequence of instructions read from memory constitutes an instruction
stream. The operations performed on the data in the processor constitutes a data
stream. Parallel processing may occur in the instruction stream, in the data
stream, or in both. Flynn’s classification divides computers into four major
groups as follows:
Single instruction stream, single data stream (SISD)
Single instruction stream, multiple data stream (SIMD)
Multiple instruction stream, single data stream (MISD)
Multiple instruction stream, multiple data stream (MIMD)
SISD represents the organization of a single computer containing a con-
trol unit, a processor unit, and a memory unit. Instructions are executed
sequentially and the system may or may not have internal parallel processing
capabilities. Parallel processing in this case may be achieved by means of
multiple functional units or by pipeline processing.
SIMD represents an organization that includes many processing units
under the supervision of a common control unit. All processors receive
the same instruction from the control unit but operate on different items of
data. The shared memory unit must contain multiple modules so that it can
communicate with all the processors simultaneously. MISD structure is only
of theoretical interest since no practical system has been constructed using this
organization. MIMD organization refers to a computer system capable of
processing several programs at the same time. Most multiprocessor and multi-
computer systems can be classified in this category.
Flynn’s classification depends on the distinction between the perform-
ance of the control unit and the data-processing unit. It emphasizes the be-302 — CHAPTER NINE Pipeline and Vector Processing
an example
havioral characteristics of the computer system rather than its operational and
structural interconnections. One type of parallel processing that does not fit
Flynn’s classification is pipelining. The only two categories used from this
classification are SIMD array processors discussed in Sec. 9-7, and MIMD
multiprocessors presented in Chap. 13.
In this chapter we consider parallel processing under the following main
topics:
1. Pipeline processing
2. Vector processing
3. Array processors
Pipeline processing is an implementation technique where arithmetic suboper-
ations or the phases of a computer instruction cycle overlap in execution.
Vector processing deals with computations involving large vectors and ma-
trices. Array processors perform computations on large arrays of data.
9-2 _ Pipelining
Pipelining is a technique of decomposing a sequential process into subopera-
tions, with each subprocess being executed ina special dedicated segment that
operates concurrently with all other segments. A pipeline can be visualized as.
a collection of processing segments through which binary information flows.
Each segment performs partial processing dictated by the way the task is
partitioned. The result obtained from the computation in each segment is
transferred to the next segment in the pipeline. The final result is obtained after
the data have passed through all segments. The name “pipeline” implies a
flow of information analogous to an industrial assembly line. It is characteristic
of pipelines that several computations can be in progress in distinct segments
at the same time. The overlapping of computation is made possible by associ-
ating a register with each segment in the pipeline. The registers provide
isolation between each segment so that each can operate on distinct data
simultaneously.
Perhaps the simplest way of viewing the pipeline structure is to imagine
that each segment consists of an input register followed by a combinational
circuit. The register holds the data and the combinational circuit performs the
suboperation in the particular segment. The output of the combinational circuit
ina given segment is applied to the input register of the next segment. A clock
is applied to all registers after enough time has elapsed to perform all segment
activity. In this way the information flows through the pipeline one step at a
time.
The pipeline organization will be demonstrated by means of a simpleSECTION 9-2 Pipelining 303
example. Suppose that we want to perform the combined multiply and add
operations with a stream of numbers.
AmB +C; fori =1,2,3,...,7
Each suboperation is to be implemented in a segment within a pipeline. Each
segment has one or two registers and a combinational circuit as shown in Fig.
9-2. R1 through RS are registers that receive new data with every clock pulse.
The multiplier and adder are combinational circuits. The suboperations per-
formed in each segment of the pipeline are as follows:
RICA, R2CB, Input A; and B;
R3<-R1*R2, R4—-C, Multiply and input C;,
RS<-R3 + RA Add C; to product
The five registers are loaded with new data every clock pulse. The effect of each
clock is shown in Table 9-1. The first clock pulse transfers A, and B, into Rl and
Figure 9-2 Example of pipeline processing.
Ai B Gq
y y
Ri R2
|
Multiplier
RB R4
Adder