0% found this document useful (0 votes)
80 views20 pages

Pipelined Architecture With Its Diagram

notes

Uploaded by

ARPAN MURMU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views20 pages

Pipelined Architecture With Its Diagram

notes

Uploaded by

ARPAN MURMU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Pipelined architecture with its diagram

Pipeline Processor
It consists of a sequence of m data-processing circuits, called stages or segments, which
collectively perform a single operation on a stream of data operands passing through them. Some
processing takes place in each stage, but a final result is obtained only after an operand set has
passed through the entire pipeline. As shown in figure, a stage S(i) contains a multiword input
register or latch R(i) , and a datapath circuit C(i), that is usually combinational. The R(i)‘s hold
partially processed results as they move through the pipeline; they also serve as buffers that
prevent neighbouring stages from interfering with one another. A common clock signal causes
the R(i)‘s to change state synchronously. Each R(i)‘s to change state synchronously. Each R(i)
receives a new set of input data D(i-1) from the preceding stage S(i-1) except for R(1) whose
data is supplied from an external source. D(i-1) represents the results computed by C(i-1) during
the preceding clock period. Once D(i-1) has been loaded into R(i) , C(i) proceeds to D(i-1)

to computer a new data set D(i) . Thus in each clock period, every stage transfers its previous
results to the next stage and computers a new set of results.

Principle of Pipelining

pipeline is technique where multiple instructions are executed to overlapping fashion. Pipeline is
divided into stages and their stages are connected to each other is cascade from to look like pipe
like structure. In a pipeline system, each stage/segment consists of an input register followed by
a combinational circuit. The register is used to hold data and combinational circuit perform
operation on it.
Here stages are pure combinational circuits performing arithmetic of logic operations.
Over the data stream following through the pipe latches are high speed register for holding
intermediate. Registers between the stages information flows between adjacent stages are under
control of a common clock applied to all the latches simultaneously.

Performance evolution factor for pipelined computer:

1. clock period

2. Speedup

3. Efficiency

4. Throughput
Computer Organization and Architecture
Pipelining

Pipelining is a technique used in modern processors to improve performance by executing


multiple instructions simultaneously. It breaks down the execution of instructions into several
stages, where each stage completes a part of the instruction. These stages can overlap, allowing
the processor to work on different instructions at various stages of completion, similar to an
assembly line in manufacturing.

What is Pipelining?
Pipelining is an arrangement of the CPU’s hardware components to raise the CPU’s general
performance. In a pipelined processor, procedures called ‘stages’ are accomplished in parallel,
and the execution of more than one line of instruction occurs. Now let us look at a real-life
example that should operate based on the pipelined operation concept. Consider a water bottle
packaging plant. For this case, let there be 3 processes that a bottle should go through, ensing the
bottle(I), Filling water in the bottle(F), Sealing the bottle(S).

It will be helpful for us to label these stages as stage 1, stage 2, and stage 3. Let each stage take 1
minute to complete its operation. Now, in a non-pipelined operation, a bottle is first inserted in
the plant, and after 1 minute it is moved to stage 2 where water is filled. Now, in stage 1 nothing
is happening. Likewise, when the bottle is in stage 3 both stage 1 and stage 2 are inactive. But in
pipelined operation, when the bottle is in stage 2, the bottle in stage 1 can be reloaded. In the
same way, during the bottle 3 there could be one bottle in the 1st and 2nd stage accordingly.
Therefore at the end of stage 3, we receive a new bottle for every minute. Hence, the average
time taken to manufacture 1 bottle is:

Therefore, the average time intervals of manufacturing each bottle is:

Without pipelining = 9/3 minutes = 3m

I F S | | | | | |
| | | I F S | | |
| | | | | | I F S (9 minutes)

With pipelining = 5/3 minutes = 1.67m

I F S | |
| I F S |
| | I F S (5 minutes)

Thus, pipelined operation increases the efficiency of a system.

Design of a basic Pipeline


In a pipelined processor, a pipeline has two ends, the input end and the output end. Between
these ends, there are multiple stages/segments such that the output of one stage is connected to
the input of the next stage and each stage performs a specific operation.
Interface registers are used to hold the intermediate output between two stages. These interface
registers are also called latch or buffer.
All the stages in the pipeline along with the interface registers are controlled by a
common clock.
Execution in a pipelined processor Execution sequence of instructions in a pipelined processor
can be visualized using a space-time diagram. For example, consider a processor having 4 stages
and let there be 2 instructions to be executed. We can visualize the execution sequence through
the following space-time diagrams:
Non-Overlapped Execution

Total time = 8 Cycle

Overlapped Execution
Total time = 5 Cycle Pipeline Stages RISC processor has 5 stage instruction pipeline to execute
all the instructions in the RISC instruction set. Following are the 5 stages of the RISC pipeline
with their respective operations:

1. Stage 1 (Instruction Fetch): In this stage the CPU fetches the instructions from the
address present in the memory location whose value is stored in the program counter.

2. Stage 2 (Instruction Decode): In this stage, the instruction is decoded and register file is
accessed to obtain the values of registers used in the instruction.

3. Stage 3 (Instruction Execute): In this stage some of activities are done such as ALU
operations.

4. Stage 4 (Memory Access): In this stage, memory operands are read and written from/to
the memory that is present in the instruction.

5. Stage 5 (Write Back): In this stage, computed/fetched value is written back to the register
present in the instructions.

Performance of a pipelined processor Consider a ‘k’ segment pipeline with clock cycle time as
‘Tp’. Let there be ‘n’ tasks to be completed in the pipelined processor. Now, the first instruction
is going to take ‘k’ cycles to come out of the pipeline but the other ‘n – 1’ instructions will take
only ‘1’ cycle each, i.e, a total of ‘n – 1’ cycles. So, time taken to execute ‘n’ instructions in a
pipelined processor:

ETpipeline = k + n – 1 cycles
= (k + n – 1) Tp
where ‘k’ are the number of stages in the pipeline. Also, Efficiency = Given speed up / Max
speed up = S / Smax We know that Smax = k So, Efficiency = S / k Throughput = Number of
instructions / Total time to complete the instructions So, Throughput = n / (k + n – 1) * Tp Note:
The cycles per instruction (CPI) value of an ideal pipelined processor is 1, Dependencies and
Data Hazard and Types of pipeline and Stalling.

Performance of pipeline is measured using two main metrices as Throughput and latency.

What is Throughout?
● It measure number of instruction completed per unit time.
● It represents overall processing speed of pipeline.
● Higher throughput indicate processing speed of pipeline.
● Calculated as, throughput= number of instruction executed/ execution time.
● It can be affected by pipeline length, clock frequency. efficiency of instruction execution
and presence of pipeline hazards or stalls.
What is Latenecy?
● It measure time taken for a single instruction to complete its execution.
● It represents delay or time it takes for an instruction to pass through pipeline stages.
● Lower latency indicates better performance .
● It is calculated as, Latency= Execution time/ Number of instruction executed.
● It in influenced by pipeline length, depth, clock cycle time, instruction dependencies and
pipeline hazards.

Advantages of Pipelining
● Increased Throughput: Pipelining enhance the throughput capacity of a CPU and enables
a number of instruction to be processed at the same time at different stages. This leads to
the improvement of the amount of instructions accomplished in a given period of time,
thus improving the efficiency of the processor.
● Improved CPU Utilization: From superimposing of instructions, pipelining helps to
ensure that different sections of the CPU are useful. This gives no time for idling of the
various segments of the pipeline and optimally utilizes hardware resources.
● Higher Instruction Throughput: Pipelining occurring because when one particular
instruction is in the execution stage it is possible for other instructions to be at varying
stages of fetch, decode, execute, memory access, and write-back. In this manner there is
concurrent processing going on and the CPU is able to process more number of
instructions in a given time frame than in non pipelined processors.
● Better Performance for Repeated Tasks: Pipelining is particularly effective when all the
tasks are accompanied by repetitive instructions, because the use of the pipeline shortens
the amount of time each task takes to complete.
● Scalability: Pipelining is RSVP implemented in different types of processors hence it is
scalable from simple CPU’s to an advanced multi-core processor.

Disadvantages of Pipelining
● Pipeline Hazards: Pipelining may result to data hazards whereby instructions depends on
other instructions; control hazards, which arise due to branch instructions; and structural
hazards whereby there are inadequate hardware facilities. Some of these hazards may
lead to delays hence tough strategies to manage them to ensure progress is made.
● Increased Complexity: Pipelining enhances the complexity of processor design as well as
its application as compared to non-pipelined structures. Pipelining stages management,
dealing with the risks and correct instruction sequence contribute to the design and
control considerations.
● Stall Cycles: When risks are present, pipeline stalls or bubbles can be brought about, and
this produces idle times in certain stages in the pipeline. These stalls can actually remove
some of the cycles acquired by pipelining, thus reducing the latter’s efficiency.
● Instruction Latency: While pipelining increases the throughput of instructions the delay
of each instruction may not necessarily be reduced. Every instruction must still go
through all the pipeline stages and the time it takes for a single instruction to execute can
neither reduce nor decrease significantly due to overheads.
● Hardware Overhead: It increases the complexity in designing the pipelining due to the
presence of pipeline registers and the control logic used in managing the pipe stages and
the data. This not only increases the cost of the wares but also forces integration of more
complicated, and thus costly, hardware.

Pipelining is one of the most essential concepts and it improves CPU’s capability to process
several instructions at the same time across various stages. It increases immensely the system’s
throughput and overall efficiency by effectively determining the optimum use of hardware. On
its own it enhances the processing speed but handling of pipeline hazards is critical for
enhancing efficiency. It is thus crucial for any architect developing systems that will support
HPC to have a war chest of efficient pipelining strategies that they can implement.

Dependencies in a pipelined processor There are mainly three types of dependencies possible in
a pipelined processor. These are : 1) Structural Dependency 2) Control Dependency 3) Data
Dependency These dependencies may introduce stalls in the pipeline. Stall : A stall is a cycle in
the pipeline without new input. Structural dependency This dependency arises due to the
resource conflict in the pipeline. A resource conflict is a situation when more than one
instruction tries to access the same resource in the same cycle. A resource can be a register,
memory, or ALU. Example:
In the above scenario, in cycle 4, instructions I1 and I4 are trying to access same resource
(Memory) which introduces a resource conflict. To avoid this problem, we have to keep the
instruction on wait until the required resource (memory in our case) becomes available. This wait
will introduce stalls in the pipeline as shown below:

Solution for structural dependency To minimize structural dependency stalls in the pipeline, we
use a hardware mechanism called Renaming. Renaming : According to renaming, we divide the
memory into two independent modules used to store the instruction and data separately called
Code memory(CM) and Data memory(DM) respectively. CM will contain all the instructions
and DM will contain all the operands that are required for the instructions.
Control Dependency (Branch Hazards) This type of dependency occurs during the transfer of
control instructions such as BRANCH, CALL, JMP, etc. On many instruction architectures, the
processor will not know the target address of these instructions when it needs to insert the new
instruction into the pipeline. Due to this, unwanted instructions are fed to the pipeline. Consider
the following sequence of instructions in the program: 100: I1 101: I2 (JMP 250) 102: I3 . . 250:
BI1 Expected output: I1 -> I2 -> BI1 NOTE: Generally, the target address of the JMP instruction
is known after ID stage only.
Output Sequence: I1 -> I2 -> I3 -> BI1 So, the output sequence is not equal to the expected
output, that means the pipeline is not implemented correctly. To correct the above problem we
need to stop the Instruction fetch until we get target address of branch instruction. This can be
implemented by introducing delay slot until we get the target address.
Output Sequence: I1 -> I2 -> Delay (Stall) -> BI1 As the delay slot performs no operation, this
output sequence is equal to the expected output sequence. But this slot introduces stall in the
pipeline. Solution for Control dependency Branch Prediction is the method through which stalls
due to control dependency can be eliminated. In this at 1st stage prediction is done about which
branch will be taken.For branch prediction Branch penalty is zero. Branch penalty : The number
of stalls introduced during the branch operations in the pipelined processor is known as branch
penalty. NOTE : As we see that the target address is available after the ID stage, so the number
of stalls introduced in the pipeline is 1. Suppose, the branch target address would have been
present after the ALU stage, there would have been 2 stalls. Generally, if the target address is
present after the kth stage, then there will be (k – 1) stalls in the pipeline. Total number of stalls
introduced in the pipeline due to branch instructions = Branch frequency * Branch Penalty
Data Dependency (Data Hazard) Let us consider an ADD instruction S, such that S : ADD R1,
R2, R3 Addresses read by S = I(S) = {R2, R3} Addresses written by S = O(S) = {R1} Now, we
say that instruction S2 depends in instruction S1, when

Example: Let there be two instructions I1 and I2 such that: I1 : ADD R1, R2, R3 I2 : SUB R4,
R1, R2 When the above instructions are executed in a pipelined processor, then data dependency
condition will occur, which means that I2 tries to read the data before I1 writes it, therefore, I2
incorrectly gets the old value from I1.

Uniform delay pipeline In this type of pipeline, all the stages will take same time to complete
an operation. In uniform delay pipeline, Cycle Time (Tp) = Stage Delay If buffers are included
between the stages then, Cycle Time (Tp) = Stage Delay + Buffer Delay.

Non-Uniform delay pipeline In this type of pipeline, different stages take different time to
complete an operation. In this type of pipeline, Cycle Time (Tp) = Maximum(Stage Delay) For
example, if there are 4 stages with delays, 1 ns, 2 ns, 3 ns, and 4 ns, then Tp = Maximum(1 ns,
2 ns, 3 ns, 4 ns) = 4 ns If buffers are included between the stages, Tp = Maximum(Stage delay
+ Buffer delay) Example : Consider a 4 segment pipeline with stage delays (2 ns, 8 ns, 3 ns, 10
ns). Find the time taken to execute 100 tasks in the above pipeline. Solution : As the above
pipeline is a non-linear pipeline, Tp = max(2, 8, 3, 10) = 10 ns We know that ETpipeline = (k +
n – 1) Tp = (4 + 100 – 1) 10 ns = 1030 ns NOTE: MIPS = Million instructions per second.
Performance of pipeline with stalls
Speed Up (S) = Performancenon- pipeline / Performancepipeline
=> S = Average Execution Timenon-pipeline / Average Execution
Timepipeline
=> S = CPInon-pipeline * Cycle Timenon-pipeline / CPIpipeline * Cycle
Timepipeline
=> S = CPInon-pipeline * Clock frequency pipeline / CPIpipeline *
Clock frequencynon-pipeline

Ideal CPI of the pipelined processor is ‘1’. But due to stalls, it becomes greater than ‘1’. =>
S = CPInon-pipeline * Cycle Timenon-pipeline / (1 + Number of stalls
per Instruction) * Cycle Timepipeline

As Cycle Timenon-pipeline = Cycle Timepipeline,

Speed Up (S) = CPInon-pipeline / (1 + Number of stalls per instruction)

Problems in Instruction Pipelining

● Time Variation:Not all stages take the same amount of time.This means that the speed
gain of a pipeline will be determined by its slowest page. This problem is particularly
acute in instruction processing, since different instructions have different operand
requirements and sometimes vastly different processing time.
● Data Hazards: When several instructions are in parallel execution, a problem arises if
they referenece the same data. We must ensure that a later instruction does not attempt to
access data source than a proceeding instruction, If they will lead to incorrect results.
● Branching: In order to fetch the “next” instruction may not Know which one is required.
If the present instruction is a conditional branch, the next instruction may not be known
until the current one is processed.
● Interrupts: Interrupts insert unplanned ” extra” instructions into the instruction stream.
The interrupt must take effect between instructions, that is when one instruction has
completed and the next has not yet began.
Introduction of Control Unit and its Design

A Central Processing Unit is the most important component of a computer system. A control unit
is a part of the CPU. A control unit controls the operations of all parts of the computer but it does
not carry out any data processing operations.

What is a Control Unit?


The Control Unit is the part of the computer’s central processing unit (CPU), which directs the
operation of the processor. It was included as part of the Von Neumann Architecture by John von
Neumann. It is the responsibility of the control unit to tell the computer’s memory,
arithmetic/logic unit, and input and output devices how to respond to the instructions that have
been sent to the processor. It fetches internal instructions of the programs from the main memory
to the processor instruction register, and based on this register contents, the control unit generates
a control signal that supervises the execution of these instructions. A control unit works by
receiving input information which it converts into control signals, which are then sent to the
central processor. The computer’s processor then tells the attached hardware what operations to
perform. The functions that a control unit performs are dependent on the type of CPU because
the architecture of the CPU varies from manufacturer to manufacturer.

Examples of devices that require a CU are:

Control Processing Units(CPUs)


Graphics Processing Units(GPUs)

Functions of the Control Unit


● It coordinates the sequence of data movements into, out of, and between a processor’s
many sub-units.
● It interprets instructions.
● It controls data flow inside the processor.
● It receives external instructions or commands to which it converts to sequence of control
signals.
● It controls many execution units(i.e. ALU , data buffers and registers ) contained within a
CPU.
● It also handles multiple tasks, such as fetching, decoding, execution handling and storing
results.

Types of Control Unit


There are two types of control units:

Hardwired
Micro programmable control unit.

Hardwired Control Unit


In the Hardwired control unit, the control signals that are important for instruction execution
control are generated by specially designed hardware logical circuits, in which we can not
modify the signal generation method without physical change of the circuit structure. The
operation code of an instruction contains the basic data for control signal generation. In the
instruction decoder, the operation code is decoded. The instruction decoder constitutes a set of
many decoders that decode different fields of the instruction opcode.
As a result, few output lines going out from the instruction decoder obtains active signal
values. These output lines are connected to the inputs of the matrix that generates control signals
for execution units of the computer. This matrix implements logical combinations of the decoded
signals from the instruction opcode with the outputs from the matrix that generates signals
representing consecutive control unit states and with signals coming from the outside of the
processor, e.g. interrupt signals. The matrices are built in a similar way as a programmable logic
arrays.
Control signals for an instruction execution have to be generated not in a single time point but
during the entire time interval that corresponds to the instruction execution cycle. Following the
structure of this cycle, the suitable sequence of internal states is organized in the control unit. A
number of signals generated by the control signal generator matrix are sent back to inputs of the
next control state generator matrix.
This matrix combines these signals with the timing signals, which are generated by the
timing unit based on the rectangular patterns usually supplied by the quartz generator. When a
new instruction arrives at the control unit, the control units is in the initial state of new
instruction fetching. Instruction decoding allows the control unit enters the first state relating
execution of the new instruction, which lasts as long as the timing signals and other input signals
as flags and state information of the computer remain unaltered.
A change of any of the earlier mentioned signals stimulates the change of the control unit
state. This causes that a new respective input is generated for the control signal generator matrix.
When an external signal appears, (e.g. an interrupt) the control unit takes entry into a next
control state that is the state concerned with the reaction to this external signal (e.g. interrupt
processing).
The values of flags and state variables of the computer are used to select suitable states
for the instruction execution cycle. The last states in the cycle are control states that commence
fetching the next instruction of the program: sending the program counter content to the main
memory address buffer register and next, reading the instruction word to the instruction register
of computer. When the ongoing instruction is the stop instruction that ends program execution,
the control unit enters an operating system state, in which it waits for a next user directive.
Micro Programmable control unit
The fundamental difference between these unit structures and the structure of the hardwired
control unit is the existence of the control store that is used for storing words containing encoded
control signals mandatory for instruction execution. In microprogrammed control units,
subsequent instruction words are fetched into the instruction register in a normal way. However,
the operation code of each instruction is not directly decoded to enable immediate control signal
generation but it comprises the initial address of a microprogram contained in the control store.
With a single-level control store: In this, the instruction opcode from the instruction
register is sent to the control store address register. Based on this address, the first
microinstruction of a microprogram that interprets execution of this instruction is read to the
microinstruction register . This microinstruction contains in its operation part encoded control
signals, normally as few bit fields. In a set microinstruction field decoders, the fields are
decoded. The microinstruction also contains the address of the next microinstruction of the given
instruction microprogram and a control field used to control activities of the microinstruction
address generator.

The last mentioned field decides the addressing mode (addressing operation) to be applied to the
address embedded in the ongoing microinstruction. In microinstructions along with conditional
addressing mode, this address is refined by using the processor condition flags that represent the
status of computations in the current program. The last microinstruction in the instruction of the
given microprogram is the microinstruction that fetches the next instruction from the main
memory to the instruction register.
With a two-level control store: In this, in a control unit with a two-level control store,
besides the control memory for microinstructions, a nano-instruction memory is included. In
such a control unit, microinstructions do not contain encoded control signals. The operation part
of microinstructions contains the address of the word in the nano-instruction memory, which
contains encoded control signals. The nano-instruction memory contains all combinations of
control signals that appear in microprograms that interpret the complete instruction set of a given
computer, written once in the form of nano-instructions.

In this way, unnecessary storing of the same operation parts of microinstructions is avoided. In
this case, microinstruction word can be much shorter than with the single level control store. It
gives a much smaller size in bits of the microinstruction memory and, as a result, a much smaller
size of the entire control memory. The microinstruction memory contains the control for
selection of consecutive microinstructions, while those control signals are generated at the basis
of nano-instructions. In nano-instructions, control signals are frequently encoded using 1 bit/ 1
signal method that eliminates decoding.

Advantages of a Well-Designed Control Unit


Efficient instruction execution: A well-designed control unit can execute instructions more
efficiently by optimizing the instruction pipeline and minimizing the number of clock cycles
required for each instruction.

Improved performance: A well-designed control unit can improve the performance of the CPU
by increasing the clock speed, reducing the latency, and improving the throughput.

Support for complex instructions: A well-designed control unit can support complex instructions
that require multiple operations, reducing the number of instructions required to execute a
program.
Improved reliability: A well-designed control unit can improve the reliability of the CPU by
detecting and correcting errors, such as memory errors and pipeline stalls.

Lower power consumption: A well-designed control unit can reduce power consumption by
optimizing the use of resources, such as registers and memory , and reducing the number of
clock cycles required for each instruction.

Better branch prediction: A well-designed control unit can improve branch prediction accuracy,
reducing the number of branch mispredictions and improving performance.

Improved scalability: A well-designed control unit can improve the scalability of the CPU,
allowing it to handle larger and more complex workloads.

Better support for parallelism: A well-designed control unit can better support parallelism,
allowing the CPU to execute multiple instructions simultaneously and improve overall
performance.

Improved security: A well-designed control unit can improve the security of the CPU by
implementing security features such as address space layout randomization and data execution
prevention.

Lower cost: A well-designed control unit can reduce the cost of the CPU by minimizing the
number of components required and improving manufacturing efficiency.
Disadvantages of a Poorly-Designed Control Unit

Reduced performance: A poorly-designed control unit can reduce the performance of the CPU
by introducing pipeline stalls, increasing the latency, and reducing the throughput.

Increased complexity: A poorly-designed control unit can increase the complexity of the CPU,
making it harder to design, test, and maintain.

Higher power consumption: A poorly-designed control unit can increase power consumption by
inefficiently using resources, such as registers and memory, and requiring more clock cycles for
each instruction.

Reduced reliability: A poorly-designed control unit can reduce the reliability of the CPU by
introducing errors, such as memory errors and pipeline stalls.

Limitations on instruction set: A poorly-designed control unit may limit the instruction set of the
CPU, making it harder to execute complex instructions and limiting the functionality of the CPU.
Inefficient use of resources: A poorly-designed control unit may inefficiently use resources such
as registers and memory, leading to wasted resources and reduced performance.

Limited scalability: A poorly-designed control unit may limit the scalability of the CPU, making
it harder to handle larger and more complex workloads.

Poor support for parallelism: A poorly-designed control unit may limit the ability of the CPU to
support parallelism, reducing the overall performance of the system.

Security vulnerabilities: A poorly-designed control unit may introduce security vulnerabilities,


such as buffer overflows or code injection attacks.

Higher cost: A poorly-designed control unit may increase the cost of the CPU by requiring
additional components or increasing the manufacturing complexity.

You might also like