Execution and Throughput

Last Updated : 10 Nov, 2025

Pipelining is an arrangement of the CPU's hardware components to raise the CPU's general performance. In a pipelined processor, procedures called 'stages’ are accomplished in parallel, and the execution of more than one line of instruction occurs.

Each stage works on a different part of an instruction.
The goal is to complete one instruction per clock cycle after filling.

Execution in a Pipelined Processor

Execution sequence of instructions in a pipelined processor can be visualised using a space-time diagram. For example, consider a processor having 4 stages and let there be 2 instructions to be executed. We can visualize the execution sequence through the following space-time diagrams:

Non-Overlapped Execution

Stage / Cycle	1	2	3	4	5	6	7	8
S1	I₁				I₂
S2		I₁				I₂
S3			I₁				I₂
S4				I₁				I₂

Total time = 8 Cycle

Overlapped Execution

Stage / Cycle	1	2	3	4	5
S1	I₁	I₂
S2		I₁	I₂
S3			I₁	I₂
S4				I₁	I₂

Total time = 5 Cycle Pipeline Stages RISC processor has 5 stage instruction pipeline to execute all the instructions in the RISC instruction set. Following are the 5 stages of the RISC pipeline with their respective operations:

Stage 1 (Instruction Fetch): In this stage the CPU fetches the instructions from the address present in the memory location whose value is stored in the program counter.
Stage 2 (Instruction Decode): In this stage, the instruction is decoded and register file is accessed to obtain the values of registers used in the instruction.
Stage 3 (Instruction Execute): In this stage some of activities are done such as ALU operations.
Stage 4 (Memory Access): In this stage, memory operands are read and written from/to the memory that is present in the instruction.
Stage 5 (Write Back): In this stage, computed/fetched value is written back to the register present in the instructions.

Performance of a pipelined processor Consider a 'k' segment pipeline with clock cycle time as 'Tp'. Let there be 'n' tasks to be completed in the pipelined processor. Now, the first instruction is going to take 'k' cycles to come out of the pipeline but the other 'n – 1' instructions will take only '1' cycle each, i.e, a total of 'n – 1' cycles. So, time taken to execute 'n' instructions in a pipelined processor:

                     ET_pipeline = k + n – 1 cycles
                              = (k + n – 1) Tp

In the same case, for a non-pipelined processor, the execution time of 'n' instructions will be:

                    ET_non-pipeline = n * k * Tp

So, speedup (S) of the pipelined processor over the non-pipelined processor, when 'n' tasks are executed on the same processor is:

    S = Performance of non-pipelined processor /
        Performance of pipelined processor

As the performance of a processor is inversely proportional to the execution time, we have,

   S = ET_non-pipeline / ET_pipeline
    => S =  [n * k * Tp] / [(k + n – 1) * Tp]
       S = [n * k] / [k + n – 1]

When the number of tasks 'n' is significantly larger than k, that is, n >> k

    S = n * k / n
    S = k

where 'k' are the number of stages in the pipeline. Also, Efficiency = Given speed up / Max speed up = S / S_max We know that Smax = k So, Efficiency = S / k Throughput = Number of instructions / Total time to complete the instructions So,

Throughout

Performance of pipeline is measured using two main metrices as Throughput and latency. Throughput = n / (k + n – 1) * Tp Note: The cycles per instruction (CPI) value of an ideal pipelined processor is 1/tp, when n is not given or ideal case case and n is very large.

It measure number of instruction completed per unit time.
It represents overall processing speed of pipeline.
Higher throughput indicate processing speed of pipeline.
Calculated as, throughput= number of instruction executed/ execution time.
It can be affected by pipeline length, clock frequency. efficiency of instruction execution and presence of pipeline hazards or stalls.

Latenecy

It measure time taken for a single instruction to complete its execution.
It represents delay or time it takes for an instruction to pass through pipeline stages.
Lower latency indicates better performance .
It is calculated as, Latency= Execution time/ Number of instruction executed.
It in influenced by pipeline length, depth, clock cycle time, instruction dependencies and pipeline hazards.

Suggested Quiz

7 Questions

What is the main goal of pipelining in a CPU ?

A

Reduce clock speed
B

Complete one instruction per clock cycle after pipeline is filled
C

Increase memory size
D

Reduce power consumption

Explanation:

The goal is to achieve 1 instruction per cycle after the pipeline is filled, improving throughput.

How does pipelining break down an instruction ?

A

Into parallel independent tasks
B

Into smaller sequential stages
C

Into random chunks
D

Into one single stage

Explanation:

Each instruction is divided into sequential stages (e.g., I, F, S) executed in order.

Why is pipelining compared to a manufacturing assembly line ?

A

Both run slower
B

Both process one item fully at a time
C

Both overlap tasks on different items
D

Both use only one worker

Explanation:

Like bottles moving through stations, instructions move through CPU stages in parallel.

In a pipelined CPU, multiple instructions are:

A

Executed one after another completely
B

Processed simultaneously in different stages
C

Stored in cache only
D

Delayed until clock cycle ends

Explanation:

Different instructions occupy different pipeline stages at the same time.

In an ideal pipelined processor, the CPI (Cycles Per Instruction) is:

A

k
B

n
C

1
D

0

Explanation:

After pipeline fill, one instruction completes per cycle → CPI = 1.

When n >> k, the maximum possible speedup of a k-stage pipeline is:

A

n
B

k
C

n – 1
D

1

Explanation:

As n becomes very large, S ≈ k, so max speedup equals number of stages.

In overlapped (pipelined) execution of 2 instructions with 4 stages, total time is:

A

8 cycles
B

5 cycles
C

4 cycles
D

2 cycles

Explanation:

First instruction: 4 cycles; second finishes 1 cycle later → total 5 cycles.

Quiz Completed Successfully

Your Score : 2/7

Accuracy : 0%

1/7 1/7 < Previous Next >

kartik

Improve

Article Tags :

Computer Organization & Architecture

Execution and Throughput

Execution in a Pipelined Processor

Non-Overlapped Execution

Overlapped Execution

Throughout

Latenecy

Explore

Basic Computer Instructions

Input and Output Systems

Instruction Design and Format

Microprogrammed Control

Input and Output Organization

Memory Organization

Pipelining

Thank You!

What kind of Experience do you want to share?