0% found this document useful (0 votes)

14 views13 pages

PIPELINE

i want it now

Uploaded by

vidhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views13 pages

PIPELINE

i want it now

Uploaded by

vidhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

PIPELINE

For an n-stage pipeline implementation of some computation, the maximum speedup that
can be obtained is upper bounded by:
a. 2n
b. n
c. 2n
d. None of the above
Correct answer is (b).
The maximum speedup that can be obtained in a pipeline is upper bounded by the
number of stages.

Q:Consider the following processors, where the inter-stage pipeline registers are assumed
to be of zero latency, and the stage delays are specified in nanoseconds. Which of the
following pipelines will have the highest clock frequency?
a. 4-stage pipeline with stage delays 1, 2, 2 and 1
b. 4-stage pipeline with stage delays 1, 1.5, 1.5, and 1.5
c. 5-stage pipeline with stage delays 0.5, 1, 1, 0.6 and 1
d. 5-stage pipeline with stage delays 0.5, 0.5, 0.3, 1 and 1.1
Correct answer is (c).
Maximum clock frequency is limited by the slowest pipeline stage. The slowest
pipeline stage is the smallest in option (c), namely, “1”. For (a), it is “2”; for (b), it is
“1.5”; and for (d) it is “1.1”.

Q:
The stage delays in a 4-stage pipeline are 800, 500, 400 and 300 picoseconds. The first
stage is replaced with a functionally equivalent design involving two stages with
respective delays 600 and 350 picoseconds. The throughput of the pipeline increases by
………… percent.
Correct answer is 33.3%.
Pipeline 1: To process n data, time = 3 + 800n
Throughput = 1/800 (approx.)
Pipeline 2: To process n data, time = 4 + 600n
Throughput = 1/600 (approx..)
% improvement = (1/600 – 1/800) / (1/800) * 100 = 33.3

Q:
. What are the drawbacks for implementing multicycle operations in a single clock cycle by
slowing down the clock?
a. The pipeline control becomes more complex.
b. Causes severe degradation of performance, as all other operations are also slowed
down.
c. Additional types of data hazards can show up.
d. None of the above.
Correct answer is (b).
Simply slowing down the clock will never result in (a) or (c). However, performance
will degrade because all operations depend on the clock.
Q. The following may occur when multicycle operations are allowed in the execution unit
(EX stage):
a. A later instruction may finish earlier.
b. Two or more instructions may try to write into a register simultaneously in WB stage.
c. RAW hazards resulting in several stall cycles can arise.
d. All of the above.
Correct answer is (d).
all of (a), (b) and (c) can happen for multicycle operations in the EX stage.

Q:
The Fetch, Decode, Execute, Memory and Write Back stages of a pipelined processor have
the latencies 200ps, 140ps, 160ps, 190ps and 100ps respectively. Assume that when
pipelining, each pipeline stage costs 10ps extra for the registers between pipeline stages. If
you could split one of the pipeline stages into 2 equal halves, what is the new latency (in ps)
for an instruction? (rounded to one decimal point)

Q:
Suppose that an unpipelined processor has a cycle time of 25ns, and that it's data path is
made up of modules with latencies of 2,3,4,7,3,2 and 4ns(in that order).In pipelining this
processor ,it is not possible to rearrange the order of the modules(for examples, putting the
register read stage before the instruction decide stage) or to divide a module into multiple
pipeline stages(for complexity reasons). Given pipeline latches with 1ns latency .if the
processor is divided into the request number of stages that allow is to achieve the minimum
latency from part 1,what is the latency of the pipeline?
(a). no latency
(b). 35 ns latency
(c). 40 ns latency
(d). 56 ns latency

Solution:

In the question it is “if the processor is divided into the fewest number of stages”

Also, we cannot change the order of the stages, so we can only combine consecutive stages
such that maximum stage latency should be 7ns (because it is already highest and we want
lowest latency possible). One possible combination could be:

2, (3 + 4), 7, 3, (2 + 4) = 2, 7, 7, 3, 6

k = 5 and max(2, 7, 7, 3, 6) = 7
latch latency = 1ns

therefore,

Latency of the pipeline would be 5*(7+1) = 40ns

A non-pipeline processor X has a clock frequency of 2.5GHz and an average CPI of

3. Processor Y an improved version of X, is designed with 5 stage linear instruction
pipeline. However, due to latch delay and clock skew the clock rate of Y is only 2 GHz. If a
program consists of one million instructions are executed on both the processors the
speedup of processor Y as compared to X is :

Q:
Consider a 5--stage pipeline - IF (Instruction Fetch), ID (Instruction Decode and register
read), EX (Execute), MEM (Memory) and WB (Write Back). All register reads take place in
the second phase of a clock cycle and all register writes occur in the first phase. Consider
the execution of the following instruction sequence:

• I : R <- R + R
1 1 2 3

• I : R <- R - R
2 3 1 2

• I : M[R +1000] <- R

3 1 1

• I : R <- R * R
4 2 3 1

If the number of RAW (Read after write) hazards is denoted by A, WAR (Write after read)
hazards by B, and WAW (Write after write) hazards by C, then A+B+C :



Q:
Consider an instruction pipeline with five stages without any branch prediction: Fetch
Instruction(FI), Decode Instruction(DI), Fetch Operand(FO), Execute instruction(EI) and
Write Operand(WO). The stage delays for FI, DI, FO, EI and WO are 4ns, 5ns, 12 ns, 7 ns
and 6ns respectively. There are intermediate storage buffers after each stage and the delay
of each buffer is 1ns. A program consisting of 12 instructions I , I , I …..I is executed in this
1 2 3 12

pipelined processor. Instruction I is the only branch instruction and its branch target is I . If
4 10

the branch is taken during the execution of this program, the time(in ns) needed to complete
the program is ________

Q:
Register renaming is able overcome which of the data hazards?
A.RAW
B. WAW
C. WAR
D. RAR

Solution: BC
WAW
WAR

Q:
T1 is the time taken by first instruction to complete in a non-pipeline system,and
T2 is the time taken by first instruction to complete in a pipeline system with inter-
stage buffer registers. What is the relationship between T1 and T2?
a. T1<T2
b. T1>=T2
c. T1=T2
d. T1>T2

Solution:a
T1<T2

Q:
An instruction pipeline has a single functional unit to perform
arithmeticoperations. It consists of 4 stages to implement three instructions (ADD,
MUL, SUB). Allstages, except the execution stage, take 1 clock, while the
execution stage for ADD andSUB takes 2 clocks each and for MUL it takes 3
clocks. If all the instructions are executed in the above order, how many clocks
are required to complete these 3 instructions?
a. 7
b. 8
c. 9
d. 10

Solution: d. 10

Q:
Given a non-pipelined architecture running at 1GHz, that takes 5 cycles
tofinish an instruction. You want to make it pipelined with 5 stages. The
increase inhardware forces you to run the machine at 800MHz. The only
stalls are caused bymemory and branch instructions. 25% of the total
instructions are memoryinstructions and a stall of 70 cycles happens in 2%
of the memory instructions.20% of the total instructions are branch
instructions and a stall of 2 cycleshappens in 10% of the branch
instructions. What is the speedup that can beachieved with pipelining as
compared to non-pipelined design?

Answer:
2.87

speed up = Twp(without pipeline) / Tp(with pipeline)

Twp = 5*(1/ 10 ) 9

Tp = (.25( .98*1 + .02 * 71) + .20( .90*1 + .1*3) + .55*1 ) * (1/ 800*10 ) 6

Then Speed up = 5*(1/ 10 ) / (.25( .98*1 + .02 * 71) + .20( .90*1 + .1*3) +
9

.551 ) (1/ 800*10 ) 6

After Solving

speed up = 4 / 1.39

= 2.877697842

= 2.87.

Q. Consider an instruction pipeline with four stages with the stage delays 5 nsec, 6 nsec, 11
nsec, and 8 nsec respectively. The delay of an inter-stage register stage of the pipeline is
1 nsec. What is the approximate speedup of the pipeline in the steady state under ideal
conditions as compared to the corresponding non-pipelined implementation?
a. 4.0
b. 2.5
c. 1.1
d. 3.0
Correct answer is (b).
Time taken to execute N instructions in non-pipelined implementation will be (5 + 6
+ 11 + 8)N = 30N
Clock period for pipelined implementation = max{5,6,11,8} + 1 = 12. Time taken
for the pipelined implementation = (3 + N)12 = 12N (approx.) Speedup = 30N /
12N = 2.5
Q. Consider an instruction pipeline with five stages without any
branch prediction: Instruction Fetch (IF), Instruction Decode (ID), Operand
Fetch (OF), Execute (EX) and Operand Write (OW). The stage delays for IF,
ID, OF, EX and OW are 5 nsec, 7 nsec, 10 nsec, 8 nsec and 6 nsec,
respectively.
There are intermediate storage buffers after each stage and the delay of each
buffer is 1 nsec. A program consisting of 12 instructions I1, I2, …, I12 is
executed in the pipelined processor. Instruction I4 is the only branch instruction
and its branch target is I9. If the branch is taken during the execution of this
program, the time needed to complete the program is:
a. 132 nsec
b. 154nsec
c. 176 nsec
d. 328 nsec

Correct answer is (b).

Minimum clock period = max{5,7,10,8,6} + 1 = 11
I1: IF ID EX ME WB
I2: IF ID EX ME WB
I3: IF ID EX ME WB
I4: IF ID EX ME WB
I5: . . . . .
I6: . . . . .
I7: . . . . .
I8: . . . . .
I9: IF ID EX ME WB
I10: IF ID EX ME WB
I11: IF ID EX ME WB
I12: IF ID EX ME WB

Total 14 clock cycles are needed, i.e. 14 x 11 = 154 nsec.

Q. Consider a RISC machine where each instruction is 4 bytes long. Conditional
and unconditional branch instructions use PC-relative addressing mode with
Offset specified in bytes to the target location of the branch instruction. Also, the
Offset is always with respect to the address of the next instruction in the program
sequence. Consider the following instruction sequence:
Instruction i: ADD R2,R3,R4
Instruction i+1: SUB R5,R6,R7
Instruction i+2: SEQ R1,R9,R10
Instruction i+3: BEQZ R1,Offset
If the target of the branch instruction is i, the decimal value of Offset will
be …………………
Correct answer is -16.
Assume that instruction “i” starts from memory address X.
Address of instruction i+1 = X + 4
Address of instruction i+2 = X + 8
Address of instruction i+3 = X + 12
Address of instruction i+4 = X + 16
So, Offset = X – (X + 16) = -16
Q. A 5-stage pipelined processor has the stages: Instruction Fetch
(IF), Instruction Decode (ID), Operand Fetch (OF), Execute (EX) and
Write Operand (WO). The IF, ID, OF, and WO stages take 1 clock cycle each
for any instruction. The EX stage takes 1 clock cycle for ADD and
SUB instructions, 3 clock cycles for MUL instruction, and 6 clock cycles for
DIV instruction. Operand forwarding is used in the pipeline (for
data dependency, OF stage of the dependent instruction can be executed only
after the previous instruction completes EX). What is the number of clock cycles
needed to execute the following sequence of instructions? MUL R2,R10,R1
DIV R5,R3,R4
ADD R2,R5,R2
SUB R5,R2,R6
a. 13
b. 17
c. 15
d. 19
Correct answer is (c).
MUL R2,R10,R1: IF ID OF EX EXEX WO
DIV R5,R3,R4: IF ID OF EX EXEXEXEXEX WO
ADD R2,R5,R2: IF ID - - - - - - OF EX WO SUB R5,R2,R6: IF - - - - -
- ID - OF EX WO Number of clock cycles = 15.

Q. In pipeline, what are the measures that can be taken to reduce the impact of
data hazards?
a. Splitting the memory into separate Instruction and Data memories.
b. Implement data forwarding in the datapath.
c. Allow split register write and read during the two halves of the same clock
cycle.
d. Replicate the register bank.
Correct answers are (b) and (c).
Option (a) reduces the impact of structural hazard. Option (d) will also not help in
mitigating data hazards.
Data forwarding and split register access can reduce the number of stall cycles.
Q. In a pipeline, which of the following scenarios of data dependency will always
result in a pipeline stall due to data hazard without any instruction scheduling?
a. An ADD instruction followed by a SUB instruction.
b. A STORE instruction followed by a LOAD instruction
c. A LOAD instruction followed by an ADD instruction.
d. None of the above.
Correct answer is (c).
Only a LOAD followed by an immediate use will result in a mandatory stall in the
pipeline.
Q. Instruction scheduling can be used to eliminate data and control hazard by:
a. Schedule the execution of the instruction only if there is no hazard.
b. Allowing the compiler the move instructions around to fill the LOAD/BRANCH
delay slot(s) with meaningful instructions.
c. Using a special hardware to check for hazard and issue instructions only when
possible.
d. None of the above.
Correct answer is (b).
Instruction scheduling is a compiler technique where instructions are moved around
keeping dependencies in mind so as to reduce the wasted cycles due to stalls.
Q. Consider a pipeline with ideal CPI of 1. Assume that 30% of all instructions
executed are branch, out of which 80% are taken branches. The pipeline
speedup for predict taken and delayed branch approaches to reduce branch
penalties will be:
a. 4.10 and 4.45
b. 3.25 and 4.35
c. 3.67 and 4.25
d. 3.85 and 4.35
Correct answer is (d).
For predict taken, branch penalty = 1
Speedup = 5 / (1 + 0.30 x 1) = 3.85
For delayed branch, branch penalty = 0.5
Speedup = 5 / (1 + 0.30 x 0.5) = 4.35

The design team for a simple, single-issue processor is choosing between a pipelined
or non-pipelined implementation. Here are some design parameters for the two possibilities:

Parameter Pipelined Version Non-Pipelined Version

Clock Rate 500MHz 350 MHz

CPI for ALU instructions 1 1

CPI for Control 2 1

instructions

CPI for Memory 2.7 1

instructions

(a) For a program with 20% ALU instructions, 10% control instructions and 75%
memory instructions, which design will be faster? Give a quantitative CPI average
for each case.

Average CPI for Pipelined Version = (0.21 + 0.12 + 0.7*2.7) = 2.29

Average CPI for Non-Pipelined Version = (0.2*1 + 0.1*1 + 0.7*1) =
1.0 CPU execution time for Pipelined version = 2.26/(500 Mhz) = 4.5ns
CPU execution time for Non-Pipelined version = 1.0/(350 Mhz) =
2.8ns The non-pipelined version is faster.

(b) For a program with 80% ALU instructions, 10% control instructions and 10%
memory instructions, which design will be faster? Give a quantitative CPI average
for each case.

Average CPI for Pipelined Version = (0.81 + 0.12 + 0.1*2.7) = 1.27

Average CPI for Non-Pipelined Version = (0.8*1 + 0.1*1 + 0.1*1) =
1.0 CPU execution time for Pipelined version = 1.27/(500 Mhz) =
2.54ns CPU execution time for Non-Pipelined version = 1.0/(350 Mhz) =
2.8ns The pipelined version is faster.

Q:
Match the following:
A. Branch Prediction
B. Instruction Scheduling
C. Delay Slots
D. Increasing functional units
E. Caches

I.Data hazard
II.Structural
III.Control

Solution:
A. III
B. II & III
C.III
D.II
E. I

Structural, data and control hazards typically require a processor pipeline to stall.

(a) Branch Prediction

It addresses control hazards by guessing the outcome of a branch instruction and
then speculatively executes the instructions on one side of the branch to keep the
pipeline moving. Predictions can be made in hardware or in software by the
compiler.

(b) Instruction Scheduling

It addresses structural hazards and data hazards. It addresses data hazards by

either moving instructions that are not dependent on an instruction, say A, before
some instructions that depend on A and thus avoiding the stall that would have
occurred otherwise. It addresses structural hazards by making sure instructions that
use functional units that have limited number of instances are be scheduled far apart
from each other and there is no unnecessary stall due to this. It can be done
in hardware (superscalar processor) or statically by the compiler

(c) delay slots

It addresses control hazards. It helps to avoid a stall that would result due branch
target identification during the decode stage by scheduling the execution of some
other instruction which anyway has to execute irrespective of the branch condition.

(d) increasing availability of functional units (ALUs, adders etc)

It helps to avoid structural hazards. It is possible to run multiple instructions of the

same type at the same time if we have replicated functional units

(e) caches

It addresses data hazards. In particular, caches help to reduce memory latency and
hence reduce the load-use latency which in turn reduce the stall duration and
improves execution time (by maintaining pipeline steady state).

Which is the Incorrect statement/s:

A. An instruction A is said to be dependent on an instruction B if the A’s execution is

determined by some condition computed by B
B. An instruction A is said to be dependent on an instruction B if A uses some data
value that is produced by B
C. Only data dependencies cause hazards
D. Dependencies always cause hazard

Solution: CD

An instruction A is said to be dependent on an instruction B if the A’s execution is

determined by some condition computed by B or if A uses some data value that is
produced by B. A hazard is situation which prevents the pipelined execution of an
program and causes a stall. Hazards are usually a consequence of having data
dependencies between instructions, but it is possible for hazards to manifest on an
architecture even though intrinsically there are no dependences between
instructions due to limitations in the number of resources (registers/functional units).

Q:
Using the code below, count the number of all of the dependence types (RAW, WAR,
WAW).

I0: A = B + C;
I1: C = A - B;
I2: D = A + C;
I3: A = B * C * D;
I4: C = F / D;
I5: F = A ˆ G;
I6: G = F + D;

Solution: RAW =9, WAR = 6, WAW=2

RAW Dependence WAR Dependence WAW Dependence

From Instr To Instr From Instr To Instr From Instr To Instr

I0 I1 I0 I1 I0 I3

I0 I2 I1 I3 I1 I4

I1 I2 I2 I4

I3 I5 I3 I4

I2 I3 I4 I5

I1 I3 I5 I6

I2 I4

I3 I5

I5 I6

Given four instructions, how many unique comparisons (between register sources
and destinations) are necessary to find all of the RAW, WAR, and WAW
dependences. Answer for the case of four instructions, and then derive a general
equation for N instructions. Assume that all instructions have one register destination
and two register sources.

For four instructions, the number of unique comparisons:

(2(3) + 2(2) + 2(1)) + (2(3) + 2(2) + 2(1)) + (3 + 2 + 1) = 30

The first summand is for RAW comparisons, the second summand is for WAR
comparisons and the last summand is for WAW comparisons.

The general equation for N instructions = (5(n-1)n)/2

Which of the following are the reasons that in pipelining throughput will not improve as
pipelining is increased indefinitely
A. Pipelining has a fixed (or relatively fixed) absolute overhead per stage which
results from latch overhead and clock/data skew.
B. increasing the pipeline depth lengthens hazard penalties, increasing the CPI.
C. the latency of a pipeline stage can be driven to zero
D. increasing the depth of the pipeline between the fetch and execute stage
decreases the branch miss prediction penalty.

Solution:

Pipelining has a fixed (or relatively fixed) absolute overhead per stage which results
from latch overhead and clock/data skew. This means that the latency of a pipeline
stage cannot be driven to zero. Second, increasing the pipeline depth lengthens
hazard penalties, increasing the CPI. For instance, increasing the depth of the pipeline
between the fetch and execute stage increases the branch miss prediction penalty.

Q:
Consider a machine with a 5-stage pipeline with a cycle time of 10ns. Assume that you
are executing a program where a fraction, f, of all instructions immediately follow a load
upon which they are dependent.

(a) With forwarding enabled what is the total execution time for N instructions, in terms of f ?

When pipeline is filled,

(1 – f)*N instructions take 1 cycle
f*N instructions take 2 cycles (including 1 cycle for load-use stall)
Total cycles = (1-f)*N + 2*f*N + 4 (then number of cycles to fill the

pipeline) Total time = 10 (N(1+f) + 4)

Non pipelined system takes 130ns to process an instruction . A program of 1000

instructions is executed in non pipelined system. Then same program is processed with
processor with 5 segment pipeline with clock cycle of 30 ns/stage.

Determine speed up ratio of pipeline.

solution:
For a non-pipelined system:

• Total number of instruction/task

• (n)=1000

Total time required to perform a single task in pipelined processor

(Tnp)=130ns
For a pipelined system:
• Total number of stages
• (k)=5
Total number of instruction/task

(n)=1000
Total time required to perform a single task in pipelined processor

(Tp)=30ns
Speedup = (n*T )/(k+(n-1)T )
np p

speedup =4.316ns is the answer.

Appendix C
63% (8)
Appendix C
7 pages
Access Guide Coursera For Employee
No ratings yet
Access Guide Coursera For Employee
29 pages
HIRA Template 200912
0% (1)
HIRA Template 200912
107 pages
Springer Aerospace Technology
No ratings yet
Springer Aerospace Technology
282 pages
High Performance Computing - CS 3010 - MID SEM Question by Subhasis Dash With Solution
No ratings yet
High Performance Computing - CS 3010 - MID SEM Question by Subhasis Dash With Solution
12 pages
Assignment Solution Week11
100% (1)
Assignment Solution Week11
5 pages
Marko Rodin Book
100% (1)
Marko Rodin Book
483 pages
GE MKII SCR Drive Training PDF
No ratings yet
GE MKII SCR Drive Training PDF
56 pages
Goal Seek in Excel
No ratings yet
Goal Seek in Excel
8 pages
HPC Question Bank
No ratings yet
HPC Question Bank
5 pages
CO Gate 2023
No ratings yet
CO Gate 2023
6 pages
CSN-221 Pipelines-Quiz: Enrollment No.: 18114031 Name - Hemil Panchiwala
No ratings yet
CSN-221 Pipelines-Quiz: Enrollment No.: 18114031 Name - Hemil Panchiwala
6 pages
Gate Py Q Son Pipe Lining
No ratings yet
Gate Py Q Son Pipe Lining
51 pages
Co MODULE 3_merged
No ratings yet
Co MODULE 3_merged
102 pages
revision1
No ratings yet
revision1
14 pages
Pipeline Processing
No ratings yet
Pipeline Processing
43 pages
Question 1 (50 Points) Pipelining
No ratings yet
Question 1 (50 Points) Pipelining
3 pages
COA Practice Problems
No ratings yet
COA Practice Problems
59 pages
Pipelining Numericals
100% (1)
Pipelining Numericals
11 pages
CS433 hw1 Fall 07
No ratings yet
CS433 hw1 Fall 07
3 pages
Pipeline PYQs
No ratings yet
Pipeline PYQs
38 pages
PS4-Solution
No ratings yet
PS4-Solution
6 pages
18116029
No ratings yet
18116029
6 pages
Lecture 3.1.5 (Throughput and Speedup)
No ratings yet
Lecture 3.1.5 (Throughput and Speedup)
13 pages
PCS216
No ratings yet
PCS216
3 pages
Sample Problems Pipe&Memory
No ratings yet
Sample Problems Pipe&Memory
57 pages
Assignment5 Soln
No ratings yet
Assignment5 Soln
5 pages
Pipeline Numericals
No ratings yet
Pipeline Numericals
32 pages
PRACTICE PROBLEMS BASED ON PIPELINING IN COMPUTER ARCHITECTURE
No ratings yet
PRACTICE PROBLEMS BASED ON PIPELINING IN COMPUTER ARCHITECTURE
18 pages
Unit 3 Problems
No ratings yet
Unit 3 Problems
18 pages
pipeline ex.1
No ratings yet
pipeline ex.1
1 page
F10 E1 Solution
No ratings yet
F10 E1 Solution
5 pages
ACFr Og Ajp 3 Woh Li TFaj WBBM PLa YIK8 Obdn FZ WVV9 O2 Anjwaz 88 Jzy XHUWlz 9541
No ratings yet
ACFr Og Ajp 3 Woh Li TFaj WBBM PLa YIK8 Obdn FZ WVV9 O2 Anjwaz 88 Jzy XHUWlz 9541
2 pages
CAO-II Module 2 Complete
100% (1)
CAO-II Module 2 Complete
32 pages
Week 11 Reduced
No ratings yet
Week 11 Reduced
29 pages
Practice Problems Based On Pipelining
No ratings yet
Practice Problems Based On Pipelining
18 pages
Assignment 2 Solution
0% (1)
Assignment 2 Solution
4 pages
Problem
No ratings yet
Problem
19 pages
Note CT-3042_41028ac6-3e2b-4248-a8eb-0a6f5bc3e059
No ratings yet
Note CT-3042_41028ac6-3e2b-4248-a8eb-0a6f5bc3e059
4 pages
Assignment4 Solutions PDF
No ratings yet
Assignment4 Solutions PDF
4 pages
Assignment 4 Solutions Pipelining and Hazards: 1 Processor Performance
100% (1)
Assignment 4 Solutions Pipelining and Hazards: 1 Processor Performance
4 pages
CompEng 361 Final Review Problems - Solutions
No ratings yet
CompEng 361 Final Review Problems - Solutions
6 pages
Ex4 Updated
No ratings yet
Ex4 Updated
4 pages
Frtyuiop
No ratings yet
Frtyuiop
8 pages
CO Assignment 4 Solution
100% (1)
CO Assignment 4 Solution
10 pages
Questions: Answer
No ratings yet
Questions: Answer
13 pages
TUT 6
No ratings yet
TUT 6
2 pages
Homework 2 -Solution
No ratings yet
Homework 2 -Solution
5 pages
Homework3 Solution v2
No ratings yet
Homework3 Solution v2
41 pages
SHEET 9
No ratings yet
SHEET 9
12 pages
COE301 Final Solution 162
No ratings yet
COE301 Final Solution 162
10 pages
اسمبلي ٩
No ratings yet
اسمبلي ٩
3 pages
L33
No ratings yet
L33
10 pages
pipelining
No ratings yet
pipelining
47 pages
Pipeline
No ratings yet
Pipeline
39 pages
ACA UNIT-2 Kai Hwang
No ratings yet
ACA UNIT-2 Kai Hwang
40 pages
9
No ratings yet
9
22 pages
ACA Final Questions
No ratings yet
ACA Final Questions
14 pages
COA Tute 8 Main
No ratings yet
COA Tute 8 Main
3 pages
Pipelining Concepts and Problems
No ratings yet
Pipelining Concepts and Problems
33 pages
PDF Document 10
No ratings yet
PDF Document 10
1 page
CS641
No ratings yet
CS641
2 pages
Comptia Network+ Primer
From Everand
Comptia Network+ Primer
John Greene
No ratings yet
Solutions to Problems in Fluids and Turbomachinery
From Everand
Solutions to Problems in Fluids and Turbomachinery
Rahul Basu
No ratings yet
Comptia Server+ Primer
From Everand
Comptia Server+ Primer
John Greene
5/5 (1)
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
MtechClg25
No ratings yet
MtechClg25
1 page
Pharmaceutical Marketing and Distribution
No ratings yet
Pharmaceutical Marketing and Distribution
9 pages
Core Java Interview Questions
No ratings yet
Core Java Interview Questions
44 pages
Minor Project Report Aanchal
No ratings yet
Minor Project Report Aanchal
44 pages
Slides Nancy Liao Brief Intro To Blockchain Iac 101217
No ratings yet
Slides Nancy Liao Brief Intro To Blockchain Iac 101217
11 pages
Daa Micro
No ratings yet
Daa Micro
25 pages
370 FORMULA Book Free Unacademy
No ratings yet
370 FORMULA Book Free Unacademy
151 pages
USB Firmware Update Operation-Windows
No ratings yet
USB Firmware Update Operation-Windows
14 pages
Inductor DataSheet
No ratings yet
Inductor DataSheet
5 pages
Reference Signal Power Boosting in LTE (MOD PDSCHCFG)
100% (1)
Reference Signal Power Boosting in LTE (MOD PDSCHCFG)
3 pages
GPP Compiler
No ratings yet
GPP Compiler
2 pages
Oracle Fusion Cloud Financials R13 1
No ratings yet
Oracle Fusion Cloud Financials R13 1
7 pages
3dm Classroom Handbook
No ratings yet
3dm Classroom Handbook
27 pages
PL SQL Code Protection
No ratings yet
PL SQL Code Protection
124 pages
Technote Browser Script v8
No ratings yet
Technote Browser Script v8
38 pages
ATTESTATION-aPPOINTMENT pROMOTION
No ratings yet
ATTESTATION-aPPOINTMENT pROMOTION
2 pages
Petrel Course
0% (1)
Petrel Course
79 pages
Ic A210e
No ratings yet
Ic A210e
2 pages
Introduction To Malbolge (Programming in Malbolge)
No ratings yet
Introduction To Malbolge (Programming in Malbolge)
6 pages
Hashing
No ratings yet
Hashing
38 pages
Fire Cam Onyx User Manual: Welcome
No ratings yet
Fire Cam Onyx User Manual: Welcome
12 pages
Internet Programming Lab Manual
40% (5)
Internet Programming Lab Manual
77 pages
RMIT Timetable - July 201 - Student - CS & IT
No ratings yet
RMIT Timetable - July 201 - Student - CS & IT
20 pages
Inglês - Advérbios - Adverbs.
No ratings yet
Inglês - Advérbios - Adverbs.
18 pages
Lab Assignment 06
No ratings yet
Lab Assignment 06
10 pages
Applying in Mercantile Bank Limited
No ratings yet
Applying in Mercantile Bank Limited
1 page
StatementofAccount 5012200653 3102022151739
No ratings yet
StatementofAccount 5012200653 3102022151739
2 pages
Giga g41mt s2pt Rev.2.0
No ratings yet
Giga g41mt s2pt Rev.2.0
33 pages
Using The FarCry Dedicated Server (Linux)
No ratings yet
Using The FarCry Dedicated Server (Linux)
7 pages
Konica Minolta Bizhub 501 All Active Solutions
No ratings yet
Konica Minolta Bizhub 501 All Active Solutions
125 pages
Fortran 77 Tutorial
0% (1)
Fortran 77 Tutorial
20 pages

PIPELINE

Uploaded by

PIPELINE

Uploaded by

PIPELINE

Latency of the pipeline would be 5*(7+1) = 40ns

A non-pipeline processor X has a clock frequency of 2.5GHz and an average CPI of

• I : M[R +1000] <- R

speed up = Twp(without pipeline) / Tp(with pipeline)

.55*1 ) * (1/ 800*10 ) 6

Correct answer is (b).

Total 14 clock cycles are needed, i.e. 14 x 11 = 154 nsec.

Parameter Pipelined Version Non-Pipelined Version

CPI for ALU instructions 1 1

CPI for Control 2 1

CPI for Memory 2.7 1

Average CPI for Pipelined Version = (0.2*1 + 0.1*2 + 0.7*2.7) = 2.29

Average CPI for Pipelined Version = (0.8*1 + 0.1*2 + 0.1*2.7) = 1.27

(a) Branch Prediction

(b) Instruction Scheduling

It addresses structural hazards and data hazards. It addresses data hazards by

(c) delay slots

(d) increasing availability of functional units (ALUs, adders etc)

It helps to avoid structural hazards. It is possible to run multiple instructions of the

Which is the Incorrect statement/s:

A. An instruction A is said to be dependent on an instruction B if the A’s execution is

An instruction A is said to be dependent on an instruction B if the A’s execution is

Solution: RAW =9, WAR = 6, WAW=2

From Instr To Instr From Instr To Instr From Instr To Instr

For four instructions, the number of unique comparisons:

(2(3) + 2(2) + 2(1)) + (2(3) + 2(2) + 2(1)) + (3 + 2 + 1) = 30

The general equation for N instructions = (5*(n-1)*n)/2

When pipeline is filled,

pipeline) Total time = 10 *(N*(1+f) + 4)

Non pipelined system takes 130ns to process an instruction . A program of 1000

Determine speed up ratio of pipeline.

• Total number of instruction/task

Total time required to perform a single task in pipelined processor

speedup =4.316ns is the answer.

You might also like

.551 ) (1/ 800*10 ) 6

Average CPI for Pipelined Version = (0.21 + 0.12 + 0.7*2.7) = 2.29

Average CPI for Pipelined Version = (0.81 + 0.12 + 0.1*2.7) = 1.27

The general equation for N instructions = (5(n-1)n)/2

pipeline) Total time = 10 (N(1+f) + 4)