Lecture-11 (Dynamic Scheduling)
CS422-Spring 2018
Biswa@CSE-IITK
How to Make CPI closer to One
• Let’s assume full pipelining:
– If we have a 4-cycle latency, then we need 3 instructions between a producing
instruction and its use:
multf $F0,$F2,$F4
delay-1
delay-2
delay-3
addf $F6,$F10,$F0 Earliest forwarding for
4-cycle instructions
Earliest forwarding for
1-cycle instructions
Fetch Decode Ex1 Ex2 Ex3 Ex4 WB
addf delay3 delay2 delay1 multf
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 2
Where Are Stalls?
Loop: LD F0,0(R1) ;F0=vector element
ADDD F4,F0,F2 ;add scalar from F2
SD 0(R1),F4 ;store result
SUBI R1,R1,8 ;decrement pointer 8B (DW)
BNEZ R1,Loop ;branch R1!=zero
NOP ;delayed branch slot
Instruction Instruction Execution Latency in Use Latency in
producing result using result clock cycles clock cycles
FP ALU op Another FP ALU op 4 3
FP ALU op Store double 4 2
Load double FP ALU op 2 1
Load double Store double 2 0
Integer op Integer op 1 0
• Where are the stalls?
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 3
Rewrite The Code
1 Loop: LD F0,0(R1) ;F0=vector element
2 stall
3 ADDD F4,F0,F2 ;add scalar in F2
4 stall
5 stall
6 SD 0(R1),F4 ;store result
7 SUBI R1,R1,8 ;decrement pointer 8B (DW)
8 BNEZ R1,Loop ;branch R1!=zero
9 stall ;delayed branch slot
Instruction Instruction Use Latency in
producing result using result clock cycles
FP ALU op Another FP ALU op 3
FP ALU op Store double 2
Load double FP ALU op 1
• 9 clocks: Rewrite code to minimize stalls?
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 4
Revised Loop
1 Loop: LD F0,0(R1)
2 stall
3 ADDD F4,F0,F2
4 SUBI R1,R1,8
5 BNEZ R1,Loop ;delayed branch
6 SD 8(R1),F4 ;altered when move past SUBI
Swap BNEZ and SD by changing address of SD
Instruction Instruction Use Latency in
producing result using result clock cycles
FP ALU op Another FP ALU op 3
FP ALU op Store double 2
Load double FP ALU op 1
6 clocks: Unroll loop 4 times code to make faster?
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 5
Unroll It 4 times
1 cycle stall
1 Loop:LD F0,0(R1)
2 ADDD F4,F0,F2 2 cycles stall
3 SD 0(R1),F4 ;drop SUBI & BNEZ
4 LD F6,-8(R1) Rewrite loop to minimize stalls?
5 ADDD F8,F6,F2
6 SD -8(R1),F8 ;drop SUBI & BNEZ
7 LD F10,-16(R1)
8 ADDD F12,F10,F2
9 SD -16(R1),F12 ;drop SUBI & BNEZ
10 LD F14,-24(R1)
11 ADDD F16,F14,F2
12 SD -24(R1),F16
13 SUBI R1,R1,#32 ;alter to 4*8
14 BNEZ R1,LOOP
15 NOP
15 + 4 x (1+2) = 27 clock cycles, or 6.8 per iteration
Assumes R1 is multiple of 4
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 6
Even Better?
Unrolled Loop That Minimizes Stalls
1 Loop:LD F0,0(R1)
2 LD F6,-8(R1)
3 LD F10,-16(R1)
4 LD F14,-24(R1)
5 ADDD F4,F0,F2
6 ADDD F8,F6,F2
7 ADDD F12,F10,F2
8 ADDD F16,F14,F2
9 SD 0(R1),F4
10 SD -8(R1),F8
11 SD -16(R1),F12
12 SUBI R1,R1,#32
13 BNEZ R1,LOOP
14 SD 8(R1),F16 ; 8-32 = -24
14 clock cycles, or 3.5 per iteration
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 7
When Safe to Unroll?
• Example: Where are data dependencies?
(A,B,C distinct & nonoverlapping)
for (i=0; i<100; i=i+1) {
A[i+1] = A[i] + C[i]; /* S1 */
B[i+1] = B[i] + A[i+1]; /* S2 */
}
1. S2 uses the value, A[i+1], computed by S1 in the same iteration.
2. S1 uses a value computed by S1 in an earlier iteration, since iteration i computes A[i+1]
which is read in iteration i+1. The same is true of S2 for B[i] and B[i+1].
This is a “loop-carried dependence”: between iterations
• For our prior example, each iteration was distinct
– In this case, iterations can’t be executed in parallel, Right????
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 8
Out-of-order + Dynamic Scheduling ?
• Pipelining: Tries to achieve CPI =1
• Compiler scheduling minimizes the impacts of dependences.
• Hardware scheduling so far: In order execution
Instructions after stall must wait even if independent.
Dynamic scheduling: Out of order execution
Hardware lookahead of blocked instructions
• Inorder, O3
• Inorder issue, O3 execute, Inorder completion
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 9
Scoreboard
• Out-of-order execution divides ID stage:
1. Issue - decode instructions, check for structural hazards
2. Read operands - wait until no data hazards, then read operands (RAW)
3. Execute - Execute instruction and notify scoreboard when done
4. Write - Wait until earlier instructions read operands before writing to register file
(WAR)
• Scoreboards date to CDC6600 in 1963
• Instructions execute whenever not dependent on previous instructions and no hazards.
• CDC 6600: In order issue, out-of-order execution, out-of-order commit (or completion)
– No forwarding!
– Imprecise interrupt/exception model for now
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 10
Four Stages of Scoreboard Control - Details
• Issue—decode instructions & check for structural hazards (ID1)
– Instructions issued in program order (for hazard checking)
– Don’t issue if structural hazard
– Don’t issue if instruction is output dependent on any previously issued but
uncompleted instruction (no WAW hazards)
• Read operands—wait until no data hazards, then read operands (ID2)
– All real dependencies (RAW hazards) resolved in this stage, since we wait for
instructions to write back data.
– No forwarding of data in this model!
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 11
Four Stages of Scoreboard Control
• Execution—operate on operands (EX)
– The functional unit begins execution upon receiving operands. When the result is
ready, it notifies the scoreboard that it has completed execution.
• Write result—finish execution (WB)
– Stall until no WAR hazards with previous instructions:
Example: DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F8,F8,F14
CDC 6600 scoreboard would stall SUBD until ADDD reads operands
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 12
Three Parts of the Scoreboard
• Instruction status:
Which of 4 steps the instruction is in
• Functional unit status:—Indicates the state of the functional unit (FU). 9 fields for each
functional unit
Busy: Indicates whether the unit is busy or not
Op: Operation to perform in the unit (e.g., + or –)
Fi: Destination register
Fj,Fk: Source-register numbers
Qj,Qk: Functional units producing source registers Fj, Fk
Rj,Rk: Flags indicating when Fj, Fk are ready
• Register result status—Indicates which functional unit will write each register, if one
exists. Blank when no pending instructions will write that register
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 13
Possible Architecture
FP Mult
FP Mult
Functional Units
Registers
FP Divide
FP Add
Integer
SCOREBOARD Memory
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 14
Scoreboard Implications
• Out-of-order completion => WAR, WAW hazards ?
• Solutions for WAR:
– Stall write-back until registers have been read
– Read registers only during Read Operands stage
• Solution for WAW:
– Detect hazard and stall issue of new instruction until other instruction completes
• No register renaming
• Need to have multiple instructions in execution phase => multiple execution units or
pipelined execution units
• Scoreboard keeps track of dependencies between instructions that have already issued
• Scoreboard replaces ID, EX, WB with 4 stages
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 15
Scoreboard Example
Instruction status: Read Exec Write Integer: 1 cycle
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 FP add: 2 cycles
LD F2 45+ R3 FP multiply: 10 cycles
FP divide: 40 cycles
MULTD F0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
Divide No
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
FU
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 16
Cycle 1
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1
LD F2 45+ R3
MULTD F0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Integer
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 17
Cycle 2
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2
LD F2 45+ R3
MULTD F0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Integer
• Issue 2nd LD? Can’t since integer unit is busy.
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 18
Cycle 3
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3
LD F2 45+ R3
MULTD F0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 No
Mult1 No
Mult2 No
Add No
Divide No
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Integer
• Issue MULT? • F2?
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 19
Cycle 4
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3
MULTD F0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
Divide No
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Integer
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 20
Cycle 5
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5
MULTD F0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Integer
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 21
Cycle 6
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6
MULTD F0 F2 F4 6
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add No
Divide No
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 Integer
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 22
Cycle 7
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7
MULTD F0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 No
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add Yes Sub F8 F6 F2 Integer Yes No
Divide No
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 Integer Add
• Read multiply operands? • LOAD is not done yet
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 23
Cycle 8 (1st half)
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7
MULTD F0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 No
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add Yes Sub F8 F6 F2 Integer Yes No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Integer Add Divide
DIVD issues. MULT and SUBD. Both waiting for F2. LD #2 writes F2.
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 24
Cycle 8 (2nd Half)
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Add Divide
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 25
Cycle 9
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Note 10 Mult1 Yes Mult F0 F2 F4 Yes Yes
Remaining Mult2 No
2 Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 Add Divide
• Read operands for MULT & SUB? Issue ADDD?
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 26
Cycle 10
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
9 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
1 Add Yes Sub F8 F6 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
10 FU Mult1 Add Divide
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 27
Cycle 11
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0
SUBD F8
F2
F6
F4
F2
6
7
9
9 11
ADDD can’t start because add unit is busy
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
8 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
0 Add Yes Sub F8 F6 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 Add Divide
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 28
Cycle 12
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
7 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 Divide
• Read operands for DIVD?
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 29
Cycle 13
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
6 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
13 FU Mult1 Add Divide
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 30
Cycle 14
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
5 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
2 Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
14 FU Mult1 Add Divide
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 31
Cycle 15
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
4 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
1 Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 Add Divide
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 32
Cycle 16
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
3 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
0 Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU Mult1 Add Divide
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 33
Cycle 17
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 WAR Hazard!
ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
2 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
17 FU Mult1 Add Divide
• Why not write result of ADD???
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 34
Cycle 18
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
1 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
18 FU Mult1 Add Divide
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 35
Cycle 19
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
0 Mult1 Yes Mult F0 F2 F4 No No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
19 FU Mult1 Add Divide
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 36
Cycle 20
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Yes Yes
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
20 FU Add Divide
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 37
Cycle 21
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21
ADDD F6 F8 F2 13 14 16
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add Yes Add F6 F8 F2 No No
Divide Yes Div F10 F0 F6 Yes Yes
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
21 FU Add Divide
• WAR Hazard is now gone...
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 38
Cycle 22
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21
ADDD F6 F8 F2 13 14 16 22
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
39 Divide Yes Div F10 F0 F6 No No
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
22 FU Divide
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 39
Cycle 61
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21 61
ADDD F6 F8 F2 13 14 16 22
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
0 Divide Yes Div F10 F0 F6 No No
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
61 FU Divide
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 40
Cycle 62
Instruction status: Read Exec Write
Instruction j k Issue Oper Comp Result
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTD F0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21 61 62
ADDD F6 F8 F2 13 14 16 22
Functional unit status: dest S1 S2 FU FU Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
Divide No
Register result status:
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
62 FU
• In-order issue; out-of-order execute & commit
CS422: Spring 2018 Biswabandan Panda, CSE@IITK 41