2023/05/17 - 15:12
gonna be honest i havent really been paying attention
this was the delay slot stuff idk
Analyzing Basic Pipeline Performance
Speedup from pipelining is Avg instr time unpipelined / avg intr time pipelined.
CPI(unpiped) x Tclk (unpiped)
--------------------------------------------
CPI(piped) x Tclk (piped)
If our clock period doesnt change, everything is about CPI
Ideal CPI is 1
CPI(piped) = CPI Ideal + Pipeline stall cycles per instruction = 1 + stalls
speedup = CPI(unpipied) / (1 + stals)
Example:
frequencies:
- conditionals: 15%, of which 60% are taken
- jumps and calls: 1%
the architcture consits of a 4 stage pipe in which the branch is resolved:
at the end of the second cycle for unconditional branches
at the end of the third cycle for conditional branches
CPI(unpiped) = 4
1: Unconditional branches: frequency of 1%
- the earliest we can know the correct branch is the second clock cycle
- the branch automatically fetched during the second clock cycle is wrong. the one forwarded from the decode stage if the first instruction is correct.
- we lose 1 clock cycle
2: conditional branch taken: (15 x 60)% = 9%
The earlist this can be resolved is the third clock cycle.
- we lose 2 clock cycles
3: conditional branch not taken: frequency of (15 x 40)% = 6%
dont know until end of the third clock cycle
only lose 1 clock cycle
stall until we know the outcome of the branch
decode is the earliest you know youve got a branch, and its not resolved until the next clock cycle (execute)
CPI Stalls = Sum of freq x stalls
sum of the frequency of seeing each type of stall
= (1x1 + 6x1)% + 2 x 9% = 0.25
Speedup w.r.t an ideal pipe = 1/1.25 = 0.8 or 20% slower
speedup w.r.t no pipe = 4/1.25 = 3.2 Index