- Consider two instructions i and j with i preceding j, eg:
→ Inst(i)
→ Inst(j) j = i+1
- Read After Write
→ instruction j reads source BEFORE instruction i writes.
→ where possible solve using forwarding
- Write after Write
→ Instruction j writes an operand before instruction i writes to the same target
→ left with the wrong operand.
→ problem when multiple pipes exist in the same machine
- Write after Read
→ instruction j writes before instruction i reads
→ results of instruction i based on result of successor instruction j
→ problem with multiple pipes
Pipelining and multicycle operations.
- different operations take a different number of cycles to complete.
- we actually have 4 different execution units:
→ Integer unit
→ FP/integer multiplication
→ FP adder
→ FP/integer divider
- This results in latency: the
→ one less than the number of stages in that part of the pipe
- rRepeat interval: how long you have to wait before releasing another of that type of instruciton
- the division unit is not necessarily pipelined
→ after 2000, intel finally pipelined this stage.
- the integer unit only has one stage
- the multiplication unit has 7 stages and is pipelined
- the FP adder has 4 stages and is pipelined
- the division unit is not pipelined.
In the case of the FP Adder:
IF / ID / A1 / A2 / A3 / A4 / MEM / WB
/ IF / ID / X / X / X / A1 / ...
Latency = # stages - 1
This is the case of a dependency
If there is no dependency, there will be no latency.
Therefore the repeat interval will be 1
( # of stages + 1 ? )
Hazards can actually happen when we consider that execution can take a different number of cycles.
There's a structural hazard that's not represented. There are several instructions trying to do WB and MEM at the same time.
Decode has to figure out if these instructions will clash and insert stall cycles. Index