2023/06/05 - 15:07

What were the three tpyes of data hazards?


Read after write


R[x] ←
__ ← R[x]

Write after write


R[x] ←
R[x] ←

Write after read


___ ← R[x]
R[x] ←

We can force things to act more independently.

Tomasulo Algorithm


Consider the following code snippet:

F0(A) ← F2/ F4
F6(S) ← F0(A) + F8
mem[0+R1] ← F6(S)
F8(T) ← F10 - F14
F6 ← F10 x F8(T)

The last reference to F6 is not dependent on the first one, since the previous value of F6 is not USED but it is simply overwritten. Therefore if you can re-label the first two references as S (since they are dependent) you can perform the operations separately.
What the hardware does is spot the pairs of dependencies and relabel all of them so the dependenceis are local.
As long as the later F6 is decoupled from the previous instance, the two chunks of code are completely independent.

F0(A) ← F2/ F4
S ← F0(A) + F8
mem[0+R1] ← S
T ← F10 - F14
F6 ← F10 x T

The Tomasulo algorithm is essentially: rename the register pairs.
The renaming happens at reservation stations... This will become a new data structure ( a hardware data structure)
In a sense, it becomes a buffer for a particular instance of a register value between two sequential instructions.
“Something of a sleight of hand”
We collect material from GP registers, or we can indicate their dependencies.
We push values into the data structure as soon as they become available.

Split decode into Decode and Issue.

Look at the instruction at the head of the queue. Which execution unit is involved? In his example, the operation is F8 ← F10 + F14 which would use the adder component.
F8 and F10 are known values, put into the adder.
As far as issue is concerned, that is all that needs to be done for that instruction.
The next instruction involved the multiplcation unit.
F6 ← F10 x F8
It sees F8 in the instruction and notices the dependency. It labels F8 to be S
F10 is given another label, M
Every write produces a label.
The instruction at the multiplication unit was waiting for S. When S is received, it can continute execution of M
If there are several insrtuctions in the queue for an execution unit, the one that is not waiting for any operands will be executed first.
As soon as both operands are known, push the instruction to execution.
except NOT in the load and store buffers.
It's important that the loads and stores happen sequentially relative to the other load and store operations.

Q: At what stage does the labelling happen?
A: The instruction queue is testing for structural hazards. Boils down to : do you run out of reservation stations. Data hazards are resolved by the register renaming

Before issue is decode. So it's like saying decode takes 2 clock cycles since its complexity has increased.

Basic execution ordeR:
1. Issue:
1) If theres a reservation station free at a functional unit THEN
1- assign instruction to rese3rvation station ELSE
2- structural hazard detected.

Reservation stations support resolution of WAR and WAW (and RAW ?) hazards.

2. Execute:
1) RAW and WAW hazards
2) IF missing operands (still being evaluated) THEN
1- wait for CDB (common data bus) update
2- ELSE queue for execution
3) It's possible that multiple instructions have operands resolved in the same cycle for the same functional unit.
4) Load/Store require FIFO queue to avoid RAW or WAW hazards. Only commit with regard to the queue head.

Everything that we're looking at here does not involve branches, because branches get resolved in the fetch stage

3. Write result
1) Update reservation stations and GPR once execution is complete.


Reservation Stations


7 fields:
- busy
- OP (opcode)
- Vj, Vk (resolved src operands value
- Oj, Ok (pending
- he changed the slide :(

Consider the following code block:
F6* ← MEM[R2 + 34]
F2 ← MEM[R3 + 45]
F0 ← F2 x F4
F8 ← F2 - F6*
F10 ← F0 / F6
F6 ← F8 + F2




12345678910111213141516171819202122232425
F6 <-- Mem[R2 + 34]FDIEMWLd1
F2 <-- Mem[R3 + 45]FDIEMWLd2
F0 <-- F2 X F4FDI-- (waiting for operands)M1M2M3M4M5M6M7M8W<-- theres no mem stage, it just writes to the common data bus.Mul1
F8 <-- F2 - F6FDI-A1A2A3A4WAdd1
F10 <-- F0 / F6FDI <-- F6 pushed to res. Station. F0 is unknown but mark its dependent reservation station.---------D1.........D24W
F6 <-- F8 + F2FDI----A1A2A3A4W<-- this doesnt cause a WAR hazard because the previous reference to F6 was copied into the reservation station when that instruction was read
KnownsPendings
Clock cycle 6:IDBUSYOPVjVkOjOkA<-- address
Reservation stationsLd1yloady<-- it is in writeback, dont need the address anymore. When writeback completes, the status will flip to not busy.
Ld2yloadR3 + 45
Mul1yxF4Ld2
^ put in the ID that will tell us where to find this result
Target Registers
FieldF0..F2..F4..F6..F8..F10
Q:Mul1Ld2Ld1



Index