Last of Structural Parallelism
What is speculative execution?
- regards control hazards
- You have to wait until branch resolution
- You cant commence execution of other intstructions until the branch is finally resolved.
- Superscalar means fetching more than one instruction percycle. this means you fill up the hardware that much quicker. We've just made the problem bigger
- What if the branch target buffer predicts taken
→ you can start executung earlier and not wait for resolution. but then you need to have a way to undo things.
Dynamic scheduling: tomasulo algorithm
Let's say we have a sequennce of instructions in the instruction queue.
Let's say instruction A ends up at the multiplication unit ... it gets pushed to the reorder buffer
Whatever the order the instructions were in the instruction quque, is the order they end up in the reorder buffer.
The order they write to the common data bus is not going to be the same as the order they were pushed into the reservation stations.
The CDB has two paths: one of them goes back to the reservation stations, filling in blank operands.
We do not push directly to the registers because we cannot undo anything once it goes there.
We also can't under writes to the actual memory.
The only tihng we can do with memory is load.
We only commit against the head of the reorder buffer
If we know the target value of the item at the head of the buffer, maybe we can commit it to the registers.
Then we move the head of the queue along
Maybe the next one is a multiplication that is waiting for a result still. It will wait at the head of the reorder buffer until it completes.
If there;s a store instruction, we push the address into the reorder buffer, but do not execute it until the address appears at the head of the queue.
Branch instructions will sit in the reorder buffer. We can speculatively execute all kidns of things while waiting, but nothing can be written to the registers until the branch is resolved. It will wait at the head of the queue.
Then if the branch is taken, everything that was put into the reorder buffer is still valid. But in the case that the branch was not taken, the reorder buffer is flushed (after the branch instruction) because you know that is not the correct path of execution
If it passes the head of the reorder buffer, we don't need to undo it.
Commits are only ever relevant to the head of the re-order buffer
When a “writeback” stage occurs, they only ever writeback to the reservation stations
This is as far as instruction level parallelism goes.
These developments came in the early 2000s. Emphasis now has been on multi-core cpus
Now they sort if get rid of the ROB and have an index field in the reservation stations, but that's functionally the same thing, it;s just a little lighter from a hardware perspective.
No architecture currently speculates and resolves multiple branches per cycle.
How many outstanding branch resolutions are you willing to service? If you're trying to service multiple branch resolutions, what they're saying is you cant do that. You can't resolve multiple branches.
End of instruction level paralellism Index