A book each of us should read: Soul of a New Machine by Tracy Kidder
Problem: Addc.
- Takes a source, a destination, the carry bit, and word/byte
- Handles Add, Addc, Sub, subc and CMP
→ carry for ADD: 0
⇒ SRC, 0
→ carry for ADDC: psw.c
⇒ SRC, psw.c
→ SUB
⇒ ~SRC , carry is 1
→ SUBC
⇒ ~SRC, psw.c
→ CMP
⇒ ~SRC, carry is 1
→ ~ = 2's compplement
- because subtraction is the addition of the 2's complement
- The mistake was in the update PSW function, which takes src dst result w/b
- it should be SRC + Carry, dst, res, w/b
- res = dst + [src + carry]
CACHE:
- Cache puts a fast memory between the CPU and the main memory bank, since accessing memory is expensive, in hopes that it will speed up the cpu performance.
Two cache organizations:
Direct mapping:
• essentially a hash function
• Take the address, from the MAR, which maps to a location in the cache.
• Mask off the least significant 4 (for example) bits, giving us 16 possible cache locations that addresses can map to.
• If we have location 1000, it maps to cache[0]
• This is great until we get to 1010, because it also maps to that location.
• The program in cache is 8 words long. If our entire program can fit in the cache then it will run super fast.
• Address 1002 would map to cache[2]... etc etc
The problem is we keep accessing the same location
Assume, for the sake of argument that we have a very small program.
Say it starts at 1000. Say the data begins at 2000.
If we are using direct mapping, we will read in 1000, and its intruction. Then 1002...
Then let's say the code accesses 2000. That means that 1000 gets overwritten in cache.
If this happens to be a loop, when we come back to 1000 the value will be overwritten again.
To combat this, we can try associative
Associative:
• essentially a table lookup.
• Associative is effectively a linear search. The structure internally isn't, but from a logical/software perspective it can be considered a linear search.
• In the case of the example, we can have all 4 lines of our program occupying the first four lines of the cache. And we have space leftover to store address 2000.
• Now, each time we access memory, the function is the quivalent of a linear search, it goes through the cache and finds what it needs without overwriting anything. Instead, things are written to the next available space in cache.
• We have effectively reduced collisions that take place.
• What we're trying to do : hits vs misses
◇ Hits: when we access the cache, the contents exist.
◇ Miss: access the cache, the item is not there. Then we have to service the miss (presumably a fetch)
What if we have a miss?
That is, the cache is searched and the item is not there. (if the item is found, it gets returned.)
If the item isn't there, we have to go out to main memory.
We might have to kick something out of the cache.
Assuming we're using direct mapping:
The item we're removing from the cache may need to be written back to main memory. Only if the item is data (not instruction) and has been changed will it need to be pushed to the main memory.
The way we know whether a cache line's value has been changed is the dirty bit.
0: not changed
1: changed
We check the state of the dirty bit. If it's set we take the contents of the cache line and write them to main memory.
If it's not set, we fetch the contents indicated by the MAR into the cache line from main memory.
Then we clear the dirty bit.
(we also perform the fetch if the dirty bit is set, after writing the value to memory)
Then we return the value to the CPU.
When do we set the dirty bit? → When we are performing a store.
If we get a hit on the cache, and the operation is a write, then we perform the write to the cahce and set the dirty bit.
In the case of a write-miss:
if the dirty bit is set, we write the contents to primary memory, then update the contents of the cache line, then set the dirty bit.
If the dirty bit is not set, then we can simply write to the cache line and set the dirty bit.
Cache Policy:
Write-through, WT:
- Whenever a write occurs to the cache, we always write to primary memory.
- Keeps cache in sync with PM
- Useful for keeping memory consistent
- Useful for shared memory. If other CPU's or devices are trying to access memory, it is important that they're not accessing outdated data.
Write-back, WB:
- Write as much as we want to a specific cache line
- If we keep getting hits, no writing to PM
- If we get a miss: write to PM
- Writeback only occurs when there is a miss
Now we know what to do when we get a miss, but how do we know who to kick out?
- Usually referred to as the least recently used.
How do we get something to become the LRU?
- We have finite space in the cache line. We already have contents, address, and a dirty bit.
- We need some way of indicating the line's usage...age
- This will be shown as Most recently used.... to least recently used.
- We don't use actual timestamps because they are expensive
- We use 2 bits.
- Let's assume we have a program that starts at Location 1000
→ 1000: MOVLZ #0, R0
→ 1002: ADD R0, R1
→ 1004: ADD #1, R0
→ 1006:
→ 1008: BLT LOOP
- MRU: cache size - 1
- LRU: 0
- 4-1 = 3... 3 = MRU
- 2 = next MRU
- ...
- 0 = LRU
When we fetch something into the cache, it gets the highest number (cache size -1) to indicate it's the youngest. In this example, it gets 3.
What happens is, each time we fetch we have a new MRU. Subtract 1 from all usage values > than the cache line's usage. (usually 0)
On a hit, we just change the usage value. Index