Presentation 2

- why?
→ the motivation
→ suppose you have some input nad you want to do something with it, and spi it out as output. you want to perform a computaiton. you generally have two options:
⇒ use software. write some code
⇒ use hardware. design a physcial circuit, a chip to do the computation for you
→ to do the first option you need a device with a processor
→ the second involces ASIC, application specific integrated circuits.
→ processors are neat =, they ar felxible, you can do all kinds of instructions, you can load and store... but that comes with overhead which makes them inefficient
→ in contrast you can skip the overhead with circuits and be more immediate. as soon as youre done you can get the output. but it is not as flexible. because once youve made the harware the design is permanent, changing your design is difficult/impossible becuse you have to make a new circuit.
→ reconfigurable computig tries to be in the middle. you have sometihng thats moe efficient than software but more felxible than traditional hardware
→ to be able to use this we need a new piece of fancy ahrdware called an FPGA

- FPGA
→ we only talked about it as a concept,
→ field programmable gate array
→ prgrammable device /chip which can be the clossest thing to designing your own chip
→ fully electronically programable
⇒ only pro9gram it digitally
→ implement thousands of gates on a single integrated circuti
→ designed to be reconfigurabel out of the box
→ program using high description language such as verilog
→ fun fcact: when you buy one, it actually doesnt do anything. there is no logic hard wired on the thing, you have to program it

- history
→ intially it wa created as a hybrig between PAL and MPGA
→ because PAL are fully electronically programmabable. MPGA can implekent thaousands of stuff
→ we also need something that is for rapid prototyping, because what if you design something wrong. being able to change the design is nice.

genral layout
- they have a CLB - configurable logic block
- io bank, configured to be input or output
- interconnected metal wire which can be joined in any configuration possible
- configuration flash memory
- we all know what Static RAM is. SRAM enables you to store bits of data. the caches we learn about use SRAM. most FPGAS are SRAM prgrammable. the SRAM bit is connected to the configration point in the fpga. when you congirre SRAM you also configure FPGA
- the CLB enables you to implement any basic function/logic gate
- for the routing structure, there will be programming bits that will dictate where the signals can pass through
- the metal wires are called <something> style layout.
- the intersection ebtween the line is a switch block
→ this allows which direction the signal can go
- flash memory - the fpga is volatile. when the power is gone, everything disappears. when you boot it up, it will take quite some time to load the configuration back, usually there is a flash memory that will store the last configuration faster.

how do they work?
- compared to normal cpu which is sequentual, it does a fetch, then a decode, thene xecute, then puts data back to memory.
- compared to fpga, there is no software invovled in the pipeline. it grabs a bunch of data from memory and moves it along using parallelism, and stores it back into the memory.
- a single calculation, a cpu can do it better. but it only does one at a time. in an fpga, one single calcualtion takes longer but its doign many at a time.

- how arefpgas used in recofnfgirierieir compiting?
- generally using as a coupling with another host microprossoces
→ use it as a fucntional unit inside the host micrppprocesor
⇒ enables you to use the tradtional programming environment with additional custom instruction
→ also can use it as a co processor with a microprossor
⇒ the co processor can see the cache of the host processor
→ CAN use fpga as a standalone processor ina fmulticore syustem
⇒ this reduces coupling offlaod workload from host processor, but the communication overhead is high
→ can use it as a standalong processor unit independent from the host processor if it exists.
⇒ used in network stations becausea process can be running for a long time wihtout needing to communicate with another processor

but is it just fpga?
- partially reconfigurable fpgas: sometimes only a part that is damaged needs to be reconfigurabel. a normal fpga can refigure all of it, but partual ones onyl recongire parts which makes it faster and have lower utilization
- course grained recofigurable array is a sweet spot between fpga and gpu. it will accelerate over a parallel loop with better power efficiency
- dynamic binary translation
→ translate an optimize binary code on the fly → executed another archiutecutre machine code on a reconfigurable system.
- all these solutions have their pros and cons

avd and disavd"
- you get flixibility and efficiency
- you can great advantage of parallelsim
- you can get good power efficiency

dis:
- espensive if youre not takin advantage of the efficency/flexibility
- like approximate ocmpiting, a programmer needs to be consciiously aware of the fpgas in order to get a use out of them
- they are volatile, forget everything when theyre off
- noticebale delay when reconfigrure them
- arent very standardized. not yet. hard to compare products against each other.

what does it look like applied?
case studies:
- performance metrics
→ latency
⇒ time from first input to first output
→ execution time
⇒ time it takes to excute
→ throughput
⇒ amount of operations completed per second. how effectively parallelized
→ hardware utilization
⇒ important because hardware is custom built, so you should be using it fully.
→ power consumption
⇒ fpgas are custom built, theres no extra hardwarte going to waste so you should be using less power overall.

brainwave:
- a project exploring fpgas to empower azure data centers
- why?
→ moore law has run its course
→ theres a demand for data diven solutions. cpus are not increasing in performance as fast as they used to.
→ project catapult used fpgas for bing search engine
→ brainwave applied to anyone using azue datacenters
→ found out they had less latency, higher tthroughput.... and basically imprived a lot
- had a bunch of cpus on racks and then added a layer of FPGAS.
- the idea is that instead of using CPUs for parallel tasks u feeed themto fpgas
- in this case the fpgas are not application specific since they don't know whats going on on the data centers
- compared to a nvidia gpu, the fpgas sdecreased latency by 100x, +50% thorughput, and +20% hardware utilization

squarte kilometer array:
- worlds largest radio telescope
- built is two locations
- the telescope is one swuare kilometer of antennas
- theyre trying to use fpgas to leverage the performance of parallelization to make the ska perform better
- it creates an exabyte a day
- you cant just take the sensors and feed them through memory and then process them and feed them back,...
- fpgas read the data from the senser in real time, and only when the processing is done is it stored in memory.
- gidding is collecting all the radio information and congomerating it into a grid (image)
- de gridding is taking the image and prodicing the radio signals
- the first time you grid an image its messy and you have to filter out the farther radio waves. all this can be done in parallel which makes it great for fpgas
- the fpgas did 2.74 times better than the xeon and 2.03x better than the nvidia
- the power consumption was the big thing for this use case, and hte fpgas always use less power. they used 7 ish times less than the comparable cpu and gpu.

real time cancer diagnoses
- use fpgas on a small comput cluster
- do basic binary classifciation tasks
- the sooner you detect cancer the better
- using fpga created 144X speedup from a cpu and 21X than a tesla gpu

use cases
- accelerate machine learning
- nearal networks on embedded systems
- any easily paralleizable tasks eg scietfic computing, siugnal processing, cyptography,
- used in health care
- aerospace application requriing robust design

summary
-

- Why was the particular approach to computation adopted in preference to a traditional computer architecture?

- What are the architectural innovations supporting the proposed approach to computation?

- What advantages and disadvantages result from adopting the proposed approach to computation?

- Given the unique properties of the proposed computing platform, how did the authors go about measuring performance?

- What are the ‘killer applications' for the proposed approach to computation

Index