Presentation 1

Reconfigurable Computing

- why we use it
- advantages and disadvantages
-

- in traditional computing we have appilication specific intgerated circuuiot (in the hardware)
- microprocessors
- start with some scenariios
→ excel 2010, and you want to update to the latest version
→ go to microsoft,com and download the latest version
→ another scenario, you used 1 gigabit ethernet in ur pc and u want to switch to 10 gigabit.
⇒ since its eithernet you would have to buy a new one and get rid of it
→ this action is performed in hardware

- microprocessors are more flexible than ASIC
- the problem is that their performance is worse than asic, we have to read instructions from memory and then determine its meainign and only then execute it. leads to high execution overhead
- we want high performance provided by hrdware, flexibility profived by software
- systems incorporating some form of hardware programmability. some physical control point, can manuilulatpe them to change excution using the same hardware.
- research areas"
→ field programmable gate arrays FGPA
⇒ FPGA are electrolonically programmable. mix of PAL and MPGA
⇒ complex computation on a single chip
→ glue logic replacement and rapid prototyping vehicles. the flexibiulity, capacity, and oerfiormance of these devices has iopened u complety new avenues in high perfomance computation forming the basis iof UUUUGHHH
- LOOKUP TABELS ARE THE LOGICAL STRUCTURE OF THE FPGA oops caps

-hardware
→ one of the primary differences is the degree of coupling with the hhost pmicroprcoesor. micro promgrammable logic isineffcient for looping and branch control, fro some of the logic we need a host microprocessor
-software
→ to make the reconfigurability more of a widespread use we need to provide a software design envrinrment.
- runtime reconfiguration
→ usually the prorams we want to accelerate are too complex to be loaded simulataneously into avaikae hardaare so we want to be able to swap in and out of reconfigurable hardware
→ more sections of an app can be mapped into hardware than can fit slkjgnsdkj

advantages
- grater functionality
→ possible to ahicve tgeater funcitonality with simple hardware design
→ leads to increasd speed and decreased engery consum,tpn
- embedded characterisics
→ if we discover a new edge case engineers profram a new compinent to handle the anomaly. the reconfighrare hardware fabric can
- lower system cost
→ we can extend the useful life of a system singe the reconfigurable components are upgradable
- reduced tiem to market
→ since we can make changes over time we can ship and make changes over time

disadvantages
- placement issuesing
- we do not have infinite space so the hardware needs to be able to make changes with the space available, this can be an issue if we have to place the chages near other shit
-
- routing
→ existimg comopinents need to be connected to the new oens. if we cant etablish teh coneection then the new configuration and the old one arent connected and it wont work. we need to use the old parts that were previously used and if we cant figure out a way around this then it will leed to routing issues
- timing
→ new hardware must meet the timiing requiremorents for hte continued operation of te hcircuit.
→ we are makign changes as we perfom the aciton so there are timing constraints. if we cant make the new system fit the timing then it can lead to overtimeing
- consistency issues
→ static or dynamic reconfiguration of hte decice should not degrade th cputational consitency of the deign

jesus h chrsit that was awful

architectural supprort:
fpga:
- diff manufacterers have diff lingo
- configurable logic block - basiuc unit
- switch block - routing
- connection block
- CLB:
→ set of inputs and one ouput
→ lets say we work a company where we need to create a gizmo. we need to create a circuit with XYZ specs.
→ xor gate: we can treat the inputs as indexes into a lookup table.
⇒ the truth table foran xor gate becomes the lookup table.
⇒ this with a MUX is a CLB.
⇒ define the behaviour with a bitstream such that 01101 defines the lookup table and the last bit is the ctrl bit of the mux.
→ CLBs can replace the AND gate with a CLB with bitstream 00010 (lookup table) and the ctrl bit is 0 because we dotn need the output to be clocked.
→ we can also make an OR gate with 01111.
→ basically u can program the CLBs to be whatever

- switch blocks:
→ what if we havbe lots of inputs
→ we can havea bunch of CLBS interspersed with swtich blocks, which are a bunch of wires, at each connection they have a bunch of transistors which we can define with a bitstream to dictate which path along the wires the signal can travel. you can reconfigure the flow of information
- conecction blocks are like switch blocks but they are fixed

- adding complexity
→ we can make a 3 input CLB with a larger lookup table.
- generating the bitstream
→ verilog allwos u to define stuff from varying levels of abstraction
→ define what we want to happen (high level design) and set constraints (synthesis phase), physcially map to FPGA (routing), to get a bit stream

killer uses
- parallelization
→ bing search engine
→ imagep rocesssing
⇒ its mostly done in parallel and theres a lot of tasks we can do on an image. having hardware that can change its specialization makes it much faster than having general purpose
- hardware emulation
→ people try to do software emulation for old hardware.
- self managing real time in high failure setting
→ iamgine you have a hardware in a place where you have damage being done to the system, like a satellite in space, and you want to make sure your hardware stays functioning. FPGA can reroute tasks when paths become unavailable
- routers and swtiches
→ need to bereconfigured
- iot and machine learning
→ imagine youre doing millions of results and you need to analyze results in hardware to work on stuff on the fly?

- case study of large scale datacenters
→ datacenters are expensive and delicate
⇒ hardware must me homogenous
⇒ cutting edge
→ this calls for reconfigurable hardware

- if there are too many FPGAs it becomes too expensive, too few it makes no difference
- catapult fabric is made from many fpgas. 48 servers get 1 catapult fabric ?? we can use thousands of fpgas for the entire bing search engine
- bro is not making sense
- great for large scale... system for recovering form corruption...
- requirements
→ need to avoid complexity, homogenous
→ scalable
→ power efficient
→ should not require too much space. needs to fit in gaps of current systems
→ need to recover/recofigure itself
- integration:
→ one fpga server in a rack of other servers. there is no one point of failure and there is no bottleneck,
→ scalbility
⇒ high throoughput and bidirectional
⇒ w
board design
- design of the server had to remain the same
- there is a problem: exhaust heat
- resiliency:
- total cost of ownership. less than 30% increase in cost
- datacenter deployment

-software inteface
it had to properly check the hardware requirements, it needs to check if it can replace the old hardware. no optimization that can only happen in fpga
catapult fabric has seven (several?) fpgas
what happens from the hardware is it creates querys (pairs) send it to the fpga and it ranks them and returns them
the fpga will look through the pair and find metadata
to make it faster data is compressed to below 64 kb
- macro pipeline
→ free
- queue manager\
→ has multiple queues
→ fills by longest latency fuirst
→ priority queue
- model reload
→ model reload command is time consuming but not as much as recreating thentire gpfgpspfgg
- ffeature extraction
→ in feature extraction a stream of processing data is setnt, ther are finite stte machines for calcualting th
- free form extraction
- document scoring
→ free form expression are turned into a 4 byute score.. something something 8 microseconds
- evaluation
- throughput gains

conclusion
- FPGA can be used to robustly accelerate large scale services of a datacenter 95% throughput was gained with only 10% power consuption increase
- the tools have to be more easy to learn

- Why was the particular approach to computation adopted in preference to a traditional computer architecture?

- What are the architectural innovations supporting the proposed approach to computation?

- What advantages and disadvantages result from adopting the proposed approach to computation?

- Given the unique properties of the proposed computing platform, how did the authors go about measuring performance?

- What are the ‘killer applications' for the proposed approach to computation

Index