Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs,...
Transcript of Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs,...
![Page 1: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/1.jpg)
wl 2020 1.1
Custom Computing
• theory and practice of customising designs– one of the fastest growing technologies
– impact on ASIC, CPU, many-core, GPU, multi-scale dataflow
• wide range of architectures and applications– data-centre/supercomputers with user-customisable accelerators
– message routers, mobile robots, LCD TVs, car audio systems
– invent processors with your own instruction set!
• based mainly on customisable implementation technology – e.g. Field-Programmable gate Arrays (FPGAs)
– also called reconfigurable computing, FPGA-based computing
• we focus on concepts, abstractions, design methods
• requirement: willing to learn new ideas, languages, tools – not afraid of C/Java/functional programs, maths, hardware
![Page 2: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/2.jpg)
wl 2020 1.2
Course coverage
• topics
– custom computing technology overview
– design parametrisation and optimisation
– system-on-chip architecture and design
• 18 lectures, 8 tutorials (flexible), 1 assessed exercise
• course material
– https://www.doc.ic.ac.uk/~wl/teachlocal/cuscomp
– EEE students: may need access via EEE machines
• preparation for projects and research
– many received project prizes or distinctions
– summer projects for non-MSc students
![Page 3: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/3.jpg)
wl 2020 1.3
Why custom computing?
• FPGAs: customisable hardware resources– data centres for cloud computing
– mobile handsets, Internet of Things (IoT), edge computing
• acceleration of demanding workloads– big data, finance, genomics, weather/climate modelling, – integrated solution: often with interface to memory, sensors…
– target multiple platforms: need to promote design re-use
• design approach: generalisation + customisation– often start with design instance: f0
– generalise f0 to become a template f(x), such that f(x0) = f0where x is a parameter and x0 is a specific value for x
– customise f with values for x to support tradeoff in speed, size…
f0
f(x)
x=x0 f1 f2
f3
generalise customise
x=x1x=x2
x=x3
![Page 4: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/4.jpg)
wl 2020 1.4
Benefits of customisation
• improvements in– accuracy: as needed, not necessarily 8, 32, 64, 128 bits
– throughput: rate of producing results
– latency: time between first input and first output
– reconfiguration time: speed of adapting to changes
– size: area, volume, weight
– energy and power consumption: mobile and remote applications
– development time: design and validation
– cost: minimise fabrication, post-delivery fixes, enhancements
• need to prioritise design objectives– e.g. smallest design at a given speed consuming given energy
• opportunities for customisation– application-oriented, e.g. run-time conditions
– implementation-oriented, e.g. technology used
![Page 5: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/5.jpg)
wl 2020 1.5
Implementation technology
• application-specific integrated circuit (ASIC)– high performance, low part cost: cheap if producing large volume
– high risk, high development cost, slow time-to-market
– costly (Moore’s Second Law) to develop, build and test, inflexible
• Field-Programmable Gate Array (FPGA)– low risk, fast time-to-market, low development cost, high part cost
– post-delivery improvement: fix bugs, update functions
– customisable at run time: adapt to environment changes
– prototype for ASIC
– enable internet routing
• custom computing systems– stand-alone
– PCIe / Infiniband
– system-on-chip: instruction processor + FPGA
![Page 6: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/6.jpg)
wl 2020 1.6
Technology comparison
FPGAs
Efficiency, Performance
Fle
xib
ility
ASICs
General-Purpose
Processors
Digital Signal
Processors
Special-Purpose
Processors
(adapted from K. Fan, HPCA’09)
![Page 7: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/7.jpg)
wl 2020 1.7
Where are FPGAs? Consumer applications
Digital Camera & Editing
LCD Projectors
PDP & HDTV
STB, DVR & VTR
Automotive
Handheld
Automotive
Diagnostics
Home Computing
Home Networking
(source: Xilinx Inc.)
![Page 8: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/8.jpg)
wl 2020 1.8
• Smart NIC (Network Interface Controller)
– compute accelerator: local / remote
– infrastructure accelerator: network / storage
– flexibility of Software Defined Network + speed of hardware
New: accelerators for data centre servers
Source: Microsoft
![Page 9: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/9.jpg)
wl 2020 1.9
Accelerate clouds: Microsoft + Amazon
aws.amazon.com/ec2/instance-types/f1/
www.top500.org/news/microsoft-goes-all-in-for-fpgas-to-build-out-cloud-based-ai/
![Page 10: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/10.jpg)
wl 2020 1.10
Why Intel bought Altera
Source: IntelIP: Intellectual Property
![Page 11: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/11.jpg)
wl 2020 1.11
Source: Intel
Drones + IoT + …
Aerotenna:
Octagonal Pilot on Chip
ASSP: Application-Specific Standard Part
SAM: Serviceable Available Market
![Page 12: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/12.jpg)
wl 2020 1.12
Particle Physics: Large Hadron Collider
(source: Xilinx Inc.)
Opto-RX,
12 way
3 x Delay FPGA
(ADC clk timing)
Virtex II, 2M gate FPGA performs signal processing
Optical ribbon cable input
Opto-to-electrical conversion Digitise & sync data Find hit clusters
• real-time analysis of particle collision
• combine data from various detectors
(source: G. Hall)
![Page 13: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/13.jpg)
wl 2020 1.13
Customisation: pre-fab and post-fab
• fabrication: manufacturing the chip– Xilinx UltraScale FPGA: 16nm, Intel i7-i770T: 22nm
– costly: very small geometry, ultra-clean room
• application-specific integrated circuit (ASIC)– greatest customisation at pre-fabrication, but could be inflexible
– high performance, low part cost: cheap if producing large volume
– high risk, high development cost, slow time-to-market
– costly (in money and time) to develop and test: Moore’s Law
• field-programmable gate array (FPGA)– post-fabrication, post-delivery, even run-time customisation
– hardware speed, software flexibility
– most basic, fine-grained unit of programmability
– need larger function blocks for efficiency
![Page 14: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/14.jpg)
wl 2020 1.14
Design metrics
• NRE (non-recurring engineering) cost– one-time cost of designing system
• total cost: total cost = NRE cost + unit cost * number of units
• size, performance, power
• flexibility– make changes to the hardware with low NRE cost
• time-to-prototype, time-to-market
• maintainability
• correctness, safety, robustness
Source: J. Wong
![Page 15: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/15.jpg)
wl 2020 1.15
FPGA/ASIC crossover points
Production Volume
Co
st
FPGA Cost Advantage ASIC Cost AdvantageFPGA Cost Advantage ASIC Cost AdvantageFPGA Cost Advantage
Source: S.S.S.P. Rao
![Page 16: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/16.jpg)
wl 2020 1.16
FPGA vs ASIC
FPGA
• faster time-to-market
– no layout, masks or other manufacturing steps are needed
• no upfront NRE costs
• simpler design cycle
– software tools for routing, placement, and timing
• more predictable project cycle
• field re-programmability
ASIC
• full custom capability
– for design since device is
manufactured to design specs
• lower unit costs
– for very high volume
• smaller form factor
– device is made to design specs
• higher raw internal clock speeds
Source: J. Wong
![Page 17: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/17.jpg)
wl 2020 1.17
Design flows
HDL: Hardware Description Language DFT: Design For Test Source: J. Wong
![Page 18: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/18.jpg)
wl 2020 1.18
Early FPGA architecture
Connection
Block
Logic Block
Switch Block
Routing Track
(Horizontal)
Routing Channel
(Vertical){
TILESource: S. Wilton
![Page 19: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/19.jpg)
wl 2020 1.19
Basic logic gate: lookup table
Function of each lookup table can be configured by
shifting in bit-stream.
Reconfigurable logic
Inputs
Bit-S
trea
m
Source: S. Wilton
![Page 20: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/20.jpg)
wl 2020 1.20
Basic logic gate: lookup table
Function of each lookup table can be configured by
shifting in bit-stream. By-passable register at output.
Reconfigurable logic
D Q
Inputs
Source: S. Wilton
![Page 21: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/21.jpg)
wl 2020 1.21
Reconfigurable logic
•Connect logic
blocks using fixed
metal tracks and
programmable
switches
Source: S. Wilton
![Page 22: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/22.jpg)
wl 2020 1.22
Reconfigurable logic
•Connect logic
blocks using fixed
metal tracks and
programmable
switches
Everything can be
built using fine-
grained logic;
why need anything
else?
Source: S. Wilton
![Page 23: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/23.jpg)
wl 2020 1.23
But every user must pay for them, whether used or not…
FPGA vendors embed fixed blocks to improve speed
and density:
Implementing systems in an FPGA
Embedded Memories
(blocks of 2K-18K)
Source: S. Wilton
![Page 24: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/24.jpg)
wl 2020 1.24
FPGA vendors embed fixed blocks to improve speed
and density:
Implementing systems in an FPGA
Embedded Memories
(blocks of 2K-18K)
Hard Blocks, eg multiplier
Source: S. Wilton
But every user must pay for them, whether used or not…
![Page 25: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/25.jpg)
wl 2020 1.25
But every user must pay for them, whether used or not…
FPGA vendors embed fixed blocks to improve speed
and density:
Implementing systems in an FPGA
Embedded Memories
(blocks of 2K-18K)
Hard Blocks, eg multiplier
High-Speed I/Os
Source: S. Wilton
![Page 26: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/26.jpg)
wl 2020 1.26
Example: Xilinx Virtex CLB tile
• CLB tile is composed of:
– switch matrix
– Configurable Logic Block and associated general routing resources
– IMUX and OMUX
• all CLB inputs have access to interconnect on all 4 sides
• fast local feedback within CLB and direct connects to east and west CLBs: support wide functions of up to 19 inputs within a single CLB
SINGLE
HEX
LONG
SINGLE
HEX
LONG
SIN
GL
E
HE
X
LO
NG
SIN
GL
E
HE
X
LO
NG
TRISTATE BUSSES
SWITCH
MATRIX
SLICE SLICE
Local
Feedback
CA
RR
Y
CA
RR
Y
CLB
CA
RR
Y
CA
RR
Y
DIRECTCONNECT
DIRECTCONNECT
Source: Xilinx Inc.
![Page 27: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/27.jpg)
wl 2020 1.27
CLB
Slice 0
LUT Carry
LUT Carry D Q
CE
PRE
CLR
D Q
CE
PRE
CLR
Slice 1
LUT Carry
LUT Carry D Q
CE
PRE
CLR
D Q
CE
PRE
CLR
Simplified CLB structure
• two slices in each CLB
– two BUFTs associated with each CLB, accessible by all 8 CLB outputs
– carry Logic runs vertically upwards, to speed up carry propagation
Source: Xilinx Inc.
![Page 28: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/28.jpg)
wl 2020 1.28
Combinatorial Logic
AB
CD
Z
A B C D Z
0 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 0 1 1 1
0 1 0 0 1
0 1 0 1 1
. . .1 1 0 0 0
1 1 0 1 0
1 1 1 0 0
1 1 1 1 1
Look-Up Tables
• combinatorial logic is stored in Look-Up Tables (LUTs) in a CLB
• capacity is limited by number of inputs, not complexity
• delay through CLB is constant
![Page 29: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/29.jpg)
wl 2020 1.29
Stratix IVGX 230: mid-size device
Adaptive
Logic
Modules
(fine grain)
RAM
Blocks
(M9K &
M144K)
(source: V. Betz)
DSP
Blocks
(coarse grain)
High
Speed
Serial
Interfaces:
eg connect
multiple
FPGAs
![Page 30: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/30.jpg)
wl 2020 1.30
Stratix IV Overview
Feature Stratix III (65 nm) Stratix IV (40 nm)
Logic Elements 340k 680k
RAM bits 16 Mb + 4 Mb 33 Mb + 8.5 Mb
18x18 multipliers 768 1360
General I/O 1104 1104
High-speed serial links
048 transmit + 48 receive
@ 11.3 Gb/s
Hard PCIe blocks 0 4
Clock generation 12 PLL(x10)
12 PLL(x10) +
32 serial recovered +
+ 24 serial transmit
Clock distribution16 Global + 88 Quadrant +
132 PCLK16 Global + 88 Quadrant
+ 132 PCLK
(from V. Betz)
![Page 31: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/31.jpg)
wl 2020 1.31
Current and future: System-on-Chip
I/O Ring and Interface Circuitry
Embedded
Processor
On-Chip
Memory
Fixed
IP
Block
Fixed
IP
Block
Reconfigurable
Logic
I/O Ring and Interface Circuitry
Fixed Intellectual Property Block
- functionality fixedat design time
- little post-fab
flexibility
Processor eg ARM
- functionality
specified using software
Programmable Logic
- circuit can be specified / modified
after fabrication, possibly at run time
- maybe slower than fixed IP block
Source: S. Wilton
![Page 32: Custom Computing - Imperial College Londonwl/teachlocal/cuscomp/notes/cc...–e.g. FPGAs, coarse-grained/hybrid processors, custom instructions •factors favouring field-programmability](https://reader035.fdocuments.us/reader035/viewer/2022071013/5fcbd07969566279cb5e686d/html5/thumbnails/32.jpg)
wl 2020 1.32
Summary
• custom computing: theory and practice of customisation – from data centres/cloud computing to mobile appliances
• customisable off-the-shelf implementation technology – e.g. FPGAs, coarse-grained/hybrid processors, custom instructions
• factors favouring field-programmability– rise in FPGA capability: many exciting applications
– rise in integrated circuit fabrication cost: zero for FPGA users!
– customisation: facilitate product evolution and prototyping
• custom computing tools + applications at Imperial College– financial analysis/trading, multimedia processing, medical imaging
– network firewall, data compression/encryption, mobile robots
– bio-informatics, machine learning, bio-inspired/self-aware systems see: http://cc.doc.ic.ac.uk