RR Osorio FPGA
-
Upload
miguel-morales -
Category
Technology
-
view
520 -
download
0
Transcript of RR Osorio FPGA
Field-Programmable Gate Arraysas tracking devices
Roberto Rodríguez OsorioJavier Díaz Bruguera
Group of Computer ArchitectureDept. of Electronics and Computer Science
University of Santiago de Compostela
2
Outline
Application-specific computing machinesASIC vs FPGAFPGA technology basicsHard cores in FPGAsPerformanceDesign effortChoicesApplications
3
Application-specific computing machines
Microprocessor
Codememory
Datamemory
PC IR
Control logic
Registerfile
Functionalunits
DatapathControlsection
M p
Control logic MAC
DatapathControlsection
Mpt
Codememory
Datamemory
PC IR
Control logic
Registerfile
Functionalunits
DatapathControlsection
M p
Control logic MAC
DatapathControlsection
Mpt
Application-SpecificIntegrated Circuit
Performance: 10 cycles @ 3GHzDissipated power: ~35 W
Performance: 1 cycle @ 1GHzDissipated power: ~mW
4
ASIC vs FPGA
0.05
$4M
$3M
$2M
$1M
Technology (micrometers)
NR
E
0.35 0.25 0.2 0.15 0.1
5
ASIC vs FPGA
10
10
10
10
10
10
10
6
5
4
3
2
1
0
2 1 0.5 0.25 0.13 0.07
1986 1990 1994 1998 2002 2006
Computational efficiency (Mops/w)
Technology ( m)
Maximum efficiency(ASIC)
FPGAASSPMPPAGPGPUVLIWASIPManyCore...
Source: Theo A.C.M Claasen, ISSCC 99
6
FPGA technology basics – Computing
carryinput a b s
carryoutput
0 0 0 0 00 0 1 1 00 1 0 1 00 1 1 0 11 0 0 1 01 0 1 0 11 1 0 0 11 1 1 1 1
FA
a b
s
cout cin
ac
b
aba
cbcin
in
in
s
cout
7
FPGA technology basics – Do not compute
Logic blocks
SRAM
Memory
8x1-bit
SRAM
Memory
8x1-bit
cin
a
b s
cout
8
FPGA technology basics – Interconnect█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ ██ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
9
FPGA technology basics – Interconnect
10
FPGA technology basics – Interconnect
11
FPGA technology basics – Interconnect + memory
FPGA fabric consists of a huge number of simple memory elements connected by means of a reconfigurable networkDesign software must break every computing tasks into 1-bit size operation with no more than 4, 5 or 6 variablesOperations are spatially distributed according to proximity criteriaRouting may be troublesome
Long paths are slowRouting though logic blocks increase area
12
Hard cores in FPGAs
Memory blocksMultipliersDSP blocksMicroprocessorsFloating point units?
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
13
Memory blocks
Hundreds or thousands of small memory blocksDual-port blocks18 K-bit each for XilinxFlexible configurations
Many short words or a few large word
Independent accessHuge aggregated bandwidth
14
Multipliers and DSP blocks
As FPGAs were becoming larger, some people tried to implement DSP algorithms on them
However: Multipliers take too much areaTherefore: Hardwired multipliers were introduced
DSP algorithms are often based on multiply & addmultiply & accumulate
DSP blocks in modern FPGAs implement hardwired: multipliy, multiply & add, multiply & accumulateoptional addition before multiplyingthree-input add1 large, 2 medium or 4 small operations on the same hardwareshifting, comparisons, bit-wise operations,…
Up to 2000 DSP blocks in current FPGAs for massive parallelism
15
Microprocessors
Xilinx: IBMs Power PC processors
Virtex II ProVirtex-4 FXVirtex-5 FX
Microblaze soft processors
Altera: ARM RISC processorsNios soft processor
16
Floating point units
Not implemented so far• Suggested to help to accelerate scientific computing• For engineering, fixed point arithmetic is usually enough
Would it happen?☺ It happened with multipliers, transceivers, DSP blocks, …
GPUs have already a strong position in this field
17
Performance
Compared to an ASIC10 times slower, larger and power hungry
Compared to a microprocessorFast, depending on:
Potential parallelismRequired bandwidth
Small and simple, even standaloneReduced power consumption (< 1W), they may run on batteries
18
Design effort
Several scenarios:
Pure VHDL or Verilog codingHigher flexibility, efficiency and performanceLong design time Costly debugging
Use macros combined with VHDL or Verilog Libraries of IP blocks easy the design processIt is not guaranteed that the required functionalities can be found
High level languages (DSP logic (Matlab), Impulse-C, Handel-C,…)
Efficient and simple implementation for simple algorithmsLack of expressiveness for complex algorithms
19
Choices
XilinxVirtexSpartan
AlteraStratixCyclone
OthersActelLattice Semiconductor…
20
Choices - Xilinx
Spartan 3 Spartan 6 Virtex 6
Logic Cells 1728 – 74880 3840 - 147443 74496 – 566784
Block RAM (Kbits)
12 - 1872 216 - 4824 5616 – 32832
Multipliers / DSP
4 – 10484 - 126 8 - 180 288 - 2016
Evaluation board cost
< $200 $300 - $1000 $2000 - $2500
21
In the context of this applications
Device choice• Logic bounded
• Standard logic• Multipliers
• IO boundedParallel acquisition• Switching memory blocks for acquisition and computationHigh computing speed• Via pipeliningResults storage• Internal or external memoryPower consumptionConfiguration