Riconfigurable Computing
-
Upload
salvatore-girone -
Category
Documents
-
view
9 -
download
0
description
Transcript of Riconfigurable Computing
![Page 1: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/1.jpg)
Reconfigurable Computing
![Page 2: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/2.jpg)
Reconfigurable Computing
Needs and Architectures
Programming
Reconfigurable Machines
Examples of application
Agenda 1
![Page 3: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/3.jpg)
Evolution in practice
• 500 FLOPS
• 19000 vacuum tubes, 1500
relays, 100000 resistors,
capacitors, crystal diodes and
inductors
• 63m2, 175KW
• 0.0029 FLOPS/Watt
1946 - ENIAC
Source: http://msrb.wordpress.com/2007/12/04/energy-dinosaurs/
2011 – NVIDIA Tegra 3
• 12 GFLOPS
• TSMC 40nm, 4 Cortex A9,
GPUs, …
• 80mm2, few Watt
• few GFLOPS/Watt
2
![Page 4: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/4.jpg)
I-Phone Teardown 3
![Page 5: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/5.jpg)
Apple A4 4
• System-in-Package SoC designed by Apple, manufactured by SAMSUNG in 45 nm
• Single application processor with:
• ARM Single-Core Cortex-A8 CPU • PowerVR GPU [SGX543MP2]
(Imagination Technologies) • Two 256Mb DDRAM chips (stacked)
• Used in: IPad Tablet (1 Ghz), IPhone4, IPod Touch 4th generation
![Page 6: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/6.jpg)
Samsung Exynos 3110 / HummingBird
• Multichip, System-in-Package SoC designed/manufactured by Samsung in 45nm
• Single application processor with:
• ARM Cortex-A8 CPU (Same layout as A4)
• PowerVR SGX540 GPU • DDR Memory On-Package
• Used in Samsung Galaxy S, Google Nexus S, Samsung Galaxy Tab, Enspert Identity, Meizu M9 others Tablets, …
5
![Page 7: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/7.jpg)
Nvidia Tegra 6
• Multichip, System-in-Package SoC designed by Nvidia, manufactured by TSMC 40nm
• One single application processor containing:
• Two ARM Dual-Core Cortex-A9 CPU
• Portal Player audio decoder
• Nvidia GPU Proprietary Graphic processors
• DDR memory in-package
• Used in Zune/Sony Samsung Mp3 players, LG Optimus, Motorola Atrix, Samsung Galaxy SmartPhones, Lenovo and others Tablets, …
![Page 8: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/8.jpg)
Electronic Devices Market Description
SoC design is a continuous trade-off between semiconductors market demands/needs and architectural/circuital solutions
7
• Low Performance (MIPS)
Device too slow, customer
unhappy
• Power Dissipation (mW)
Battery degradation too fast,
customer unhappy
• Soft Errors due to Heat, power
peaks, Electromagnetic Fields
Unexpected behavior, customer
unhappy
• Evolution of wireless services offered by
handheld devices • Phone, Wi-Fi, GPS, Sound, Video, PDA
• Evolution of Communication and
Multimedia storage formats • GSM / GPRS / UMTS-HSDPA / LTE
• 3G, Wi-Fi / Wi-Max
• (All Audio – Video Technologies )
![Page 9: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/9.jpg)
Trade-Off: programmable processor vs
ASIC 8
Spatial Computation
(ASIC)
Temporal Computation
(Processor)
Ax2 + Bx + C (Ax + B)x + C
![Page 10: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/10.jpg)
Programmable Processors:
Drawbacks 9
Shannon’s Law: Algorithm
Requirements (GOPS)
Moore’s Law: Available
Computational Power
(GOPS, GOPS/mW)
year
s
Battery
Capacity
Increase the “Gap” between available computational power and computational requirement of recent algorithms/protocols
![Page 11: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/11.jpg)
Asic: Drawbacks
• Exponential increase of technology costs
• (Design)
• Masks
• Verification
• Test
• Reliability and yield of products rapidly decrease
10
![Page 12: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/12.jpg)
Application Specific Standard Processor
Template 11
µP
(ARM/PowerPC)
Interconnect
IO ASIC CORES
Memory
• Standard Template for ASSP processor (Application-Specific Standard Processor)
![Page 13: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/13.jpg)
Programmability on System-on-Chip
• Programmable Hardware allow to increase volumes of SoC,
reducing NREs
• Programmable Hardware increase the products lifetime
• Programmable Hardware has a negative impact on area & power,
thus reducing product margins
12
End of product
lifetime
Margin ($)
Time
Asic oriented SoC
Programmable HW –
oriented SoC
![Page 14: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/14.jpg)
Application Specific Standard Processor
Template (2) 13
µP
(ARM/PowerPC)
Interconnect
IO ASIC CORES
Memory
• Adding DSP specialized, we can reduce the “computational
pressure” on the microprocessor, reducing the “full-ASIC” percentage
on the system thus leaving hardware reconfigurability
DSPs
![Page 15: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/15.jpg)
Reconfigurable Processors
• Reconfigurable Processor is special DSP architecture programmable
at execution time and base on hardware reconfiguration:
• Fine-Grain: architectures, as FPGAs, based on ~1/4-bit parallelism and typically
featuring LUTs
• Coarse-Grain: parallel architectures based on ~8/32-bit hardwired blocks (ALU,
Mult, MAC, …)
• Processor Arrays: Architectures obtained interconnecting a set of simple/small
processors featuring 8/16/32-bits datapath
14
![Page 16: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/16.jpg)
Soluzioni ASIC-oriented: Configurable Processors
• Es: Xtensa by Tensilica: Microprocessore “configurabile” basati su
instruction set extension
15
Conventional
Dpath
Application
Specific
Dpath
Regis
ter file
• Problema 1: realizzare Compilatori “Re-targetable”
• Problema 2: Come definire la accelerazione ideale (Granularita della
accelerazione)
• Problema 3: Come portare sufficienti dati al dpath per sfruttare la capacita di
calcolo
Instruction
Decode
![Page 17: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/17.jpg)
Application Specific Standard Processor
Template (3)
• Processoformizzazione del SoC: Allo scopo di aumentare il segmento di mercato di un
determinato prodotto, e di facilitare I bug-fix spostandoli da hardware a software, sempre piu’
tasks sono migrati da blocchi ASIC a processori/DSP, o architetture processor-oriented
16
µP
(ARM/PowerPC)
Interconnect
DSPs ASIC CORES
Processor
PIPELINE
ASIC CORES /
RECONFIGURABLE CORES
Memory,
IO
![Page 18: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/18.jpg)
FPGAs 17
FPGA (Field Programmable Gate-Arrays:
Permettono computazione “spaziale” ma mantengono programmabilita’ a run-time
Richiedono un overhead ~100 in area e power, ~10 in Timing
Richiedono flussi di progetto HDL, non familiare a sviluppatori di applicazioni (C/C++/Java/Matlab)
![Page 19: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/19.jpg)
Soluzioni FPGA-oriented: Re-Configurable Processors
• Es: Microprocessore “configurabile” basati su instruction set
extension
18
Instruction Decode
Conventional
Dpath
eFPGA
Regis
ter file
• Problema 1: realizzare Compilatori
“Re-targetable”
• Problema 2: Come definire la
accelerazione ideale (Granularita
della accelerazione)
• Problema 3: Come portare
sufficienti dati al dpath per sfruttare
la capacita di calcolo
• Problema 4: Area Overhead di
eFPGA rispetto ad ASIC
![Page 20: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/20.jpg)
Run-time programmable processors:
DREAM
• µP Risc per gestire controllo,
interrupt, configurazione e
regolare il flusso di dati/istruzioni
• eFPGA per realizzare unita’
funzionali customizzabili
• Memoria ad accesso parallelo
con “DMA” programmabili per
offrire massimo parallelismo
(MIMD)
19
… … … …
… … … …
…
…
PiCoGA
Address Generators
Interconnect Cross-Bar
High-Bandwidth Memory Bank Registers
µP
STxp70
Contr
ol In
terf
ace
![Page 21: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/21.jpg)
Modello di Calcolo DREAM 20
…..
Set_conf
…..
Set_io, Set_df
…..
for(i=0;i<N;i++)
Execute ID
…...
……
……
Unset_conf
IO Banks
Contro
l Risc
Pro
cesso
r
PiCoGA
![Page 22: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/22.jpg)
Reconfigurability overhead di FPGAs 21
Switch Switch
Switch Switch
L
L L
L L
L
L L L
• Giallo: Area
effettivamente usata in
Computazione (LUT)
• Verde/Rosso: Routing e
configurazione
![Page 23: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/23.jpg)
CGRA: Coarse Grained Reconfigurable Architecture
23
Reconfigurable Interconnect
Fabric
PE PE PE PE
PE PE PE PE
PE PE PE PE
PE PE PE PE
Le LUT sono sostituite da operatori aritmetici piu’ complessi Uso piu’ efficiente delle risorse di interconnessione Minore flessibilita’ del calcolo (Operazioni a 1 bit impiegano ALU ad 8/32 bits
ALU
16 8 32
24
2
4
ALU
![Page 24: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/24.jpg)
MorphoSys RC Array
24
SIMD Model: Tutte le celle di
una riga eseguono la stessa
operazione a 128-bit.
MIMD Model: Ogni cella
esegue una operazione
indipendente
![Page 25: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/25.jpg)
Processore Reconfigurabile Coarse Grain: Pact Xpp
• Xpp e’ un array di elementi coarse grain (PAE) ed una rete di interconnessione a pacchetto.
• La computazione e’ distribuita sui diversi elementi, ognuno dei quali calcola quando ha a disposizione I dati necessari
25
![Page 26: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/26.jpg)
Pact XPP Computing Model
Modello misto a due livelli:
• Linguaggio ad eventi per descrivere la sincronizzazione tra PAE
• Assembly per I singoli PAE
• Esiste un compilatore da C ma offre risultati ancora deludenti
26
![Page 27: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/27.jpg)
Processor Arrays
• Multi-Processor System-on-Chip
• Insieme di processori interconnessi da una rete on-chip
• Si possono considerare una sorta di CGRA in cui il PE e’ composto da un
processore
• Permettono di riusare concetti noti [multi-thread, sockets, process scheduling]
riportando il parallelismo ad un ambiente di programmazione standard
• Supporto di OS
• Uso di C come strumento di computazione
• Supporto di sistemi di comunicazione noti
27
![Page 28: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/28.jpg)
Processor Arrays: PicoChip Processor 28
I/O
I/O I/O
I/O
External Memory
Array Processing Element
Switch Matrix
Inter-picoArray Interface
• 322 PE composti da piccoli processori a 16 bit a memoria distribuita
• Architettura eterogenea, con 4 diversi tipi di PE
• Standard (STAN)
• Multiply-accumulate (MAC)
• Memory (MEM)
• Control (CTRL)
• Interconnection fabric deterministica basata su un modello a divisione di tempo (TDM)
• Utilizzato commercialmente per base stations
![Page 29: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/29.jpg)
Tilera Tile Processor (RAW / Tilera)
• Processor mesh:
• 2-D Array di cores omogenei
• Basic block: general-purpose processor core + switch connesso alla 2-D network on-chip
29
![Page 30: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/30.jpg)
SoC Communication: Bus vs Networks-on-
Chip 30
M1 M2
S1 S2 B1 S3
M1.1
S1.1 S1.2
M=Master, S=Slave, B=Bridge
Tipicamente per ogni bus M<4, S+B<16
T1 T2
T3
I2 I1
T=Target, I=Initiator
![Page 31: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/31.jpg)
MPSoC Communication: Networks-on-Chip
• Sistema di comunicazione improntato alla massima scalabilita’
• Basato su 2 componenti:
• Router:Instradamento di pacchetti tra N ingressi e M uscite)
• Network Interface: Connessione tra isola di calcolo e NoC)
• Link: Bus che unisce i routers
31
NI
router
IP
IP
NI
physical
link
network
transport
application
router
link
link
![Page 32: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/32.jpg)
MPSoC: Clock, Power & Workload
Management
• Multi-Processor System-on-Chip:
Dal punto di vista della implementazione, permettono di sviluppare
isole di calcolo indipendenti:
• GALS: Globally Asynchronous Locally Synchronous Design (alberi di clock molto
piu’ brevi, distribuzione regolare del consumo di potenza)
• Power Management: SD, DVFS Dynamic Voltage and Frequency Scaling
(Cambiare Voltaggio e frequenza di un processore a seconda del suo carico di
lavoro)
• Redundancy: Workload distribution, Failure recovery (Migrare un task da un
processore guasto o troppo impegnato ad un vicino)
32
![Page 33: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/33.jpg)
Morpheus
• Processore Riconfigurabile ETEROGENEO NoC-based, con domini
di frequenza indipendenti per ogni core
33
PACT
XPP DREAM
ARM9 On-Chip
Memory
NoC
Data Interface
Conf. Interface
Data Interface
Conf. Interface
eFPGA
M2K
Data Interface
Conf. Interface
Data + Configuration
IO
PACT XPP-III
DREAM
e F P G A ARM
M E M
M E M
![Page 34: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/34.jpg)
ManyAC Architecture 34
µP MEM
MEM
MEM
MEM MEM
µP
MEM
MEM
MEM
µP
µP µP
µP
µP
µP ASIC
ASIC
ASIC
ASIC ASIC
ASIC
ASIC
ASIC
µP
MEM
COMPUTATIONAL TILE ct controller
cluster controller
• Dispositivo basato su
REGULAR
HETEROGENEITY,
array di celle con
identica struttura ma
accelerazione ASIC
customizzabile
![Page 35: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/35.jpg)
Conclusione: Reconfigurable Processors
• Si definiscono RP architetture di calcolo che sfrutta hardware configurabile per aggiungere istruzioni application-specific al SET standard
• Design-Time programmable
• Configurable Processors
• Run-time programmable
• Risc + FPGA
• Risc + CGRA
• Microprocessor Arrays
• Sono necessari, per estendere la quota di mercato di un dispositivo fino a coprire i costi di progettazione
• Portano overhead hardware (Area, Timing)
• Causano problemi di produttivita’ e porting del Software
• Per questo, si vanno affermando soprattutto nel contesto di sistemi multiprocessore “regolarmente
eterogenei”
35
![Page 36: Riconfigurable Computing](https://reader033.fdocuments.us/reader033/viewer/2022051218/5695d1241a28ab9b02954f8e/html5/thumbnails/36.jpg)
Acronyms • ASIC Application Specific Integrated Circuits
• ASSP Application Specific Standard Product
• SoC System-on-Chip
• SiP System-in-Package
• NRE Non Recurring Engineering (costs)
• TTM Time to market
• FPGA Field Programmable Gate Array
• LUT Lookup-table
• CGRA Coarse Grained Reconfigurable Architecture
• PE Processing Element (In CGRA)
• RP Reconfigurable Processor
• RISP Reconfigurable Instruction Set processor
• SIMD Single Instruction Multiple Data
• MIMD Multiple Instruction Multiple Data
• PA Processor Array
• MPSoC Multi-Processor System-on-Chip
• GALS Globally Asynchronous Locally Synchronous
• NoC Network-on-Chip
• DVFD Dynamic Voltage and Frequency Scaling
36