TOSHIBA BiCD Integrated Circuit Silicon Monolithic TB62763FMG
Monolithic Integration of Energy-efficient CMOS Silicon ...
Transcript of Monolithic Integration of Energy-efficient CMOS Silicon ...
![Page 1: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/1.jpg)
Integrated Systems Group
Massachusetts Institute of Technology
Monolithic Integration of
Energy-efficient
CMOS Silicon Photonic Interconnects
Vladimir Stojanović
![Page 2: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/2.jpg)
Manycore SOC roadmap fuels
bandwidth demand
64-tile system (64-256 cores) - 4-way SIMD FMACs @ 2.5 – 5 GHz
- 5-10 TFlops on one chip
- Need 5-10 TB/s of off-chip I/O
- Even higher on-chip bandwidth
2 cm
2 cm
Intel 48 core -Xeon
2
![Page 3: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/3.jpg)
System Bottlenecks
CPU
Cache/
MC
DR
AM
DIM
M
Manycore system
cores
Cache/
MC
DR
AM
DIM
M
Cache/
MC
DR
AM
DIM
M
CPU CPU
Interconnect
Network
Interconnect
Network
Bottlenecks due
to energy and
bandwidth density
limitations
3
![Page 4: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/4.jpg)
Wire and I/O scaling
Increased wire resistivity makes wire caps scale very slowly
Can’t get both energy-efficiency and high-data rate in I/O
On-chip wires
copper resistivity
0
2
4
6
8
10
12
14
16
18
0 5 10 15 20 25
Chip2Chip Backplane
En
erg
y-c
ost
[pJ/b
]Data-rate [Gb/s]
Best electrical links
Loss ~10dB
Loss ~20-25dB
On-chip wires I/O
4
![Page 5: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/5.jpg)
Bandwidth, pin count and power scaling
Need 16k pins
in 2017 for HPC*
1 Byte/Flop
256 cores
2 TFlop/s signal pins @ 20 Gb/s/link
2,4 cores
Pa
cka
ge
pin
co
un
t
*> half pins for power supply
5
![Page 6: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/6.jpg)
Supercomputers
Monolithic CMOS-Photonics in Computer Systems
Embedded apps
Si-photonics in advanced
CMOS and DRAM process
NO costly process changes
6
![Page 7: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/7.jpg)
Many architectural studies show promise
[Shacham’07]
[Petracca’08]
[Vantrease’08]
[Psota’07]
[Kirman’06]
[Joshi’09]
[Pan’09]
[Batten’08] [Kurian’10] [Koka’08-10]
7
![Page 8: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/8.jpg)
Optimization requires full system insight
Developed cross-layer modeling framework Kurian, Chen 2011
Cache & Core
Energy & Area
8
DSENT Electrical and optical link and
network models
![Page 9: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/9.jpg)
Start at the link level:
Jointly optimize circuits and photonic devices
Reg
iste
r
Mu
x
Pre-Driver Mod-DriverReceiver
Front-end
Φ Φ Φ
Φ Φ
+
Samplers &
Monitoring
Dem
ux
Reg
iste
r
PLL or
Opt. Clk
1 2 3 4 in PLL or
Opt. Clk
Phase
Adjust
Reg
iste
r
Mu
x
Pre-Driver Mod-DriverReceiver
Front-end
Φ Φ Φ
Φ Φ
+
Samplers &
Monitoring
Dem
ux
Reg
iste
r
PLL or
Opt. Clk
1 2 3 4 in PLL or
Opt. Clk
Phase
Adjust
Dense WDM – 128 wavelengths/waveguide - >1Tb/s per waveguide
Need 1000’s of transceivers on die with < 100fJ/bit cost at > 10Gb/s !
- Optimized modulator circuits/devices
- Optimized receiver circuits/photo-detector
- Optimized thermal tuning 9
![Page 10: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/10.jpg)
Laser energy increases with data-rate
Limited Rx sensitivity
Modulation more expensive -> extinction ratio / insertion loss trade-off
Tuning costs decrease with data-rate
Moderate data rates most energy-efficient
Reg
iste
r
Mu
x
Pre-Driver Mod-DriverReceiver
Front-end
Φ Φ Φ
Φ Φ
+
Samplers &
Monitoring
Dem
ux
Reg
iste
r
PLL or
Opt. Clk
1 2 3 4 in PLL or
Opt. Clk
Phase
Adjust
Reg
iste
r
Mu
x
Pre-Driver Mod-DriverReceiver
Front-end
Φ Φ Φ
Φ Φ
+
Samplers &
Monitoring
Dem
ux
Reg
iste
r
PLL or
Opt. Clk
1 2 3 4 in PLL or
Opt. Clk
Phase
Adjust
512 Gb/s aggregate throughput
assuming 32nm CMOS
Georgas CICC 2011
Need to optimize carefully
10
![Page 11: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/11.jpg)
DWDM link efficiency optimization
Optimize for min energy-cost
Bandwidth density dominated by circuit and photonics area (not coupler pitch) 10x better than electrical bump limited
200x better than electrical package pin limit
Electrical
bump-pitch
limited to
<1Tb/s/mm2 >10x
Package pin limit
0.05 Tb/s/mm2
11
![Page 12: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/12.jpg)
Photonic DRAM Network Organization
Important Concepts
- Power/message switching (only to active DRAM chip in
DRAM cube/super DIMM)
- Vertical die-to-die coupling (minimizes cabling - 8 dies per
DRAM cube)
-Command distributed
electrically (broadcast)
- Data photonic (single writer
multiple readers)
MC 1
MC 16
Mem
Sch
edu
ler
MC K
CPUDRAM cube 1
DRAM cube 4
Super DIMM
cmdDwr
Drd
( cube 1, die 1)
cmdDwr
Drd
( cube 1, die 8)
Dwr
Drd
DRAM cube 4
Super DIMM K
die-die switch
Laser in
Modulator bank
Receiver/PD bank
Tunable filterbank
Through silicon via
Through silicon via holeBeamer ISCA 2010 Processor die
12
![Page 13: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/13.jpg)
Optimizing DRAM with photonics
Floorplan
Beamer ISCA 2010
P1 P4
13
![Page 14: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/14.jpg)
Laser Power Guiding Effectiveness
Beamer ISCA 2010 14
Enables capacity scaling per channel and significant savings in laser energy
![Page 15: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/15.jpg)
ATAC – On-Chip network Example
1000 core die
64 clusters connected via optical broadcast 15
![Page 16: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/16.jpg)
Average Energy over Splash2 benchmarks
Ring tuning very expensive
Non-gated laser very expensive 16
![Page 17: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/17.jpg)
Including the cores gives the full picture
Energy dominated by cores/caches
Faster network saves overall energy (leakage and clock)
Need aggressive clock-gating and supply/retention scaling
![Page 18: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/18.jpg)
Execution time also matters
18
![Page 19: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/19.jpg)
Feedback to device designers
Waveguide losses up to 2dB/cm o.k.
19
![Page 20: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/20.jpg)
Conclusions
Biggest gains if photonics both on-chip and off-chip
Core-to-MC network
MC-to-DRAM bank network – immediate 10x gains
Need comprehensive modeling framework to see
the full picture
Link-level – tight interaction of circuits and photonics
through good models
System-level – Include all system components – cores,
network, caches, memory
![Page 21: Monolithic Integration of Energy-efficient CMOS Silicon ...](https://reader031.fdocuments.us/reader031/viewer/2022012519/619457607544985358112e72/html5/thumbnails/21.jpg)
Acknowledgments
Krste Asanović, Rajeev Ram, Miloš Popović, Christopher
Batten, Ajay Joshi
Anant Agarwal, Li-Shiuan Peh, Lionel Kimerling, Jurgen
Michel, Dimitri Antoniadis
Jason Miller, Jeff Shainline
Jason Orcutt, Chen Sun, Ben Moss, Jonathan Leu, Michael
Georgas, Stevan Urosević, Owen Chen, George Kurian,
Yong-Jin Kwon, Scott Beamer
Dr. Jag Shah and Dr. Charles Holland, DARPA
FCRP IFC, NSF
Trusted Foundry, Intel Corporation, APIC