More-than-Moore with Integrated...
Transcript of More-than-Moore with Integrated...
More-than-Moore with Integrated Silicon-Photonics
Vladimir Stojanović
Berkeley Wireless Rearch Center
UC Berkeley
1
• Milos Popović (Boulder/BU), Rajeev Ram, Jason Orcutt, Hanqing Li (MIT), Krste Asanović (UC Berkeley)
• Jeffrey Shainline, Christopher Batten, Ajay Joshi, Anatoly Khilo
• Mark Wade, Karan Mehta, Jie Sun, Josh Wang
• Chen Sun, Sen Lin, Sajjad Moazeni, Nandish Mehta, Michael Georgas, Benjamin Moss, Jonathan Leu
• Yong-Jin Kwon, Scott Beamer, Yunsup Lee, Andrew Waterman, Miquel Planas, Rimas Avizienis, Henry Cook, Huy Vo
• Roy Meade, Gurtej Sandhu and Micron Fab12 team (Zvi, Ofer, Daniel, Efi, Elad, …)
• DARPA, Micron, NSF, BWRC
• IBM Trusted Foundry, Global Foundries
Acknowledgments
Slide 2
More-than-Moore perspective
World’s first 60GHz CMOS AmplifierNiknejad & Brodersen
One of the first CMOS radiosRudell & Gray
1997
2004
2012
World’s first siPhotonic transmitter in 45nm SOIStojanovic, Popovic, Ram
Enhanced CMOS enables new applications
3
Inductors in IC processNguyen & Meyer1990
IBM/GF 12SOI (45nm) CMOS
IBM Cell IBM Power 7IBM Espresso
• 300mm wafer, commercial process• MOSIS and TAPO MPW access• Advanced process used in microprocessors• Photonic enhancement enables VLSI photonic
systems (no required process changes)
“Zero-Change” Photonics in 45nm
Monolithic photonics platform with fastest transistors
[C. Sun, JSSC 2016]
• Photonics for free! (No modification to the process)
• Closest proximity of electronics and photonics
• Single substrate removal post-processing step
• Each λ carries one bit of dataBandwidth Density achieved
through DWDMEnergy-efficiency achieved through
low-loss optical components and tight integration
Integrated photonic interconnects
Slide 6
Single channel link tradeoffs
10-dB
15-dB
Loss
5-fF 25-fFRx Cap7
512 Gb/s aggregate throughput
assuming 32nm CMOS
Georgas CICC 2011
• Laser energy increases with data-rate – Limited Rx sensitivity
– Modulation more expensive -> lower extinction ratio
• Tuning costs decrease with data-rate
• Moderate data rates most energy-efficient
Need to optimize carefully
8
DWDM link efficiency optimization
• Optimize for min energy-cost
• Bandwidth density dominated by circuit and photonics area (not coupler pitch)
Slide 9
Photonic-Interconnected DRAM (PIM)
Towards an Optical DRAM System
[ISCA 2010]
EOS22 Test ChipHigh Performance 45nm SOI
Micron D1L Test ChipPower-optimized0.22µm Bulk
70M transistors1000 optical devices
DARPA POEM
2M transistors1000s optical devices
6m
m
5m
m
Slide 10
11
World’s First Processor to Communicate with Light
Silicon-Photonic components integrated directly in the chip
C. Sun et al. Nature, Dec. 15.
DARPA POEM & PERFECT – Stojanović, Asanovićzero-change 45nm SOI
Processor Cores – 45nm SOI
• RISC-V open ISA
• Scalar-vector cores - Boot Linux
• 0.2-1.35GHz, 4-16 GFlops/W
Slide 12
0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20
15.8 12.8 10.6 8.7 7.1 5.7 4.6 3.7 3.1 2.6 2.1 1.6 200
16.7 14.0 11.6 9.6 7.9 6.4 5.3 4.3 3.6 3.0 2.4 1.9 250
14.9 12.4 10.3 8.6 7.0 5.8 4.8 4.0 3.3 2.8 2.2 300
15.6 13.0 10.9 9.1 7.5 6.2 5.2 4.3 3.7 3.0 2.4 350
13.6 11.3 9.6 7.9 6.6 5.5 4.7 3.9 3.3 2.6 400
14.1 11.8 9.9 8.3 6.9 5.8 4.9 4.2 3.5 2.8 450
12.2 10.3 8.6 7.2 6.1 5.2 4.4 3.7 3.0 500
12.5 10.5 8.8 7.5 6.3 5.4 4.5 3.8 3.1 550
10.8 9.1 7.7 6.5 5.6 4.7 4.0 3.3 600
11.0 9.3 7.9 6.7 5.8 4.9 4.2 3.3 650
11.2 9.5 8.1 6.9 5.9 5.0 4.3 3.5 700
11.4 9.7 8.3 7.1 6.1 5.2 4.4 3.6 750
9.8 8.4 7.2 6.2 5.3 4.5 3.6 800
8.6 7.4 6.4 5.4 4.6 3.7 850
8.7 7.5 6.5 5.5 4.7 3.8 900
8.8 7.6 6.6 5.6 4.8 3.9 950
7.3 6.6 5.7 4.8 4.0 1000
6.7 5.7 4.9 4.0 1050
5.8 5.0 4.1 1100
5.9 5.0 4.2 1150
5.1 4.2 1200
5.1 4.3 1250
4.2 1300
1350
Fre
que
ncy (M
Hz)
Vdd (V)
Not Operational
Gflops/W
[Lee ESSCIRC 2014]
470nm width
700nm width
Si Waveguides
Key Device Components
Slide 14
submitted to Nature – August 2015 – please keep confidential – funded in part by DARPA POEM program, Stojanovic, Asanovic, Popovic, Ram
Vertical couplers
Waveguide Taper
Diffraction GratingWaveguide
[Wade OIC 2015]
Key Device Components
Slide 15
SiGe from PMOS strain engineering used in Photodetectors
SiGe Photodetector
Waveguide
Waveguide Taper
[Orcutt 2013, Alloatti APL 2015]
Key Device Components
Slide 16
Integrated Heater
Drop Waveguide
Output Waveguide
Modulator Microring
Input Waveguide[Shainline OL 2013, Wade OFC 2014]
Transmitter
• Driven by a CMOS logic inverter (1.2Vpp)
• 5 Gb/s data rate at ~30fJ/b with >6dB extinction ratio, 3dB insertion loss
– Up to 12 Gb/s with 2-3dB extinction
InTransmit Site 50um
PRBS31
8:1
In/Out Grating
Couplers
VBIAS
Modulator Driver
Modulator
[Wade OFC 2014][Sun VLSI 2015]
Slide 17
Receiver
• Low parasitics from monolithic integration enable single-stage 5kΩ TIA receiver
• 10 Gb/s operation at 290 fJ/bit with 8.3uA sensitivity
VPD
φ
φ
Clock Buffers
OutA
OutB-
+Dummy TIA
TIA
5k Ω
5k Ω
Photo-
detector
-
+
ReceiverReceive Site 50um
BER Checker
2:8
Input Grating Coupler
[Georgas VLSI 2014]
Slide 18
5 Gb/s Chip-to-Chip Link
Rx Out A
Rx Out B
Chip 1
PRBS31
8:1
Thermal Tuner
PD Tx
Chip 2
BER Check
2:8
PD
RxOptical
Amplifier1189nm
Laser
3.8 dBm
-0.2 dBm
-3.2 dBm
-10.5 dBm
-14.5 dBm
-7.2 dBm
0.8 dBm
-5.9 dBm-3.2 dBm 9.65uA
2.04uA
Bit 1
Bit 0-9.9 dBm
Laser Power
Bit 1
Bit 0
Time [ps]
Decis
ion T
hre
shold
[uA
]
0 50 100 150 200
-2
0
2
1e-0011e-0021e-0031e-0041e-0051e-0061e-0071e-0081e-0091e-010<1e-010
Slide 19
• “zero-change” monolithic competitive with state-of-the-art heterogeneous platforms
• 680 fJ/bit, 14mW optical power [Zheng PTL 2012**]
5 Gb/s Link Efficiency Summary
Slide 20
*Includes all closed-loop circuits + 0.5 nm tuning power **0.5nm tuning power only
662 fJ/bit for circuits
Thermal Tuning*275 fJ/bit (42%)
Receiver357 fJ/bit (54%)
Optical power: 3.6mW (13mWextrapolated without amplifier)
Modulator Driver30 fJ/bit (5%)
• Not using our best devices in the link– 1dB loss couplers [Wade, OIC 2015] (on the same chip instead of 4dB in the link)
– 5-10x better photodetector (0.1-0.2 A/W photodetector on the same chip)
• Expect to obtain >40x smaller laser power (65fJ/b optical)
5 Gb/s Link Efficiency Summary
Slide 21
662 fJ/bit for circuits
Thermal Tuning*275 fJ/bit (42%)
Receiver357 fJ/bit (54%)
Optical power: 3.6mW (13mWextrapolated without amplifier)
*Includes all closed-loop circuits + 0.5 nm tuning power
Modulator Driver30 fJ/bit (5%)
560 fJ/bit for laser – wall-plug**
**11.6% QD laser wall-plug efficiency
Electronic-Photonic Packaging
• Flip-chip onto FR4 PCB using C4 bumps
• Selective substrate removal of optical transceiver regions
Die-thinned chip with selective substrate removal
Processor
and SRAM
regions
WDM
transceiver
regions Epoxy
Slide 22
Optical Memory System Demo
• Chip 1 acts as processor, Chip 2 acts as memory• Custom memory controller, DRAM interface emulator
– Takes advantage of full duplex (as opposed to half-duplex) memory interface
• Video demonstration (thermal stress test)http://www.nature.com/nature/journal/v528/n7583/fig_tab/nature16454_SV1.html
Mem
ory
Co
ntr
olle
r
PD
50/50 Power Splitter
PD
Transmitter
Transmitter
Receiver
Receiver
Inte
rfac
e
RIS
C-V
Pro
cess
or
1M
B M
emo
ry A
rray
Chip (Processor Mode) Chip (Memory Mode)
Processor to memory link
Command + address + write data
Memory to processor link
read data
OpticalAmplifier
OpticalAmplifier
Laser
Single-Mode Fiber
1MB Memory Array (Inactive)
RISC-V Processor (Inactive)
Slide 23
Tx and Rx DWDM Transceiver Banks
Slide 24
-16
-14
-12
-10
-8
-6
-4
-2
0
1170 1172 1174 1176 1178 1180 1182 1184 1186 1188 1190
Tra
nsm
issi
on [
dB]
Wavelength [nm]
01
23
45 6
78
9 10
0
12 3
4
Tx
Rx
-12
-10
-8
-6
-4
-2
0
1170 1172 1174 1176 1178 1180 1182 1184 1186 1188 1190
Tra
nsm
issi
on [
dB]
Wavelength [nm]
0 1
2 34 5
6
78 9
100
9 10
Advanced lithography enables tight ring resonance control
11 x 8 Gbps Tx Demonstration
• 11 rings, each demonstrating 8 Gbps modulation– Independently testing one at a time – Potential for 88 Gb/s on a single fiber/waveguide– Each ring is auto-locked
Slice 8 Slice 9 Slice 10
Slice 4 Slice 5 Slice 6 Slice 7
Slice 0 Slice 1 Slice 2 Slice 3
>8 Gb/s limited by duty-cycle distortion of off-chip clock source
Slide Slide 19
[Sun JSSC 2016]
Going Faster – PAM2 and PAM4
Direct Digital-to-Optical Conversion!
[Moazeni et al, ISSCC 2017]
Chip floorplan
Transmitter eye diagrams
• Extinction ratio (ER): 3dB, Insertion loss (IL): 5.5dB
• PAM4 coding used: (0,5,10,15)
• 42fJ/b driver energy efficiency
Transmitter specs
29
• Leverage tight electronic-photonic integration to create new, more sensitive receiver structures
– Differential, DDR receiver
Improved Rx Topologies
[Nandish Mehta et al. ESSCIRC16]
Platform Performance Summary
Metric [Beamer ISCA 2010]Conservative Estimates
45nm SOI Platform
Bulk Photonics Platform*
Waveguide Loss 4 dB/cm 3.7 dB/cm 10.5 dB/cm
Vertical Coupler Loss 1 dB 1 dB 3 dB
Tx Data Rate 10 Gb/s 20 Gb/s 5 Gb/s
Tx Energy Per Bit 120 fJ/b 42 fJ/b 350 fJ/b
Rx Data Rate 10 Gb/s 12 Gb/s 5 Gb/s
Rx Energy Per Bit 80 fJ/b 297 fJ/b 1700 fJ/b
Rx Sensitivity 10 μA 8 μA 36 μA
PD Responsivity 0.9 A/W 0.44 A/W 0.2 A/W
Thermal Tuning Efficiency 1.6 μW/GHz 3.8 μW/GHz 10 μW/GHz
*considerably slower process than one assumed in [Beamer ISCA 2010]
• Comparison to a proposal for the processor-memory system we published 6 years ago
• Meeting/exceeding most system specsSlide 31
Array
Periphery
DDR3-1333 Technology2 Gb die cost ~90¢
8 mm
8m
m
DRAM processes heavily optimized for cost
Poly Si Photonics in Bulk CMOS
Key constraints:
Bulk Substrate
Low Cost
No SiGeMeade et al. OI 13, VLSI Tech Symp 14
Micron wafers
17
Memory: Bulk photonics integration
DTI adjacent to STI
[Meade et al. VLSI Tech Symp 14, Sun et al VLSI Ckts Symp 14]
Micron D1L Reticle
180nmBulk chip
First-ever link result with bulk CMOS photonics
WDM in bulk-photonics - Tx
• All slices BER checked at 5Gb/s
• 45Gb/s aggregate rate per waveguide34
WDM in bulk-photonics - Rx
• All receive slices functional and BER checked at 5Gb/s
• Single fiber more I/O BW than x16 DDR4 part
Conclusions
• Silicon-photonics – enabler of new capabilities
– Think “new on-chip inductor” or “new on-chip t-line”
• Potentially revolutionize many applications despite
slowdown in CMOS scaling
– VLSI compute and network infrastructure just a start
• Need process, device, circuit and system-level
understanding
36