Building Modern Integrated Systems: A Cross-cut Approach

47
Building Modern Integrated Systems: A Cross-cut Approach (The Electrical, The Optical and The Mechanical) Vladimir Stojanović Integrated Systems Group Massachusetts Institute of Technology

Transcript of Building Modern Integrated Systems: A Cross-cut Approach

Page 1: Building Modern Integrated Systems: A Cross-cut Approach

Building Modern Integrated Systems:

A Cross-cut Approach (The Electrical, The Optical and The Mechanical)

Vladimir Stojanović

Integrated Systems Group

Massachusetts Institute of Technology

Page 2: Building Modern Integrated Systems: A Cross-cut Approach

2

Chip design is going through a change

“The Processor is the new Transistor” [Rowen]

Intel 4004 (1971):

4-bit processor,

2312 transistors,

~100 KIPS,

10 micron PMOS,

11 mm2 chip

Sun Niagara 8 GPP cores (32 threads)

Intel®

XScale

Core 32K IC

32K DC

MEv2

10

MEv2

11

MEv2

12

MEv2

15

MEv2

14

MEv2

13

Rbuf

64 @

128B

Tbuf

64 @

128B

Hash

48/64/1

28 Scratc

h

16KB

QDR

SRAM

2

QDR

SRAM

1

RDRA

M 1

RDRA

M 3

RDRA

M 2

G

A

S

K

E

T

PCI

(64b)

66

MHz

IXP280

0 16b

16b

1

8 1

8

1

8 1

8

18 18 18

64b

S

P

I

4

or

C S

I

X

Stripe

E/D Q E/D Q

QDR

SRAM

3 E/D Q

1

8 1

8

MEv2

9

MEv2

16

MEv2

2

MEv2

3

MEv2

4

MEv2 7

MEv2 6

MEv2 5

MEv2

1

MEv2 8

CSRs

-Fast_wr -UART

-Timers -GPIO

-BootROM/SlowPort

QDR

SRAM

4 E/D Q

1

8 1

8

Intel Network Processor 1 GPP Core 16 ASPs (128 threads)

IBM Cell 1 GPP (2 threads) 8 ASPs

Picochip DSP 1 GPP core 248 ASPs

Cisco CSR-1 188 Tensilica GPPs

1000s of processor cores and

accelerators per die Asanovic

Already have more devices than can use at once

Limited by power density and bandwidth

Page 3: Building Modern Integrated Systems: A Cross-cut Approach

Subthreshold leakage: Game over for CMOS

CMOS circuits have well-defined minimum energy

Caused by leakage and finite sub-threshold swing

Need to balance leakage and active energy

Limits energy-efficiency, regardless how slow the circuit runs

Energy/op vs. Vdd Energy/op vs. 1/throughput

101

102

103

104

105

0

20

40

60

80

100

No

rma

lize

d E

ne

rgy

/op

1/throughput (ps/op)

0.1 0.2 0.3 0.4 0.5

5

10

15

20

25

No

rmalized

En

erg

y/c

ycle

Vdd (V)

Etotal

Edynamic

Eleak

Scale Vdd & VT:

3

Page 4: Building Modern Integrated Systems: A Cross-cut Approach

4

Wire and I/O scaling

Increased wire resistivity makes wire caps scale very slowly

Can’t get both energy-efficiency and high-data rate in I/O

On-chip wires

copper resistivity

0

2

4

6

8

10

12

14

16

18

0 5 10 15 20 25

Chip2Chip Backplane

En

erg

y-c

ost

[pJ/b

]Data-rate [Gb/s]

Best electrical links

Loss ~10dB

Loss ~20-25dB

On-chip wires I/O

Page 5: Building Modern Integrated Systems: A Cross-cut Approach

Opportunity for integrated system design

Energy-efficient computation and communication

CMOS – need cross-cut

approach to keep scaling

performance

Circuits & Logic

Tx, Rx, Ctrl, Meas

Cu

Interconnect

and switch

technology

Circuit modeling,

Characterization

Design

Optimization Network &

µArchitecture

Communications

(Eq., Mod, Coding)

0 1 2 30

0.5

1

1.5

2

2.5

Data Rate Density (Gbps/um)

En

erg

y/B

it (

pJ/B

it)

Equalized, 30mV Eye

Equalized, 50mV Eye

Equalized, 90mV Eye

Repeated

MOSFET

Φ Φ

ΦΦ

Φ

in+ in-

Φ

IPHOTO

5

Page 6: Building Modern Integrated Systems: A Cross-cut Approach

Manycore SOC roadmap fuels

bandwidth demand 64-tile system (64-256 cores) - 4-way SIMD FMACs @ 2.5 – 5 GHz

- 5-10 TFlops on one chip

- Need 5-10 TB/s of off-chip I/O

- Even higher on-chip bandwidth

2 cm

2 cm

Intel 48 core -Xeon

6

Page 7: Building Modern Integrated Systems: A Cross-cut Approach

Cross-layer design approach

NoC metrics

Manycore

hardware

Build modeling tools for design-space exploration and vertical

integration

NoC topologies

Routers, NoC

Channel Technologies

1 2 3 0

1

2

3

Throughput Density ( Gbps / um )

W i r e W

i d t h

a n

d S

p a c

e (

u m

)

Eq . , Width

Eq . , Space

Rep . , Width

Rep . , Space

0 1 2 3 0

0 . 5

1

1 . 5

2

2 . 5

Data Rate Density ( Gbps / um )

E n

e r g

y / B

i t (

p J / B

i t )

Equalized , 30 mV Eye Equalized , 50 mV Eye Equalized , 90 mV Eye Repeated

Link metrics

Link design parameters

Offered BW Offered BW

Pow

er

Rep.. Eq.

+

-Vth

+

-Vth

+-

+-

+-

D

DD

clk

clkid

id̂

-y1

y1

1

0

VsVp

Vs

Vs

Vp

Vp

Sp

Wth

Td

WLCM

w0

w1

w2

Apps

OS

ISA

Reg

iste

r

Mu

x

Pre-Driver Mod-DriverReceiver

Front-end

Φ Φ Φ

Φ Φ

+

Samplers &

Monitoring

Dem

ux

Reg

iste

r

PLL or

Opt. Clk

1 2 3 4 in PLL or

Opt. Clk

Phase

Adjust

Reg

iste

r

Mu

x

Pre-Driver Mod-DriverReceiver

Front-end

Φ Φ Φ

Φ Φ

+

Samplers &

Monitoring

Dem

ux

Reg

iste

r

PLL or

Opt. Clk

1 2 3 4 in PLL or

Opt. Clk

Phase

Adjust

7

Page 8: Building Modern Integrated Systems: A Cross-cut Approach

8

Channel model

Link

power model Link

performance

model

Equalization

coefficient: w, y1

Data rate density,

latency, eye opening,

sampling delay(Td)

RLGC parameters

Energy-per-bit

(Eb)

Transfer function:

T(f), Tc(f)

R, C model

for LCM & Inverter

Wth, Sp

target

wire length: l

2D field

solver

Linearized

RC swtich

extraction

Transistor: spice model

Wire: metal conductance,

dielectric constant, etc.

Technology information

2D RLGC matrices

database

Wire Model

Normalized

R(Ohm-um), C(fF/um)

switch model database

Circuit Model

Wth, Sp

WLCM, Vs, Vp

Circuit type:

LCM|Inverter,

WLCM, Vs, Vp

Circuit type:

LCM | Inverter

target

data rate

density

Link architecture:

FFE, DFE tap numbers

Capturing the

wire+ circuit

interactions

Huge design-space

Physical modeling – Equalized interconnects

8

Kim and Stojanovic

ICCAD07,

D&T 2008

Sredojevic and Stojanovic

ICCAD08

Page 9: Building Modern Integrated Systems: A Cross-cut Approach

Optimized on-chip links

D

D

+

-

Deco

din

g B

lock

Weak Driver

+

-

Strong Driver

Amplitude Control

D0 D0B

IBIAS

I0

- +

IBIAS

8

20

A<19:0>

Transition Signals : P1_P, P2_P, N1_P, N2_P

A<19:0>

60uA

60uA

80uA

27u

9.4u

P2_P

A<4:0>

Effective Receiver

Admittance

P1_P P2_P

N1_P N2_P

+

20

4.3u

1.75u

I1 I2

0 1 2 30

0.5

1

1.5

2

2.5

Data Rate Density (Gbps/um)

En

erg

y/B

it (p

J/B

it)

Equalized, 30mV Eye

Equalized, 50mV Eye

Equalized, 90mV Eye

Repeated

voltage

swing

distance

channel attenuation

10mm wire

Kim and Stojanovic, ISSCC09, JSSC June 2010

Energy-efficient digital pre-emphasis

Nonlinear predistortion, mismatch robustness 90nm CMOS

Page 10: Building Modern Integrated Systems: A Cross-cut Approach

Optimized off-chip links

Song and Stojanovic, VLSI09, JSSC May 2011

DAC DAC

DAC DAC

DAC DAC

2 7-bit DACs 7-bit DAC

Sign(Vesref, Way3)

Sign(Vssref, Way3)Sign(Vesref, Way2)

Sign(Vssref, Way2)

Sign(Vesref, Way1)

Sign(Vssref, Way1)

Vin+

Vin-

TOR1+

TOF1+

V2TΦS1

ΦEVA

RF

TOR2+

TOF2+

V2T

ΦEVAΦS2 R

F

TOR2-

TOF2-

ΦEVA

V2TΦS2

RF

TOR1-

TOF1-

V2TΦS1

ΦEVA

2-tap

T2VRF

VO+

VO-

MUX

Sign(ES[n],way0)

Iss1,2,ref

Feedforward Equalizer

(FSE) Feedback Equalizer

(DFE)

Sensors for Adaptation

Way0

Way1

Way2

Way3

~bk,Way0

~bk,Way0

~bk,Way1

~bk,Way2

~bk,Way3

ΦS1ΦS2ΦEVA

Φ315Φ0Φ45 ΦS1ΦS2ΦEVA

Φ45Φ90Φ135 ΦS1ΦS2ΦEVA

Φ135Φ180Φ225ΦS1ΦS2

ΦEVA

Φ225Φ270Φ315

Way0

CONF0

CONF1

CONF2

CONF3

Iref

CONF0

CONF1

CONF2

CONF3

ΦEVA_

I1 I2

SnapShot

SnapShot

SnapShot

ScanChain

ScanChain

ScanChain

SnapShot

SnapShot

SnapShot

ScanChain

ScanChain

ScanChain

Way3

ScanChainCONF

SnapShot

SnapShot

SnapShot

ScanChain

ScanChain

ScanChain

SnapShot

SnapShot

SnapShot

ScanChain

ScanChain

ScanChain

Way2

ScanChainCONF

SnapShot

SnapShot

SnapShot

ScanChain

ScanChain

ScanChain

SnapShot

SnapShot

SnapShot

ScanChain

ScanChain

ScanChain

Way1

ScanChainCONF

SI

SnapShot

SnapShot

SnapShot

ScanChain

ScanChain

ScanChain

SnapShot

SnapShot

SnapShot

ScanChain

ScanChain

ScanChain

ScanChain

CONF0

SO

7-bit DAC

Ies,ref

7-bit DAC

Way0

Way3

Way1

Way2

7-bit DAC 7-bit DAC

I+α,ref I+α,refIi1,ref Ii2,ref

ΦEVA

Sign(d0[n],way0),Sign(d1[n],way0)

Sredojevic and Stojanovic,

CICC10, JSSC Aug 2011

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

100

200

300

400

mV

(a) FSE Input Eye Openning @4Gbps

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

100

200

300

400

mV

Delay between Data and CLK (Data Cycle - UI)

(b) FSE Output Eye Openning @4Gbps

I1 = 0 I2 = 15.5u

I1 = 20.0u I2 = 4.5u

I1 = 30.0u I2 = 0

FSE output eye openning

single-tap output eye openning

Oversampled Discrete-Time Rx Equalizer

- No need for CDR – only Adaptive Eq

5b linear

3pJ/b @ 6Gb/s

90nm CMOS

Transmitter

Scan-chain

VDD

Therm. Code

[63:0]1

0

0

1

0

1

VDD

Sign

1

0

0

1

0

1

Serializer

4bit

LUT 16 x 6b + 1b

Driver Linearization

Sequence Coding

Bit sequence

Pattern

dependent

code

Output

Voltage

E[]

S

-60 -40 -20 0 20 40 60-0.8

-0.55

-0.3

-0.05

0.2

0.45

0.70.8

Memory Code

Sta

tic tra

nsfe

r cu

rve

[V

dd]

880mV

300mV

480 mV

880mV

300mV

480 mV

Unequalized Equalized

Digital Tx Equalizer - Energy-efficient

Dynamic Impedance Modulation

<1pJ/b @ 4Gb/s

90nm CMOS

Page 11: Building Modern Integrated Systems: A Cross-cut Approach

Bandwidth, pin count and power scaling

Need 16k pins

in 2017 for HPC*

1 Byte/Flop

256 cores

2 TFlop/s signal pins @ 20 Gb/s/link

2,4 cores

Package p

in c

ount

*> half pins for power supply

Page 12: Building Modern Integrated Systems: A Cross-cut Approach

Emerging devices can help

Energy-efficient computation and communication

CMOS – need cross-cut

approach to keep scaling

performance

Post-CMOS – need cross-cut

approach to guide new

devices/systems

Circuits & Logic

Tx, Rx, Ctrl, Meas

Si-Photonics Cu

Interconnect

and switch

technology

Circuit modeling,

Characterization

Design

Optimization Network &

µArchitecture

Communications

(Eq., Mod, Coding)

0 1 2 30

0.5

1

1.5

2

2.5

Data Rate Density (Gbps/um)

En

erg

y/B

it (

pJ/B

it)

Equalized, 30mV Eye

Equalized, 50mV Eye

Equalized, 90mV Eye

Repeated

MOSFET

Φ Φ

ΦΦ

Φ

in+ in-

Φ

IPHOTO

Page 13: Building Modern Integrated Systems: A Cross-cut Approach

CMOS photonics density and energy

advantage

13

Metric Energy

(pJ/b)

Bandwidth

density

(Gb/s/μ)

Global on-chip photonic link 0.1-0.25 160-320

Global on-chip optimally repeated electrical link 1 5

Off-chip photonic link (100 μ coupler pitch) 0.1-0.25 6-12

Off-chip electrical SERDES (100 μ pitch) 5 0.1

Assuming 128 x 10Gb/s wavelengths on each waveguide, and 20Gb/s electrical I/O

Page 14: Building Modern Integrated Systems: A Cross-cut Approach

Monolithic Si-Photonics for core-to-core and

core-to-DRAM networks

14 14

Supercomputers

Embedded apps

Si-photonics in advanced

CMOS and DRAM process

NO costly process changes

Bandwidth density – need dense WDM

Energy-efficiency – need monolithic integration

Page 15: Building Modern Integrated Systems: A Cross-cut Approach

Many architectural studies show promise

15

[Shacham’07]

[Petracca’08]

[Vantrease’08]

[Psota’07]

[Kirman’06]

[Joshi’09]

[Pan’09]

[Batten’08] [Beamer’10] [Koka’08-10]

Page 16: Building Modern Integrated Systems: A Cross-cut Approach

Big Challenge: Efficient integration with

circuits in advanced CMOS process

16

Reg

iste

r

Mu

x

Pre-Driver Mod-DriverReceiver

Front-end

Φ Φ Φ

Φ Φ

+

Samplers &

Monitoring

Dem

ux

Reg

iste

r

PLL or

Opt. Clk

1 2 3 4 in PLL or

Opt. Clk

Phase

Adjust

Reg

iste

r

Mu

x

Pre-Driver Mod-DriverReceiver

Front-end

Φ Φ Φ

Φ Φ

+

Samplers &

Monitoring

Dem

ux

Reg

iste

r

PLL or

Opt. Clk

1 2 3 4 in PLL or

Opt. Clk

Phase

Adjust

Dense WDM – 128 wavelengths/waveguide - >1Tb/s per waveguide

Need 1000’s of transceivers on die with < 100fJ/bit cost at > 10Gb/s !

- Optimized modulator circuits/devices

- Optimized receiver circuits/photo-detector

- Optimized thermal tuning

Page 17: Building Modern Integrated Systems: A Cross-cut Approach

Laser energy increases with data-rate – Limited Rx sensitivity

– Modulation more expensive -> extinction ratio / insertion loss trade-off

Tuning costs decrease with data-rate

Moderate data rates most energy-efficient

Reg

iste

r

Mu

x

Pre-Driver Mod-DriverReceiver

Front-end

Φ Φ Φ

Φ Φ

+

Samplers &

Monitoring

Dem

ux

Reg

iste

r

PLL or

Opt. Clk

1 2 3 4 in PLL or

Opt. Clk

Phase

Adjust

Reg

iste

r

Mu

x

Pre-Driver Mod-DriverReceiver

Front-end

Φ Φ Φ

Φ Φ

+

Samplers &

Monitoring

Dem

ux

Reg

iste

r

PLL or

Opt. Clk

1 2 3 4 in PLL or

Opt. Clk

Phase

Adjust

512 Gb/s aggregate throughput

assuming 32nm CMOS

Georgas CICC 2011

Need to optimize carefully

Page 18: Building Modern Integrated Systems: A Cross-cut Approach

DWDM link efficiency optimization

Optimize for min energy-cost

Bandwidth density dominated by circuit and

photonics area (not coupler pitch)

10x better than electrical bump limited

200x better than electrical package pin limit

18

Electrical

bump-pitch

limited to

<1Tb/s/mm2 >10x

Package pin limit

0.05 Tb/s/mm2

Page 19: Building Modern Integrated Systems: A Cross-cut Approach

Photonic DRAM Network Organization

Important Concepts

- Power/message switching (only to active DRAM chip in

DRAM cube/super DIMM)

- Vertical die-to-die coupling (minimizes cabling - 8 dies per

DRAM cube)

-Command distributed

electrically (broadcast)

- Data photonic (single writer

multiple readers)

MC 1

MC 16

Mem

Sch

edu

ler

MC K

CPUDRAM cube 1

DRAM cube 4

Super DIMM

cmdDwr

Drd

( cube 1, die 1)

cmdDwr

Drd

( cube 1, die 8)

Dwr

Drd

DRAM cube 4

Super DIMM K

die-die switch

Laser in

Modulator bank

Receiver/PD bank

Tunable filterbank

Through silicon via

Through silicon via holeBeamer ISCA 2010 Processor die

Page 20: Building Modern Integrated Systems: A Cross-cut Approach

Optimizing DRAM with photonics

Floorplan

Beamer ISCA 2010

P1 P4

Page 21: Building Modern Integrated Systems: A Cross-cut Approach

Cross-layer modeling identifies key device

requirements

Feedback to device designers

Waveguide loss and Through loss limits for 2 W optical laser power

Optical Laser Power Die Area Overhead

Thro

ugh loss (

dB

/rin

g)

Waveguide loss (dB/cm)

21

Page 22: Building Modern Integrated Systems: A Cross-cut Approach

Significant integration activity,

but hybrid and older processes …

[Luxtera/Oracle/Kotura] [IBM]

[HP]

[Watts/Sandia/MIT]

[Intel]

130nm

thick BOX SOI

130nm

thick BOX SOI

Bulk CMOS

Backend

monolithic

[Lipson/Cornell]

[Kimerling/MIT]

[Many schools]

22

Page 23: Building Modern Integrated Systems: A Cross-cut Approach

Optical Mode

Monolithic CMOS photonic integration

Photo credit: Intel

Polysilicon - transistor gates, local interconnect and resistors

Use for photonic components instead or with silicon body in SOI

Sub-100nm lithography has 1-5 nm design grid

Enables edge roughness necessary for photonic devices

23

Page 24: Building Modern Integrated Systems: A Cross-cut Approach

65 nm bulk CMOS Texas Instruments

90 nm bulk CMOS IBM cmos9sf

45 nm SOI CMOS IBM 12SOIs0

24

32 nm bulk CMOS Texas Instruments

EOS Platform for Monolithic CMOS

photonic integration

-200 0 200 400 600 800 1000

-14

-12

-10

-8

-6

-4

-2

0

Tra

nsm

issio

n, dB

Frequency, GHz

2007

2011

Create integration platform to accelerate

technology development and adoption

Joint work with Ram and Popovic

Page 25: Building Modern Integrated Systems: A Cross-cut Approach

A 32nm bulk CMOS photonic platform

Monolithic CMOS photonic platform integrated with CMOS circuits

32nm process – fabrication support from Texas Instruments

Robust post-processing steps at MIT

Second-order resonator filterbank shows process precision

Great on-die matching (rings track within 40GHz)

Record thermal heating efficiency 25uW/K

Orcutt et al – CLEO 2008, Optics Express 2011 25

Page 26: Building Modern Integrated Systems: A Cross-cut Approach

Polysilicon and Silicon Photonics on Thin BOX IBM SOI

Reg

iste

r

Mu

xPre-Driver Mod-Driver

Receiver

Front-end

Φ Φ Φ

Φ Φ

+

Samplers &

Monitoring

Dem

ux

Reg

iste

r

PLL or

Opt. Clk

1 2 3 4 in PLL or

Opt. Clk

Phase

Adjust

Electrical and photonic integration – test row

EOS: A 45nm SOI Monolithic Photonic Platform

6 rows of electronic-photonic

WDM links with

body and polysilicon

photonic devices

54 Transmit-receive test-

sites,

~3M transistors and

hundreds of photonic devices

Body and polysilicon photonic devices

Filterbanks, waveguide paperclips, rings, stand-

alone modulators and photodetectors

26

Page 27: Building Modern Integrated Systems: A Cross-cut Approach

Integration of photonics into VLSI tools

27

VERSION 5.6 ;

BUSBITCHARS "[]" ;

DIVIDERCHAR "/" ;

MACRO block_electronic_etch_row_1

CLASS BLOCK ;

ORIGIN -208 -1794 ;

FOREIGN block_electronic_etch_row_1 208 1794 ;

SIZE 2488 BY 165 ;

SYMMETRY X Y R90 ;

PIN heater_a_1

DIRECTION INOUT ;

USE SIGNAL ;

PORT

LAYER ua ;

RECT 431 1870.5 436.5 1882 ;

END

END heater_a_1

...

OBS

LAYER m1 ;

RECT 208 1794 2696 1959 ;

...

END

END block_electronic_etch_row_1

END LIBRARY

modulator.LEF

Layout of

photonics

Layout of

Circuit blocks

abstract

abstract

LEF

LEF

LEF of standard cells, I/O pads

(provided by ARM)

Chip-level verilog

(instantiation of.LEF macros and

connectivity)

Technology files

SOC Encounter

Place and route

Floorplan

(macro placement,power grid, routing

Constraints)

Place&routed

layout

Photonic device

p-cell abstract

custom photonics-friendly auto-fill

layout

Page 28: Building Modern Integrated Systems: A Cross-cut Approach

Platform Organization

28

Page 29: Building Modern Integrated Systems: A Cross-cut Approach

A full electro-optical test setup

29

DUT Chip

Board

HS

Clocks

FPGA

Control

Board

Fiber PositionerFiber

Positioner

USB to laptop

Microscope

Page 30: Building Modern Integrated Systems: A Cross-cut Approach

Extremely good dimensional tolerances

in 45nm SOI

Good body waveguide loss

3.7dB/cm at ~1220nm

30

Page 31: Building Modern Integrated Systems: A Cross-cut Approach

Integrated Delta-Sigma Heat Control

Tuning efficiency 2.6mW/nm (32.4mW/2π)

On fully substrate removed die

~10mW required

to retune all 8 rings

Thermal tuning BW

lower than 500kHz

Tuning control overhead

negligible

31

Page 32: Building Modern Integrated Systems: A Cross-cut Approach

Current-sensing optical data receiver

Georgas ESSCIRC 2011

Receiver detects photo current

50fJ/b, uA sensitivities, 3-5Gb/s 32

Page 33: Building Modern Integrated Systems: A Cross-cut Approach

Receiver sensitivity

Φ Φ

ΦΦ

Φ

in+ in-

Φ

IPHOTO

Exponential Dependence

on Wire Capacitance

Linear Dependence on

Photo-Detector

Capacitance

33

Page 34: Building Modern Integrated Systems: A Cross-cut Approach

45 GHz

3 dB bandwidth

19 dB

extinction

Modulator test site

• Extinction ratio 19dB

• 45GHz 3dB bandwidth

• Carrier lifetime ~2-3ns

• Requires flexible drive circuits

• Sub-bit pre-emphasis

• Split-supplies

Silicon carrier injection modulator

monolithically integrated with

transistors

Page 35: Building Modern Integrated Systems: A Cross-cut Approach

First ever dynamic electro-optic test in

45nm SOI

Modulator Driver

Modulator

Transistors and Photonics can be built together in

advanced CMOS!

Silicon carrier injection modulator

monolithically integrated with

transistors

35

Modulation data-rate up to 1Gb/s

5-10 Gb/s achievable with device and biasing optimization

Lots of room to improve circuit/device designs

Page 36: Building Modern Integrated Systems: A Cross-cut Approach

Improving computation efficiency

Energy-efficient computation and communication

CMOS – need cross-cut

approach to keep scaling

performance

Post-CMOS – need cross-cut

approach to guide new

devices/systems

Circuits & Logic

Tx, Rx, Ctrl, Meas

Si-Photonics Cu

Interconnect

and switch

technology

Circuit modeling,

Characterization

Design

Optimization Network &

µArchitecture

Communications

(Eq., Mod, Coding)

0 1 2 30

0.5

1

1.5

2

2.5

Data Rate Density (Gbps/um)

En

erg

y/B

it (

pJ/B

it)

Equalized, 30mV Eye

Equalized, 50mV Eye

Equalized, 90mV Eye

Repeated

NEMS relay MOSFET

Φ Φ

ΦΦ

Φ

in+ in-

Φ

IPHOTO

Page 37: Building Modern Integrated Systems: A Cross-cut Approach

Nearly ideal switching characteristics: Low on-state resistance (Ron <1kΩ)

Infinite off-state resistance Zero off-state leakage

Nano-electro-mechanical (NEM) relays

30mm

90nm

Body

Drain

Source

Body

GateA

A’

Relay schematic

Gate

Oxide

27.5mm

Channel

Joint work with T-J. King Liu, E. Alon and D. Markovic (UCB, UCLA)

37

Page 38: Building Modern Integrated Systems: A Cross-cut Approach

Why not use relays to compute?

- Need to compare at block level -

Delay Comparison vs. CMOS

Single mechanical delay vs. several electrical gate delays

For reasonable load, NEMS delay unaffected by fan-out/fan-in

Area Comparison vs. CMOS

Larger individual devices

But often need fewer devices to implement same function

4 gate delays 1 mechanical delay

F. Chen et al., “Integrated Circuit Design with NEM Relays,” ICCAD 2008

NEMS: 12 relays

38

Page 39: Building Modern Integrated Systems: A Cross-cut Approach

Scaled NEMS vs. CMOS adders

For similar area: >9x lower E/op, >10x greater delay

Scaled relays limited by contact surface energy

- 2aJ for 90nm litho – 50x better than 90nm CMOS

*D. Patil et. al., “Robust Energy-Efficient Adder Topologies,” in Proc. 18th IEEE Symp.

on Computer Arithmetic (ARITH'07).

9x

10x

Energy/op vs. Delay/op across Vdd

30x less capacitance

Lower device Cg, Cd

Fewer devices

2.4x lower Vdd

No leakage energy

Compare vs. Sklansky

CMOS adder*

90nm technology

39

Page 40: Building Modern Integrated Systems: A Cross-cut Approach

Contact resistance

- Feedback from system level -

Low contact R

not critical

Good news for

reliability…

Can build test-

platforms that

work

Energy/op vs. Delay/op across Vdd & CL

40

Page 41: Building Modern Integrated Systems: A Cross-cut Approach

CLICKR technology development platform:

NEM relay-based circuits ISSCC 2010 – TD Award

41

F. Chen et al, ISSCC2010

M. Spencer et al, JSSC Jan’11

Page 42: Building Modern Integrated Systems: A Cross-cut Approach

Towards more complex designs

100

101

102

103

101

102

103

104

Delay(ns)

En

erg

y/o

p (

fJ)

Scaled MEM Relay

OTCT (90nm)

Dadda/HC (45nm)

16X Parallel

Y2 Y1 Y0 70

m

8mm

Multiplier building block: 7:3 compressor

98 relays – largest working relay circuit to

date

Input code

A1

Generate

A0

A1

A2

A3

A2

A4

A3

Y2

A1

Y2

A0

A1

A2

A3

A4

A5

A6

A1

A2 A2

A3A3A3

A4A4

A5

(a) (b)

(c)

A0

A1

A2

A3

A2

A4

A3

Y2

A1 A1

Kill

A0

Y2

A0

A1

A2

A3

A4

A5

A6

A1

A1

A1

A2A2A2

A2

A2

A3A3

A3

A3 A3 A3A3

A4A4A4A4

A4

A5 A5

A5

A6

Y2(d)

A0

A0

Y0 Y0

A1

A2

A3

A4

A5

A6

A1

A2

A3

A4

A5

A6

A0

A0

A1A1

A0 A1 A0 A1

A0

A0

A2

A4

A6

A1

A4

A6

A1A1

A2

A3A3

A5A5

Y0 Y0

A3

A5

(a) (b) (c)

Energy-benefit preserved even in

more complex functions

16-bit multipliers

Fariborzi ASSCC 2011

Page 43: Building Modern Integrated Systems: A Cross-cut Approach

Verilog-A model and Logic Synthesis created for NEMS technology

The flow supports multiple device designs and foundries

NEM Relay VLSI design infrastructure

Device

Verilog-A

Model

DRC

B B

Vout

A A

Schematic

Layout

P-cell

Verilog

Spectre

Place & Route

LVS

SynthesisLogic

Synthesis

Place & Route

Verilog-A

Model

Page 44: Building Modern Integrated Systems: A Cross-cut Approach

Toward full systems - NEM Relay scaling

1um litho

Scaled Relay size

20um x 20um

Sematech

Relay size

120um x 150um

0.25um litho

44

Page 45: Building Modern Integrated Systems: A Cross-cut Approach

Microcontroller Test-Chip

64x8b

Scratchpad

64x18b

Program Memory

32x10b

Program Stack

2 x 72 I/O Pads

Instruction

DecodeRegister File + ALU

Control Logic

12k relays

9mm x 6mm (using 85um x 53um devices) 45

Page 46: Building Modern Integrated Systems: A Cross-cut Approach

Summary

Cross-layer modeling and design key to continued system performance scaling Fast design-space exploration

Feedback to all layers of design hierarchy

Building early technology development platforms Feedback to device and circuit designers

Accelerated adoption

EOS Platform designed for multi-project wafer runs 50 fJ/b receivers with uA sensitivities

Record-high tuning efficiency with undercut ~ 25uW/K

First modulation demonstrated in 45nm process

CLICKR Platform designed for multiple foundries and devices Energy-gains preserved for larger blocks

Designs moving toward scaled devices and full VLSI systems

46

Page 47: Building Modern Integrated Systems: A Cross-cut Approach

Acknowledgments

Devices: Tsu-Jae King Liu, Rajeev Ram, Miloš Popović, Henry Smith

Architecture: Krste Asanović, Christopher Batten, Ajay Joshi

Circuits: Elad Alon, Dejan Marković

Students:

Devices - Jason Orcutt, Anatoly Khilo, Jie Sun, Cheryl Sorace, Eugen Zgraggen, Jaeseok Jeon, Rhesa Nathanael, Hei Kam

Circuits – Michael Georgas, Jonathan Leu, Ben Moss, Chen Sun, Fred Chen, Byungsub Kim, Hossein Fariborzi, Matthew Spencer, Chengcheng Wang, Kevin Dwan

Architecture - Yong-Jin Kwon, Scott Beamer, Chen Sun, Imran Shamim

DARPA MTO

Texas Instruments – Dennis Buss and Tom Bonifield

IBM and Trusted Foundry

Intel Corporation – Ian Young and Alex Kern

47