DJF_Scaling and Optimization - IIT

8/10/2019 DJF_Scaling and Optimization - IIT

1/64


2/64

2

Time (yr)

l o g ( P e r

f o r m a n c e

)

1. Introduction1. Introduction

The End of Scaling


3/64

3

Time (yr)

l o g ( P e r

f o r m

a n c e

)

The End of Scaling


4/644

Miniaturization l o g

( S y s t e m

P e r f o r m a n c e

)

Stop when you get to the top.The End of Scaling is Optimization


5/645

Outline

1. Review limitations to Scaling2. Optimizing technology within constraints3. Optimization results4. Technology projections


6/64

6

1. Limitations to Scaling

1. Electrostatic constraints2. Quantum mechanical leakage currents

3. Discreteness of matter and energy4. Thermodynamic limitations5. Practical and environmental constraints on power

Basic idea of Scaling:

Adjust dimensions,

voltages, & doping toachieve smaller FETwith same electrostaticbehavior.


7/64

7

Electrostatic constraints on FET design

1. A good design should have A. High output resistance

B. High gainC. Low sensitivity to variationsD. High transconductanceE. High drain currentF. High speed

2. Must choose a compromise:Short, but not so short that 2Deffects kill A, B, C.

}}

Long channelbehavior

Short channelbehavior

Gate

S D


8/64

8

Quantum Mechanical Tunneling Leakage Currents

Currents increase exponentially as the barriers become thinner.

FET 'ON'

Gnd GndVdd

FET 'OFF'

Gnd VddGnd

Gate insulator tunnelingSubthreshold leakageDirect source-to-drain tunneling

Drain-to-body tunnelingSource

DrainChannel

e -e -

e -

Everythingbecomes

leaky.


9/64

9

Atomistic effects

Statistical variation in thenumber of dopants, N, varies

as N1/2

, causing increasing V Tuncertainty for small N.

The number of dopant atomsin the depletion layer of a

MOSFET has been scalingroughly as Leff 1.5 .

249,403,263 Si atoms 68,743 donors 13,042 acceptors

[D. J. Frank, et al., 1999 Symp. VLSI Tech.]


10/64

10

Thermodynamic limitations

IsubVT

= I0 ee(VG-VT)/kT

The Boltzmann distribution determines the subthreshold slopeand leakage current, V T, and diode leakage currents, too.

VT can only be scaled by reducing the temperature, which isnot acceptable for many applications.

Speed is very sensitive to V T/VDD ratio.Irreversible computation => All

switching energy is converted to heat.

All leakage currents and IR drops areirreversible => More heat.

Source

DrainChannel

e -


11/64

11

Practical and Environmental Issues

Power consumption and heat removal arelimited by practical considerations.

Low power applications are often batterypowered Many must be lightweight => power < ~few watts. Disposable batteries can cost >> $500/watt over life of

device. Rechargeables can cost > $50/watt over life of device.

Home electronics is limited to


12/64

12

2. Optimizing Technology within Constraints

Practicality imposes powerconstraints.

Electrostatics imposesgeometric constraints

Thermodynamics imposesvoltage constraints.

Quantum mechanics imposes

miniaturization constraints due totunneling.

Fixed architectural complexity+ Fixed power constraints+ Device physics= Existence of an optimal tech-nology with maximal performance.

Declining availabledynamic power

overwhelms speedimprovements of

scaling

leakage increases dueto tunneling effects

leakage power

dynamic power

Large Small

l o g ( P er f or m a

n c e )

Large Small

P ow er

Miniaturization Miniaturization


13/64

13

Background: Schematic organization of anoptimization program

Area Model

Wire Capacitance

Device Structure

IV Model Leakage Model

Delay

Total Power Adjust for Latencyof Long Paths

Constrainedoptimizer

newvalues:

improvedguess

Fixedparameters

Variables:initial guess

tolerance adjustments tolerance adjustments

Wiringstatistics

Leakage Power

ThermalModel

Goal: optimize devicetechnology tomaximize chip-level

performance, subjectto power constraints.


14/64

14

Models and ApproximationsSystem Assumptions

Processor chip is assumed to have a fixed number of cores, each witha specified number of logic gates.

Only the logic within the cores is considered within the optimizations.

The clock and memory aspects of the chip are assumed to scale inthe same way as the logic (delay, power, and area).

Core-to-core and core-to-memory communication is not dealt with.

LogicMemory

C l o c k

Fudge Treat indetail

Fudge

Repeaters

Treat these by simple scaling from the logic part.


15/64

15

How much area do the processor cores take?

100% to 25%, generally decreasing with generation:

Dothan, 140M FETs

2 cores, 1.72B FETs

Prescott. 125M FETs

Power4, 174M

FETs

Alpha 21264 ('96)15M FETs, L1 cache only

100%

40%

40%

70%

~25%


16/64


17/64

17

Optimization Approaches

1. Engineering approach:Maximize system performance, at fixed power.

Use total logic transition rate (LTR),LTR = N gates x activity factor/logic depth x 1/DelayRelatively little dependence on architectural details.

2. Business approach:Maximize Return on Investment (ROI).


18/64

18

FET ModelUsing a general temperature-dependent short-channel FET model inwhich VT, tD, and t ox are coupled, halo doping effects are included, andVT is set by the doping.

Modified alpha power model:

=

ekT V V

E E

L E FI ekT

ekT

t W

V I T GS C

s

CH C eff ox

I GS D /

F)(/

)(0

0

10W FETLg=28nm

1mW FETLg=45nm

Fermi-Diracintegral of order


19/64


20/64

20

Power Calculation

B BOX subVT DYN TOT P P P P P 2+++== DD L H CKT DYN V V V C N P D )(2

1l

),,,,(7.1 , LW

Gox DDT off DDCKT subVT Lt V V I V N P =

),,()( , = ox DDT ox DD LW oxcoreox t V V J V D A P

),()( 231

2 DD Max B B DD LW

oxcore B B V F J V D A P =

= 1CKT N LTR

Dl

Note thatcross-throughpower is not

included.

The powers are computed separately for logic and for repeaters. = mean delay for a single loaded logic gate

Dl

is activity factor divided by logic depth. Usually ~0.012 inrecent optimizations.


21/64


22/64

22

Repeater Model

0.01 0.1 1

Latency Penalty Factor

50

60

70

80

90

100

R e p e a

t e r S p a c i n g

( u m

)

0.2

0.3

0.4

0.50.6

0.7

0.8

0.9

1

R e p e a

t e r W i d t h ( u m

)Repeater spacingRepeater width

Long wires receive repeaters with a spacing that isoptimized.Long wire delay can be absorbed into pipeline depth, butthe latency causes inefficiency, so we use a latencypenalty factor:

P econ =10 W/cm 2

0.01 0.1 1 10 100

CPU Core Power (W)

0.001

0.01

0.1

1

R e p e a

t e r S p a c i n g

( c m

)

9S10S

11S 12S 13S


23/64

23

Local Variation Modeling Variation sources:

Signal Coupling noise Supply noise

Statistical doping variations LER gate length variations

Consequences modeled: Increased static power

combine 1 sigma of doping, length, and noise Critical path delay distribution

yield-based, using estimated critical pathdistribution,

and 1 sigma of doping and length, and worst casenoise.

Single stage functionality use worst case (~6 sigma) of doping and length,

no noise.


24/64

24

Accounting for variations A complete most-probable worst-case-vector methodology is

used to handle both local and global variability.

123

Two variable example:

MPWCvector

Murphyvector

Worst-case vectors:

Blue curves are contours ofconstant probability.

Red curves are contours offunction to be minimized.


25/64

25

Power vs Frequency Variation Windows

C h i p

P o w e r

( W )

Clock Frequency (GHz)

Optimizations for high-perf processor, 45 nm node technology, PDSOIPower-constrained optimization parameters: V DD, VTn, VTp, LG, repeater size and spacing.

Area constrained is also constrained, to 5.6 cm 2, by adjusting the widths.

VDD-10%

VDD+5%

230 W

Nominaldesign point:

VDD=1.058 V

This is the pointthat is calculated,optimized to, andreported.

These boxes are for

0.675 sigma, for 50%yield.

Power and delay aretreated as independent:25% of yield is lost for

each.The data shows 10%frequency variation persigma.


26/64

26

Impact of variability on performance Atomistic effects are leading to greater device variability. Increasing variability requires larger design margins. Designing for larger margins decreases performance.

0% 50% 100% 150% 200%

Relative Margin

0.7

0.8

0.9

1

1.1

1.2

1.3

R e l a t i v e

P e r f o r m

a n c e

P=0.01 P=1 P=50

Increased variabilityrequires:

Higher supply voltages

Less scaled FETs65nm node, dualprocessor core

S M d l A ti d


27/64

27

Summary: Models, Assumptions andApproximations

Power modelingDynamic switching energy plus static power mechanismsincluding sub-threshold current, gate oxide tunneling, andbody-to-drain band-to-band tunneling.

Device modelingBulk MOSFETs: V T, and depletion depth determined by thehalo doping, 2D effects are taken into account.

Gate length is fully optimized, not set by the technologynode.Circuit modeling

Delay is for FI=FO=2 or 3 NAND gates, based on model

from J.C. Eble's thesis [Ga.Tech. '98].Capacitance includes gate, parasitic, and wire parts (Rent'srule).Wire resistance includes temperature dependence and

surface scattering in small wires.


28/64


29/64

29

3. Optimization Results

General results Evaluating specific possible device

directions Increasing mobility High-k gate dielectric and metal gates 3D stacking Better heat sinks

Sub-ambient cooling Multi-processor tradeoffs


30/64


31/64

31

Optimization results

0.01 0.1 1 10 100

Total Chip Power (W)

0.50.60.70.80.9

11.11.21.31.41.51.6

E q u

i v . O

x y n i

t r i d e

T h i c k n e s s

( n m

)

90 nm 65 nm 45 nm 32 nm

0.01 0.1 1 10 100


0

20

40

60

80

100

120

G a t e

L e n g

t h ( n m

)90 nm 65 nm 45 nm 32 nm

Oxide Thickness vs Power

High-k, for 32nm

Oxynitride

Gate Length vs Power

(High-k case assumes 0.3nmbarrier layer, bandedge metal gate,HfO

2-like insulator characteristics.)

Dual core processor with

aggressive air cooling


32/64

32

Optimization results

Voltages vs Power

0.01 0.1 1 10 100


0.1

0.20.3

0.4

0.5

0.6

0.7

0.8

0.9

S u p p

l y a

n d T h r e s

h o

l d V o

t l a g e

( V )

Vdd,90VT,90

Vdd,65VT,65

Vdd,45VT,45

Vdd,32VT,32

Dual core processorwith aggressive aircooling

Supply voltages are lower for low power applications. High-k lowers V DD ~ 15% at the 45nm generation.

l ll


33/64

33

Optimal Power Allocation Fractions

Active power fraction:70% at low power to40% at high power.

45nm technology with microchannel

heat sink and water cooling.4 core chip.

1 3 10 30 100 300

Chip Power (W)

0%

20%

40%

60%

80%

100%

P o w e r

A l l o c a

t i o n

Oxide pwr, rptrsSubVT pwr, rptrsDyn. pwr, rptrs

Oxide pwr, logicSubVT pwr, logicDyn. pwr, logic

M bili d d


34/64

34

Mobility dependence

Enhanced mobility has greatest benefit at high power.

Even for large mobility enhancements, performance boost ismodest: 10-15%.

32nm technology8 core processor

Air cooled

45nm technologydual core processor water cooled

1 1.5 2 2.5 3Mobility Enhancement Factor

1

1.05

1.1

1.15

1.2

R e l a t

i v e


1 W chip 10 W chip 100W chip

1 1.5 2 2.5 3Mobility Enhancement Factor

1

1.02

1.04

1.06

1.08

1.1

1.12

R e l a t

i v e

P e r f o r m a n c e 1W 10W 100W

Metal-gate workfunction for high-k


35/64

35

eta gate wo u ct o o gand oxynitride

45nm node, dual core processorwith aggressive air cooling

0 0.1 0.2 0.3 0.4 0.5Workfunction offset from bandedge (ev)

0.5

0.60.70.80.9

11.11.21.31.4

1

P e r

f o r m a n c e r e

l a t i v e

t o p o

l y - S

i

5W, oxynitride50W, oxynitride

5W, high-k50W, high-k

3D ki


36/64

36

3D stacking

Multiple layers offer higher performancedue to shorter wires.

RED = 1 Layer , GREEN = 2 Layers

1 10 1000

5

10

15

100200

300400Relative Performance Mean FET Width (nm)

0 1 10 100

Chip Power (W)

Chip Area (cm 2)

1 10 10000.20.40.6

0.81

1.2

T o t

S i a

r e a

F o o

t p r i n t

Chip Power (W)

Mean Wire Length (um)

1 10 10005

10

1520

(4 core, 45nm node, water cooling.)

C li i ti i ti


37/64

37

Cooling scenario optimizations

Forced liquid cooling through microchannel fins may permit very highpower densities.Optimized (maximum) performance increases as the ~log of the power.

Optimized over 7 variables: L g, tox, Nd, , D rptr , , Vdd .

Low temperature case does not include refrigerator power.

8 core processor design

32nm technology

4 core processor design

45nm technology1 10 100 1000 10000


0

1

2

3

4

5

P e r

f o r m a n c e

( a r b u n

i t s )

-40C Liquid18C Water

Hi-Perf. Air Low-Cost Air

1 10 100 1000 10000Total Chip Power (W)

0

1

2

34

5

67

P e r

f o r m a n c e

( a r b u n

i t s )

-40C Liquid18C Water Hi-Perf. Air Low-Cost Air

M lti ti ti


38/64

38

Multiprocessor motivationThe energy / performance tradeoff is very steep at the high end.Lower power, more parallel processors potentially offer morecomputation for the same total power level.

3 0 0

1 0 0

3 0

1 0

3

1

0 . 3 0 . 1 0 . 0 3

0 . 0 1 0 . 0 0 3

0 . 0 0 1

1E+12 1E+13 1E+14 1E+15 1E+16

Total Logic Transistions / sec

0.1

1

10

L o a

d e d S w

i t c h i n g

E n e r g y

( f J )

3x

3x

4 core processors 4-processor chips withmicro-channel watercooling, optimizingeverything.

9 variables: t ox , Lg , N D,, V dd , w HP , S rpt ,

, x halo

Dependence on number of cores


39/64

39

Dependence on number of cores

1 10 100 1000


1

10

P e r

f o r m a n c e

( a r b u n

i t s )

2 cores 4 cores 8 cores 16 cores

Constant total number of transistors, divided equally among n cores:

4 Future Projections (22 11nm)


40/64

40

4. Future Projections (22 11nm)

Device options General results Technology projections Beyond 11nm?

Device Options


41/64

41

Device Options PDSOI

IBMs best understood technology. FinFET

Improved electrostatic control ofchannel offers shorter gates, lowervoltages, higher speed.Workfunction V T control. Notentirely planar.

ETSOI Somewhat improved electrostatic

control compared to PDSOI. More

compatible with conventionalplanar processing. WorkfunctionVT control.

Shallow Bulk

Shallow junctions, raised S/D.

GateGate

Drain

Source

FinFET

Buried OxideSubstrate

Source DrainGateETSOI

Bulk

Source Drain

Gate

The ETSOI and FinFETdevices simulated usingan in-house scalingmodel.

Comparable source/drainresistance and parasiticcapacitance modelswere implemented forETSOI, FinFET, and forshallow bulk MOSFETs.

General optimization results


42/64

42

performance vs power

Conditions: PDSOI, 4 core processor chip, constraining total chip power

Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths,repeater size and spacing.

1.E+07

1.E+08

1.E+09

1.E+10

0.01 0.1 1 10 100 1000

Total Chip Power(W)

F r e q u e n c y

( H z

)

22nm

16nm

11nm

As always, theeasiest way toincreaseperformance isto increase thepower.

Gate Length and Chip Area vs Power


43/64

43

Gate Length and Chip Area vs Power

0

5

10

15

20

25

30

35

40

45

50

0.01 0.1 1 10 100 1000

Total Chip Power(W)

G a

t e L e n g

t h ( n m

)

22nm

16nm

11nm

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.01 0.1 1 10 100 1000

Total Chip Power(W)

C h i p A r e a

( c m

2 )

22nm

16nm

11nm

22nm

11nm

11nm

22nm

Lower power requires longer gatelengths, to reduce variability.

Toxeqv : 0.6 0.82 nm

Device density increases foreach generation. Shrinking areaincreases power density, too.



Voltage and Energy vs Power


44/64

44

Voltage and Energy vs Power

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0.01 0.1 1 10 100 1000

Total Chip Power(W)

S u p p

l y V o

l t a g e

( V )

22nm

16nm

11nm

0

2

4

6

8

10

12

14

16

0.01 0.1 1 10 100 1000

Total Chip Power(W)

E n e r g y p e r

l o g

i c t r a n s i

t i o n ( f

J )

22nm

16nm

11nm22nm11nm

22nm

11nm

Optimal supply voltage can become quite low for low power constraints, leading to verylow energy use per logic transition.



Energy vs performance trade-off


45/64

45

Energy vs performance trade-off

0.1

1

10

1.E+07 1.E+08 1.E+09 1.E+10

Clock Frequency (Hz)

E n e r g y

p e r

l o g

i c t r a n s i t

i o n

( f J )

22nm

16nm

11nm



10x

4x22nm

11nm

Optimal On/Off Ratio


46/64

46

Optimal On/Off Ratio

0.00001

0.0001

0.001

0.01

0.1

0.1 1 10 100

On-current (A/cm)

O f f - c u r r e n

t ( A / c m

)

22 nm

16 nm

11 nm

PDSOI nFETs, currents measured at nominal process and bias conditions

10000:1

1000:1

L o w

b i a s c

o n d i t i o

n s

H i g h b i

a s c o

n d i t i o

n s



Performance at constant power density (PDSOI)


47/64

47

Performance at constant power density (PDSOI)

0

1

2

3

4

5

6

45nm 32nm 22nm 16nm 11nm

Technology Node

F r e q u e n c y

( G H z )

10W/cm2

25W/cm2

50W/cm2

Conditions: PDSOI, 4 core processor chip, constraining total chip power density

Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths, repeatersize and spacing.

Performance increases at 32nm due to hi-k introduction, but then falls as straindiminishes and gate dielectric does not scale further.

00.10.2

0.3

0.40.50.60.7

0.80.9

1


Technology Node

S u p p

l y V o

l t a g e

( V )

10W/cm2

25 W/cm2

50W/cm2

0

1

2

3

4

5

6


Technology Node

C h i p A r e a

( c m

2 )

10 W/cm2

25 W/cm2

50 W/cm2

50 W/cm 2

10 W/cm 2

Performance at constant power density h l


48/64

48

comparing technologies

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5


Technology Node

F r e q u e n c y

( G H z )

Bulk

PDSOI

FinFET

ETSOI

25 W/cm 2

ETSOI and FinFET offer moderate performance advantage over PDSOI for 22nm nodeand beyond. The industry should transition to FinFET at 16nm to avoid performance loss.

Conditions: 4 core processor chip, constraining total chip power at 25 W/cm 2

Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths, repeater sizeand spacing, fin height, sidewall thickness (Fin), Si thickness (ET).

Power density at constant performance


49/64

49

Power density at constant performance

0

10

20

30

40

50

60


Technology node

P o w e r

D e n s i

t y ( W / c m

2 )

PD 1e15

PD 1.5e15

Fin 1e15

Fin 1.5e15

0.1

1

10

45nm 32nm 22nm 16nm 11nmTechnology node

A r e a

( c m

2 ) PD 1e15PD 1.5e15

Fin 1e15Fin 1.5e15

Conditions: PDSOI and FinFET, 4 core processor chip, constraining total chipperformance

Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths, repeatersize and spacing, fin height, sidewall thickness

~2.6 GHz

~3.9 GHz

Required power density drops at32nm due to transition to hi-k, thenrises due to decreasing strain andlack of scaling of gate insulator.

Transition to FinFET at 16nm to

obtain continued improvement.

PDSOI

FinFET

PDSOI

PDSOI

FinFET

FinFET

Performance at constant area and power


50/64

50

0

2

4

6

8

10

12

14

16


Technology node

R e

l a t i v e


PD 25W/cm2

PD 50W/cm2

Fin 25W/cm2

Fin 50W/cm2

p

0

10

20

30

40

50

60

70

80


Technology node

N u m

b e r o

f C o r e s

PD 25W/

PD 50W/

Fin 25W/

Fin 50W/

Area = 4 cm 2, power = 100 W, fixed.

Number of cores is adjusted to maintainconstant area. Chip performance isassumed linear with the number of cores.

Maximizing total chip performance.50 W/cm2

25 W/cm 2

50 W/cm 2 25 W/cm 2

FinFET

PDSOI

Conditions: PDSOI and FinFET, variable # core processor chip, constraining bothchip power and chip area (4 cm 2).

Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths, repeatersize and spacing, fin height, sidewall thickness, and number of cores.

PDSOI

FinFET

cm2

cm2

cm2

cm2

Supply voltage considerations


51/64

51

pp y g

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

0.4 0.5 0.6 0.7 0.8 0.9

Supply Voltage (V)

R e l a

t i v e

P e r

f o r m a n c e

Vary Vdd

Vary actf

Vary Tol

Vary mu

Conditions: PDSOI, 4 core processor chip, constraining total chip power density

Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths, repeatersize and spacing.

25 W/cm 2 VDD can be reduced byhigher activity, highermobility, and tightertolerances.

18%

6%

0.7X

0.7X

1.4X

1.4XOptimizing everythingexcept V DD.

Activity factor

Variability

Mobility

Optimizations vs gate length for FinFETs and ETSOI


52/64

52

p g g

00.10.20.30.40.50.60.7

0 20 40 60

Gate Length (nm)

S u p p

l y V o

l t a g e ,

V d d ( V

ETSOI

FinFET

01234

5678

0 20 40 60

Gate Length (nm)

S i l i c o n

T h i c k n e s s

( n m

)

ETSOI

FinFET

0

10

20

30

40

50

0 20 40 60

Gate Length (nm)

R e l .


ETSOIFinFET

0

0.02

0.04

0.06

0.08

0.1

0.12

0 20 40 60

Gate Length (nm)

D I B L ( V )

ETSOI

FinFET

Everything except gate length is optimized. The gate length is scanned.

25 W/cm 2 power density constraints.

Shorter gatelengths necessitate:

Higher DIBL

Higher V DD

Thinner t Si

Ultimately, lower

performance

11 nm node

DIBL Scaling enables VDD Scaling


53/64

53

1W

0

5

10

15

20

25

30

35

22nm 16nm 11nm

Technology Node

G a t e

L e n g

t h ( n m

)

Fin ET Bulk

0

5

10

15

20

25

30

35

22nm 16nm 11nm

Technology Node

G a

t e L e n g

t h ( n m

)

Fin ET Bulk

00.10.20.30.40.50.60.70.8

0 0.02 0.04 0.06 0.08DIBL (V/V)

S u p p

l y V o

l t a g

e ,

V D D ( V )

FinFET ETSOI Bulk

g V g

0

0.1

0.2

0.3

0.4

0.50.6

0 0.02 0.04 0.06DIBL (V/V)

S u p p l y

V o

l t a g e ,

V D D ( V ) FinFET ETSOI Bulk

25W/cm 2

2211

2211

Novel devices for 11nm and beyond


54/64

54

y

III-V FinFETs Higher mobility improves drive current

Tunnel FETs

Improved subthreshold slope enables low V DD and low energyoperation. To properly model this device, have to be able to calculate the tunneling

barrier shapes and band-edge alignments.

We are in the process of developing a compact model for TFETs for theoptimizer, but results are not yet available. As an interim measure, wecan alter the Boltzmann constant in the conventional FET model, to seethe impact of steeper subthreshold slope.

Carbon Nanotube Transistors Ballistic current flow in the channel should enable very high switchingspeeds for these devices.

A compact model for CNTs suitable for the optimizer is being presented

at IEDM this year, but results are not available yet.


55/64

Summary


56/64

56

y

CMOS scaling is limited by electrostatic, quantum mechanical,discreteness, thermodynamic and practical effects.

Optimization can and should be used to find the best design pointsin the midst of these various constraints. Example: low power needs somewhat less scaled devices.

Technology performance projections based on optimization forPDSOI, FinFETs, ETSOI, and Bulk MOSFETs show:

Density improvements should continue, as long as wiring densitycontinues to improve

Performance improvements are likely to be rather modest, even with aswitch to FinFETs for 16 and/or 11nm nodes.

Exploratory devices (CNTs and/or TFETs) may offer substantialperformance advantages, someday.

Acknowledgements


57/64

57

Wilfried Haensch Leland Chang Paul Solomon Steve Koester Lan Wei

Philip Wong Ghavam Shahidi

Mike Scheuermann Phillip Restle Omer Dokumaci Mary Wisniewski Steve Kosonocky

Yuan Taur Bob Dennard

Extra slides


58/64

58

Generalized heat sink model


59/64

59

Two level heat flow model: Flow in the silicon wafer Flow in the heat sink material

In each layer, the flow can be: 3D (spherical) for spots smaller

than thickness 2D (cylindrical) at distances

larger than the thickness In silicon layer, inhomogeneous

power dissipation is accountedfor, to estimate maximum junctiontemperature at hottest point.

Si wafer Heat sources

Interface

Interface to final coolant(e.g., air or water)

Si thermal sheet resistance of Si wafer

HS thermal sheet resistance of heat sink

R HS thermal contact resistance of heat sink

R Si thermal contact resistance of Si wafer

Heat spreader

(e.g., SiC or Cu)

T e m p e r a

t u r e

R i s e

( K )

Hot spot size (cm)

My model is red.

Kais data is blue.

T e m p e r a

t u r e

R i s e

( K )

Hot spot size (cm)

My model is red.

Kais data is blue.

This model is redFE model is blue

Comparisonof simplified

analyticmodel withdetailednumericalmodel.

III-V FinFETs


60/64

60

III-V FinFETs are modeled by increasing the mobility in the conventional model.

Increased mobility enables an improved energy/performance tradeoff byreducing the voltage needed for high performance designs.

0.1

1

10

1.E+13 1.E+14 1.E+15 1.E+16

Performance (transitions/sec)

E n e r g y /

t r a n s i

t i o n

( f J )

FinFET III-V 2x III-V 4x

1x

(=Si)

2x(~GaAs)

4x(~InGaAs)

Drive current multiplier:

0

0.2

0.4

0.6

0.8

1

1.2

1.4

0 2 4 6 8 10 12 14 16

Frequency (GHz)

S u p p

l y V o l t a g e

( V )

Fin 2x 4x


61/64

TFET Heterostuctures SiGe Heterostuctures offer loweffective bandgap, which improvesli


62/64

62

ON-state

Off-state

tunneling.

III-V materials offer lower effectivemasses and more heterojunctions,to further improve tunneling.

Nanowire Geometry Offers:

Optimum electrostatics with gateall around

New material combinations

Integration onto silicon possible

Scaling to quantum capacitancelimit

Improved I on

III-V H-TFETPlanar SiGe H-TFET

Si

p ++ SiGep-Si

Buried Oxide

n+ Si

polysource drain

gate

SiGe

Si

V g > 0 Ambipolar

devicesupressed

SiGe

Si

V g = 0

x

Ambipolardevicesupressed

High oncurrent

[Koester, Riel, Koswatta]

CNFET: Good or Bad?


63/64

63

Carbon Nanotube Field Effect Transistor (CNFET) 1D devices Better transport characteristics Worse parasitics Leakage, Variations, etc

CNFET: Good or Bad? The judgment highly depends on the application

Our approach is to build a optimizer for a full technology,with proper consideration of device properties and systemneeds.

Device modeling

Circuit performance benchmarking System application consideration

[Lan Wei, Stanford]

Wire Model


64/64

64

0.001 0.01 0.1 1 10

Total Power (W)

0.85

0.90

0.95

1.00

L T R w

i t h S c a

t t / L T R w

/ o S c a

t t .

50 100 150 200 250 300 350 400Junction Temperature (K)

0.5

1

1.5

2

2.5

3

R e l a t

i v e


10W,const R1W, const R0.1W, const R10W1W0.1W

Relative Performance change due to R(T)

Assumed constant 2:1 height to width wiring with equal linesand spaces. ( 0.062 k BEOL fF/um )

Performance loss due to scattering

++=

261lnnm70

),(10/)40(

K 300

T eW

T W

Consequences of wire resistancemodel:

DJF_Scaling and Optimization - IIT

Documents

Transcript of DJF_Scaling and Optimization - IIT