DJF_Scaling and Optimization - IIT

download DJF_Scaling and Optimization - IIT

of 64

Transcript of DJF_Scaling and Optimization - IIT

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    1/64

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    2/64

    2

    Time (yr)

    l o g ( P e r

    f o r m a n c e

    )

    1. Introduction1. Introduction

    The End of Scaling

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    3/64

    3

    Time (yr)

    l o g ( P e r

    f o r m

    a n c e

    )

    The End of Scaling

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    4/644

    Miniaturization l o g

    ( S y s t e m

    P e r f o r m a n c e

    )

    Stop when you get to the top.The End of Scaling is Optimization

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    5/645

    Outline

    1. Review limitations to Scaling2. Optimizing technology within constraints3. Optimization results4. Technology projections

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    6/64

    6

    1. Limitations to Scaling

    1. Electrostatic constraints2. Quantum mechanical leakage currents

    3. Discreteness of matter and energy4. Thermodynamic limitations5. Practical and environmental constraints on power

    Basic idea of Scaling:

    Adjust dimensions,

    voltages, & doping toachieve smaller FETwith same electrostaticbehavior.

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    7/64

    7

    Electrostatic constraints on FET design

    1. A good design should have A. High output resistance

    B. High gainC. Low sensitivity to variationsD. High transconductanceE. High drain currentF. High speed

    2. Must choose a compromise:Short, but not so short that 2Deffects kill A, B, C.

    }}

    Long channelbehavior

    Short channelbehavior

    Gate

    S D

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    8/64

    8

    Quantum Mechanical Tunneling Leakage Currents

    Currents increase exponentially as the barriers become thinner.

    FET 'ON'

    Gnd GndVdd

    FET 'OFF'

    Gnd VddGnd

    Gate insulator tunnelingSubthreshold leakageDirect source-to-drain tunneling

    Drain-to-body tunnelingSource

    DrainChannel

    e -e -

    e -

    Everythingbecomes

    leaky.

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    9/64

    9

    Atomistic effects

    Statistical variation in thenumber of dopants, N, varies

    as N1/2

    , causing increasing V Tuncertainty for small N.

    The number of dopant atomsin the depletion layer of a

    MOSFET has been scalingroughly as Leff 1.5 .

    249,403,263 Si atoms 68,743 donors 13,042 acceptors

    [D. J. Frank, et al., 1999 Symp. VLSI Tech.]

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    10/64

    10

    Thermodynamic limitations

    IsubVT

    = I0 ee(VG-VT)/kT

    The Boltzmann distribution determines the subthreshold slopeand leakage current, V T, and diode leakage currents, too.

    VT can only be scaled by reducing the temperature, which isnot acceptable for many applications.

    Speed is very sensitive to V T/VDD ratio.Irreversible computation => All

    switching energy is converted to heat.

    All leakage currents and IR drops areirreversible => More heat.

    Source

    DrainChannel

    e -

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    11/64

    11

    Practical and Environmental Issues

    Power consumption and heat removal arelimited by practical considerations.

    Low power applications are often batterypowered Many must be lightweight => power < ~few watts. Disposable batteries can cost >> $500/watt over life of

    device. Rechargeables can cost > $50/watt over life of device.

    Home electronics is limited to

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    12/64

    12

    2. Optimizing Technology within Constraints

    Practicality imposes powerconstraints.

    Electrostatics imposesgeometric constraints

    Thermodynamics imposesvoltage constraints.

    Quantum mechanics imposes

    miniaturization constraints due totunneling.

    Fixed architectural complexity+ Fixed power constraints+ Device physics= Existence of an optimal tech-nology with maximal performance.

    Declining availabledynamic power

    overwhelms speedimprovements of

    scaling

    leakage increases dueto tunneling effects

    leakage power

    dynamic power

    Large Small

    l o g ( P er f or m a

    n c e )

    Large Small

    P ow er

    Miniaturization Miniaturization

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    13/64

    13

    Background: Schematic organization of anoptimization program

    Area Model

    Wire Capacitance

    Device Structure

    IV Model Leakage Model

    Delay

    Total Power Adjust for Latencyof Long Paths

    Constrainedoptimizer

    newvalues:

    improvedguess

    Fixedparameters

    Variables:initial guess

    tolerance adjustments tolerance adjustments

    Wiringstatistics

    Leakage Power

    ThermalModel

    Goal: optimize devicetechnology tomaximize chip-level

    performance, subjectto power constraints.

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    14/64

    14

    Models and ApproximationsSystem Assumptions

    Processor chip is assumed to have a fixed number of cores, each witha specified number of logic gates.

    Only the logic within the cores is considered within the optimizations.

    The clock and memory aspects of the chip are assumed to scale inthe same way as the logic (delay, power, and area).

    Core-to-core and core-to-memory communication is not dealt with.

    LogicMemory

    C l o c k

    Fudge Treat indetail

    Fudge

    Repeaters

    Treat these by simple scaling from the logic part.

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    15/64

    15

    How much area do the processor cores take?

    100% to 25%, generally decreasing with generation:

    Dothan, 140M FETs

    2 cores, 1.72B FETs

    Prescott. 125M FETs

    Power4, 174M

    FETs

    Alpha 21264 ('96)15M FETs, L1 cache only

    100%

    40%

    40%

    70%

    ~25%

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    16/64

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    17/64

    17

    Optimization Approaches

    1. Engineering approach:Maximize system performance, at fixed power.

    Use total logic transition rate (LTR),LTR = N gates x activity factor/logic depth x 1/DelayRelatively little dependence on architectural details.

    2. Business approach:Maximize Return on Investment (ROI).

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    18/64

    18

    FET ModelUsing a general temperature-dependent short-channel FET model inwhich VT, tD, and t ox are coupled, halo doping effects are included, andVT is set by the doping.

    Modified alpha power model:

    =

    ekT V V

    E E

    L E FI ekT

    ekT

    t W

    V I T GS C

    s

    CH C eff ox

    I GS D /

    F)(/

    )(0

    0

    10W FETLg=28nm

    1mW FETLg=45nm

    Fermi-Diracintegral of order

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    19/64

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    20/64

    20

    Power Calculation

    B BOX subVT DYN TOT P P P P P 2+++== DD L H CKT DYN V V V C N P D )(2

    1l

    ),,,,(7.1 , LW

    Gox DDT off DDCKT subVT Lt V V I V N P =

    ),,()( , = ox DDT ox DD LW oxcoreox t V V J V D A P

    ),()( 231

    2 DD Max B B DD LW

    oxcore B B V F J V D A P =

    = 1CKT N LTR

    Dl

    Note thatcross-throughpower is not

    included.

    The powers are computed separately for logic and for repeaters. = mean delay for a single loaded logic gate

    Dl

    is activity factor divided by logic depth. Usually ~0.012 inrecent optimizations.

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    21/64

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    22/64

    22

    Repeater Model

    0.01 0.1 1

    Latency Penalty Factor

    50

    60

    70

    80

    90

    100

    R e p e a

    t e r S p a c i n g

    ( u m

    )

    0.2

    0.3

    0.4

    0.50.6

    0.7

    0.8

    0.9

    1

    R e p e a

    t e r W i d t h ( u m

    )Repeater spacingRepeater width

    Long wires receive repeaters with a spacing that isoptimized.Long wire delay can be absorbed into pipeline depth, butthe latency causes inefficiency, so we use a latencypenalty factor:

    P econ =10 W/cm 2

    0.01 0.1 1 10 100

    CPU Core Power (W)

    0.001

    0.01

    0.1

    1

    R e p e a

    t e r S p a c i n g

    ( c m

    )

    9S10S

    11S 12S 13S

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    23/64

    23

    Local Variation Modeling Variation sources:

    Signal Coupling noise Supply noise

    Statistical doping variations LER gate length variations

    Consequences modeled: Increased static power

    combine 1 sigma of doping, length, and noise Critical path delay distribution

    yield-based, using estimated critical pathdistribution,

    and 1 sigma of doping and length, and worst casenoise.

    Single stage functionality use worst case (~6 sigma) of doping and length,

    no noise.

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    24/64

    24

    Accounting for variations A complete most-probable worst-case-vector methodology is

    used to handle both local and global variability.

    123

    Two variable example:

    MPWCvector

    Murphyvector

    Worst-case vectors:

    Blue curves are contours ofconstant probability.

    Red curves are contours offunction to be minimized.

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    25/64

    25

    Power vs Frequency Variation Windows

    C h i p

    P o w e r

    ( W )

    Clock Frequency (GHz)

    Optimizations for high-perf processor, 45 nm node technology, PDSOIPower-constrained optimization parameters: V DD, VTn, VTp, LG, repeater size and spacing.

    Area constrained is also constrained, to 5.6 cm 2, by adjusting the widths.

    VDD-10%

    VDD+5%

    230 W

    Nominaldesign point:

    VDD=1.058 V

    This is the pointthat is calculated,optimized to, andreported.

    These boxes are for

    0.675 sigma, for 50%yield.

    Power and delay aretreated as independent:25% of yield is lost for

    each.The data shows 10%frequency variation persigma.

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    26/64

    26

    Impact of variability on performance Atomistic effects are leading to greater device variability. Increasing variability requires larger design margins. Designing for larger margins decreases performance.

    0% 50% 100% 150% 200%

    Relative Margin

    0.7

    0.8

    0.9

    1

    1.1

    1.2

    1.3

    R e l a t i v e

    P e r f o r m

    a n c e

    P=0.01 P=1 P=50

    Increased variabilityrequires:

    Higher supply voltages

    Less scaled FETs65nm node, dualprocessor core

    S M d l A ti d

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    27/64

    27

    Summary: Models, Assumptions andApproximations

    Power modelingDynamic switching energy plus static power mechanismsincluding sub-threshold current, gate oxide tunneling, andbody-to-drain band-to-band tunneling.

    Device modelingBulk MOSFETs: V T, and depletion depth determined by thehalo doping, 2D effects are taken into account.

    Gate length is fully optimized, not set by the technologynode.Circuit modeling

    Delay is for FI=FO=2 or 3 NAND gates, based on model

    from J.C. Eble's thesis [Ga.Tech. '98].Capacitance includes gate, parasitic, and wire parts (Rent'srule).Wire resistance includes temperature dependence and

    surface scattering in small wires.

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    28/64

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    29/64

    29

    3. Optimization Results

    General results Evaluating specific possible device

    directions Increasing mobility High-k gate dielectric and metal gates 3D stacking Better heat sinks

    Sub-ambient cooling Multi-processor tradeoffs

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    30/64

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    31/64

    31

    Optimization results

    0.01 0.1 1 10 100

    Total Chip Power (W)

    0.50.60.70.80.9

    11.11.21.31.41.51.6

    E q u

    i v . O

    x y n i

    t r i d e

    T h i c k n e s s

    ( n m

    )

    90 nm 65 nm 45 nm 32 nm

    0.01 0.1 1 10 100

    Total Chip Power (W)

    0

    20

    40

    60

    80

    100

    120

    G a t e

    L e n g

    t h ( n m

    )90 nm 65 nm 45 nm 32 nm

    Oxide Thickness vs Power

    High-k, for 32nm

    Oxynitride

    Gate Length vs Power

    (High-k case assumes 0.3nmbarrier layer, bandedge metal gate,HfO

    2-like insulator characteristics.)

    Dual core processor with

    aggressive air cooling

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    32/64

    32

    Optimization results

    Voltages vs Power

    0.01 0.1 1 10 100

    Total Chip Power (W)

    0.1

    0.20.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    S u p p

    l y a

    n d T h r e s

    h o

    l d V o

    t l a g e

    ( V )

    Vdd,90VT,90

    Vdd,65VT,65

    Vdd,45VT,45

    Vdd,32VT,32

    Dual core processorwith aggressive aircooling

    Supply voltages are lower for low power applications. High-k lowers V DD ~ 15% at the 45nm generation.

    l ll

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    33/64

    33

    Optimal Power Allocation Fractions

    Active power fraction:70% at low power to40% at high power.

    45nm technology with microchannel

    heat sink and water cooling.4 core chip.

    1 3 10 30 100 300

    Chip Power (W)

    0%

    20%

    40%

    60%

    80%

    100%

    P o w e r

    A l l o c a

    t i o n

    Oxide pwr, rptrsSubVT pwr, rptrsDyn. pwr, rptrs

    Oxide pwr, logicSubVT pwr, logicDyn. pwr, logic

    M bili d d

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    34/64

    34

    Mobility dependence

    Enhanced mobility has greatest benefit at high power.

    Even for large mobility enhancements, performance boost ismodest: 10-15%.

    32nm technology8 core processor

    Air cooled

    45nm technologydual core processor water cooled

    1 1.5 2 2.5 3Mobility Enhancement Factor

    1

    1.05

    1.1

    1.15

    1.2

    R e l a t

    i v e

    P e r f o r m a n c e

    1 W chip 10 W chip 100W chip

    1 1.5 2 2.5 3Mobility Enhancement Factor

    1

    1.02

    1.04

    1.06

    1.08

    1.1

    1.12

    R e l a t

    i v e

    P e r f o r m a n c e 1W 10W 100W

    Metal-gate workfunction for high-k

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    35/64

    35

    eta gate wo u ct o o gand oxynitride

    45nm node, dual core processorwith aggressive air cooling

    0 0.1 0.2 0.3 0.4 0.5Workfunction offset from bandedge (ev)

    0.5

    0.60.70.80.9

    11.11.21.31.4

    1

    P e r

    f o r m a n c e r e

    l a t i v e

    t o p o

    l y - S

    i

    5W, oxynitride50W, oxynitride

    5W, high-k50W, high-k

    3D ki

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    36/64

    36

    3D stacking

    Multiple layers offer higher performancedue to shorter wires.

    RED = 1 Layer , GREEN = 2 Layers

    1 10 1000

    5

    10

    15

    100200

    300400Relative Performance Mean FET Width (nm)

    0 1 10 100

    Chip Power (W)

    Chip Area (cm 2)

    1 10 10000.20.40.6

    0.81

    1.2

    T o t

    S i a

    r e a

    F o o

    t p r i n t

    Chip Power (W)

    Mean Wire Length (um)

    1 10 10005

    10

    1520

    (4 core, 45nm node, water cooling.)

    C li i ti i ti

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    37/64

    37

    Cooling scenario optimizations

    Forced liquid cooling through microchannel fins may permit very highpower densities.Optimized (maximum) performance increases as the ~log of the power.

    Optimized over 7 variables: L g, tox, Nd, , D rptr , , Vdd .

    Low temperature case does not include refrigerator power.

    8 core processor design

    32nm technology

    4 core processor design

    45nm technology1 10 100 1000 10000

    Total Chip Power (W)

    0

    1

    2

    3

    4

    5

    P e r

    f o r m a n c e

    ( a r b u n

    i t s )

    -40C Liquid18C Water

    Hi-Perf. Air Low-Cost Air

    1 10 100 1000 10000Total Chip Power (W)

    0

    1

    2

    34

    5

    67

    P e r

    f o r m a n c e

    ( a r b u n

    i t s )

    -40C Liquid18C Water Hi-Perf. Air Low-Cost Air

    M lti ti ti

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    38/64

    38

    Multiprocessor motivationThe energy / performance tradeoff is very steep at the high end.Lower power, more parallel processors potentially offer morecomputation for the same total power level.

    3 0 0

    1 0 0

    3 0

    1 0

    3

    1

    0 . 3 0 . 1 0 . 0 3

    0 . 0 1 0 . 0 0 3

    0 . 0 0 1

    1E+12 1E+13 1E+14 1E+15 1E+16

    Total Logic Transistions / sec

    0.1

    1

    10

    L o a

    d e d S w

    i t c h i n g

    E n e r g y

    ( f J )

    3x

    3x

    4 core processors 4-processor chips withmicro-channel watercooling, optimizingeverything.

    9 variables: t ox , Lg , N D,, V dd , w HP , S rpt ,

    , x halo

    Dependence on number of cores

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    39/64

    39

    Dependence on number of cores

    1 10 100 1000

    Total Chip Power (W)

    1

    10

    P e r

    f o r m a n c e

    ( a r b u n

    i t s )

    2 cores 4 cores 8 cores 16 cores

    Constant total number of transistors, divided equally among n cores:

    4 Future Projections (22 11nm)

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    40/64

    40

    4. Future Projections (22 11nm)

    Device options General results Technology projections Beyond 11nm?

    Device Options

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    41/64

    41

    Device Options PDSOI

    IBMs best understood technology. FinFET

    Improved electrostatic control ofchannel offers shorter gates, lowervoltages, higher speed.Workfunction V T control. Notentirely planar.

    ETSOI Somewhat improved electrostatic

    control compared to PDSOI. More

    compatible with conventionalplanar processing. WorkfunctionVT control.

    Shallow Bulk

    Shallow junctions, raised S/D.

    GateGate

    Drain

    Source

    FinFET

    Buried OxideSubstrate

    Source DrainGateETSOI

    Bulk

    Source Drain

    Gate

    The ETSOI and FinFETdevices simulated usingan in-house scalingmodel.

    Comparable source/drainresistance and parasiticcapacitance modelswere implemented forETSOI, FinFET, and forshallow bulk MOSFETs.

    General optimization results

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    42/64

    42

    performance vs power

    Conditions: PDSOI, 4 core processor chip, constraining total chip power

    Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths,repeater size and spacing.

    1.E+07

    1.E+08

    1.E+09

    1.E+10

    0.01 0.1 1 10 100 1000

    Total Chip Power(W)

    F r e q u e n c y

    ( H z

    )

    22nm

    16nm

    11nm

    As always, theeasiest way toincreaseperformance isto increase thepower.

    Gate Length and Chip Area vs Power

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    43/64

    43

    Gate Length and Chip Area vs Power

    0

    5

    10

    15

    20

    25

    30

    35

    40

    45

    50

    0.01 0.1 1 10 100 1000

    Total Chip Power(W)

    G a

    t e L e n g

    t h ( n m

    )

    22nm

    16nm

    11nm

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.01 0.1 1 10 100 1000

    Total Chip Power(W)

    C h i p A r e a

    ( c m

    2 )

    22nm

    16nm

    11nm

    22nm

    11nm

    11nm

    22nm

    Lower power requires longer gatelengths, to reduce variability.

    Toxeqv : 0.6 0.82 nm

    Device density increases foreach generation. Shrinking areaincreases power density, too.

    Conditions: PDSOI, 4 core processor chip, constraining total chip power

    Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths,repeater size and spacing.

    Voltage and Energy vs Power

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    44/64

    44

    Voltage and Energy vs Power

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    0.01 0.1 1 10 100 1000

    Total Chip Power(W)

    S u p p

    l y V o

    l t a g e

    ( V )

    22nm

    16nm

    11nm

    0

    2

    4

    6

    8

    10

    12

    14

    16

    0.01 0.1 1 10 100 1000

    Total Chip Power(W)

    E n e r g y p e r

    l o g

    i c t r a n s i

    t i o n ( f

    J )

    22nm

    16nm

    11nm22nm11nm

    22nm

    11nm

    Optimal supply voltage can become quite low for low power constraints, leading to verylow energy use per logic transition.

    Conditions: PDSOI, 4 core processor chip, constraining total chip power

    Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths,repeater size and spacing.

    Energy vs performance trade-off

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    45/64

    45

    Energy vs performance trade-off

    0.1

    1

    10

    1.E+07 1.E+08 1.E+09 1.E+10

    Clock Frequency (Hz)

    E n e r g y

    p e r

    l o g

    i c t r a n s i t

    i o n

    ( f J )

    22nm

    16nm

    11nm

    Conditions: PDSOI, 4 core processor chip, constraining total chip power

    Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths,repeater size and spacing.

    10x

    4x22nm

    11nm

    Optimal On/Off Ratio

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    46/64

    46

    Optimal On/Off Ratio

    0.00001

    0.0001

    0.001

    0.01

    0.1

    0.1 1 10 100

    On-current (A/cm)

    O f f - c u r r e n

    t ( A / c m

    )

    22 nm

    16 nm

    11 nm

    PDSOI nFETs, currents measured at nominal process and bias conditions

    10000:1

    1000:1

    L o w

    b i a s c

    o n d i t i o

    n s

    H i g h b i

    a s c o

    n d i t i o

    n s

    Conditions: PDSOI, 4 core processor chip, constraining total chip power

    Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths,repeater size and spacing.

    Performance at constant power density (PDSOI)

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    47/64

    47

    Performance at constant power density (PDSOI)

    0

    1

    2

    3

    4

    5

    6

    45nm 32nm 22nm 16nm 11nm

    Technology Node

    F r e q u e n c y

    ( G H z )

    10W/cm2

    25W/cm2

    50W/cm2

    Conditions: PDSOI, 4 core processor chip, constraining total chip power density

    Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths, repeatersize and spacing.

    Performance increases at 32nm due to hi-k introduction, but then falls as straindiminishes and gate dielectric does not scale further.

    00.10.2

    0.3

    0.40.50.60.7

    0.80.9

    1

    45nm 32nm 22nm 16nm 11nm

    Technology Node

    S u p p

    l y V o

    l t a g e

    ( V )

    10W/cm2

    25 W/cm2

    50W/cm2

    0

    1

    2

    3

    4

    5

    6

    45nm 32nm 22nm 16nm 11nm

    Technology Node

    C h i p A r e a

    ( c m

    2 )

    10 W/cm2

    25 W/cm2

    50 W/cm2

    50 W/cm 2

    10 W/cm 2

    Performance at constant power density h l

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    48/64

    48

    comparing technologies

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    45nm 32nm 22nm 16nm 11nm

    Technology Node

    F r e q u e n c y

    ( G H z )

    Bulk

    PDSOI

    FinFET

    ETSOI

    25 W/cm 2

    ETSOI and FinFET offer moderate performance advantage over PDSOI for 22nm nodeand beyond. The industry should transition to FinFET at 16nm to avoid performance loss.

    Conditions: 4 core processor chip, constraining total chip power at 25 W/cm 2

    Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths, repeater sizeand spacing, fin height, sidewall thickness (Fin), Si thickness (ET).

    Power density at constant performance

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    49/64

    49

    Power density at constant performance

    0

    10

    20

    30

    40

    50

    60

    45nm 32nm 22nm 16nm 11nm

    Technology node

    P o w e r

    D e n s i

    t y ( W / c m

    2 )

    PD 1e15

    PD 1.5e15

    Fin 1e15

    Fin 1.5e15

    0.1

    1

    10

    45nm 32nm 22nm 16nm 11nmTechnology node

    A r e a

    ( c m

    2 ) PD 1e15PD 1.5e15

    Fin 1e15Fin 1.5e15

    Conditions: PDSOI and FinFET, 4 core processor chip, constraining total chipperformance

    Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths, repeatersize and spacing, fin height, sidewall thickness

    ~2.6 GHz

    ~3.9 GHz

    Required power density drops at32nm due to transition to hi-k, thenrises due to decreasing strain andlack of scaling of gate insulator.

    Transition to FinFET at 16nm to

    obtain continued improvement.

    PDSOI

    FinFET

    PDSOI

    PDSOI

    FinFET

    FinFET

    Performance at constant area and power

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    50/64

    50

    0

    2

    4

    6

    8

    10

    12

    14

    16

    45nm 32nm 22nm 16nm 11nm

    Technology node

    R e

    l a t i v e

    P e r f o r m a n c e

    PD 25W/cm2

    PD 50W/cm2

    Fin 25W/cm2

    Fin 50W/cm2

    p

    0

    10

    20

    30

    40

    50

    60

    70

    80

    45nm 32nm 22nm 16nm 11nm

    Technology node

    N u m

    b e r o

    f C o r e s

    PD 25W/

    PD 50W/

    Fin 25W/

    Fin 50W/

    Area = 4 cm 2, power = 100 W, fixed.

    Number of cores is adjusted to maintainconstant area. Chip performance isassumed linear with the number of cores.

    Maximizing total chip performance.50 W/cm2

    25 W/cm 2

    50 W/cm 2 25 W/cm 2

    FinFET

    PDSOI

    Conditions: PDSOI and FinFET, variable # core processor chip, constraining bothchip power and chip area (4 cm 2).

    Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths, repeatersize and spacing, fin height, sidewall thickness, and number of cores.

    PDSOI

    FinFET

    cm2

    cm2

    cm2

    cm2

    Supply voltage considerations

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    51/64

    51

    pp y g

    0.8

    0.9

    1

    1.1

    1.2

    1.3

    1.4

    1.5

    1.6

    1.7

    1.8

    0.4 0.5 0.6 0.7 0.8 0.9

    Supply Voltage (V)

    R e l a

    t i v e

    P e r

    f o r m a n c e

    Vary Vdd

    Vary actf

    Vary Tol

    Vary mu

    Conditions: PDSOI, 4 core processor chip, constraining total chip power density

    Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths, repeatersize and spacing.

    25 W/cm 2 VDD can be reduced byhigher activity, highermobility, and tightertolerances.

    18%

    6%

    0.7X

    0.7X

    1.4X

    1.4XOptimizing everythingexcept V DD.

    Activity factor

    Variability

    Mobility

    Optimizations vs gate length for FinFETs and ETSOI

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    52/64

    52

    p g g

    00.10.20.30.40.50.60.7

    0 20 40 60

    Gate Length (nm)

    S u p p

    l y V o

    l t a g e ,

    V d d ( V

    ETSOI

    FinFET

    01234

    5678

    0 20 40 60

    Gate Length (nm)

    S i l i c o n

    T h i c k n e s s

    ( n m

    )

    ETSOI

    FinFET

    0

    10

    20

    30

    40

    50

    0 20 40 60

    Gate Length (nm)

    R e l .

    P e r f o r m a n c e

    ETSOIFinFET

    0

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0 20 40 60

    Gate Length (nm)

    D I B L ( V )

    ETSOI

    FinFET

    Everything except gate length is optimized. The gate length is scanned.

    25 W/cm 2 power density constraints.

    Shorter gatelengths necessitate:

    Higher DIBL

    Higher V DD

    Thinner t Si

    Ultimately, lower

    performance

    11 nm node

    DIBL Scaling enables VDD Scaling

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    53/64

    53

    1W

    0

    5

    10

    15

    20

    25

    30

    35

    22nm 16nm 11nm

    Technology Node

    G a t e

    L e n g

    t h ( n m

    )

    Fin ET Bulk

    0

    5

    10

    15

    20

    25

    30

    35

    22nm 16nm 11nm

    Technology Node

    G a

    t e L e n g

    t h ( n m

    )

    Fin ET Bulk

    00.10.20.30.40.50.60.70.8

    0 0.02 0.04 0.06 0.08DIBL (V/V)

    S u p p

    l y V o

    l t a g

    e ,

    V D D ( V )

    FinFET ETSOI Bulk

    g V g

    0

    0.1

    0.2

    0.3

    0.4

    0.50.6

    0 0.02 0.04 0.06DIBL (V/V)

    S u p p l y

    V o

    l t a g e ,

    V D D ( V ) FinFET ETSOI Bulk

    25W/cm 2

    2211

    2211

    Novel devices for 11nm and beyond

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    54/64

    54

    y

    III-V FinFETs Higher mobility improves drive current

    Tunnel FETs

    Improved subthreshold slope enables low V DD and low energyoperation. To properly model this device, have to be able to calculate the tunneling

    barrier shapes and band-edge alignments.

    We are in the process of developing a compact model for TFETs for theoptimizer, but results are not yet available. As an interim measure, wecan alter the Boltzmann constant in the conventional FET model, to seethe impact of steeper subthreshold slope.

    Carbon Nanotube Transistors Ballistic current flow in the channel should enable very high switchingspeeds for these devices.

    A compact model for CNTs suitable for the optimizer is being presented

    at IEDM this year, but results are not available yet.

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    55/64

    Summary

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    56/64

    56

    y

    CMOS scaling is limited by electrostatic, quantum mechanical,discreteness, thermodynamic and practical effects.

    Optimization can and should be used to find the best design pointsin the midst of these various constraints. Example: low power needs somewhat less scaled devices.

    Technology performance projections based on optimization forPDSOI, FinFETs, ETSOI, and Bulk MOSFETs show:

    Density improvements should continue, as long as wiring densitycontinues to improve

    Performance improvements are likely to be rather modest, even with aswitch to FinFETs for 16 and/or 11nm nodes.

    Exploratory devices (CNTs and/or TFETs) may offer substantialperformance advantages, someday.

    Acknowledgements

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    57/64

    57

    Wilfried Haensch Leland Chang Paul Solomon Steve Koester Lan Wei

    Philip Wong Ghavam Shahidi

    Mike Scheuermann Phillip Restle Omer Dokumaci Mary Wisniewski Steve Kosonocky

    Yuan Taur Bob Dennard

    Extra slides

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    58/64

    58

    Generalized heat sink model

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    59/64

    59

    Two level heat flow model: Flow in the silicon wafer Flow in the heat sink material

    In each layer, the flow can be: 3D (spherical) for spots smaller

    than thickness 2D (cylindrical) at distances

    larger than the thickness In silicon layer, inhomogeneous

    power dissipation is accountedfor, to estimate maximum junctiontemperature at hottest point.

    Si wafer Heat sources

    Interface

    Interface to final coolant(e.g., air or water)

    Si thermal sheet resistance of Si wafer

    HS thermal sheet resistance of heat sink

    R HS thermal contact resistance of heat sink

    R Si thermal contact resistance of Si wafer

    Heat spreader

    (e.g., SiC or Cu)

    T e m p e r a

    t u r e

    R i s e

    ( K )

    Hot spot size (cm)

    My model is red.

    Kais data is blue.

    T e m p e r a

    t u r e

    R i s e

    ( K )

    Hot spot size (cm)

    My model is red.

    Kais data is blue.

    This model is redFE model is blue

    Comparisonof simplified

    analyticmodel withdetailednumericalmodel.

    III-V FinFETs

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    60/64

    60

    III-V FinFETs are modeled by increasing the mobility in the conventional model.

    Increased mobility enables an improved energy/performance tradeoff byreducing the voltage needed for high performance designs.

    0.1

    1

    10

    1.E+13 1.E+14 1.E+15 1.E+16

    Performance (transitions/sec)

    E n e r g y /

    t r a n s i

    t i o n

    ( f J )

    FinFET III-V 2x III-V 4x

    1x

    (=Si)

    2x(~GaAs)

    4x(~InGaAs)

    Drive current multiplier:

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    0 2 4 6 8 10 12 14 16

    Frequency (GHz)

    S u p p

    l y V o l t a g e

    ( V )

    Fin 2x 4x

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    61/64

    TFET Heterostuctures SiGe Heterostuctures offer loweffective bandgap, which improvesli

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    62/64

    62

    ON-state

    Off-state

    tunneling.

    III-V materials offer lower effectivemasses and more heterojunctions,to further improve tunneling.

    Nanowire Geometry Offers:

    Optimum electrostatics with gateall around

    New material combinations

    Integration onto silicon possible

    Scaling to quantum capacitancelimit

    Improved I on

    III-V H-TFETPlanar SiGe H-TFET

    Si

    p ++ SiGep-Si

    Buried Oxide

    n+ Si

    polysource drain

    gate

    SiGe

    Si

    V g > 0 Ambipolar

    devicesupressed

    SiGe

    Si

    V g = 0

    x

    Ambipolardevicesupressed

    High oncurrent

    [Koester, Riel, Koswatta]

    CNFET: Good or Bad?

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    63/64

    63

    Carbon Nanotube Field Effect Transistor (CNFET) 1D devices Better transport characteristics Worse parasitics Leakage, Variations, etc

    CNFET: Good or Bad? The judgment highly depends on the application

    Our approach is to build a optimizer for a full technology,with proper consideration of device properties and systemneeds.

    Device modeling

    Circuit performance benchmarking System application consideration

    [Lan Wei, Stanford]

    Wire Model

  • 8/10/2019 DJF_Scaling and Optimization - IIT

    64/64

    64

    0.001 0.01 0.1 1 10

    Total Power (W)

    0.85

    0.90

    0.95

    1.00

    L T R w

    i t h S c a

    t t / L T R w

    / o S c a

    t t .

    50 100 150 200 250 300 350 400Junction Temperature (K)

    0.5

    1

    1.5

    2

    2.5

    3

    R e l a t

    i v e

    P e r f o r m a n c e

    10W,const R1W, const R0.1W, const R10W1W0.1W

    Relative Performance change due to R(T)

    Assumed constant 2:1 height to width wiring with equal linesand spaces. ( 0.062 k BEOL fF/um )

    Performance loss due to scattering

    ++=

    261lnnm70

    ),(10/)40(

    K 300

    T eW

    T W

    Consequences of wire resistancemodel: