DJF_Scaling and Optimization - IIT
-
Upload
anjireddy-thatiparthy -
Category
Documents
-
view
215 -
download
0
Transcript of DJF_Scaling and Optimization - IIT
-
8/10/2019 DJF_Scaling and Optimization - IIT
1/64
-
8/10/2019 DJF_Scaling and Optimization - IIT
2/64
2
Time (yr)
l o g ( P e r
f o r m a n c e
)
1. Introduction1. Introduction
The End of Scaling
-
8/10/2019 DJF_Scaling and Optimization - IIT
3/64
3
Time (yr)
l o g ( P e r
f o r m
a n c e
)
The End of Scaling
-
8/10/2019 DJF_Scaling and Optimization - IIT
4/644
Miniaturization l o g
( S y s t e m
P e r f o r m a n c e
)
Stop when you get to the top.The End of Scaling is Optimization
-
8/10/2019 DJF_Scaling and Optimization - IIT
5/645
Outline
1. Review limitations to Scaling2. Optimizing technology within constraints3. Optimization results4. Technology projections
-
8/10/2019 DJF_Scaling and Optimization - IIT
6/64
6
1. Limitations to Scaling
1. Electrostatic constraints2. Quantum mechanical leakage currents
3. Discreteness of matter and energy4. Thermodynamic limitations5. Practical and environmental constraints on power
Basic idea of Scaling:
Adjust dimensions,
voltages, & doping toachieve smaller FETwith same electrostaticbehavior.
-
8/10/2019 DJF_Scaling and Optimization - IIT
7/64
7
Electrostatic constraints on FET design
1. A good design should have A. High output resistance
B. High gainC. Low sensitivity to variationsD. High transconductanceE. High drain currentF. High speed
2. Must choose a compromise:Short, but not so short that 2Deffects kill A, B, C.
}}
Long channelbehavior
Short channelbehavior
Gate
S D
-
8/10/2019 DJF_Scaling and Optimization - IIT
8/64
8
Quantum Mechanical Tunneling Leakage Currents
Currents increase exponentially as the barriers become thinner.
FET 'ON'
Gnd GndVdd
FET 'OFF'
Gnd VddGnd
Gate insulator tunnelingSubthreshold leakageDirect source-to-drain tunneling
Drain-to-body tunnelingSource
DrainChannel
e -e -
e -
Everythingbecomes
leaky.
-
8/10/2019 DJF_Scaling and Optimization - IIT
9/64
9
Atomistic effects
Statistical variation in thenumber of dopants, N, varies
as N1/2
, causing increasing V Tuncertainty for small N.
The number of dopant atomsin the depletion layer of a
MOSFET has been scalingroughly as Leff 1.5 .
249,403,263 Si atoms 68,743 donors 13,042 acceptors
[D. J. Frank, et al., 1999 Symp. VLSI Tech.]
-
8/10/2019 DJF_Scaling and Optimization - IIT
10/64
10
Thermodynamic limitations
IsubVT
= I0 ee(VG-VT)/kT
The Boltzmann distribution determines the subthreshold slopeand leakage current, V T, and diode leakage currents, too.
VT can only be scaled by reducing the temperature, which isnot acceptable for many applications.
Speed is very sensitive to V T/VDD ratio.Irreversible computation => All
switching energy is converted to heat.
All leakage currents and IR drops areirreversible => More heat.
Source
DrainChannel
e -
-
8/10/2019 DJF_Scaling and Optimization - IIT
11/64
11
Practical and Environmental Issues
Power consumption and heat removal arelimited by practical considerations.
Low power applications are often batterypowered Many must be lightweight => power < ~few watts. Disposable batteries can cost >> $500/watt over life of
device. Rechargeables can cost > $50/watt over life of device.
Home electronics is limited to
-
8/10/2019 DJF_Scaling and Optimization - IIT
12/64
12
2. Optimizing Technology within Constraints
Practicality imposes powerconstraints.
Electrostatics imposesgeometric constraints
Thermodynamics imposesvoltage constraints.
Quantum mechanics imposes
miniaturization constraints due totunneling.
Fixed architectural complexity+ Fixed power constraints+ Device physics= Existence of an optimal tech-nology with maximal performance.
Declining availabledynamic power
overwhelms speedimprovements of
scaling
leakage increases dueto tunneling effects
leakage power
dynamic power
Large Small
l o g ( P er f or m a
n c e )
Large Small
P ow er
Miniaturization Miniaturization
-
8/10/2019 DJF_Scaling and Optimization - IIT
13/64
13
Background: Schematic organization of anoptimization program
Area Model
Wire Capacitance
Device Structure
IV Model Leakage Model
Delay
Total Power Adjust for Latencyof Long Paths
Constrainedoptimizer
newvalues:
improvedguess
Fixedparameters
Variables:initial guess
tolerance adjustments tolerance adjustments
Wiringstatistics
Leakage Power
ThermalModel
Goal: optimize devicetechnology tomaximize chip-level
performance, subjectto power constraints.
-
8/10/2019 DJF_Scaling and Optimization - IIT
14/64
14
Models and ApproximationsSystem Assumptions
Processor chip is assumed to have a fixed number of cores, each witha specified number of logic gates.
Only the logic within the cores is considered within the optimizations.
The clock and memory aspects of the chip are assumed to scale inthe same way as the logic (delay, power, and area).
Core-to-core and core-to-memory communication is not dealt with.
LogicMemory
C l o c k
Fudge Treat indetail
Fudge
Repeaters
Treat these by simple scaling from the logic part.
-
8/10/2019 DJF_Scaling and Optimization - IIT
15/64
15
How much area do the processor cores take?
100% to 25%, generally decreasing with generation:
Dothan, 140M FETs
2 cores, 1.72B FETs
Prescott. 125M FETs
Power4, 174M
FETs
Alpha 21264 ('96)15M FETs, L1 cache only
100%
40%
40%
70%
~25%
-
8/10/2019 DJF_Scaling and Optimization - IIT
16/64
-
8/10/2019 DJF_Scaling and Optimization - IIT
17/64
17
Optimization Approaches
1. Engineering approach:Maximize system performance, at fixed power.
Use total logic transition rate (LTR),LTR = N gates x activity factor/logic depth x 1/DelayRelatively little dependence on architectural details.
2. Business approach:Maximize Return on Investment (ROI).
-
8/10/2019 DJF_Scaling and Optimization - IIT
18/64
18
FET ModelUsing a general temperature-dependent short-channel FET model inwhich VT, tD, and t ox are coupled, halo doping effects are included, andVT is set by the doping.
Modified alpha power model:
=
ekT V V
E E
L E FI ekT
ekT
t W
V I T GS C
s
CH C eff ox
I GS D /
F)(/
)(0
0
10W FETLg=28nm
1mW FETLg=45nm
Fermi-Diracintegral of order
-
8/10/2019 DJF_Scaling and Optimization - IIT
19/64
-
8/10/2019 DJF_Scaling and Optimization - IIT
20/64
20
Power Calculation
B BOX subVT DYN TOT P P P P P 2+++== DD L H CKT DYN V V V C N P D )(2
1l
),,,,(7.1 , LW
Gox DDT off DDCKT subVT Lt V V I V N P =
),,()( , = ox DDT ox DD LW oxcoreox t V V J V D A P
),()( 231
2 DD Max B B DD LW
oxcore B B V F J V D A P =
= 1CKT N LTR
Dl
Note thatcross-throughpower is not
included.
The powers are computed separately for logic and for repeaters. = mean delay for a single loaded logic gate
Dl
is activity factor divided by logic depth. Usually ~0.012 inrecent optimizations.
-
8/10/2019 DJF_Scaling and Optimization - IIT
21/64
-
8/10/2019 DJF_Scaling and Optimization - IIT
22/64
22
Repeater Model
0.01 0.1 1
Latency Penalty Factor
50
60
70
80
90
100
R e p e a
t e r S p a c i n g
( u m
)
0.2
0.3
0.4
0.50.6
0.7
0.8
0.9
1
R e p e a
t e r W i d t h ( u m
)Repeater spacingRepeater width
Long wires receive repeaters with a spacing that isoptimized.Long wire delay can be absorbed into pipeline depth, butthe latency causes inefficiency, so we use a latencypenalty factor:
P econ =10 W/cm 2
0.01 0.1 1 10 100
CPU Core Power (W)
0.001
0.01
0.1
1
R e p e a
t e r S p a c i n g
( c m
)
9S10S
11S 12S 13S
-
8/10/2019 DJF_Scaling and Optimization - IIT
23/64
23
Local Variation Modeling Variation sources:
Signal Coupling noise Supply noise
Statistical doping variations LER gate length variations
Consequences modeled: Increased static power
combine 1 sigma of doping, length, and noise Critical path delay distribution
yield-based, using estimated critical pathdistribution,
and 1 sigma of doping and length, and worst casenoise.
Single stage functionality use worst case (~6 sigma) of doping and length,
no noise.
-
8/10/2019 DJF_Scaling and Optimization - IIT
24/64
24
Accounting for variations A complete most-probable worst-case-vector methodology is
used to handle both local and global variability.
123
Two variable example:
MPWCvector
Murphyvector
Worst-case vectors:
Blue curves are contours ofconstant probability.
Red curves are contours offunction to be minimized.
-
8/10/2019 DJF_Scaling and Optimization - IIT
25/64
25
Power vs Frequency Variation Windows
C h i p
P o w e r
( W )
Clock Frequency (GHz)
Optimizations for high-perf processor, 45 nm node technology, PDSOIPower-constrained optimization parameters: V DD, VTn, VTp, LG, repeater size and spacing.
Area constrained is also constrained, to 5.6 cm 2, by adjusting the widths.
VDD-10%
VDD+5%
230 W
Nominaldesign point:
VDD=1.058 V
This is the pointthat is calculated,optimized to, andreported.
These boxes are for
0.675 sigma, for 50%yield.
Power and delay aretreated as independent:25% of yield is lost for
each.The data shows 10%frequency variation persigma.
-
8/10/2019 DJF_Scaling and Optimization - IIT
26/64
26
Impact of variability on performance Atomistic effects are leading to greater device variability. Increasing variability requires larger design margins. Designing for larger margins decreases performance.
0% 50% 100% 150% 200%
Relative Margin
0.7
0.8
0.9
1
1.1
1.2
1.3
R e l a t i v e
P e r f o r m
a n c e
P=0.01 P=1 P=50
Increased variabilityrequires:
Higher supply voltages
Less scaled FETs65nm node, dualprocessor core
S M d l A ti d
-
8/10/2019 DJF_Scaling and Optimization - IIT
27/64
27
Summary: Models, Assumptions andApproximations
Power modelingDynamic switching energy plus static power mechanismsincluding sub-threshold current, gate oxide tunneling, andbody-to-drain band-to-band tunneling.
Device modelingBulk MOSFETs: V T, and depletion depth determined by thehalo doping, 2D effects are taken into account.
Gate length is fully optimized, not set by the technologynode.Circuit modeling
Delay is for FI=FO=2 or 3 NAND gates, based on model
from J.C. Eble's thesis [Ga.Tech. '98].Capacitance includes gate, parasitic, and wire parts (Rent'srule).Wire resistance includes temperature dependence and
surface scattering in small wires.
-
8/10/2019 DJF_Scaling and Optimization - IIT
28/64
-
8/10/2019 DJF_Scaling and Optimization - IIT
29/64
29
3. Optimization Results
General results Evaluating specific possible device
directions Increasing mobility High-k gate dielectric and metal gates 3D stacking Better heat sinks
Sub-ambient cooling Multi-processor tradeoffs
-
8/10/2019 DJF_Scaling and Optimization - IIT
30/64
-
8/10/2019 DJF_Scaling and Optimization - IIT
31/64
31
Optimization results
0.01 0.1 1 10 100
Total Chip Power (W)
0.50.60.70.80.9
11.11.21.31.41.51.6
E q u
i v . O
x y n i
t r i d e
T h i c k n e s s
( n m
)
90 nm 65 nm 45 nm 32 nm
0.01 0.1 1 10 100
Total Chip Power (W)
0
20
40
60
80
100
120
G a t e
L e n g
t h ( n m
)90 nm 65 nm 45 nm 32 nm
Oxide Thickness vs Power
High-k, for 32nm
Oxynitride
Gate Length vs Power
(High-k case assumes 0.3nmbarrier layer, bandedge metal gate,HfO
2-like insulator characteristics.)
Dual core processor with
aggressive air cooling
-
8/10/2019 DJF_Scaling and Optimization - IIT
32/64
32
Optimization results
Voltages vs Power
0.01 0.1 1 10 100
Total Chip Power (W)
0.1
0.20.3
0.4
0.5
0.6
0.7
0.8
0.9
S u p p
l y a
n d T h r e s
h o
l d V o
t l a g e
( V )
Vdd,90VT,90
Vdd,65VT,65
Vdd,45VT,45
Vdd,32VT,32
Dual core processorwith aggressive aircooling
Supply voltages are lower for low power applications. High-k lowers V DD ~ 15% at the 45nm generation.
l ll
-
8/10/2019 DJF_Scaling and Optimization - IIT
33/64
33
Optimal Power Allocation Fractions
Active power fraction:70% at low power to40% at high power.
45nm technology with microchannel
heat sink and water cooling.4 core chip.
1 3 10 30 100 300
Chip Power (W)
0%
20%
40%
60%
80%
100%
P o w e r
A l l o c a
t i o n
Oxide pwr, rptrsSubVT pwr, rptrsDyn. pwr, rptrs
Oxide pwr, logicSubVT pwr, logicDyn. pwr, logic
M bili d d
-
8/10/2019 DJF_Scaling and Optimization - IIT
34/64
34
Mobility dependence
Enhanced mobility has greatest benefit at high power.
Even for large mobility enhancements, performance boost ismodest: 10-15%.
32nm technology8 core processor
Air cooled
45nm technologydual core processor water cooled
1 1.5 2 2.5 3Mobility Enhancement Factor
1
1.05
1.1
1.15
1.2
R e l a t
i v e
P e r f o r m a n c e
1 W chip 10 W chip 100W chip
1 1.5 2 2.5 3Mobility Enhancement Factor
1
1.02
1.04
1.06
1.08
1.1
1.12
R e l a t
i v e
P e r f o r m a n c e 1W 10W 100W
Metal-gate workfunction for high-k
-
8/10/2019 DJF_Scaling and Optimization - IIT
35/64
35
eta gate wo u ct o o gand oxynitride
45nm node, dual core processorwith aggressive air cooling
0 0.1 0.2 0.3 0.4 0.5Workfunction offset from bandedge (ev)
0.5
0.60.70.80.9
11.11.21.31.4
1
P e r
f o r m a n c e r e
l a t i v e
t o p o
l y - S
i
5W, oxynitride50W, oxynitride
5W, high-k50W, high-k
3D ki
-
8/10/2019 DJF_Scaling and Optimization - IIT
36/64
36
3D stacking
Multiple layers offer higher performancedue to shorter wires.
RED = 1 Layer , GREEN = 2 Layers
1 10 1000
5
10
15
100200
300400Relative Performance Mean FET Width (nm)
0 1 10 100
Chip Power (W)
Chip Area (cm 2)
1 10 10000.20.40.6
0.81
1.2
T o t
S i a
r e a
F o o
t p r i n t
Chip Power (W)
Mean Wire Length (um)
1 10 10005
10
1520
(4 core, 45nm node, water cooling.)
C li i ti i ti
-
8/10/2019 DJF_Scaling and Optimization - IIT
37/64
37
Cooling scenario optimizations
Forced liquid cooling through microchannel fins may permit very highpower densities.Optimized (maximum) performance increases as the ~log of the power.
Optimized over 7 variables: L g, tox, Nd, , D rptr , , Vdd .
Low temperature case does not include refrigerator power.
8 core processor design
32nm technology
4 core processor design
45nm technology1 10 100 1000 10000
Total Chip Power (W)
0
1
2
3
4
5
P e r
f o r m a n c e
( a r b u n
i t s )
-40C Liquid18C Water
Hi-Perf. Air Low-Cost Air
1 10 100 1000 10000Total Chip Power (W)
0
1
2
34
5
67
P e r
f o r m a n c e
( a r b u n
i t s )
-40C Liquid18C Water Hi-Perf. Air Low-Cost Air
M lti ti ti
-
8/10/2019 DJF_Scaling and Optimization - IIT
38/64
38
Multiprocessor motivationThe energy / performance tradeoff is very steep at the high end.Lower power, more parallel processors potentially offer morecomputation for the same total power level.
3 0 0
1 0 0
3 0
1 0
3
1
0 . 3 0 . 1 0 . 0 3
0 . 0 1 0 . 0 0 3
0 . 0 0 1
1E+12 1E+13 1E+14 1E+15 1E+16
Total Logic Transistions / sec
0.1
1
10
L o a
d e d S w
i t c h i n g
E n e r g y
( f J )
3x
3x
4 core processors 4-processor chips withmicro-channel watercooling, optimizingeverything.
9 variables: t ox , Lg , N D,, V dd , w HP , S rpt ,
, x halo
Dependence on number of cores
-
8/10/2019 DJF_Scaling and Optimization - IIT
39/64
39
Dependence on number of cores
1 10 100 1000
Total Chip Power (W)
1
10
P e r
f o r m a n c e
( a r b u n
i t s )
2 cores 4 cores 8 cores 16 cores
Constant total number of transistors, divided equally among n cores:
4 Future Projections (22 11nm)
-
8/10/2019 DJF_Scaling and Optimization - IIT
40/64
40
4. Future Projections (22 11nm)
Device options General results Technology projections Beyond 11nm?
Device Options
-
8/10/2019 DJF_Scaling and Optimization - IIT
41/64
41
Device Options PDSOI
IBMs best understood technology. FinFET
Improved electrostatic control ofchannel offers shorter gates, lowervoltages, higher speed.Workfunction V T control. Notentirely planar.
ETSOI Somewhat improved electrostatic
control compared to PDSOI. More
compatible with conventionalplanar processing. WorkfunctionVT control.
Shallow Bulk
Shallow junctions, raised S/D.
GateGate
Drain
Source
FinFET
Buried OxideSubstrate
Source DrainGateETSOI
Bulk
Source Drain
Gate
The ETSOI and FinFETdevices simulated usingan in-house scalingmodel.
Comparable source/drainresistance and parasiticcapacitance modelswere implemented forETSOI, FinFET, and forshallow bulk MOSFETs.
General optimization results
-
8/10/2019 DJF_Scaling and Optimization - IIT
42/64
42
performance vs power
Conditions: PDSOI, 4 core processor chip, constraining total chip power
Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths,repeater size and spacing.
1.E+07
1.E+08
1.E+09
1.E+10
0.01 0.1 1 10 100 1000
Total Chip Power(W)
F r e q u e n c y
( H z
)
22nm
16nm
11nm
As always, theeasiest way toincreaseperformance isto increase thepower.
Gate Length and Chip Area vs Power
-
8/10/2019 DJF_Scaling and Optimization - IIT
43/64
43
Gate Length and Chip Area vs Power
0
5
10
15
20
25
30
35
40
45
50
0.01 0.1 1 10 100 1000
Total Chip Power(W)
G a
t e L e n g
t h ( n m
)
22nm
16nm
11nm
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.01 0.1 1 10 100 1000
Total Chip Power(W)
C h i p A r e a
( c m
2 )
22nm
16nm
11nm
22nm
11nm
11nm
22nm
Lower power requires longer gatelengths, to reduce variability.
Toxeqv : 0.6 0.82 nm
Device density increases foreach generation. Shrinking areaincreases power density, too.
Conditions: PDSOI, 4 core processor chip, constraining total chip power
Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths,repeater size and spacing.
Voltage and Energy vs Power
-
8/10/2019 DJF_Scaling and Optimization - IIT
44/64
44
Voltage and Energy vs Power
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0.01 0.1 1 10 100 1000
Total Chip Power(W)
S u p p
l y V o
l t a g e
( V )
22nm
16nm
11nm
0
2
4
6
8
10
12
14
16
0.01 0.1 1 10 100 1000
Total Chip Power(W)
E n e r g y p e r
l o g
i c t r a n s i
t i o n ( f
J )
22nm
16nm
11nm22nm11nm
22nm
11nm
Optimal supply voltage can become quite low for low power constraints, leading to verylow energy use per logic transition.
Conditions: PDSOI, 4 core processor chip, constraining total chip power
Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths,repeater size and spacing.
Energy vs performance trade-off
-
8/10/2019 DJF_Scaling and Optimization - IIT
45/64
45
Energy vs performance trade-off
0.1
1
10
1.E+07 1.E+08 1.E+09 1.E+10
Clock Frequency (Hz)
E n e r g y
p e r
l o g
i c t r a n s i t
i o n
( f J )
22nm
16nm
11nm
Conditions: PDSOI, 4 core processor chip, constraining total chip power
Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths,repeater size and spacing.
10x
4x22nm
11nm
Optimal On/Off Ratio
-
8/10/2019 DJF_Scaling and Optimization - IIT
46/64
46
Optimal On/Off Ratio
0.00001
0.0001
0.001
0.01
0.1
0.1 1 10 100
On-current (A/cm)
O f f - c u r r e n
t ( A / c m
)
22 nm
16 nm
11 nm
PDSOI nFETs, currents measured at nominal process and bias conditions
10000:1
1000:1
L o w
b i a s c
o n d i t i o
n s
H i g h b i
a s c o
n d i t i o
n s
Conditions: PDSOI, 4 core processor chip, constraining total chip power
Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths,repeater size and spacing.
Performance at constant power density (PDSOI)
-
8/10/2019 DJF_Scaling and Optimization - IIT
47/64
47
Performance at constant power density (PDSOI)
0
1
2
3
4
5
6
45nm 32nm 22nm 16nm 11nm
Technology Node
F r e q u e n c y
( G H z )
10W/cm2
25W/cm2
50W/cm2
Conditions: PDSOI, 4 core processor chip, constraining total chip power density
Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths, repeatersize and spacing.
Performance increases at 32nm due to hi-k introduction, but then falls as straindiminishes and gate dielectric does not scale further.
00.10.2
0.3
0.40.50.60.7
0.80.9
1
45nm 32nm 22nm 16nm 11nm
Technology Node
S u p p
l y V o
l t a g e
( V )
10W/cm2
25 W/cm2
50W/cm2
0
1
2
3
4
5
6
45nm 32nm 22nm 16nm 11nm
Technology Node
C h i p A r e a
( c m
2 )
10 W/cm2
25 W/cm2
50 W/cm2
50 W/cm 2
10 W/cm 2
Performance at constant power density h l
-
8/10/2019 DJF_Scaling and Optimization - IIT
48/64
48
comparing technologies
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
45nm 32nm 22nm 16nm 11nm
Technology Node
F r e q u e n c y
( G H z )
Bulk
PDSOI
FinFET
ETSOI
25 W/cm 2
ETSOI and FinFET offer moderate performance advantage over PDSOI for 22nm nodeand beyond. The industry should transition to FinFET at 16nm to avoid performance loss.
Conditions: 4 core processor chip, constraining total chip power at 25 W/cm 2
Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths, repeater sizeand spacing, fin height, sidewall thickness (Fin), Si thickness (ET).
Power density at constant performance
-
8/10/2019 DJF_Scaling and Optimization - IIT
49/64
49
Power density at constant performance
0
10
20
30
40
50
60
45nm 32nm 22nm 16nm 11nm
Technology node
P o w e r
D e n s i
t y ( W / c m
2 )
PD 1e15
PD 1.5e15
Fin 1e15
Fin 1.5e15
0.1
1
10
45nm 32nm 22nm 16nm 11nmTechnology node
A r e a
( c m
2 ) PD 1e15PD 1.5e15
Fin 1e15Fin 1.5e15
Conditions: PDSOI and FinFET, 4 core processor chip, constraining total chipperformance
Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths, repeatersize and spacing, fin height, sidewall thickness
~2.6 GHz
~3.9 GHz
Required power density drops at32nm due to transition to hi-k, thenrises due to decreasing strain andlack of scaling of gate insulator.
Transition to FinFET at 16nm to
obtain continued improvement.
PDSOI
FinFET
PDSOI
PDSOI
FinFET
FinFET
Performance at constant area and power
-
8/10/2019 DJF_Scaling and Optimization - IIT
50/64
50
0
2
4
6
8
10
12
14
16
45nm 32nm 22nm 16nm 11nm
Technology node
R e
l a t i v e
P e r f o r m a n c e
PD 25W/cm2
PD 50W/cm2
Fin 25W/cm2
Fin 50W/cm2
p
0
10
20
30
40
50
60
70
80
45nm 32nm 22nm 16nm 11nm
Technology node
N u m
b e r o
f C o r e s
PD 25W/
PD 50W/
Fin 25W/
Fin 50W/
Area = 4 cm 2, power = 100 W, fixed.
Number of cores is adjusted to maintainconstant area. Chip performance isassumed linear with the number of cores.
Maximizing total chip performance.50 W/cm2
25 W/cm 2
50 W/cm 2 25 W/cm 2
FinFET
PDSOI
Conditions: PDSOI and FinFET, variable # core processor chip, constraining bothchip power and chip area (4 cm 2).
Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths, repeatersize and spacing, fin height, sidewall thickness, and number of cores.
PDSOI
FinFET
cm2
cm2
cm2
cm2
Supply voltage considerations
-
8/10/2019 DJF_Scaling and Optimization - IIT
51/64
51
pp y g
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
0.4 0.5 0.6 0.7 0.8 0.9
Supply Voltage (V)
R e l a
t i v e
P e r
f o r m a n c e
Vary Vdd
Vary actf
Vary Tol
Vary mu
Conditions: PDSOI, 4 core processor chip, constraining total chip power density
Optimizing: V DD, tox, dopings (for V Ts), L G, p:n width ratio, mean widths, repeatersize and spacing.
25 W/cm 2 VDD can be reduced byhigher activity, highermobility, and tightertolerances.
18%
6%
0.7X
0.7X
1.4X
1.4XOptimizing everythingexcept V DD.
Activity factor
Variability
Mobility
Optimizations vs gate length for FinFETs and ETSOI
-
8/10/2019 DJF_Scaling and Optimization - IIT
52/64
52
p g g
00.10.20.30.40.50.60.7
0 20 40 60
Gate Length (nm)
S u p p
l y V o
l t a g e ,
V d d ( V
ETSOI
FinFET
01234
5678
0 20 40 60
Gate Length (nm)
S i l i c o n
T h i c k n e s s
( n m
)
ETSOI
FinFET
0
10
20
30
40
50
0 20 40 60
Gate Length (nm)
R e l .
P e r f o r m a n c e
ETSOIFinFET
0
0.02
0.04
0.06
0.08
0.1
0.12
0 20 40 60
Gate Length (nm)
D I B L ( V )
ETSOI
FinFET
Everything except gate length is optimized. The gate length is scanned.
25 W/cm 2 power density constraints.
Shorter gatelengths necessitate:
Higher DIBL
Higher V DD
Thinner t Si
Ultimately, lower
performance
11 nm node
DIBL Scaling enables VDD Scaling
-
8/10/2019 DJF_Scaling and Optimization - IIT
53/64
53
1W
0
5
10
15
20
25
30
35
22nm 16nm 11nm
Technology Node
G a t e
L e n g
t h ( n m
)
Fin ET Bulk
0
5
10
15
20
25
30
35
22nm 16nm 11nm
Technology Node
G a
t e L e n g
t h ( n m
)
Fin ET Bulk
00.10.20.30.40.50.60.70.8
0 0.02 0.04 0.06 0.08DIBL (V/V)
S u p p
l y V o
l t a g
e ,
V D D ( V )
FinFET ETSOI Bulk
g V g
0
0.1
0.2
0.3
0.4
0.50.6
0 0.02 0.04 0.06DIBL (V/V)
S u p p l y
V o
l t a g e ,
V D D ( V ) FinFET ETSOI Bulk
25W/cm 2
2211
2211
Novel devices for 11nm and beyond
-
8/10/2019 DJF_Scaling and Optimization - IIT
54/64
54
y
III-V FinFETs Higher mobility improves drive current
Tunnel FETs
Improved subthreshold slope enables low V DD and low energyoperation. To properly model this device, have to be able to calculate the tunneling
barrier shapes and band-edge alignments.
We are in the process of developing a compact model for TFETs for theoptimizer, but results are not yet available. As an interim measure, wecan alter the Boltzmann constant in the conventional FET model, to seethe impact of steeper subthreshold slope.
Carbon Nanotube Transistors Ballistic current flow in the channel should enable very high switchingspeeds for these devices.
A compact model for CNTs suitable for the optimizer is being presented
at IEDM this year, but results are not available yet.
-
8/10/2019 DJF_Scaling and Optimization - IIT
55/64
Summary
-
8/10/2019 DJF_Scaling and Optimization - IIT
56/64
56
y
CMOS scaling is limited by electrostatic, quantum mechanical,discreteness, thermodynamic and practical effects.
Optimization can and should be used to find the best design pointsin the midst of these various constraints. Example: low power needs somewhat less scaled devices.
Technology performance projections based on optimization forPDSOI, FinFETs, ETSOI, and Bulk MOSFETs show:
Density improvements should continue, as long as wiring densitycontinues to improve
Performance improvements are likely to be rather modest, even with aswitch to FinFETs for 16 and/or 11nm nodes.
Exploratory devices (CNTs and/or TFETs) may offer substantialperformance advantages, someday.
Acknowledgements
-
8/10/2019 DJF_Scaling and Optimization - IIT
57/64
57
Wilfried Haensch Leland Chang Paul Solomon Steve Koester Lan Wei
Philip Wong Ghavam Shahidi
Mike Scheuermann Phillip Restle Omer Dokumaci Mary Wisniewski Steve Kosonocky
Yuan Taur Bob Dennard
Extra slides
-
8/10/2019 DJF_Scaling and Optimization - IIT
58/64
58
Generalized heat sink model
-
8/10/2019 DJF_Scaling and Optimization - IIT
59/64
59
Two level heat flow model: Flow in the silicon wafer Flow in the heat sink material
In each layer, the flow can be: 3D (spherical) for spots smaller
than thickness 2D (cylindrical) at distances
larger than the thickness In silicon layer, inhomogeneous
power dissipation is accountedfor, to estimate maximum junctiontemperature at hottest point.
Si wafer Heat sources
Interface
Interface to final coolant(e.g., air or water)
Si thermal sheet resistance of Si wafer
HS thermal sheet resistance of heat sink
R HS thermal contact resistance of heat sink
R Si thermal contact resistance of Si wafer
Heat spreader
(e.g., SiC or Cu)
T e m p e r a
t u r e
R i s e
( K )
Hot spot size (cm)
My model is red.
Kais data is blue.
T e m p e r a
t u r e
R i s e
( K )
Hot spot size (cm)
My model is red.
Kais data is blue.
This model is redFE model is blue
Comparisonof simplified
analyticmodel withdetailednumericalmodel.
III-V FinFETs
-
8/10/2019 DJF_Scaling and Optimization - IIT
60/64
60
III-V FinFETs are modeled by increasing the mobility in the conventional model.
Increased mobility enables an improved energy/performance tradeoff byreducing the voltage needed for high performance designs.
0.1
1
10
1.E+13 1.E+14 1.E+15 1.E+16
Performance (transitions/sec)
E n e r g y /
t r a n s i
t i o n
( f J )
FinFET III-V 2x III-V 4x
1x
(=Si)
2x(~GaAs)
4x(~InGaAs)
Drive current multiplier:
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 2 4 6 8 10 12 14 16
Frequency (GHz)
S u p p
l y V o l t a g e
( V )
Fin 2x 4x
-
8/10/2019 DJF_Scaling and Optimization - IIT
61/64
TFET Heterostuctures SiGe Heterostuctures offer loweffective bandgap, which improvesli
-
8/10/2019 DJF_Scaling and Optimization - IIT
62/64
62
ON-state
Off-state
tunneling.
III-V materials offer lower effectivemasses and more heterojunctions,to further improve tunneling.
Nanowire Geometry Offers:
Optimum electrostatics with gateall around
New material combinations
Integration onto silicon possible
Scaling to quantum capacitancelimit
Improved I on
III-V H-TFETPlanar SiGe H-TFET
Si
p ++ SiGep-Si
Buried Oxide
n+ Si
polysource drain
gate
SiGe
Si
V g > 0 Ambipolar
devicesupressed
SiGe
Si
V g = 0
x
Ambipolardevicesupressed
High oncurrent
[Koester, Riel, Koswatta]
CNFET: Good or Bad?
-
8/10/2019 DJF_Scaling and Optimization - IIT
63/64
63
Carbon Nanotube Field Effect Transistor (CNFET) 1D devices Better transport characteristics Worse parasitics Leakage, Variations, etc
CNFET: Good or Bad? The judgment highly depends on the application
Our approach is to build a optimizer for a full technology,with proper consideration of device properties and systemneeds.
Device modeling
Circuit performance benchmarking System application consideration
[Lan Wei, Stanford]
Wire Model
-
8/10/2019 DJF_Scaling and Optimization - IIT
64/64
64
0.001 0.01 0.1 1 10
Total Power (W)
0.85
0.90
0.95
1.00
L T R w
i t h S c a
t t / L T R w
/ o S c a
t t .
50 100 150 200 250 300 350 400Junction Temperature (K)
0.5
1
1.5
2
2.5
3
R e l a t
i v e
P e r f o r m a n c e
10W,const R1W, const R0.1W, const R10W1W0.1W
Relative Performance change due to R(T)
Assumed constant 2:1 height to width wiring with equal linesand spaces. ( 0.062 k BEOL fF/um )
Performance loss due to scattering
++=
261lnnm70
),(10/)40(
K 300
T eW
T W
Consequences of wire resistancemodel: