Designing High Performance Computing Architectures for Reliable Space Applications
Designing Reliable Systems With Unreliable Components
Transcript of Designing Reliable Systems With Unreliable Components
®® 1
Designing Reliable SystemsDesigning Reliable SystemsWithWith
Unreliable ComponentsUnreliable Components
Shekhar Borkar, Shekhar Borkar,
Intel FellowIntel Fellow
Director, Microprocessor ResearchDirector, Microprocessor Research
November 14, 2005November 14, 2005
2
OutlineOutline
�� Process technology scaling Process technology scaling
�� Near term challengesNear term challenges
�� µµµµµµµµArchitectureArchitecture & design solutions& design solutions
�� Upcoming paradigm shiftsUpcoming paradigm shifts
�� Long term outlook & challengesLong term outlook & challenges
�� SummarySummary
3
Technology ScalingTechnology Scaling
GATE
SOURCE
BODY
DRAIN
Xj
ToxD
GATE
SOURCE DRAIN
Leff
BODY
Lower active powerLower active powerVdd & Vt scalingVdd & Vt scaling
Faster transistor, Faster transistor, higher performancehigher performance
Oxide thickness Oxide thickness scales downscales down
Doubles transistor Doubles transistor densitydensity
Dimensions scale Dimensions scale down by 30%down by 30%
Scaling will continue, but with challenges!Scaling will continue, but with challenges!Scaling will continue, but with challenges!
4
Technology OutlookTechnology Outlook
Medium High Very HighMedium High Very HighVariabilityVariability
Energy scaling will slow downEnergy scaling will slow down>0.5>0.5>0.5>0.5>0.35>0.35Energy/Logic Op Energy/Logic Op scalingscaling
0.5 to 1 layer per generation0.5 to 1 layer per generation88--9977--8866--77Metal LayersMetal Layers
1111111111111111RC DelayRC Delay
Reduce slowly towards 2Reduce slowly towards 2--2.52.5<3<3~3~3ILD (K)ILD (K)
Low Probability High ProbabilitLow Probability High ProbabilityyAlternate, 3G etcAlternate, 3G etc
128
1111
20162016
High Probability Low ProbabilitHigh Probability Low ProbabilityyBulk Planar CMOSBulk Planar CMOS
Delay scaling will slow downDelay scaling will slow down>0.7>0.7~0.7~0.70.70.7Delay = CV/I Delay = CV/I scalingscaling
256643216842Integration Integration Capacity (BT)Capacity (BT)
88161622223232454565659090Technology Node Technology Node (nm)(nm)
20182018201420142012201220102010200820082006200620042004High Volume High Volume
ManufacturingManufacturing
5
6
The The Leakage(sLeakage(s))……
1
10
100
1000
10000
30 50 70 90 110 130
Temp (C)
Ioff
(n
a/u
)
0.25
90
45
0.1
1
10
100
1000
0.25u 0.18u 0.13u 90nm 65nm 45nm
Technology
SD
Leakag
e (
Watt
s)
2X Tr Growth
1.5X Tr Growth
90nm MOS Transistor
50nm50nm SiSi
1.2 nm SiO1.2 nm SiO22
GateGate
1.E-03
1.E-02
1.E-01
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
0.25u 0.18u 0.13u 90nm 65nm 45nm
Technology
Ga
te L
ea
kag
e (
Wa
tts)
1.5X
2X
During Burn-in
1.4X Vdd
7
Must Fit in Power EnvelopeMust Fit in Power Envelope
0
200
400
600
800
1000
1200
1400
90nm 65nm 45nm 32nm 22nm 16nm
Po
wer
(W),
Po
wer
Den
sit
y (
W/c
m2)
SiO2 Lkg
SD Lkg
Active
10 mm Die
Technology, Circuits, and Technology, Circuits, and
Architecture to constrain the power Architecture to constrain the power
8
Between Now and ThenBetween Now and Then——
�� Move away from Frequency alone to Move away from Frequency alone to deliver performancedeliver performance
�� More onMore on--die memorydie memory
�� MultiMulti--everywhereeverywhere
––MultiMulti--threadingthreading
––Chip level multiChip level multi--processingprocessing
�� Throughput oriented designsThroughput oriented designs
�� Valued performance by higher level of Valued performance by higher level of integrationintegration
––Monolithic & Monolithic & PolylithicPolylithic
9
Leakage SolutionsLeakage Solutions
Tri-gate Transistor
Silicon substrateSilicon substrate
1.2 nm SiO1.2 nm SiO22
GateGate
Planar Transistor
SiGeSiGe
Silicon substrate
Gate electrode
3.0nm High-k
For a few generations, then what?
10
Active Power ReductionActive Power Reduction
Slow Fast Slow
Lo
w S
up
ply
V
olt
ag
e
Hig
h S
up
ply
V
olt
ag
e
Multiple Supply Voltages
Logic BlockFreq = 1Vdd = 1Throughput = 1Power = 1Area = 1 Pwr Den = 1
Vdd
Logic Block
Freq = 0.5Vdd = 0.5Throughput = 1Power = 0.25Area = 2Pwr Den = 0.125
Vdd/2
Logic Block
Throughput Oriented Designs
11
Leakage ControlLeakage ControlBody Bias
VddVbp
Vbn-Ve
+Ve
22--10X10XReductionReduction
Sleep Transistor
Logic Block
22--1000X1000XReductionReduction
Stack Effect
Equal Loading
55--10X10XReductionReduction
12
Optimum FrequencyOptimum Frequency
Maximum performance withMaximum performance with
•• Optimum pipeline depth Optimum pipeline depth
•• Optimum frequencyOptimum frequency
Process Technology
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10Relative Frequency
Sub-threshold Leakage increases
exponentially
Pipeline Depth
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10Relative Pipeline Depth
Power
Efficiency
Optimum
Pipeline & Performance
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10Relative Frequency (Pipelining)
Performance
DiminishingReturn
13
µµArchitectureArchitecture TechniquesTechniques
0%
25%
50%
75%
100%
1u 0.5u 0.25u 0.13u 65nm
Cach
e %
of
To
tal
Are
a
486Pentium®
Pentium® III
Pentium® 4
Pentium® M
Increase on-die Memory
ST Wait for Mem
MT1 Wait for Mem
MT2 Wait
MT3
Single ThreadSingle Thread
MultiMulti--ThreadingThreading
Full HW Utilization
Multi-threading
Improved performance, no impact Improved performance, no impact
on thermals & power deliveryon thermals & power delivery
C1 C2
C3 C4
Cache
Chip Multi-processing
LargeCore
1
1.5
2
2.5
3
3.5
1 2 3 4
Die Area, Power
Re
lati
ve
Pe
rfo
rma
nc
e
Multi Core
Single Core
14
Special Purpose HardwareSpecial Purpose Hardware
TC
B ExecCore
PLL
OOO
ROM
CA
M1
TC
B ExecCore
PLL
ROB
ROM
CL
BInputseq
Sendbuffer
2.23 mm X 3.54 mm, 260K transistors
Opportunities: Network processing engines
MPEG Encode/Decode engines, Speech engines
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1995 2000 2005 2010 2015
MIP
S
GP MIPS
@75W
TOE MIPS
@~2W
TCP/IP Offload EngineTCP/IP Offload Engine
Si Monolithic Polylithic
CPU
SpecialHW
Memory
CMOS RF
Wireline
RF
Opto-Electronics
Dense Memory
HeterogeneousSi, SiGe, GaAs
Special purpose HW and extreme integration
15
16
Sources of VariationsSources of Variations
0
50
100
150
200
250H
eat
Flu
x (
W/c
m2
)
Heat Flux (W/cm2)—Vcc variation
40
50
60
70
80
90
100
110
Te
mp
era
ture
(C
)
Temp Variation & Hot spots
10
100
1000
10000
1000 500 250 130 65 32
Technology Node (nm)
Me
an
Nu
mb
er
of
Do
pa
nt
Ato
ms
Random Dopant Fluctuations
0.01
0.1
1
1980 1990 2000 2010 2020
micron
10
100
1000
nm
193nm193nm248nm248nm
365nm365nm LithographyLithography
WavelengthWavelength
65nm65nm
90nm90nm
130nm130nm
GenerationGeneration
GapGap
45nm45nm
32nm32nm13nm 13nm
EUVEUV
180nm180nm
Source: Mark Bohr, Intel
Sub-wavelength Lithography
17
Impact of Static VariationsImpact of Static Variations
130nm
30%
5X
FrequencyFrequency
~30%~30%
LeakageLeakage
PowerPower
~5~5--10X10X
0.90.9
1.01.0
1.11.1
1.21.2
1.31.3
1.41.4
11 22 33 44 55Normalized Leakage (Normalized Leakage (IsbIsb))
No
rmalized
Fre
qu
en
cy
No
rmalized
Fre
qu
en
cy
Today…
18
Circuit Design TradeoffsCircuit Design Tradeoffs
0
0.5
1
1.5
2
Low-Vt usagelow high
Higher probability of target frequency with:Higher probability of target frequency with:
1.1. Larger transistor sizes Larger transistor sizes
2.2. Higher LowHigher Low--VtVt usageusage
But with power penaltyBut with power penalty
0
0.5
1
1.5
2
Transistor sizesmall large
power
target frequency probability
19
Impact of Critical PathsImpact of Critical Paths
1.1
1.2
1.3
1.4
1 9 17 25
# of critical paths
Mean
clo
ck f
req
uen
cy
Clock frequency
Nu
mb
er
of
die
s
0%
20%
40%
60%
0.9 1.1 1.3 1.5
# critical paths
�� With increasing # of critical pathsWith increasing # of critical paths
––Both Both σσσσσσσσ and and µµµµµµµµ become smallerbecome smaller
––Lower mean frequencyLower mean frequency
20
Impact of Logic DepthImpact of Logic Depth
0%
20%
40%
-16% -8% 0% 8% 16%
Delay
20%
40% NMOS
PMOS
Device IONN
um
ber
of
sa
mp
les (
%)
Variation (%)
NMOS Ion
σ/µσ/µσ/µσ/µ
PMOS Ion
σσσσ/µµµµ
Delay
σσσσ/µµµµ
5.6% 3.0% 4.2%
Logic depth: 16
0.0
0.5
1.0
Logic depth
Rati
o o
f
dela
y- σσ σσ
to Io
n- σσ σσ
16 49
21
0
0.5
1
1.5
Logic depthsmalllarge
frequency
target frequency probability # uArch critical paths
0
0.5
1
1.5
less more
µµArchitectureArchitecture TradeoffsTradeoffs
Higher target frequency with:Higher target frequency with:
1.1. Shallow logic depthShallow logic depth
2.2. Larger number of critical pathsLarger number of critical paths
But with lower probabilityBut with lower probability
22
VariationVariation--tolerant Designtolerant Design
0
0.5
1
1.5
# uArch critical pathsless more
Balance Balance Balance Balance
power & power & power & power &
frequency frequency frequency frequency
with with with with
variation variation variation variation
tolerancetolerancetolerancetolerance
0
0.5
1
1.5
Logic depthsmalllarge
frequency
target frequency probability
0
0.5
1
1.5
2
Transistor sizesmall large
power
target frequency probability
0
0.5
1
1.5
2
Low-Vt usagelow high
23
Leakage PowerLeakage Power
Fre
qu
en
cy
Fre
qu
en
cy DeterministicDeterministic
ProbabilisticProbabilistic
10X variation 10X variation
~50% total power~50% total power
Probabilistic DesignProbabilistic Design
DelayDelay
Path DelayPath Delay Pro
bab
ilit
yP
rob
ab
ilit
yDeterministic design techniques inadequate in the futureDeterministic design techniques inadequate in the futureDeterministic design techniques inadequate in the future
Due to Due to
variations in:variations in:
VVdddd, V, Vtt, and , and
TempTemp
Delay TargetDelay Target
# o
f P
ath
s# o
f P
ath
s
DeterministicDeterministic
Delay TargetDelay Target
# o
f P
ath
s# o
f P
ath
s
ProbabilisticProbabilistic
24
Shift in Design ParadigmShift in Design Paradigm
�� MultiMulti--variable design optimization for:variable design optimization for:
–– Yield and bin splits Yield and bin splits
–– Parameter variations Parameter variations
–– Active and leakage powerActive and leakage power
–– Performance Performance
Tomorrow:Tomorrow:Global OptimizationGlobal Optimization
MultiMulti--variatevariate
Today:Today:Local OptimizationLocal Optimization
Single VariableSingle Variable
25
26
TodayToday’’s Freelance Layouts Freelance Layout
Vss
Vdd
OpIp
Vss
Vdd
Op
No layout restrictionsNo layout restrictionsNo layout restrictions
27
Transistor Orientation RestrictionsTransistor Orientation Restrictions
Vss
Vdd
OpIp
Vss
Vdd
Op
Transistor orientation restricted to improve manufacturing control
Transistor orientation restricted to improve Transistor orientation restricted to improve
manufacturing controlmanufacturing control
28
Op
Vss
Vdd
Ip
Vss
Vdd
Op
Transistor Width QuantizationTransistor Width Quantization
29
TodayToday’’s Unrestricted s Unrestricted RoutingRouting
30
Future Metal RestrictionsFuture Metal Restrictions
31
TodayToday’’s Metric: s Metric: Maximizing Transistor DensityMaximizing Transistor Density
Dense layout causes hot-spotsDense layout causes hotDense layout causes hot--spotsspots
32
TomorrowTomorrow’’s Metric: s Metric: Optimizing Transistor & Power DensityOptimizing Transistor & Power Density
Balanced DesignBalanced DesignBalanced Design
33
Implications to DesignImplications to Design
�� Design fabric will be Design fabric will be RegularRegular
�� Will look like Will look like SeaSea--ofof--transistorstransistorsinterconnected with regular interconnected with regular interconnect fabricinterconnect fabric
�� Shift in the design efficiency metricShift in the design efficiency metric
–– From From Transistor DensityTransistor Density to to Balanced Balanced DesignDesign
34
35
Technology OutlookTechnology Outlook
Medium High Very HighMedium High Very HighVariabilityVariability
Energy scaling will slow downEnergy scaling will slow down>0.5>0.5>0.5>0.5>0.35>0.35Energy/Logic Op Energy/Logic Op scalingscaling
0.5 to 1 layer per generation0.5 to 1 layer per generation88--9977--8866--77Metal LayersMetal Layers
1111111111111111RC DelayRC Delay
Reduce slowly towards 2Reduce slowly towards 2--2.52.5<3<3~3~3ILD (K)ILD (K)
Low Probability High ProbabilitLow Probability High ProbabilityyAlternate, 3G etcAlternate, 3G etc
128
1111
20162016
High Probability Low ProbabilitHigh Probability Low ProbabilityyBulk Planar CMOSBulk Planar CMOS
Delay scaling will slow downDelay scaling will slow down>0.7>0.7~0.7~0.70.70.7Delay = CV/I Delay = CV/I scalingscaling
256643216842Integration Integration Capacity (BT)Capacity (BT)
88161622223232454565659090Technology Node Technology Node (nm)(nm)
20182018201420142012201220102010200820082006200620042004High Volume High Volume
ManufacturingManufacturing
36
ReliabilityReliability
Soft Error FIT/Chip (Logic & Mem)
0
50
100
150
180
130 90 65 45 32 22 16
Re
lati
ve
~8% degradation/bit/generation
Time dependent device degradation
0
1
1 2 3 4 5 6 7 8 9 10
Time
Ion
Rela
tive
Burn-in may phase out…?
1
10
100
1000
10000
180 90 45 22
Jo
x (
Re
lati
ve)
Hi-K?
Extreme device variations
0
50
100
100 120 140 160 180 200
Vt(mV)
Re
lati
ve
Wider
37
Implications to ReliabilityImplications to Reliability
�� Extreme variations (Static & Dynamic) Extreme variations (Static & Dynamic) will result in unreliable componentswill result in unreliable components
�� Impossible to design reliable system Impossible to design reliable system as we know todayas we know today
––Transient errors (Soft Errors) Transient errors (Soft Errors)
––Gradual errors (Variations)Gradual errors (Variations)
––Time dependent (Degradation)Time dependent (Degradation)
Reliable systems with unreliable components
—Resilient µµµµArchitectures
Reliable systems with unreliable components Reliable systems with unreliable components
——Resilient Resilient µµµµµµµµArchitecturesArchitectures
38
Implications to TestImplications to Test
�� OneOne--timetime--factory testing will be outfactory testing will be out
�� BurnBurn--in to catch chip infantin to catch chip infant--mortality mortality will not be practicalwill not be practical
�� Test HW will be part of the designTest HW will be part of the design
�� Dynamically selfDynamically self--test, detect errors, test, detect errors, reconfigure, & adaptreconfigure, & adapt
39
In a NutIn a Nut--shellshell……
100 Billion
Transistors
100 BT integration capacity
Billions unusable (variations)
Some will fail over time
Yet, deliver high performance in the power &
cost envelope
Yet, deliver high performance in the power & Yet, deliver high performance in the power &
cost envelopecost envelope
Intermittent failures
40
SummarySummary�� Near term:Near term:
–– Optimum frequency & Optimum frequency & µµµµµµµµArchitectureArchitecture
–– Lots of memory & MultiLots of memory & Multi——everywhereeverywhere
–– Valued performance with higher integrationValued performance with higher integration
�� Paradigm shift:Paradigm shift:
–– From deterministic to probabilistic design, with From deterministic to probabilistic design, with multimulti--variatevariate optimizationoptimization
–– Evolution of regular design fabricEvolution of regular design fabric
�� Long term:Long term:
–– Reliable systems with unreliable componentsReliable systems with unreliable components
–– Dynamic selfDynamic self--test, detect, reconfigure, & adapttest, detect, reconfigure, & adapt