Circuit Reliability: Mechanisms, Monitors, and Effects in...
Transcript of Circuit Reliability: Mechanisms, Monitors, and Effects in...
![Page 1: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/1.jpg)
Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold
Processors
Chris H. Kim
University of Minnesota, Minneapolis, MN
[email protected]/~chriskim/
![Page 2: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/2.jpg)
2
Scaling Challenges
Power wall Reliability wallVariability wall
2000 2010 2020
Year
Pow
er (W
)
![Page 3: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/3.jpg)
3
Overcoming the Power Wall
• Proven solutions: Multi-core chips, dynamic voltage frequency scaling, clock gating, power gating, …
Y=AxBY=AxBY=AxB
Freq=1Vdd=1Throughput=1Area=1Power=1Pwr Den=1
Freq=0.5Vdd=0.5Throughput=1Area=2Power=0.25Pwr Den=0.125
87%↓
![Page 4: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/4.jpg)
4
Overcoming the Variability WallVID
6+
-DAC+
-
PLimit
VDie
IIR
PCalc Calc
VConnectorA/D
A/D
Power SupplyMicro-Controller
RPackage
Package/Die
VConnectorVDie
RPackage
Intel Foxton Technology
• Proven solutions: Variation aware design, memory assist/repair, lithography techniques, adaptive systems
![Page 5: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/5.jpg)
5
Overcoming the Reliability Wall
• Possible solutions: Guardbanding, sensing and compensation, wear-leveling, failure resistant systems, …
![Page 6: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/6.jpg)
6
Outline• Device Reliability Issues
• Reliability Monitors and Measurements
• Reliability Effects in NTV Processors
• Summary
![Page 7: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/7.jpg)
Aging in CMOS Transistors
7
![Page 8: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/8.jpg)
8
• Transistors are exposed to different stress conditions during normal digital circuit operation
HCI, BTI, and TDDB in Digital Logic
D Inverted Channel
ID Inverted Channel
![Page 9: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/9.jpg)
Practical Solutions for Preventing Aging Related Failures
• BTI and HCI– Gradual decline in performance– Guard banding (static or dynamic), adjust Vmax– CAD, firmware & architecture level support essential
• TDDB– Single incident may lead to outright system failure– Can happen anywhere inside a chip– Improve fabrication procedure, adjust Vmax
• Bottom line: Precise measurement and understanding of circuit degradation a key aspect of robust design
9
![Page 10: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/10.jpg)
Transistor Lifetime Estimation
• Extrapolate stress results with respect to:– Op. conditions based on acceleration models– Larger chip areas (e.g., Poisson scaling for TDDB)– Lower percentiles based on chosen distribution
10
real
sup
ply
volta
ge
![Page 11: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/11.jpg)
Benefits of In-Situ Reliability Monitors over Device Probing
11
• Information from actual circuits (test circuit must be representative)
• High (timing) precision + short measurement interrupt
• No expensive equipment• Short test time and reduced test area• Measurements at use condition
allows realistic lifetime projection• Complements traditional probing
methods
![Page 12: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/12.jpg)
Usage Scenarios and Design Issues of In-situ Reliability Monitors
• Usage scenario 1: Process characterization and yield improvement• Early technology characterization is often performed
before many metallization layers are being fabricated• Library cells may not be available (flip-flops, scan)• Device probing would still be a competitive solution
for extracting analog parameters such as I–V or C–V• Usage scenario 2: In-field monitoring and data
collection• Workload unknown• Simple circuits are practical but they have limited
capabilities• Firmware and architecture support needed
12
![Page 13: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/13.jpg)
Usage Scenarios and Design Issues of In-situ Reliability Monitors
• Usage scenario 3: Sensor for real time aging compensation• Effectiveness versus overhead• Measurements are from a proxy circuit• Practical issues: type of sensor, temporal granularity,
spatial granularity, communication with sensors, interface and protocol
• Personally not a big fan
13
![Page 14: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/14.jpg)
14
Outline• Device Reliability Issues
• Monitors and Measurements
• Effects in NTV Processors
• Summary
![Page 15: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/15.jpg)
Circuit Based Reliability Monitors (or Silicon Odometers)
15
Die Photo
Odometer Projects
Original Silicon
Odometer
Focused Reliability
Issues
Year 2007 2008 2009 2010 2011
130nm 65nm 65nm 65nm 32nmSOI
All-In-One Odometer
Process
Statistical, Duty-Cycle,
and RTN Odometer
Interconnect Odometer
PBTI and SRAM
Odometer
32nmSOI
SRAM and RTN
Odometer
2012
NBTI Induced
Frequency Degradation
Separately Monitoring
NBTI, HCI and TDDB
Statistical Behavior of
NBTI;RTN on Logic
Circuit
Impact of Interconnect on BTI and HCI Aging
Monitoring PBTI in HKMG
Process;BTI Impact on SRAM Read/
Write
SRAM Timing Issues Due to
BTI;RTN Impact
on Ring Oscillator
![Page 16: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/16.jpg)
Beat Frequency Silicon Odometer
• Beat frequency of two free running ROSCs measured by DFF and edge detector
• Benefits of beat frequency detection system– Achieve ps resolution with μs measurement interrupt– Insensitive to common mode noise such as
temperature drifts– Fully digital, scan based interface, easy to implement
16
![Page 17: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/17.jpg)
Beat Frequency Silicon Odometer
ref
stressbeat ref stress
stress
ref
• Sample stressed ROSC output with reference ROSC– 1% frequency difference before stress N=100– 2% frequency difference after stress N=50– Δf or ΔT sensing resolution is >0.01%
17
![Page 18: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/18.jpg)
18
ROSC Based Aging Sensor Comparison
Block Diagram
FunctionCount Stress ROSC
periods during externally controlled meas. time
Count Stress ROSC periods during N1 periods
of Ref. ROSC
Count Ref. ROSC periods during one period of
PC_OUT
Features Simple; compact Simple; immune to common mode variations
High resolution w/ short meas. time; immune to
common mode variations
Issues
Voltage and temp. varations; meas. time vs.
resolution tradeoff; requires absolute timing reference
(e.g. oscilloscope)
Meas. time vs. resolution tradeoff
Requires extra circuits (e.g., Phase Comp., edge
detector, etc...)
Meas. time for 0.01% max res. *
30 μs 30 μs
Meas. error wrt. common
mode variations **
+10.18% / -8.57% +0.06% / -0.07%+0.26% / -0.38%
*ROSC period = 3 ns ** simulated with +/- 4% ∆VCC
0.3 μs
System Single ROSC 2 ROSC, simple 2 ROSC, beat freq.
![Page 19: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/19.jpg)
Separately Monitoring NBTI and PBTI
• PBTI becoming an important concern in high-k metal-gate
• Conventional Ring Oscillator (ROSC) can only provide overall frequency degradation information due to combined NBTI and PBTI effects
• New RO structure separates NBTI and PBTI effects
19
PBTI stress
NBTI stress
N/PBTI stressJ. Kim, et al., IBM, IRPS 2011
![Page 20: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/20.jpg)
Separately Monitoring BTI and HCI
20
BTI_ROSC(BTI Stress Only)
DRIVE_ROSC(BTI & HCI Stress)
BTI_REF_ROSC
DRIVE_REF_ROSC
Beat Frequency Detection Circuit 1
SCANOUT
STRESSED
UNSTRESSED
SCANOUT Beat Frequency
Detection Circuit 2
BTI_ROSC
DRIVE_ROSC
![Page 21: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/21.jpg)
Separately Monitoring BTI and HCI
• Backdriving action equalizes BTI in both BTI_ROSC and DRIVE_ROSC
• Negligible HCI in BTI_ROSC: only 3-5% of the switching current in the DRIVE_ROSC
• Fresh power gates are used for frequency measurements21
![Page 22: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/22.jpg)
Temp. and Voltage Dependencies
• HCI slightly reduced with temperature– Due to reduced drain current
• Both mechanisms degrade with stress voltage– Point when HCI begins to dominate pushed out in
time by >1 order of magnitude at 1.8V vs. 2.4V
1.E-02
1.E-01
1.E+00
1.E+0 1.E+1 1.E+2 1.E+3 1.E+4 1.E+5
Freq
uenc
y Shi
ft (%
)
Stress Time (s)
250MHz stress freq.2.0V stress
30OC: HCIDEG , BTIDEG
120OC: HCIDEG , BTIDEG
1.E-02
1.E-01
1.E+00
1.E+01
1.E+00 1.E+02 1.E+04 1.E+06
Freq
uenc
y Sh
ift (%
)
Stress Time (s)
26OC470MHz stress freq.
2.4V stress
1.8V stress
22
![Page 23: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/23.jpg)
Aging Issues in Interconnects
• Interconnect affects the voltage and current shapes– Increased transition time (decreased slew rate)– Increased current pulse; decreased current peak value
• BTI and HCI have different sensitivities to bias conditions
23
![Page 24: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/24.jpg)
Interconnect Aging Monitor
24
• Serpentine wires for a dense chip implementation• Ground shielding on both sides for reducing noise
X. Wang, et al., IRPS 2012, TVLSI 2014
![Page 25: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/25.jpg)
• BTI aging decreases with interconnect length• HCI degradation peaks at L=500µm
BTI and HCI Aging: With Interconnect
25
![Page 26: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/26.jpg)
BTI Aging vs. Interconnect Length
• BTI induced frequency degradation decreases with longer interconnect
• Longer transition time shorter PMOS stress duration Less BTI aging
26
![Page 27: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/27.jpg)
HCI Aging vs. Interconnect Length
• HCI aging exhibits a non-monotonic behavior with respect to interconnect length– Current pulse width increases – Current peak decreases
27
![Page 28: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/28.jpg)
Statistical Behavior of Aging
• Finite number and random spatial distribution of discrete charges NBTI & HCI variation
• Inversely proportional to AGATE worse with scaling• Small number of aging measurements not sufficient to
characterize aging28
Spread in ∆Vt increases with scaling CDF of ∆Vt at different stress timesS. Pae, et al., TDMR‘08 S. Rauch, TDMR, Dec. ‘07
![Page 29: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/29.jpg)
Statistical Reliability Monitor• Need stressed &
reference ROSC frequencies to be close
• Difficult, costly to tune each stressed ROSC
• Use multiple ref. ROSCs with different frequencies
• Cover the frequency distribution of the stressed array
SCANOUTRESULTS
FSM+
ScanChain
Column Peripherals
Ref ROSC 3
Ref ROSC 1
Ref ROSC 2
3 Silicon Odometer Beat
Frequency Detection Systems
J. Keane, et al., IEDM 2010, JSSC 201129
![Page 30: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/30.jpg)
65nm Test Chip Data
• Fresh and post-stress ROSC frequency PDFs• No significant correlation of the frequency shift
with fresh frequency
Perc
enta
ge o
f RO
SCs
245 254 263 272DUT Frequency (MHz)
Fresh3.1hr Stress1.8V
2.0V
2.2V
30
0.011
-0.126
-0.112
20OC, DC; 2.2V, 2.0V, & 1.8V120 ROSCs each @ 11200s
Correl. Coef.
![Page 31: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/31.jpg)
SRAM Memory Design Challenges at Low Supply Voltages
• Ratio-ed operation leads to poor noise margin at low voltages for 6T SRAM cells
• Conflicting requirements: a stronger access transistor improves write margin but worsens read margin
BLBB
L
31
![Page 32: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/32.jpg)
• With BTI: Read stability degrades
• Cell recovers on a fail
32
Impact of BTI on SRAM Read
Bit
Failu
re R
ate
(BFR
, %)
![Page 33: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/33.jpg)
33
Impact of BTI on SRAM Write
• With BTI: Write stability improves or remains unchanged
• Cell recovers on a pass
Bit
Failu
re R
ate
(BFR
, %)
![Page 34: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/34.jpg)
34
Representative SRAM Reliability Macro
256x128b SRAM array
Deco
der
FSM
VMEAS, VSTRESS
128b scan reg.
Col. peripheralVCO
Peripheral supply
Off-chip DAQ
CLK
Array supply domain
Peripheral supply domain
WL
BL
Supply switches
• Represents a product SRAM sub-array• BIST function done by on-chip FSM with supply switches
P. Jain, et al., IEDM, 2012
0.1
1
100
Read
Fai
lure
Rat
e (%
)TSTRESS (s)
1 10 100 1000
10
TMEAS increased from 3µs to 2ms
32nm, 0.52V, 85°C
10x
![Page 35: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/35.jpg)
35
• Implemented on IBM’s z196 Enterprise systems for long term degradation under real-use conditions.
• Over 500 days worth of ring oscillator degradation data from customer systems
• Other companies have aging monitors too, but they tend not to publish their work
Aging Monitor in IBM MicroprocessorsPongfei Flu, Keith Jenkins, IBM, IRPS 2013
![Page 36: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/36.jpg)
36
• Time-zero problem: Some time will elapse between applying voltage (burn-in, test, operation) and making the first measurement time-zero frequency is completely unknown incorrect time slope of 0.42
• Use fitting parameters assuming Δf = A(t-to)n-Atn time slope of 0.172
Aging Monitor in IBM MicroprocessorsPongfei Flu, Keith Jenkins, IBM, IRPS 2013
![Page 37: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/37.jpg)
37
Design Considerations Examples of Practical Issues
BTI, HCI, TDDB, RTN, transient errors, memory bit failures, etc.Type of Sensor
Temporal Granularity Sensing period, threshold setting, dynamic range, etc.
Spatial Granularity Per CPU/GPU/memory, per functional unit, per sub-block, etc.
Stress and Measurement
ConditionAC vs. DC, accelerated vs. usage
condition, fast measurement
CommunicationBetween data gathering sensor,
across sensors, between sensors and processor
Interface and ProtocolInterrupt based, polling, event
alarms, performance counter based, etc.
Aging Sensor Implementation in IBM z196 Server [3]
Ring Oscillator based BTI monitor for long-term frequency degradation measurement
Sampling period: once a week
Total: 5 sensors per chip; One sensor per core (x4 cores) plus one sensor in L2 cache
AC stress, usage condition, 0.5ms measurement time
Sensors are integrated with IBM z196 pervasive infrastructure with firmware
support
Interrupt based in-field frequency degradation measurement
Testing and Calibration
Similar to any other on-chip monitor circuit
Time 0 frequency shift unknown since first sample is taken after some stress
Aging Monitor in IBM MicroprocessorsPongfei Flu, Keith Jenkins, IBM, IRPS 2013
![Page 38: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/38.jpg)
Outline• Device Reliability Issues
• Monitors and Measurements
• Effects in NTV Processors
• Summary
38
![Page 39: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/39.jpg)
DVFS Systems in ISSCC 2014
22nm Intel Haswell processor N. Kurd, et al., ISSCC, 2014
• Latest trends: On-chip distributed VRM (fast transients, supply noise suppression), per-core DVS, NTV/Turbo
22nm IBM POWER8 processorZ. Toprak-Deniz, et al., ISSCC, 2014
<1% area overhead
39
![Page 40: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/40.jpg)
40
Frequency Fluctuation in DVFS(BTI Example)
Time
V DD
Freq
uenc
y
Time
V DD
Freq
uenc
y
Time
V DD
Freq
uenc
y
• Constant VDD: Frequency degrades with stress • High VDD to low VDD: Freq. dips due to lower VDD followed
by recovery• Low VDD to high VDD: Freq. jumps and then degrades
Freq. dip
Freq. peak
![Page 41: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/41.jpg)
41
Time
V DD
Freq
uenc
y
Time
V DD
Freq
uenc
y
Time
V DD
Freq
uenc
y
fCLK
GuardbandfCLK
Guardband
fCLK
Guardband
• Constant VDD: Frequency degrades with stress • High VDD to low VDD: Freq. dips due to lower VDD followed
by recovery• Low VDD to high VDD: Freq. jumps and then degrades
Frequency Fluctuation in DVFS(BTI Example)
![Page 42: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/42.jpg)
42
Modeling Approach using Superposition• Rationale for empirical
superposition method– Complicated VDD trace
can be broken down into multiple pulses
– Suitable for long-and short-term, DC and AC
– Computation is more efficient, short runtime
Δf(%
)
ΔV T
ΔV T
3Δ
V T2
ΔV T
1V D
D
V1
V2V3
V1
ΔVT1
V2ΔVT2
V3ΔVT3
ΔVT=ΔVT1+ΔVT2+ΔVT3
Time (a.u.)
Not captured in previous work
Time (a.u.)
C. Zhou, et al., IRPS 2014
![Page 43: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/43.jpg)
43
BTI Recovery Model using Superposition
• Stress model: tn (power law)• Recovery model derived from superposition property:
ΔVT,recovery(t) = tn-(t-t0)n
![Page 44: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/44.jpg)
44
Translating VT Shift to Delay Shift
• ROSC mimics logic path• Translate ΔVT to pull-up,
pull-down delayPull-down Delay Pull-up Delay
0.6 0.7 0.8 0.9 1.00
0.005
0.010
0.015
0.020
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Dealy
(normalized)Δ
V T (n
orm
aliz
ed)
VDD (normalized)
0.6 0.7 0.8 0.9 10
0.005
0.010
0.015
0.020
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1HSPICE, TT, 60°C
Dealy
(normalized)Δ
V T (n
orm
aliz
ed)
VDD (normalized)
HSPICE, TT, 60°C
![Page 45: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/45.jpg)
45
Android Development Board for Collecting DVFS Traces
• VDD and operating frequency collected in real time• Navigating websites, running benchmark applications
Linux kernel v 3.4.5Operating system Android v 4.2.2
Processor ARM Cortex A15
System Samsung Exynos 5410 SoC
Frequency 0.8 – 1.8 GHzVoltage 0.9 – 1.25 V
DVFS meas. National Instr. DAQ
Sampling frequency
1000 samples per second
Process 28nm
![Page 46: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/46.jpg)
46
Sample Waveform and Estimated Frequency Shift
0 1 2 3 4 5 6Time (s)
-0.8-0.6-0.4-0.2
0
Freq
uenc
y Sh
ift (%
)
-1.0-0.90%
high VDD stress
low VDD recovery
Amazon.com
0.60.8
11.2
V DD
(nor
mal
ized
)
• High VDD duration: Freq. degrades with time • Low VDD duration: Freq. shift dips and then recovers
![Page 47: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/47.jpg)
47
Applying Model to Other DVFS Traces
• Worst case frequency dip– 3D-raytrace: Δf=1.0% at t=6s when VDD drops by 29%
after staying in high VDD mode for 5.8s
Sina.com
Google.comNYTimes.comAmazon.com
![Page 48: Circuit Reliability: Mechanisms, Monitors, and Effects in ...web.cse.ohio-state.edu/~teodorescu.1/workshops/... · Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold](https://reader034.fdocuments.us/reader034/viewer/2022042216/5ebeb82c56d95977277537e7/html5/thumbnails/48.jpg)
48
Summary• Power wall (2000) Variability wall (2010)
Reliability wall (2020) • Example: NTV + RDF + BTI
• Aging sensor deployed for the first time in a commercial processor (IBM z systems)
• Per-Core DVFS with sub-microsecond ramp time becoming a standard feature in new processors
• Turbo boost + NTV: Best of both worlds in terms of power and performance, but presents new reliability challenges