Analyzing Circuit-aware Microarchitectural Reliability

16
Analyzing Circuit- aware Microarchitectural Reliability Taniya Siddiqua , Paul Lee [email protected], [email protected] University of Virginia, Charlottesville

description

Analyzing Circuit-aware Microarchitectural Reliability. Taniya Siddiqua , Paul Lee [email protected], [email protected] University of Virginia, Charlottesville. Motivation. Transistor Size. Time. Transient Faults. Hard Errors (EM, TC, SM, TDDB, NBTI). 5%. Problem Description. - PowerPoint PPT Presentation

Transcript of Analyzing Circuit-aware Microarchitectural Reliability

Page 1: Analyzing Circuit-aware Microarchitectural Reliability

Analyzing Circuit-aware Microarchitectural Reliability

Taniya Siddiqua , Paul Lee

[email protected], [email protected] of Virginia, Charlottesville

Page 2: Analyzing Circuit-aware Microarchitectural Reliability

Motivation

Hard Errors(EM, TC, SM, TDDB,

NBTI)

Transient Faults

Tran

sist

or S

ize

Time

5%

Page 3: Analyzing Circuit-aware Microarchitectural Reliability

Problem DescriptionArchitects focus on this problem at architecture-level granularityPoint of focus are architectural structures for e.g. caches, ALU etc.Reliability predictions are circuit-agnosticThere is a potential gap between architecture and circuit level reliability estimation

10%

Page 4: Analyzing Circuit-aware Microarchitectural Reliability

Problem DescriptionWe :

Show that circuit-level granularity affects architecture-level granularity reliability simulationsLook into 2 hard-errors viz. NBTI (or Negative Bias Temperature Instability) and TDDB (or Time Dependent Dielectric Breakdown) at architecture and circuit level on ALUDetermine the effect of scaling of NBTI and TDDB on ALU up to 22nm technologyPropose a design of NBTI-aware ALU, which utilizes architecture as well as circuit-level optimizations

15%

Page 5: Analyzing Circuit-aware Microarchitectural Reliability

NBTI – A quick guideKey reliability issue related to P-Channel MOSConcerned with MOS devices stressed with negative gate voltagesManifests as the threshold voltage increase and drain current decreaseConsequently the circuit slows down – timing constraintGood News! -- Recovery starts as soon as stress is removed

25%

Page 6: Analyzing Circuit-aware Microarchitectural Reliability

Architecture-level Reliability SimulationWe simulate:

2-wide issue core having 2 INT ALUsSimpleScalar 3.0 for modeling processor behaviorWattch and HotSpot for simulating power and temperature behavior respectively Estimate lifetime of 1st INT ALU Lifetimes of ALUs are projected based on MTTF for NBTI

30.25 0.252

1 001

2( )4V ( ) . ( ). .. .ox

ts ox gs t stress

ox

ta GS

ox

V VEkT t Eqt

K C V V e T te

1 0 2 02.(1 )

(1 ) ( )

a

a

E

kTx rec

t tsE

kTox stress rec

t e T tV V

t e t t

1 anbtiE

kTNBTI

gs

MTTF eV

35%

Page 7: Analyzing Circuit-aware Microarchitectural Reliability

Circuit-level Reliability SimulationWe :

Use Kogge-Stone adder circuit for ALUUse average temperature of 1st ALU from architectural-level reliability simulation and feed to Cadence frameworkCalculate stress and recovery time based on utilization pattern obtained from architectural-level reliability simulationCalculate lifetime based on circuit-delay to be 25 % of original delay

45%

Page 8: Analyzing Circuit-aware Microarchitectural Reliability

Comparison of Approaches

NBTI

TDDB

Architecture-level Simulation

Lifetime: 5.7 Yrs

?

Circuit-level Simulation

Lifetime: 7 Yrs

?

50%

Page 9: Analyzing Circuit-aware Microarchitectural Reliability

Scaling EffectWe :

Show scaling effect for 65nm, 45nm, 32nm, 22nmShow output delay for NBTI for each technology scale after 7 yrs65 nm (25%), 45 nm(27%), 32 nm (31%), 22 nm (46%) Require design of NBTI-aware ALU

55%

Page 10: Analyzing Circuit-aware Microarchitectural Reliability

NBTI-aware ALU DesignWe :

Determine that SPEC2000 INT benchmarks have 50 % operands of 16-bit sizePartition 64-bit ALU into four 8-bit and two 16-bit independent blocks to support 8,16,32 and 64bit operationAim is to use utilize idle time and narrow-width operands to increase recovery time of PMOS devices Use Power gating technique Use round-robin mechanism to let all the blocks of ALU experience equal recovery timeAfter 7 yrs the delay is only 10% - Achieves 60% improvement over non-NBTI aware ALUTradeoff!!

60%

Page 11: Analyzing Circuit-aware Microarchitectural Reliability

TDDB – A quick guide

70%

Gate dielectric wears down over time due to electric field and failure occurs when there is a short through the gate oxideUltra-thin gate oxide breakdown is highly dependent on temperature, but also dependent on Vgs

Page 12: Analyzing Circuit-aware Microarchitectural Reliability

55%45%

The SplitLess Than 0.55VGreater Than 0.55V

Circuit-level Reliability SimulationWe :

Use Pin to get a set of inputs used when running gzip and use those inputs to find an input pattern based on the samples taken from PinUse Cadence Spectre simulatorUse Kogge-Stone adder circuit for ALUUse average temperature of 1st ALU from architectural-level reliability simulation and feed to Cadence frameworkExtract Vgs from every device in Kogge-Stone adder

80%

0 3 6 9 12 15 18 21 24 27 300

50000010000001500000200000025000003000000

Input Distribution

inputs1 inputs2

bit number

# of

tim

es h

igh

0.000169 0.114496 1.099531 1.0998420

20

40

60

80

100

120

65nm Vgs Distribution

Vgs

Num

ber o

f Dev

ices

Page 13: Analyzing Circuit-aware Microarchitectural Reliability

Comparison of Approaches

NBTI

TDDB

Architecture-level Simulation

Lifetime: 5.7 Yrs

Lifetime: 5.09 Yrs

Circuit-level Simulation

Lifetime: 7 Yrs

Lifetime: 5.09 Yrs

85%

Page 14: Analyzing Circuit-aware Microarchitectural Reliability

Scaling EffectWe :

Measured Vgs, but temperature needs to be investigated.

95%

65 nm 45 nm 32 nm 22 nm

Vgs<Vdd/2Min Vgs 0V 0V 0V 0V

Max Vgs 0.255177V

0.248489V

0.24522V 0.255226V

Mean Vgs 0.032351V

0.033159V

0.034129V

0.035485V

StdDev Vgs

0.077101V

0.077045V

0.077484V

0.076488V

Vgs>Vdd/2Min Vgs 1.099435

V0.999175V

0.89365V 0.789839V

Max Vgs 1.1V 1V 0.9V 0.8V

Mean Vgs 1.099834V

0.99764V 0.899567V

0.797425V

StdDev Vgs

0.000117V

0.000181V

0.000335V

0.001789V

0.000423 0.24643 0.9995760

20

40

60

80

100

120

45nm Vgs Distribution

Vgs

Num

ber o

f Dev

ices

0.000394 0.12667 0.2447580.8992310.8999650

50

100

150

32nm Vgs Distribution

Vgs

Num

ber o

f Dev

ices

0.001243 0.002811 0.004068 0.225921 0.794278 0.796235 0.7980430

20

40

60

80

100

120

22nm Vgs Distribution

Vgs

Num

ber o

f Dev

ices

Page 15: Analyzing Circuit-aware Microarchitectural Reliability

Conclusion

For some problems like TDDB, the Architecture / Circuit level simulation gap is almost nonexistentFor other problems like NBTI, the Architecture / Circuit level simulation gap is significant and combining both approaches can yield better designs

100%

Page 16: Analyzing Circuit-aware Microarchitectural Reliability

Thank you

Questions ?