Download - SRAM design challenges in nano-scale CMOS - BIOEE · SRAM design challenges in nano-scale CMOS Vivek De ... 3 Vivek De C2S2 Workshop ... data retention, SER

C2S2 Workshop1

SRAM design challengesSRAM design challengesin nanoin nano--scale CMOS scale CMOS

Vivek DeVivek DeCircuits Research LabCircuits Research Lab

Acknowledgment:M. Agostinelli, A. Farhang, F. Hamzaoglu, A. Keshavarzi,

D. Khalil, M. Khellah, N-S. Kim, G. Pandya, S. Rusu,D. Somasekhar, Y. Wang, C. Webb, Y. Ye, K. Zhang

C2S2 Workshop2 Vivek De

Outline

SRAM scaling trendsSRAM scaling trends

Emerging challengesEmerging challenges

Circuit & design techniquesCircuit & design techniques

Research opportunitiesResearch opportunities


SRAM cell area & Vmin scaling

•• 0.5X cell area scaling at constant Vmin becoming difficult0.5X cell area scaling at constant Vmin becoming difficult

•• Density vs. Vmin tradeDensity vs. Vmin trade--off impacts are significantoff impacts are significant


Array efficiency & cycle time scaling

•• Array efficiency degrades with traditional scalingArray efficiency degrades with traditional scaling

•• Array efficiency vs. cycle time tradeArray efficiency vs. cycle time trade--off impacts significantoff impacts significant

0%

20%

40%

60%

80%

100%

3080130180

Technology Generation (nm)A

rray

Effi

cien

cy

Example projections assuming traditional scaling

1

4

7

3080130180

Technology Generation (nm)

Freq

uenc

y (G

Hz)

Example projections assuming traditional scaling


Memory latency & LLC power density

1

10

100

1000

100 1000 10000Freq (MHz)

Mem

ory

Late

ncy

(Clo

cks)

Assume: 50ns Memory latency

1

10

100

0.25μ 0.18μ 0.13μ 0.1μ

Pow

er D

ensi

ty (W

atts

/cm

2 )

Logic

Memory

•• Memory latency demands bigger last level cache (LLC)Memory latency demands bigger last level cache (LLC)

•• Cache is more energyCache is more energy--efficient than logicefficient than logic


LLC integration trends

0%

25%

50%

75%

100%

1u 0.5u 0.25u 0.13u 65nm

Cac

he %

of T

otal

Are

a

486 Pentium®

Pentium® III

Pentium® 4

Pentium® M

•• LLC area approaching 50% in desktop & mobile processorsLLC area approaching 50% in desktop & mobile processors

•• LLC area approaching 80% in server processorsLLC area approaching 80% in server processors

Desktop & mobile processors Server processors


SRAM cell design trends

•• Improve CD control by unidirectional polyImprove CD control by unidirectional poly

•• Relax critical layer patterning requirementsRelax critical layer patterning requirements

•• Optimizing design rules is keyOptimizing design rules is key

•• Shorter bitline enables better cycle time and/or array efficienShorter bitline enables better cycle time and/or array efficiencycy

•• Full metal wordline with wider pitch achieves better RCFull metal wordline with wider pitch achieves better RC

Cell on 90nm(1um2)

Cell on 65nm(0.57um2)

0.46x1.24um

IEDM’02


Cell transistor scaling challenges

10

100

1000

10000

1980 1990 2000 2010Mea

n #

Dop

ant

Ato

ms

PMOS

NMOS

10

100

1000

10000

1980 1990 2000 2010Mea

n #

Dop

ant

Ato

ms

PMOS

NMOS

Random dopant fluctuation Oxide charge fluctuationLine edge roughness

•• Cell transistor ratioing Cell transistor ratioing –– width, length, Vtwidth, length, Vt

•• Width variations & defects due to diffusion notchesWidth variations & defects due to diffusion notches

•• Gate dielectric leakage Gate dielectric leakage –– cell failure & bitline leakage impactscell failure & bitline leakage impacts

•• Narrow width device performance & leakageNarrow width device performance & leakage

•• Fin dimension variations in trigate/FinFETFin dimension variations in trigate/FinFET

•• Area impact of widthArea impact of width--based transistor ratioing in trigate/FinFETbased transistor ratioing in trigate/FinFET


Array Vmin scaling challengesShrinking Voltage RangeShrinking Voltage Range

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 voltsvolts

Cache fail rateCache fail rate

PerformancePerformance

SERSER

Total powerTotal power

ReliabilityReliability

VVMAXMAXVVMINMINShrinking Voltage RangeShrinking Voltage Range

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 voltsvolts

Cache fail rateCache fail rate

PerformancePerformance

SERSER

Total powerTotal power

ReliabilityReliability

VVMAXMAXVVMINMIN

50%

84.1%

97.7%

99.9%

15.9%

2.3%

0.14%

Vmin (V)

Increasing array size

1X 2X 3X

•• Active Vmin limits Active Vmin limits –– read failure, write failure, access failureread failure, write failure, access failure

•• Standby Vmin limits Standby Vmin limits –– data retention, SERdata retention, SER

•• Redundancy & cell size Redundancy & cell size –– density vs. Vmin knobdensity vs. Vmin knob

•• Vmin degrades over time due to transistor agingVmin degrades over time due to transistor aging

•• PowerPower--limited, multilimited, multi--core processors with single Vcccore processors with single Vcc


Erratic bit failures

Increased sampling

•• Need more complex error detection & correction schemesNeed more complex error detection & correction schemes

•• Dynamic Pellston engine for cache line disablingDynamic Pellston engine for cache line disabling

•• Additional density & performance impactsAdditional density & performance impacts


Gate dielectric leakage impacts

Stress aggravates impact

Shows 1/f behavior

Rg


Performance impact of bitline leakage

Courtesy: K. Agawa et al., JSSC, May 2001

•• Reduced effective cell currentReduced effective cell current•• Negative bitline swing developmentNegative bitline swing development

Need to use bitline leakage compensation & reduction techniquesNeed to use bitline leakage compensation & reduction techniques


Dual-Vcc + dynamic sleep

Embedded level Embedded level shifters for wordline & shifters for wordline & write drivers minimize write drivers minimize area & power overheadarea & power overhead

Push active Vmin limit to VmaxPush active Vmin limit to Vmax

Reduce idle power:Reduce idle power:NMOS sleep with NMOS sleep with passive clamppassive clamp

Reduce idle power:Reduce idle power:PMOS sleep with PMOS sleep with passive clamppassive clamp


Sleep transistor with active clamp


PVT variation & aging toleranceProcess (P) variation toleranceProcess (P) variation tolerance

Temperature (T) variation toleranceTemperature (T) variation tolerance

Voltage (V) variation toleranceVoltage (V) variation tolerance

Aging toleranceAging tolerance


Multi-Vcc cell & array design

•• MultiMulti--Vcc generation, control, distribution & timing overheadVcc generation, control, distribution & timing overhead

•• Differential noises among multiple VccDifferential noises among multiple Vcc’’s impact cell failures impact cell failure

•• Partial write & pseudoPartial write & pseudo--read supportread support

Optimum voltage choicesOptimum voltage choices

Vmax: Max Vcc, Va: Min Vcc

Improved static noise Improved static noise margin (SNM) for readmargin (SNM) for read

Improved write noise Improved write noise margin (WNM)margin (WNM)


Adaptive array biasing

•• Bias generation & selection overheadsBias generation & selection overheads

•• Body effect scalingBody effect scaling

•• Extensions to trigate/FinFETExtensions to trigate/FinFET

NMOS FBB + PMOS RBB:NMOS FBB + PMOS RBB: Access & write failuresAccess & write failuresNMOS RBB + PMOS FBB:NMOS RBB + PMOS FBB: Read failuresRead failures

Courtesy: S. Mukhopadhyay et al., 2006 Symp. VLSI Circuits


Cell stability: static vs. dynamicStatic read failure Static read failure (conservative)(conservative)

Dynamic read failure Dynamic read failure (realistic)(realistic)

Wordline

Static write failure Static write failure (optimistic)(optimistic)

Wordline

Dynamic write failure Dynamic write failure (realistic)(realistic)

Need to comprehend realistic dynamic stability in statistical Need to comprehend realistic dynamic stability in statistical failure rate analysis & array Vmin measurementsfailure rate analysis & array Vmin measurements


Exploit dynamic nature of stability

1.0E-06

1.0E-04

1.0E-02

1.0E+000 500 1000 1500 2000

WL Pulse (ps)

Failu

re R

ate

(nor

mal

ized

)

Read

Write

Differential

SAE

PCH

SABL BL#

RYSEL0RYSEL7

VCCVCC

Cell

Cell

WL127

WL0

WL

BL

SAE

BL BL#

READ

PCH

VL VR

Wordline (WL) pulsing technique

Optimum WL pulse width to balance Optimum WL pulse width to balance read, write & access failuresread, write & access failures

•• Reduce bitline & senseReduce bitline & sense--amp cap loadingamp cap loading

•• Optimize cell & array design for best dynamic failure rateOptimize cell & array design for best dynamic failure rate

•• Hierarchical bitline for improved dynamic failure rateHierarchical bitline for improved dynamic failure rate


Read-assist circuits

•• PerPer--column sensecolumn sense--amp area overheadamp area overhead

•• Power overhead of full bitline discharge & prechargePower overhead of full bitline discharge & precharge

Courtesy: H. Pilo et al., 2006 Symp. VLSI Circuits


Performance & power improvement•• Optimize senseOptimize sense--amp (SA) for input offset, loading, speed & areaamp (SA) for input offset, loading, speed & area

•• AC offset improvement AC offset improvement –– bitline segmentation & SA strobe controlbitline segmentation & SA strobe control

•• Bitline decoupled senseBitline decoupled sense--amp amp –– reduce timing complexityreduce timing complexity

•• SA offset compensation SA offset compensation –– cycle time vs. latencycycle time vs. latency

•• Asynchronous array design Asynchronous array design –– latency vs. complexitylatency vs. complexity

•• Dynamic IntelDynamic Intel® smart cache sizing smart cache sizing – Predict cache usage requirement– Dynamically adapt effective cache size– Re-power on demand to deliver full performance


Research opportunities

•• Dynamic multiDynamic multi--Vcc & other circuit techniquesVcc & other circuit techniques

•• Vmin tracking for PVT variations & agingVmin tracking for PVT variations & aging

•• Resilient techniques for access failuresResilient techniques for access failures

•• Adaptive cache size, cycle time & latencyAdaptive cache size, cycle time & latency

•• Application of cache compression techniquesApplication of cache compression techniques

•• Cache hierarchy, size & performance needsCache hierarchy, size & performance needs