Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale...

68
©2012 Intel Corporation Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer Intel Corporation IEEE Fellow [email protected] 1

Transcript of Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale...

Page 1: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012 ©2012 Intel Corporation

Microprocessor Design in the Nanoscale Era

Stefan Rusu Senior Principal Engineer

Intel Corporation

IEEE Fellow

[email protected]

1

Page 2: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Agenda

•Microprocessor Design Trends

•Process Technology Directions

•Active Power Management

• Leakage Reduction Techniques

•Packaging and Thermal Modeling

•Future Directions and Summary

2

Page 3: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Microprocessor Evolution

4004 Processor Westmere-EX Processor

Year 1971 2011

Transistors 2300 2.6 B

Process 10 µm 32 nm

Die area 12 mm2 513 mm2

Die photos not at scale

3

Page 4: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Scaling Trends

Transistor dimensions scale to improve performance, reduce power and reduce cost per transistor

M. Bohr

0.01

0.1

1

10

1970 1980 1990 2000 2010 2020

Mic

ron

s

45nm 65nm

32nm 0.7x every

2 years

Feature Size

22nm

4

Page 5: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Client Processor Trend: Integrated Graphics

• Ivy Bridge 22nm client processor with monolithic integrated graphics

• Up to 4 dual-threaded cores and 8MB L3 cache

• Dual channel DDR3 memory controller at 1600MT/s

• Integrated PCIe interface (16 Gen3 + 4 Gen2 + 4 DMI lanes)

- First Client CPU to support PCIe Gen3

• Three independent displays

• 1.4B transistors in 160mm2 die

S. Damaraju, ISSCC 2012

5

Page 6: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Client Processor Trend: Integrated WiFi RF

6

• Sensitive RF circuits integrated with 32nm ATOM and PCH

• Integration of traditional III-V RF components

-21dBm Power amp, and 34dBm T/R switch, 3.5dB NF LNA

H. Lakdawala, ISSCC 2012

Chip

TLine to ANT (50Ω)

Package50Ω Diff.

PA

Balun Filter

TX SWRX SW /

Matching Network

GmTX RF

RX RF

Page 7: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Server Processor Trends: More Cores

Server core count increases every generation,

while keeping within flat power budget

0

4

8

12N

um

ber

of

core

s

Tulsa

Tigerton

Dunnington

Nehalem-EX

Xeon® EX Processors

65nm 45nm 32nm

Westmere-EX

7

Page 8: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Server Processor Trends: More Cache

8

1

4

8

16

24

30

0

4

8

12

16

20

24

28

32

180nm 130nm 90nm 65nm 45nm 32nm

On-Die L3 Cache

[MB]

Cache size increases with every process generation

Xeon® EX Processors

Page 9: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Server Processors Power Trends

0.001

0.01

0.1

1

10

100

1000

1990 1995 2000 2005 2010

Pow

er

[W]

Year

Leakage ►

Total

Power

Active

Power

9

Page 10: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Voltage Scaling Has Slowed Down

0.1

1

10

'91 '93 '95 '97 '99 '01 '03 '05 '07 '09 '11 '13

Su

pp

ly V

olta

ge

[V

]

~0.7X Scaling

~0.95X Scaling

10

Page 11: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Agenda

•Microprocessor Design Trends

•Process Technology Directions

•Active Power Management

•Leakage Reduction Techniques

•Packaging and Thermal Modeling

•Future Directions and Summary

11

Page 12: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

30 Years of MOSFET Scaling

Gate Length: 1.0 mm 35 nm

Gate Oxide Thickness: 35 nm 1.2 nm

Operating Voltage: 4.0 V 1.2 V

1 mm

Dennard 1974 Intel 2005

Classical scaling ended in the early 2000s

due to gate oxide leakage limits M. Bohr, ISSCC 2009

12

Page 13: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

90 nm Strained Silicon Transistors

High Stress Film

NMOS

SiGe SiGe

PMOS

SiN cap layer SiGe source-drain

Tensile channel strain Compressive channel strain

Strained silicon provided increased drive currents,

making up for lack of gate oxide scaling M. Bohr, ISSCC 2009

13

Page 14: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

45 nm High-k + Metal Gate Transistors

65 nm Transistor 45 nm HK+MG

High-k + metal gate transistors

break through gate oxide scaling barrier

SiO2 dielectric Hafnium-based dielectric

Polysilicon gate electrode Metal gate electrode

M. Bohr, ISSCC 2009

14

Page 15: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

HK/MG Gate Leakage Reduction

HK+MG significantly reduces gate leakage

25x 1000x

K. Mistry, IEDM 2007

15

Page 16: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

6T SRAM Bit Cell Leakage Reduction

SRAM bit cell leakage reduced ~10x M. Bohr, ISSCC 2009

0

2

4

6

8

10

12

65nm 45nm

Norm

aliz

ed C

ell

Leaka

ge

I GATE

I OFF

I JUNCT

1.0V, 25C

10x

16

Page 17: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Traditional 2-D planar transistors form a conducting channel in the

silicon region under the gate electrode when in the “on” state

Silicon

Substrate

Oxide

Gate

Source

Drain

High-k

Dielectric

Traditional Planar Transistor

M. Bohr, 2011

17

Page 18: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Silicon

Substrate

Oxide

Source

Drain

Gate

3-D Tri-Gate transistors form conducting channels on three sides

of a vertical fin structure, providing “fully depleted” operation

22 nm Tri-Gate Transistor

18

Page 19: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

32 nm Planar Transistors 22 nm Tri-Gate Transistors

Gates Fins

M. Bohr, 2011

Transistor Scaling Trends

19

Page 20: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

2.2

0.5 0.6 0.7 0.8 0.9 1.0 1.1

32nm

Planar

Operating Voltage (V)

Transistor

Gate Delay (normalized)

45nm

Planar

22%

Faster

22%

Faster

Transistor Gate Delay

20

32nm planar transistors 22% faster than 45nm planar

D. Perlmutter, ISSCC 2012

Page 21: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

2.2

0.5 0.6 0.7 0.8 0.9 1.0 1.1

32nm

Planar

14%

Faster

Operating Voltage (V)

Transistor

Gate Delay (normalized)

22nm

Planar

45nm

Planar

22%

Faster

14%

Faster

22%

Faster

Transistor Gate Delay

21

22nm planar transistors would have been only 14% faster

Page 22: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

2.2

0.5 0.6 0.7 0.8 0.9 1.0 1.1

22nm

Tri-Gate

32nm

Planar

18%

Faster

37%

Faster

Operating Voltage (V)

Transistor

Gate Delay (normalized)

Transistor Gate Delay

22

22nm Tri-Gate transistors provide improved performance at high voltage and unprecedented 37% speedup at low voltage

Page 23: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Intel Transistor Leadership

2003 2005 2007 2009 2011

90 nm 65 nm 45 nm 32 nm 22 nm

SiGeSiGe

Invented

SiGe

Strained Silicon

2nd Gen.

SiGe

Strained Silicon

2nd Gen.

Gate-Last

High-k

Metal Gate

Invented

Gate-Last

High-k

Metal Gate

First to

Implement

Tri-Gate

23

Strained Silicon

High-k Metal Gate

Tri-Gate

Page 24: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Lithography Challenges

Initial Production

1000

100

10 ’89 ’91 ’93 ’95 ’97 ’99 ’01 ’03 ’05 ’07 ’09 ’11 ’13 ‘15

Feature size

13nm (EUVL)

Lithography Wavelength

193nm 248nm

Gap

nm

193 nm enhancements enable the 22 nm generation

24

Page 25: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Extreme Ultraviolet Lithography

•EUV lithography uses extremely short wavelength light

-Visible light – 400 to 700 nm

-DUV lithography – 193 and 248 nm

-EUV lithography – 13 nm

World’s First EUV Mask EUV Micro Exposure Tool

25

Page 26: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Layout Restrictions

26

65 nm Layout Style 32 nm Layout Style

Bi-directional features

Varied gate dimensions

Varied pitches

Uni-directional features

Uniform gate dimension

Gridded layout

M. Bohr, ISSCC 2009

Page 27: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

450mm in the Era of Complex Scaling: Must coordinate demand drivers,

technical requirements and resources

End-User

Demand Drivers

Integrated

IC Maker

Coordination

Equipment

& Materials

Development

University and Government

Support

Projected 2000 Wafer, circa 1975 (Gordon Moore, ISSCC ’03)

Page 28: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Within-Die Variations

Systematic

Die-to-Die Variations

Random Placement

of Dopant Atoms

Lens Aberrations Resist Thickness

Random

Process Variations

28

Page 29: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Voltage and Temperature Variations

• Voltage

-Chip activity change

-Current delivery—RLC

-Dynamic: ns to 10-100µs

-Within-die variation

• Temperature

-Activity & ambient change

-Dynamic: 100-1000µs

-Within-die variation

Temp(oC)

29

Page 30: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Impact on Design Methodology

Delay

Path Delay

Pro

babili

ty

Due to variations in:

Vdd, Vt, and Temp

Delay Target

# o

f P

ath

s

Delay Target

# o

f P

ath

s

Deterministic Probabilistic

Leakage Power

Fre

qu

en

cy Deterministic

Probabilistic

10X variation

~50% total power

Major paradigm shift from deterministic design to probabilistic / statistical design

30

Page 31: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

SRAM Cell Size Scaling

32 nm, 0.171 um2

45 nm, 0.346 um2

Memory density continues to double every 2 years

0.01

0.1

1

10

180 130 90 65 45 32 22

Cell

Are

a (

um

2)

Process technology [nm]

0.5x every 2 years

22 nm, 0.092 um2

31

Page 32: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Interconnect Trends

0

2

4

6

8

10

500 350 250 180 130 90 65 45 32 22

Nu

mb

er

of M

eta

l L

aye

rs

Technology Generation (nm)

Al Cu

32

Page 33: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

22nm Interconnects

• M1 to M8 cross-section

• M1-M6 use ultra-low-k ILD and self-aligned vias providing 13-18% capacitance reduction

• Cross-section of integrated MIM capacitor

• Enables capacitance density of >20fF/mm2

33

C. Auth, VLSI Symposium 2012

Page 34: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

On-chip Interconnect Trend

• Local interconnects scale with gate delay

• Global interconnects do not keep up with scaling

0.1

1

10

100

250 180 130 90 65 45 32

Relative delay

Feature size (nm)

Source: ITRS, 2001 Gate delay (FO4)

Local interconnect (M1,2)

Global interconnect

with repeaters

Global interconnect

without repeaters

34

Page 35: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Agenda

•Microprocessor Design Trends

•Process Technology Directions

•Active Power Management

•Leakage Reduction Techniques

•Packaging and Thermal Modeling

•Future Directions and Summary

35

Page 36: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Voltage and Frequency Scaling

1 - Fixed VDD, Frequency Scaling: Linear Power Reduction

2 - Fixed Frequency, VDD Scaling: Square Power Reduction

3 - Voltage and Frequency scaling: Cubic Power Reduction

Frequency

VDD VT VLOW

Data

Retention

Limit

VHI

sub-threshold

logic

VDD

+/-10%

Max Target

Frequency

Required

Frequency

Performance

Limit

1

2

3

J. Rosal,

ISSCC 2006

Reliability Limit

36

Page 37: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Memory and RF Vmin Reduction

37

• Write Assist circuit temporarily drops the array supply node to make it easier to write into the bit-cell

• Both Cache and Register Files use this technique to improve write Vmin in 22nm Ivy Bridge processor

WL (0 --> 1)

Write

Data ‘0’

Data#

0 --> 1

Data

1 --> 0

Write

Data# ‘1’

Shared

across

several cells

BL BL#

CVCC

22nm transistor and circuit improvements enable Vmin reduction of >100mV for Cache and 60mV for RF

S. Damaraju, ISSCC 2012

Page 38: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

NTV Pentium® Processor

NTV

EN

ERG

Y

EFFIC

EN

CY

HIGH

LOW

VOLTAGE ZERO MAX

~5x Demonstrated

IA-32 Core

Logic

Scan

RO

M

L1$-I L1$-D

Level Shifters + clk spine

1.1 mm

1.8

mm

S. Jain, ISSCC 2012

Normal operating range Subthreshold

Ultra-low Power

Energy Efficient

High Performance

280 mV 0.45 V 1.2 V

3 MHz 60 MHz 915 MHz

2 mW 10 mW 737 mW

1500 Mips/W 5830 Mips/W 1240 Mips/W

38

Page 39: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

D Q 0

1 S

Clk

En

D

REG

D Q

REG

Clk En

D

Clock Gating

• Save power by gating the clock when data activity is low

• Requires detailed logic validation

39

Page 40: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Core Power Management

Modulating the processor core voltage and frequency enables lower power states

Gerosa, A-SSCC 2008

40

Core voltage

Core clock

PLL

L1 caches

L2 caches

Wakeup time

Power

C0 HFM C0 LFM C1/C2 C4 C6

OFF OFF OFF

OFF OFF

partial flush off

active active

off flushed flushed

<1us <30us <100us

Page 41: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Multiple Voltage Domains

Multiple voltage domains minimize power consumption across the core and uncore areas

SMI SMI

QPI QPI QPI QPI

Fuse

Core Supply

Core Supply

Core Supply

Core Supply

Uncore Supply

Core Domain 0.85-1.1V variable

Un-Core Domain 0.9-1.1V

fixed

I/O Domain 1.1V fixed

Rusu, ISSCC 2009

41

Page 42: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Multiple Clock Domains

SMI SMI

QPI QPI QPI QPI

BCLK

IO PLLs

Un - - core PLL

Filter PLL

- -

IO DLLs

Core PLLs

Three primary clock domains: core, un-core, I/O

Total of 16 PLLs and 8 DLLs Rusu, ISSCC 2009

42

Page 43: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Agenda

•Microprocessor Design Trends

•Process Technology Directions

•Active Power Management

•Leakage Reduction Techniques

•Packaging and Thermal Modeling

•Future Directions and Summary

43

Page 44: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Subthreshold Leakage Trend

1.E-14

1.E-12

1.E-10

1.E-08

1.E-06

1.E-04

10 100 1000

Physical Gate Length (nm)

I Off (

A/u

m)

Intel 15nm

transistor

Intel 30nm

transistor

Intel 20nm transistor

Research data

in literature ( )

Production data

in literature ( )

44

Page 45: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Leakage Reduction Techniques

Stack Effect Body Bias Sleep Transistor

Vdd Vbp

Vbn -Ve

+Ve

Equal Loading Logic Block

2-3X reduction 2-1000X reduction

45

Page 46: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Leakage is a Strong Function of Voltage

46

Sub-threshold and gate leakage reduce with lower supply voltage

0102030405060708090

100

0 0.3 0.6 0.9 1.2 1.5

No

rma

lize

d L

ea

ka

ge

Voltage (V)

130nm process

Subthreshold

Leakage

Gate Leakage

Page 47: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Cache Sleep and Shut-off Modes

47

Active Mode Sleep Mode Shut-off Mode

0V

1.1V

Vo

lta

ge

0V

250mV

520mV Virtual

VSS

2x lower leakage

2x lower leakage

X X

Sub-array

X

Sub-array

Virtual VSS

Block

Select

Sub-array

Sleep Bias

Shut off

S. Rusu, US Pat. 7,657,767

Page 48: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Leakage Shut-off Infrared Images

48

Leakage reduction ► 3W (8MB) 5W (4MB)

16MB part 8MB part 4MB part

16MB in

sleep mode

8MB

sleep

8MB

shut-off

4MB

sleep

12MB

shut-off

Page 49: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Cache Dynamic Shut-off

49

Normal Operation

In the full-load state, all 16 ways are enabled (green)

Cache-by-Demand Operation

Under idle or low-load states, cache ways are dynamically flushed out and put in shut-off mode (red)

Tag ►

Data ►

Data ►

Way ► 15 14 • • 3 2 1 0

Controller

15 14 • • 3 2 1 0

Controller

Page 50: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012 50

Three PMOS sleep transistor groups for sub-array leakage reduction

Y. Wang, ISSCC 2009

Cache Leakage Management

Page 51: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Cache Leakage Reduction Benefit

51

Leakage management circuit reduces sub-array leakage by 58%

Page 52: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Leakage Mitigation: Long-Le Transistors

52

• All transistors can be

either nominal or long-Le

• Most library cells are

available in both flavors

• Long-Le transistors are

~10% slower, but have

3x lower leakage

• All paths with timing slack

use long-Le transistors

• Initial design uses only

long channel devices

Nominal Le

Long Le

(Nom+10%) S. Rusu, ISSCC 2006

Page 53: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Long-Le Transistors Usage Map

53

90-100

80-90

70-80

60-70

50-60

Massive long-channel usage in uncore reduces leakage

Nehalem-EX

long-channel

device usage

[percent]

SMI SMI

QPI QPI QPI QPI

Page 54: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Power & Leakage Breakdown

54

Power Breakdown Leakage Breakdown

Vcore

54.6%

Vuncore

33.4%

Vio

11.2% Vpll

0.8%

Active

84%

Leakage

16%

Clock gating

Run uncore at 0.9V

Long channel device usage:

58% cores, 85% uncore

Reduction

techniques

S. Rusu, ISSCC 2009

Nehalem-EX 45nm example

Page 55: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Core and Cache Recovery Example

Defective core and cache slices can be disabled in horizontal pairs

QPI0 QPI1

Disabled

Core5

Core6

Core7

SMI

Disabled

Core2

Core1

Disabled Disabled

System Interface

SMI

QPI2 QPI3

Core0

S. Rusu, ISSCC 2009

55

Page 56: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

• Disabled cores ► Power gated

• Disabled cache slices ► All major arrays in shut-off

Minimize Leakage in Disabled Blocks

0V

0.9V

Vo

lta

ge

Active Sleep Shut-off

Virtual VCC

35% Leakage

Reduction

Active

SRAM array

Sleep/ Shut-off

Core

Active/

Shut-off

83%

0V

0.85V

Vo

lta

ge

Active Shut-off Virtual VCC

40x Leakage

Reduction

S. Rusu, ISSCC 2009

56

Page 57: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Core/Cache Recovery – Infrared Image

All cores and cache slices are enabled

S. Rusu, ISSCC 2009

57

Page 58: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Core/Cache Recovery – Infrared Image

Shut-off 2 cores (top row) and 2 cache slices (bottom row)

Disabled blocks are clock and power gated

S. Rusu, ISSCC 2009

58

Page 59: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Agenda

•Microprocessor Design Trends

•Process Technology Directions

•Active Power Management

•Leakage Reduction Techniques

•Packaging and Thermal Modeling

•Future Directions and Summary

59

Page 60: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Microprocessor Package Evolution

•1971 – 4004 Processor

- 16-pin ceramic package

- Wire bond attach

- 750 kHz I/O

•2012 – Xeon® E5 Processor

- 2011-contact organic package

- Flip-chip attach

- 8.0 GHz I/O

60

Page 61: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Power Density Models

With increasing power density and large on-die caches, detailed, non-uniform power models are required

0

50

100

150

200

250

He

at F

lux (

W/c

m2

)

40

50

60

70

80

90

100

110

Te

mp

era

ture

(C

)

Power Map On-Die Temperature

61

Page 62: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Thermal Modeling

Simulated power density Infrared emission microscope measurement

D. Genossar and N. Shamir “Intel® Pentium® M Processor Power Estimation,

Budgeting, Optimization and Validation”, Intel Technology Journal 5/2003

62

Page 63: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Thermal Sensors • Multiple temperature sensors

-One in each core hot spot

-One in the die center

• Temperature information is available through PECI bus for system fan management

QPI0 QPI1

SMI SMI

QPI2 QPI3

63

Page 64: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Power Management Unit

External Voltage Regulator Control

Power Management

Unit

Pow

er

Gate

Pow

er

Gate

Pow

er

Gate

Pow

er

Gate

Pow

er

Gate

Pow

er

Gate

Pow

er

Gate

Pow

er

Gate

Power Gates Control

Core 0

Core 1

Core 2

Core 3

Core 7

Core 6

Core 5

Core 4

Sensors Sensors Sensors Sensors

Sensors Sensors Sensors Sensors

Sensors

PMU controls processor voltage and frequency based on compute loading and thermal data

64

Page 65: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Agenda

•Microprocessor Design Trends

•Process Technology Directions

•Active Power Management

•Leakage Reduction Techniques

•Packaging and Thermal Modeling

•Future Directions and Summary

65

Page 66: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Future Directions

• 2D mesh network with multiple Voltage / Frequency islands

• Communication across islands achieved through FIFOs

Ogras (CMU), DAC 2007

66

Page 67: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Fine Grain Power Management

Cores with critical tasks Freq = f, at Vdd TPT = 1, Power = 1

Non-critical cores Freq = f/2, at 0.7*Vdd TPT = 0.5, Power = 0.25

Temporarily shut down TPT = 0, Power = 0

0

VDD

Vo

lta

ge

Hi-Act Lo-Act Shut-off

0.7*VDD

0V

Pwr=1 Pwr=¼ Pwr=0

f f

f

f

f f

f/2

f/2

f/2

f/2

f/2

0

0

0

0 0

f 0 f/2 0

0

f/2

f/2

0

0

f

f/2

0

0 Permanently disabled TPT = 0, Power = 0

25-core processor example:

67

Page 68: Microprocessor Design in the Nanoscale Era · 2016-02-17 · Microprocessor Design in the Nanoscale Era Stefan Rusu Senior Principal Engineer ... (16 Gen3 + 4 Gen2 + 4 DMI lanes)

Stefan Rusu – July 2012

Summary

• Moore’s Law has fueled the worldwide technology revolution for

over 40 years and will continue for at least another decade

-0.7x transistor dimension scaling every two years

-Tri-gate devices provide significant benefits

• Continued microprocessor performance improvement

depends on our ability to manage active power and leakage

-Clock and power gate un-used or disabled blocks

-Multiple voltage and clock domains

-Dynamic voltage and frequency adjustment

• Core and cache recovery enables multiple product options

-Disabled cores and cache slices are clock and power gated

68