A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2...

25
1 A Dual-Core Multi-Threaded Xeon ® Processor with 16MB L3 Cache Stefan Rusu Intel Corporation Santa Clara, CA Intel and the Intel logo are registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice. Copyright © 2006, Intel Corporation.

Transcript of A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2...

Page 1: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

1Copyright © 2006, Intel Corporation

A Dual-Core Multi-Threaded Xeon®

Processor with 16MB L3 Cache

Stefan RusuIntel CorporationSanta Clara, CA

Intel and the Intel logo are registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice. Copyright © 2006, Intel Corporation.

Page 2: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

2Copyright © 2006, Intel Corporation

Outline• Processor Highlights• 65nm Process Technology• Block Diagram• L3 Cache Overview• Sleep and Shut-off Modes• Long-Le Transistor Usage• Clock and Power Distribution• Package Details• DFT/DFM Features• Thermal sensors• Frequency Shmoo• Summary

Page 3: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

3Copyright © 2006, Intel Corporation

Processor Highlights• Dual core, four threads• 1 MB unified L2 per core• 16 MB unified L3• 435 mm2 die size• 1.328 B transistors• 121 M transistors per core• 3.4 GHz at 1.25 V and

150 W TDP• 800 MT/s 3-load front-side

bus interface• Plugs in existing platforms

Largest cache and transistor count for an x86 processor

16MB L3

TAG

TAG1MB L2

1MB L2

Core 1

Core 0

FSB TOP

FSB BOT

Control Logic

Page 4: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

4Copyright © 2006, Intel Corporation

65nm Process Technology

• 1.2 nm gate oxide• NiSi for low resistance• Second generation

strained silicon• 8 Cu interconnect layers

with low-K CDO dielectric

M1M2M3M4M5

M6

M7

M8

Page 5: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

5Copyright © 2006, Intel Corporation

Block Diagram

• Shared 16MB L3 cache – Better efficiency - one core

can use more than half of the total cache

– No need for coherency traffic between caches

• Support for Intel Virtualization Technology

• Hyper-Threading Technology

• Enhanced Intel SpeedStep® Technology

Core 0(1M L2)

Core 1(1M L2)

Caching Front-SideBus Controller

16MBL3

Cache

3-load System Bus

External Front-SideBus Interface

Page 6: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

6Copyright © 2006, Intel Corporation

L3 Cache Floorplan• 0.624 um2 bit cell• 0.75 Watts / MB

average power• Only 0.8% of all array

blocks powered-upfor each access

• 256 regular 64kBsub-arrays

• 32 redundancy 68kB sub-arrays

TAG(1st 8K sets)

TAG(2nd 8K sets)

8M18M0

Redundancy shift logic & horizontal clock spine

Regular Sub-array (32 bits)

RedundancySub-array(34 bits)

Repeaters CLK Spine+ RepeatersLegend:

Page 7: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

7Copyright © 2006, Intel Corporation

L3 Sub-Array and Sleep Partitioning

Bank 1M

idlo

gic

Timer

blk0 blk1 blk 2 blk 3blk7 blk6 blk5 blk4Reg SA:128 colsRdn SA:136 cols

256 rows

SleepResolution

Mid

Log

ic

Word-lineDriver

Sense amp+ write circuit

blk2 blk3

PMOS-sleep

M2, BLM

3, W

L

Bank3 Bank 2 Bank 0

NMOS-sleep

Bank 1

Page 8: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

8Copyright © 2006, Intel Corporation

Column Redundancy Features• Data Array

– Two redundancy columns for each 290-bit chunk– Repair up to two random defects in each cache line

• Tag Array– One redundancy column for each 36-bit tag line– Repair one random defect in each entry

06364287288289 7172

DataECCR …DataECC279280 215216

Lower tagECCR03132

State33

CR3435

Upper tagU13142021 23

Page 9: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

9Copyright © 2006, Intel Corporation

XX

X

L3 Cache Sleep and Shut-off ModesActive Mode Sleep Mode Shut-off Mode

0V

1.1V

Vol

tage

0V250mV

~500mVVirtualVSS

2x lowerleakage

2x lowerleakage

Virtual VSS

BlockSelect

Sub-array

SleepBias

Shutoff

Sub-array Sub-array

Page 10: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

10Copyright © 2006, Intel Corporation

Leakage Shut-off Infrared Images

16MB SKU

All 16MB insleep mode

8MB SKU

8MB insleep mode

8MB inshut-off mode

Shut-off feature reduces the leakage of the8MB disabled sub-arrays by about 3W

Page 11: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

11Copyright © 2006, Intel Corporation

Leakage Mitigation: Long-Le Transistors

• All transistors can be either nominal or long-Le

• Most library cells are available in both flavors

• Long-Le transistors are about 10% slower, but have 3x lower leakage

• All paths with timing slack use long-Le transistors

Nominal Le

Long Le(Nom+10%)

Page 12: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

12Copyright © 2006, Intel Corporation

Long-Le Transistors Usage Map

0%

20%

40%

60%

80%

100%

Long

-Le

Usa

ge (%

)

Core 0

Cor 1 L3 CacheControl

Page 13: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

13Copyright © 2006, Intel Corporation

Long-Le Transistors Summary

Percentage of Long-Le device width excluding RAM arrays:

Cores Uncore

Moore’s Law will continue to double transistors every 2 years New Trend ► To reduce sub-threshold leakage, most devices

will be slower and only a handful of transistors will be fast

Long-Le54%

Nominal46%

Long-Le76%

Nominal24%

Page 14: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

14Copyright © 2006, Intel Corporation

Clock Domains

Uncore I/OCore PLL

16MB L3

TAG

TAG

1MB L2

1MB L2

Core 1

Core 0

FSB TOP

FSB BOT

Legend:

System Clock (BCLK)

Page 15: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

15Copyright © 2006, Intel Corporation

Global Clock Skew Profile

Worst-case globalskew is 11ps

More details on the clock distribution in paper 21.2

Page 16: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

16Copyright © 2006, Intel Corporation

Voltage Domains

Uncore I/OCore PLL

16MB L3

TAG

TAG

1MB L2

1MB L2

Core 1

Core 0

FSB TOP

FSB BOT

Control Logic

Legend:

Page 17: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

17Copyright © 2006, Intel Corporation

Voltage Profile

Voltage ProfileCut Line

0V

1.25V

Vol

tage

1.10V

0.25VCores 16MB arrayCtrl

+Tag Virtual

VSS

Cache sleep function enables separate voltage knob

Page 18: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

18Copyright © 2006, Intel Corporation

Power and Leakage Breakdown

Leakage accounts for about 30% of the total power

Total Power Breakdown

Cores74%

L3 Cache12%Ctrl

11%I/O3%

Leakage Breakdown

Cores67%

L3 Cache22%Ctrl

9%I/O2%

Page 19: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

19Copyright © 2006, Intel Corporation

Symmetric I/O Pre-driver Circuit

Din

Ron_5

Ron_4

Ron_3

Ron_2

Ron_1

Ron_6

VttRodt

2

Mux

BidirectionalDelay Control

R_5

R_4

R_3

R_2

R_1

R_6

PAD

Page 20: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

20Copyright © 2006, Intel Corporation

C4 Bump Map

• 13164 C4 bumps• Perfectly uniform

bump pitch over the entire die (including I/O buffers) to improve epoxy underfill

Page 21: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

21Copyright © 2006, Intel Corporation

Package Details

• 12 layers organic substrate (53.3 mm/side)

• 4-4-4 stacking

• Integrated heatspreader (38.5 mm/side)

• 604 total pins

• 366 signal I/Os

• System management components onpackage

Page 22: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

22Copyright © 2006, Intel Corporation

Design for Test and Debug Features

L3 cache DFT/DFM

•Built-in pattern generator (PBIST)

•Programmableweak-write test

•Low-yield analysis

•Stability test mode

•32-entry cache line disable (Pellston)

FSB DFT/DFM

•I/O loopback

•I/O test generator

Die-level DFT/DFM

•Parallel structural core test with XOR

•Scan and observability registers (scan-out)

•Three TAP controllers(core0, core1, uncore)

•Within-die process monitors

•On-die clock shrink

Page 23: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

23Copyright © 2006, Intel Corporation

Thermal Sensors

Temperature Sensors

ThermalDiode

Page 24: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

24Copyright © 2006, Intel Corporation

Frequency Shmoo

• Target operating point 3.4GHz at 1.25V corevoltage and 150W TDP envelope

4.074.40 3.143.79 3.55 3.33

1.25

1.20

1.15

1.40

1.35

1.30

Cor

e V

olta

ge [V

] ►

◄ Frequency [GHz]

Page 25: A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2 per core • 16 MB unified L3 • 435 mm2 die size • 1.328 B transistors • 121

25Copyright © 2006, Intel Corporation

Summary• 65nm dual-core, four-thread Xeon® Processor

with on-die 16MB L3 cache– Shared L3 cache ► best fit for server processors– Virtualization and Hyper-Threading Technology

• Leakage reduction circuit techniques– Massive Long-Le usage – N and P sleep transistors in L3 cache– L3 leakage shut-off mode ► saves power in lower

cache size SKUs• Multiple voltage and clock domains to reduce

active power and leakage