A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2...
Transcript of A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache · 2006-04-24 · • 1 MB unified L2...
1Copyright © 2006, Intel Corporation
A Dual-Core Multi-Threaded Xeon®
Processor with 16MB L3 Cache
Stefan RusuIntel CorporationSanta Clara, CA
Intel and the Intel logo are registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others. All products, dates, and figures are preliminary and are subject to change without any notice. Copyright © 2006, Intel Corporation.
2Copyright © 2006, Intel Corporation
Outline• Processor Highlights• 65nm Process Technology• Block Diagram• L3 Cache Overview• Sleep and Shut-off Modes• Long-Le Transistor Usage• Clock and Power Distribution• Package Details• DFT/DFM Features• Thermal sensors• Frequency Shmoo• Summary
3Copyright © 2006, Intel Corporation
Processor Highlights• Dual core, four threads• 1 MB unified L2 per core• 16 MB unified L3• 435 mm2 die size• 1.328 B transistors• 121 M transistors per core• 3.4 GHz at 1.25 V and
150 W TDP• 800 MT/s 3-load front-side
bus interface• Plugs in existing platforms
Largest cache and transistor count for an x86 processor
16MB L3
TAG
TAG1MB L2
1MB L2
Core 1
Core 0
FSB TOP
FSB BOT
Control Logic
4Copyright © 2006, Intel Corporation
65nm Process Technology
• 1.2 nm gate oxide• NiSi for low resistance• Second generation
strained silicon• 8 Cu interconnect layers
with low-K CDO dielectric
M1M2M3M4M5
M6
M7
M8
5Copyright © 2006, Intel Corporation
Block Diagram
• Shared 16MB L3 cache – Better efficiency - one core
can use more than half of the total cache
– No need for coherency traffic between caches
• Support for Intel Virtualization Technology
• Hyper-Threading Technology
• Enhanced Intel SpeedStep® Technology
Core 0(1M L2)
Core 1(1M L2)
Caching Front-SideBus Controller
16MBL3
Cache
3-load System Bus
External Front-SideBus Interface
6Copyright © 2006, Intel Corporation
L3 Cache Floorplan• 0.624 um2 bit cell• 0.75 Watts / MB
average power• Only 0.8% of all array
blocks powered-upfor each access
• 256 regular 64kBsub-arrays
• 32 redundancy 68kB sub-arrays
TAG(1st 8K sets)
TAG(2nd 8K sets)
8M18M0
Redundancy shift logic & horizontal clock spine
Regular Sub-array (32 bits)
RedundancySub-array(34 bits)
Repeaters CLK Spine+ RepeatersLegend:
7Copyright © 2006, Intel Corporation
L3 Sub-Array and Sleep Partitioning
Bank 1M
idlo
gic
Timer
blk0 blk1 blk 2 blk 3blk7 blk6 blk5 blk4Reg SA:128 colsRdn SA:136 cols
256 rows
SleepResolution
Mid
Log
ic
Word-lineDriver
Sense amp+ write circuit
blk2 blk3
PMOS-sleep
M2, BLM
3, W
L
Bank3 Bank 2 Bank 0
NMOS-sleep
Bank 1
8Copyright © 2006, Intel Corporation
Column Redundancy Features• Data Array
– Two redundancy columns for each 290-bit chunk– Repair up to two random defects in each cache line
• Tag Array– One redundancy column for each 36-bit tag line– Repair one random defect in each entry
06364287288289 7172
DataECCR …DataECC279280 215216
Lower tagECCR03132
State33
CR3435
Upper tagU13142021 23
9Copyright © 2006, Intel Corporation
XX
X
L3 Cache Sleep and Shut-off ModesActive Mode Sleep Mode Shut-off Mode
0V
1.1V
Vol
tage
0V250mV
~500mVVirtualVSS
2x lowerleakage
2x lowerleakage
Virtual VSS
BlockSelect
Sub-array
SleepBias
Shutoff
Sub-array Sub-array
10Copyright © 2006, Intel Corporation
Leakage Shut-off Infrared Images
16MB SKU
All 16MB insleep mode
8MB SKU
8MB insleep mode
8MB inshut-off mode
Shut-off feature reduces the leakage of the8MB disabled sub-arrays by about 3W
11Copyright © 2006, Intel Corporation
Leakage Mitigation: Long-Le Transistors
• All transistors can be either nominal or long-Le
• Most library cells are available in both flavors
• Long-Le transistors are about 10% slower, but have 3x lower leakage
• All paths with timing slack use long-Le transistors
Nominal Le
Long Le(Nom+10%)
12Copyright © 2006, Intel Corporation
Long-Le Transistors Usage Map
0%
20%
40%
60%
80%
100%
Long
-Le
Usa
ge (%
)
Core 0
Cor 1 L3 CacheControl
13Copyright © 2006, Intel Corporation
Long-Le Transistors Summary
Percentage of Long-Le device width excluding RAM arrays:
Cores Uncore
Moore’s Law will continue to double transistors every 2 years New Trend ► To reduce sub-threshold leakage, most devices
will be slower and only a handful of transistors will be fast
Long-Le54%
Nominal46%
Long-Le76%
Nominal24%
14Copyright © 2006, Intel Corporation
Clock Domains
Uncore I/OCore PLL
16MB L3
TAG
TAG
1MB L2
1MB L2
Core 1
Core 0
FSB TOP
FSB BOT
Legend:
System Clock (BCLK)
15Copyright © 2006, Intel Corporation
Global Clock Skew Profile
Worst-case globalskew is 11ps
More details on the clock distribution in paper 21.2
16Copyright © 2006, Intel Corporation
Voltage Domains
Uncore I/OCore PLL
16MB L3
TAG
TAG
1MB L2
1MB L2
Core 1
Core 0
FSB TOP
FSB BOT
Control Logic
Legend:
17Copyright © 2006, Intel Corporation
Voltage Profile
Voltage ProfileCut Line
0V
1.25V
Vol
tage
1.10V
0.25VCores 16MB arrayCtrl
+Tag Virtual
VSS
Cache sleep function enables separate voltage knob
18Copyright © 2006, Intel Corporation
Power and Leakage Breakdown
Leakage accounts for about 30% of the total power
Total Power Breakdown
Cores74%
L3 Cache12%Ctrl
11%I/O3%
Leakage Breakdown
Cores67%
L3 Cache22%Ctrl
9%I/O2%
19Copyright © 2006, Intel Corporation
Symmetric I/O Pre-driver Circuit
Din
Ron_5
Ron_4
Ron_3
Ron_2
Ron_1
Ron_6
VttRodt
2
Mux
BidirectionalDelay Control
R_5
R_4
R_3
R_2
R_1
R_6
PAD
20Copyright © 2006, Intel Corporation
C4 Bump Map
• 13164 C4 bumps• Perfectly uniform
bump pitch over the entire die (including I/O buffers) to improve epoxy underfill
21Copyright © 2006, Intel Corporation
Package Details
• 12 layers organic substrate (53.3 mm/side)
• 4-4-4 stacking
• Integrated heatspreader (38.5 mm/side)
• 604 total pins
• 366 signal I/Os
• System management components onpackage
22Copyright © 2006, Intel Corporation
Design for Test and Debug Features
L3 cache DFT/DFM
•Built-in pattern generator (PBIST)
•Programmableweak-write test
•Low-yield analysis
•Stability test mode
•32-entry cache line disable (Pellston)
FSB DFT/DFM
•I/O loopback
•I/O test generator
Die-level DFT/DFM
•Parallel structural core test with XOR
•Scan and observability registers (scan-out)
•Three TAP controllers(core0, core1, uncore)
•Within-die process monitors
•On-die clock shrink
23Copyright © 2006, Intel Corporation
Thermal Sensors
Temperature Sensors
ThermalDiode
24Copyright © 2006, Intel Corporation
Frequency Shmoo
• Target operating point 3.4GHz at 1.25V corevoltage and 150W TDP envelope
4.074.40 3.143.79 3.55 3.33
1.25
1.20
1.15
1.40
1.35
1.30
Cor
e V
olta
ge [V
] ►
◄ Frequency [GHz]
25Copyright © 2006, Intel Corporation
Summary• 65nm dual-core, four-thread Xeon® Processor
with on-die 16MB L3 cache– Shared L3 cache ► best fit for server processors– Virtualization and Hyper-Threading Technology
• Leakage reduction circuit techniques– Massive Long-Le usage – N and P sleep transistors in L3 cache– L3 leakage shut-off mode ► saves power in lower
cache size SKUs• Multiple voltage and clock domains to reduce
active power and leakage