© Copyright IBM Corporation 2005
From Ingenuity to Impact
Technology Trends Presentation For Power Symposium 2006
8-23-06
Darryl Solie, Distinguished Engineer, Chief System ArchitectIBM Systems & Technology Group
IBM Engineering & Technology Services © 2005 IBM Corporation 2
Agenda
What’s driving technology today?
Why Cell BE today?
Emerging System Strategy – Scale Out & Acceleration
Intel’s direction
3
© 2002 IBM CorporationIBM Engineering & Technology Services
When Moore Is Less,Watts Happen
4
© 2002 IBM CorporationIBM Engineering & Technology Services
A Second Observation:Where have all the Gigahertz gone?
IBM Engineering & Technology Services © 2005 IBM Corporation 5
Technology Scaling – We’ve hit the wall
1988 1992 1996 2000 2004 2008 20120.2
0.40.60.81
2
46810
20
Conventional Bulk CMOS SOI (silicon-on-insulator) High mobility Double-Gate
Rel
ativ
e D
evic
e Pe
rfor
man
ce
Year
?
IBM Engineering & Technology Services © 2005 IBM Corporation 6
What’s causing the problem?
10S Tox=11AGate Stack
Gate dielectric approaching a fundamental limit (a few atomic layers)
0.010.110.001
0.01
0.1
1
10
100
1000
Gate Length (microns)
Active Power
Passive Power
1994 2004Pow
er D
ensi
ty (W
/cm
2 )
65 nM
Gate Length (microns)
IBM Engineering & Technology Services © 2005 IBM Corporation 7
Steam Iron5W/cm2
? Opp
ortu
nity
Has This Ever Happened Before?
Page 8
Engineering & Technology Services
© IBM Corporation 2005
Cell Processor Chip Overview
3 GHz 64 Bit PowerPC Processor
8 SPU’s (VMX-like accelerators)
25 GBytes/sec memory bandwidth
Up to 75 GBytes/sec I/O bandwidth
0.5-1 GByte High Speed Memory
~ 95 Watts @ 3 GHz
64b PowerProcessor
SynergisticProcessor
SynergisticProcessor
...
Mem. Contr.
Flexible IO
Page 9
Engineering & Technology Services
© IBM Corporation 2005
Cell BE Processor Overview
Heterogeneous multi-core system architecture- Power Processor Element for
control tasks- Synergistic Processor Elements for
data-intensive processing
Synergistic Processor Element (SPE) consists of - Synergistic Processor Unit (SPU)- Synergistic Memory Flow Control
(SMF)Data movement and synchronizationInterface to high-performance Element Interconnect Bus
16B/cycle (2x)16B/cycle
BIC
FlexIOTM
MIC
Dual XDRTM
16B/cycle
EIB (up to 96B/cycle)
16B/cycle
64-bit Power Architecture with VMX
PPE
SPE
LS
SXUSPU
SMF
PXUL1
PPU
16B/cycleL2
32B/cycle
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
LS
SXUSPU
SMF
Page 10
Engineering & Technology Services
© IBM Corporation 2005
General Purpose Cores vs Synergistic Processor Elements
Optimized acceleration can provide significant advantages
Engineering and Technology Services
© 2006 IBM Corporation11
FreescaleMPC8641D
1.5 GHz
Theoretical Peak Operations
0
50
100
150
200
250
Bill
ion
Ops
/ se
c
FP (SP) FP (DP) Int (16 bit) Int (32 bit)
AMDAthlon™ 64 X2
2.4 GHz
PowerPC®
970MP2.5 GHz
Cell BroadbandEngineTM
3.2 GHz
IntelPentium D®
3.2 GHz
IBM Systems & Technology Group 12
Cell BE Performance
12x290 fps (per SPE)200 fps (IA32)mpeg2 decoder (sdtv)video processing
12x770 Telemark (per SPE)
501 Telemark(1.4GHz mpc7447)EEMBCcommunication
18x1.98 Gbps (per SPE)0.85 Gbps (IA32)SHA-1
6x2.3 Gbps (per SPE)2.68 Gbps (IA32)MD-5
10x0.16 Gbps (per SPE)0.12 Gbps (IA32)TDES
14x2Gbps (per SPE)1.1 Gbps (IA32)AESsecurity15x24 fps (BE)1.6 fps (G5/VMX)TRE
12x240 MVPS (per SPE)160 MVPS (G5/VMX)transform-lightgraphics6x420 Mcups (per SPE)570 Mcups (IA32)smith-watermanbioinformatic2x12 GFLops (BE)6 GFlops (IA32)Linpack (D.P.)
8x150 GFlops (BE)18 GFlops (IA32)Linpack (S.P.)
8x190 GFlops (8SPEs)25 GflopsMatrix Multiplication (S.P.)HPC
BE PerfAdvantage3 GHz BE3 GHz GPPAlgorithmType
BE’s performance is about an order of magnitude better than traditional GPPs for media and other applications that can take advantage of its SIMD capability
BE can outperform a P4/SSE2 at same clock rate by 3 to 18x (assuming linear scaling) in various types of application workloads
IBM Engineering & Technology Services © 2005 IBM Corporation 13
Clusters andVirtualization
High Density Racks/Blades
Large SMPs
IBM Server StrategySc
ale
Up
/ SM
P C
ompu
ting
Scale Out / Distributed Computing
Page 14
Engineering & Technology Services
© IBM Corporation 2005
IBM BladeCenter
IBM Engineering & Technology Services © 2005 IBM Corporation 15
Blue Gene/L – Lawrence Livermore System131072 Processors / 262144 Floating Point Units360 Teraflops / 16 Terabytes of Memory10X Performance / 28X Less Power / 10X smaller
IBM Engineering & Technology Services © 2005 IBM Corporation 16
Blue Gene/L – Compute SoC
PLB (4:1)
“Double FPU”
Ethernet Gbit
JTAGAccess
144 bit wide DDR256MB
JTAG
Gbit Ethernet
440 CPU
440 CPUI/O proc
L2
L2
MultiportedSharedSRAM Buffer
Torus
DDR Control with ECC
SharedL3 directoryfor EDRAM
Includes ECC
4MB EDRAM
L3 CacheorMemory
l
6 out and6 in, each at 1.4 Gbit/s link
256
256
1024+144 ECC256
128
128
32k/32k L1
32k/32k L1
“Double FPU”
256
snoop
Tree
3 out and3 in, each at 2.8 Gbit/s link
GlobalInterrupt
128
2 PPC 440 Processors4 DP Floating Point Units4 MB EDRAMFull Mesh Toroid InterconnectIntegrated Memory Control/Ethernet~ 10-13 Watts/Chip
IBM Systems & Technology Group 17
Intel EMEA Academic Forum 5/05
IBM Systems & Technology Group 18
Intel EMEA Academic Forum 5/05
IBM Engineering & Technology Services © 2005 IBM Corporation 19
So………..Where do we go next?
- (More) Application Specific Acceleration!
Top Related