Cell/B.E. processor-based systems and software offerings ... · – Complete, integrated kit –...
Transcript of Cell/B.E. processor-based systems and software offerings ... · – Complete, integrated kit –...
© 2008 IBM Corporation
IBM CONFIDENTIAL© 2008 IBM Corporation
IBM Systems and Technology Group
Cell/B.E. processor-based systems and software offeringsIBM BladeCenter® QS22 and SDK 3.0
IBM Systems and Technology Group
© 2008 IBM Corporation2Sales Conference
The challenge today
�For many years, organizations have relied on performance gains from increasing clock speeds of “traditional” microprocessor architectures
�This approach has been challenged by the physical limitations ofsemiconductors and by traditional processor architecture implementations
�High performance computing (HPC) applications need a fundamentally new technology and approach to the system-level architecture to achieve the desired level of performance.
IBM Systems and Technology Group
© 2008 IBM Corporation3Sales Conference
Cell Broadband Engine™ (Cell/B.E.) Technology
� IBM, Sony, Toshiba Alliance formed in 2000
� March, 2001 – STI Design Center opened in Austin, TX
� April, 2004 - Single Cell BE operational
� July, 2004 - 2-way SMP operational
� February, 2005 - first technical disclosures at ISSCC
� May, 2005 - first public demonstration of Cell/B.E. processor-based system at E3
� August, 2005 - published technical details of Cell/B.E. architecture
� November, 2005 - published open source SDK & Cell/B.E. simulator
� August, 2006 - introduced the very first Cell/B.E. processor-based server to the market
For a higher of absolute performance and efficiency
IBM Systems and Technology Group
© 2008 IBM Corporation4Sales Conference
IBM commitment to innovation
2006
2008
2007
Produce systems for early adoption and solution enablement
Create initial platforms for experimentation
•BladeCenter QS21
•IBM SDK forMulticoreAcceleration 3.0
BladeCenter QS20Produce robust production ready systems for targeted industry applications
IBM BladeCenter QS22
Extraordinary double precision floating point performance. Large memory capability. Ready for the most demanding production applications
PowerXCell™ 8i processor
IBM Systems and Technology Group
© 2008 IBM Corporation5Sales Conference
Cell Broadband Engine Architecture™ (CBEA) Technology Roadmap
20102009200820072006
PerformanceEnhancements/Scaling
CostReduction
All future dates and specifications are estimations only; Subject to change without notice. Dashed outlines indicate concept designs.
Concept
Committed
Compatible code and security base across entire lineCompatible code and security base across entire line
Cell/B.E.(1+8)
90nm SOI
IBM
PowerXCell™ 8i
(1+8eDP SPE)65nm SOI
Cell/B.E.(1+8)
65nm SOI
IBM PowerXCell 32ii
45nm SOI
Cell/B.E.(1+8)
45nm SOI
IBM Systems and Technology Group
© 2008 IBM Corporation6Sales Conference
IBM PowerXCell™ 8i processor benefits
� Sets a new performance standard – Accelerates computationally intense workloads such as
analytics, multimedia and vector processing.
– Efficient computation per watt
� Designed for flexibility– Wide variety of application domains
– Cell can cover a wide range of application space with its capabilities in
– floating point operations, integer operations
– data streaming / throughput support
– real-time support
– Exploits C/C++, Fortran programming models
� Enhanced security capability– Virtual trusted computing environment for security
The new PowerXCell 8i processor builds on the Cell Broadband Engine Architecture and combines a general-purpose Power Architecture™ core of modest performance with eight enhanced synergistic processing elements optimized for extreme double precision and single precision computational performance
PowerXCell 8i processor
� 65 nm
� 9 cores, 10 threads
� 230.4 GFlops peak (SP) at 3.2GHz
� 108.8 GFlops peak (DP) at 3.2GHz
� Up to 25 GB/s memory bandwidth
� Up to 75 GB/s I/O bandwidth
� 92 Watts @ 3.2GHz
� Top frequency >4GHz (observed in lab)
PowerXCell 8i processor
� 65 nm
� 9 cores, 10 threads
� 230.4 GFlops peak (SP) at 3.2GHz
� 108.8 GFlops peak (DP) at 3.2GHz
� Up to 25 GB/s memory bandwidth
� Up to 75 GB/s I/O bandwidth
� 92 Watts @ 3.2GHz
� Top frequency >4GHz (observed in lab)
IBM Systems and Technology Group
© 2008 IBM Corporation7Sales Conference
Intel’s x86 Quad Core processors are Dual Chip Modules (DCMs), 2 of these processor
stacked vertically & packaged together
PowerXCell 8i uses ½ the space & power and delivers more than 2.3x the GFlops of traditional architecture
On any traditional processor, shown ratio of cores to cache, prediction, & related items
illustrated here remains at ~50% of area the chip area
Example Server Dual Core
349mm2, 3.4 GHz @ 150W2 Cores, ~27.2 SP GFlops1.3b Transistors @ 65nm
Example Desktop Quad Core
214 mm², 3 GHz @ 130W4 Cores, ~96 SP GFlops
820m Transistors @ 45nm
PowerXCell 8i Nine Core
109 mm2 3.2 GHz@ 75W
9 cores, ~ 230 SP GFlops,250m Transistors @ 65nm
IBM Systems and Technology Group
© 2008 IBM Corporation8Sales Conference
BladeCenter® QS22 – PowerXCell 8i
� Core Electronics
– Two 3.2GHz PowerXCell 8i Processors
– SP: 460 GFlops peak per blade
– DP: 217 GFlops peak per blade
– Up to 32GB DDR2 800MHz
– Standard blade form factor
– Support BladeCenter H chassis
� Integrated features
– Dual 1Gb Ethernet (BCM5704)
– Serial/Console port, 4x USB on PCI
� Optional
– Pair 1GB DDR2 VLP DIMMs as I/O buffer (2GB total) (46C0501)
– 4x SDR InfiniBand adapter (32R1760)
– SAS expansion card (39Y9190)
– 8GB Flash Drive (43W3934)
D
DR2
D
DR2
D
DR2
D
DR2
PowerXCell 8i
DDR2
PowerXCell 8i
2 UART, SPI
Rambus® FlexIO ™
PCI-E x16
PCI-X
PCI-E x8
HSC *1
2x PCI-E x16
PCI
Leg
ac
y C
on
USB toBC mid plane
GbE toBC mid plane
2x1GbE
SPI
Optional IB
2 port
IB x4 HCA
HSDC
IB-4x toBC-H high speed fabric/mid plane
D
DR2
D
DR2
D
DR2
D
DR2
Flash
Drive
D
DR2
IBM
SouthBridge
4xUSB2.0
Flash, RTC
& NVRAMIBM
South
Bridge
D
DR2
*The HSC interface is not enabled on the standard products. This interface can be enabled on “custom”system implementations for clients by working with the Cell services organization in IBM Industry Systems.
IBM Systems and Technology Group
© 2008 IBM Corporation9Sales Conference
Performance highlights
� Performance is an order of magnitude better than general purposeprocessors (GPP) for media and certain applications that can take advantage of its Single Instruction Multiple Data (SIMD) capability
– Performance of its simple Power Processor Element (PPE) is comparable to a traditional GPP performance
– Each Synergetic Processor Element (SPE) is able to perform mostly the same as a GPP running at the same frequency
– Key performance advantage comes from its eight de-coupled SPE engines with dedicated resources including large register files and DMA channels
� Accelerates targeted applications with extraordinary processing capabilities– Floating-point operations
– Integer operations
– Data streaming / throughput support
– Real-time support
� Open architecture allows for optimization at compiler and application level– Performance gains from tuning compilers and applications can be significant
– Tools/simulators are provided to assist in performance optimization efforts
IBM Systems and Technology Group
© 2008 IBM Corporation10Sales Conference
IBM BladeCenter QS22
� QS22 is the RIGHT choice for intensive streaming and/or single and double precision floating point workloads
� QS22 is OPEN – based on Power Architecture and running Linux® OS
� QS22 is EASY to deploy and to integrate into the existing IT infrastructure and/or workloads:
– Co-exist and complement all other Blade servers offerings (Intel®, AMD®, POWER®)
– Ready to scale out and deploy in production environments
� QS22 is GREEN – more than 1.7 SP (or 0.8 DP) GFLOPS per watt.
Premier blade for HPC workloads
IBM Systems and Technology Group
© 2008 IBM Corporation11Sales Conference
IBM SDK for Multicore Acceleration and related tools
Libraries and frameworks
IBM XL C/C++ compiler*
Optimized compiler for use in creating Cell/B.E. optimized applications. Offers:* improved performance * automatic overlay support * SPE code generation
AcceleratedLibrary
Framework (ALF)
Data
Communicationand
Synchronization
(DaCS)
Basic LinearAlgebra
Subroutines (BLAS)
StandardizedSIMD math
libraries
GNU tool chain
Performance Tools
The IBM SDK is a complete tools package that simplifies programming for the Cell Broadband Engine Architecture
XLC compiler is
a
complementary
product to SDK
Eclipse-based IDE
Simulator
Denotes software components included in the SDK for Multicore Acceleration
IBM Systems and Technology Group
© 2008 IBM Corporation12Sales Conference
IBM SDK for Multicore Acceleration value
� Designed to be highly reliable, simple toacquire and easy to use
– Complete, integrated kit
– Production-ready tools from IBM
– IBM warranty and support
� Based on industry standards to ease thetransition to the Cell/B.E.
– Eclipse-based Integrated DevelopmentEnvironment
– Standard, base libraries
– Third-party libraries can be plugged in
� Designed to make it easy to port and optimize applications for the QS21 and QS22
– Enhancements to enable new features in QS22
– Performance tuning tools to help optimize algorithms without re-writing the entire application
– Tools designed to help you partition an application across a hybrid Cell/B.E. and x86 platform
IBM Systems and Technology Group
© 2008 IBM Corporation13Sales Conference
Cell Programming Approaches are fully customizable!
3. Case Tools /
Complete
Hardware
Abstraction
� User tool-driven
2. Assisted
Programming
� Libraries,
Frameworks
1. “Native”
Programming
� Compilers,
Intrinsics,
DMA, etc.
Increasing Programmer Control over Cell/B.E. resources
Decreasing programmer attention to architectural details
IBM Systems and Technology Group
© 2008 IBM Corporation14Sales Conference
Workloads ideal for PowerXCell 8i and QS22
DigitalMedia
Financial ServicesSector
Home MediaConsumer Electronics
Information Based
Medicine
Digital Video Surveillance
Aerospace and Defense
Electronic Design
Automation
Chemicals & Petroleum
Market & Solution Specific Assets
Real-time AnalyticsProcessing of Data
Information Synthesis
Analysis
Unstructured DataMultimodal Search
Data Transforms
Pattern Matching
Image/Video Creation/MgtPresentation of Data
Visualization
Imaging
Extreme Stream Computation and Bandwidth requirements
PowerXCell 8i is suited for applications which demand extraordinary floating point performance
IBM Systems and Technology Group
© 2008 IBM Corporation15Sales Conference
Public sector HPC solutions
� IBM components:
– IBM BladeCenter QS21 & QS22
– IBM SDK for Multicore Acceleration
– IBM Cell/B.E. math libraries
– IBM hybrid computing solution (custom offering)
– PXCAB
� ISV applications:
– Development tools from RapidMind, Gedae, Wind River, etc.
– A growing number of university and government research labs with external collaborative missions are exercising existing and emerging science codes
� The solution is designed to offer:
– Petaflop Scalability and reliability
– Lower power and space footprint
– Lower total cost of ownership
� Performance advantages:
– Science code such as SPaSM, VPIC, Milagro, Sweep3D, accelerated up to 4-9X faster than AMD Opteron™ single core(Source: LANL - www.lanl.gov/roadrunner)
Enable government labs, agencies, and academic research centers to run high performance codes faster, less expensively, and with lower power consumption than existing computing architectures
*See Notes on Benchmarks, charts 46 and 47
IBM Systems and Technology Group
© 2008 IBM Corporation16Sales Conference
Aerospace & defense solutions
� IBM components:
– IBM BladeCenter QS21 & QS22
– IBM SDK for Multicore Acceleration
– IBM Cell/B.E. math libraries
– IBM hybrid computing solution (custom offering)
– PXCAB
� ISV applications:
– Gedae stream, image and signal programming environment
– RapidMind development tools
– Wind River VxWorks RTOS and WorkBenchTools
� Performance advantages:
– FFT workloads up to 7.7x faster than 3.0 GHz 2-core Woodcrest x2*
– Double Precision Matrix Multiplication up to 2.6x faster than 2.66GHz 4-core Clovertown*
Enhance competitiveness, demonstrate innovation and capture significant government contracts through dramatic performance improvements in real time signal and image processing
“As a time-served radar architect, I can say that
Cell/Gedae is something of a dream and should rightly impact the new design market… it is an
opportunity that the DoD should not fail to grasp.”
- John Roulston,SCImus Solutions, March 2007
*See Notes on Benchmarks, charts 46 and 47
IBM Systems and Technology Group
© 2008 IBM Corporation17Sales Conference
Digital content creation solutions
� IBM components:
– IBM BladeCenter QS21 & QS22
– IBM SDK for Multicore Acceleration
– IBM Cell/B.E. math libraries
– IBM hybrid computing solution (custom offering)
– PXCAB
– IBM iRT scalable real-time ray tracer
� ISV applications
– RapidMind development tools
� The solution is designed to offer:
– Rapid turn around of digital assets
– More realistic simulation
– An open and flexible solution based on standards
– Scalability and reliability
� Performance advantages:
– 1080p Ray-traced images computed in milliseconds*
– 1080p Ambient Occlusion images computed in seconds*
IBM solutions enable Media and Entertainment companies to produce the next
generation of animated feature films, games, and advertising content
*See Notes on Benchmarks, charts 46 and 47
IBM Systems and Technology Group
© 2008 IBM Corporation18Sales Conference
Digital video surveillance solutions
� IBM components:
– IBM BladeCenter QS21/QS22
– IBM Total Storage
– IBM DVS ADK
� ISV applications:
– Codec libraries
– Video distribution software
� The solution is designed to offer:
– H.264 encoding
– Encoders for analog cameras
– Transcoding to save storage and network costs
– Decoding acceleration to reduce workstation costs and improve robustness
– Better management and scalability
– Network-based surveillance
– Compute density - with two processors per blade, 14 blades to a chassis, and two chassis to a rack, it is possible to have as many as 672 H.264 encoders in the rack
� Performance advantage:
– One Cell/B.E processor running at 3.2 GHz, can encode 12 channels of standard definition video at 30 fps to H.264 (main profile, including CABAC)[1]
[1] Source: IBM Research benchmark
Solutions deliver hardware and enablement for high-density, highly scalable encoding,
transcoding, and compositing for digital video surveillance
IBM BladeCenter QS21/QS22
PTZ
Coax
16 camera inputs
16 camera inputs
Aggregation Unit
14 card slots
IBM BladeCenter-HIBM Total Storage
672 encoders in
a rack!
*See Notes on Benchmarks, charts 46 and 47
IBM Systems and Technology Group
© 2008 IBM Corporation19Sales Conference
EDA solutions
� IBM components:– Cell/B.E. hybrid cluster
– IBM BladeCenter QS21
– IBM System x / IBM BladeCenter
– IBM Cluster 1350 integrated cluster
– Storage: DS4000, N series, DCS9550
� ISV applications: – Mentor Graphics® Calibre® nmOPC
and OPCVerify™
� The solution is designed to offer:
– Significant run time acceleration
– Leverages Cell/B.E. strengths to offer significant speed-up when compared to existing solutions in the market, reducing design turnaround time
– Scalability and reliability
– Blade form factor improves scalability, compute density and reliability
Accelerate computational lithography workload to address turnaround time
challenges and at the same time reduce total cost of the computing infrastructure
IBM Systems and Technology Group
© 2008 IBM Corporation20Sales Conference
Financial market analytics solutions
� IBM components:
– IBM BladeCenter QS22
– IBM SDK for Multicore Acceleration
– Dynamic Application Virtualization
– Cell/B.E. math libraries
� ISV applications:
– NAG - Math & Stat Software
– Platform Symphony -Grid Computing Environment
– Encirq – Event Processing Platform
� The solution is designed to offer:
– Flexibility and Scalability
– IBM Bladecenter QS22 integrates with other Bladecenter Products
– IBM SDK, DAV, third party applications for ease of adoption within existing infrastructure
– Technical Services with skilled programming expertise and subject matter experts
– Power, space and cooling advantages
� Performance advantage
– Collateralized Debt Obligation (CDO) - 7.5X faster than 2.8 GHz 4-core Harpertown*
– 650 million European options /sec using Monte Carlo simulations on QS22 blade*
Enable financial market professionals to perform the required speed, accuracy and highly complex analytics to support trade execution and improve their firms’ competitive position
*See Notes on Benchmarks, charts 46 and 47
IBM Systems and Technology Group
© 2008 IBM Corporation21Sales Conference
Medical imaging solutions
� IBM components:
– IBM BladeCenter QS21 & QS22
– IBM SDK for Multicore Acceleration
– IBM Cell/B.E. math libraries
– IBM hybrid computing solution (custom offering)
– PXCAB
� ISV applications:
– Advanced image and text analytics
– High-performance image compression
� The solution is designed to offer:
– 3D image reconstruction, registration, volume rendering, segmentation
– On-demand compression/decompression
� Performance advantage:
– 16x improvement on MRI image reconstruction over Opteron system
– 11x improvement on CT image reconstruction over 3.0GHz Xeon system
– 48x improvement on image registration over 3GHz Pentium 4
– 200x shear-warp volume visualization over TI TMS320C80 processor
– 40:1 CT study data compression(Source for all above: Mayo Clinic -
http://www.mayoclinic.org/news2007-rst/3996.html )*
Improve the efficiency, productivity, and quality of patient care through dramatic performance improvements in the transmission and analysis of medical images
*See Notes on Benchmarks, charts 46 and 47
IBM Systems and Technology Group
© 2008 IBM Corporation22Sales Conference
Seismic solutions
� IBM components:
– IBM BladeCenter QS22
– IBM SDK for Multicore Acceleration
– IBM Cell/B.E. math libraries
– IBM hybrid computing solution (custom offering)
– PXCAB
– Standard math, vector math, FFT, BLAS, MPI and tridiagonal solver
� ISV applications:
– Simudyne
– Customers own proprietary code
� The solution is designed to offer:
– High-performance highly accurate rendering
of geologic structures
– Cost effective HPC environment that has significant performance increases
– Scalability and reliability
� Performance advantages:
– FFT workloads up to 7.7x faster than 3.0 GHz 2-core Woodcrest x2*
– Double Precision Matrix Multiplication up to 2.6x faster than 2.66GHz 4-core Clovertown*
Improve the speed and accuracy of geologic visualization to reduce the cost of evaluatingpotential targets for oil and gas yielding potential
*See Notes on Benchmarks, charts 46 and 47
IBM Systems and Technology Group
© 2008 IBM Corporation23Sales Conference
QS22 summary
� The QS22 is based on the new PowerXCell 8i processor – built on an enhanced version of the Cell Broadband Engine Architecture
� The QS22 offers the capabilities you need for your most demanding
computational requirements
– Offers extraordinary double precision and single precision floating point performance
– Supports up to 32GB of processor memory
� IBM is working with ISVs and customers to accelerate workloadson the QS22 in targeted application areas
� The QS22 is extremely efficient, offering more than 1.7 SP (or 0.8
DP) GFLOPS per watt of energy
BladeCenter QS22 is Right, Open, Easy and Green
Premier blade for HPC workloads
IBM Systems and Technology Group
© 2008 IBM Corporation24Sales Conference
IBM SDK for Multicore Acceleration summary
� Designed to be highly reliable, simple to acquire and easy to use– Complete, integrated kit
– Production-ready tools from IBM
– IBM warranty and support
� RHEL 5.2 Enterprise support
� Based on industry standards to ease the transition to the Cell/B.E. architecture
– Eclipse-based Integrated Development Environment
– Standard, base libraries
– Third-party libraries can be plugged in
� Designed to make it easy to port and optimize applications for the QS22– Performance tuning tools to help optimize algorithms without re-writing the entire application
– Tools designed to help you partition an application across a hybrid Cell/B.E. and x86 platform
IBM Systems and Technology Group
© 2008 IBM Corporation25Sales Conference
Cell/B.E. architecture reaches wide and deep – from consumer products to high performance computing
SCE PS3(Cell/B.E. + GPU)
IBM BladeServer(2 Cell/B.E. or
PowerXCell 8i)
Roadrunner(16,000
PowerXCell 8i. + AMD)
Mercury 1u Dual Cell
Sony Cell/B.E. Computing Unit
(Cell/B.E. + GPU + AV I/O)
Consumer Business
High Performance ComputingEnterprise
PowerXCell 8i PCI card
(Cell/B.E. + Host)
Common OS’s, Infrastructure, Tools, Libraries, Code…
the SAME SPE code runs from end to end
Toshiba SpursEngine
(SPU’s. + Host)
Mini-Roadrunner
Custom
Increasing support for s
cale and datacenter
Increasing support for s
cale and datacenter