Boost Your Productivity with GPGPUs and IBM Platform Computing
Transcript of Boost Your Productivity with GPGPUs and IBM Platform Computing
Platform Computing
Boost Your Productivity with GPGPUs and IBM Platform Computing Software
NVIDIA GTC 2013Chris Porter, IBM
March, 2013
© 2012 IBM Corporation1
Chris Porter, IBM
Platform Computing
Agenda
• IBM Platform Computing offerings
• GPGPU Adoption in the HPC Market
• GPGPU Scheduling & Management
- IBM Platform Computing Solutions for GPGPUs
© 2012 IBM Corporation2
- IBM Platform Computing Solutions for GPGPUs
- Benefits from Intelligent GPU Scheduling & Management
- Use Case Examples
• Summary
Platform Computing
© 2012 IBM Corporation3
IBM PLATFORM COMPUTING OFFERINGS
Platform Computing
IBM Platform Computing The leader in cluster, grid and HPC cloud management software
• Acquired by IBM in 2012 as part of mainstream Technical Computing strategy
• 20 year history delivering leading workload and resource
management software for technical computing and big data/analytics environments
• 2000+ global customers including 23 of 30 largest enterprises
De facto Standard for Commercial
HPC
60% of top Financial
© 2012 IBM Corporation4
• 2000+ global customers including 23 of 30 largest enterprises
• Market leading scheduling engine with high performance,
mission-critical reliability and extreme scalability
• Comprehensive capability from ready-to-deploy complete cluster systems to large global grids to HPC clouds
• Large ISV and global partner ecosystem
• Global services and support coverage
Over 5 MM CPUs under management
Financial Services
Platform Computing
IBM Platform Computing offerings
Platform LSF Family
Platform HPC for
System x
Scalable, comprehensive workload management suite for heterogeneous compute environments
Simplified, integrated, purpose-built HPC management software integrated with systemsW
ork
load M
anagem
ent • Unmatched experience through market share
• Powerful multi-policy scheduling engine
• Unmatched scalability through high end accounts
• Unmatched breadth of offering due to extent of add-ons
• All-in-one integrated solution with leading web interface
• Applicable to the smallest of clusters
• Leverages Platform LSF technology base
• Hardware bundled for turnkey purchasing and deployment
© 2012 IBM Corporation5
Platform Symphony
Family
Platform Cluster
Manager
integrated with systems
High-throughput, low-latency compute and data intensive analytics applications
Provisioning and management of HPC clusters
Work
load M
anagem
ent
Analy
tics
Infr
astr
uctu
reF
lexib
le
Clu
ste
rs
• Leading experience due to 50%+ major investment banks as customers (translates to other industries)
• High scalability and better application performance due to fast, low latency processing (sub millisecond)
• Proven business model for sharing grid infrastructure
• Both compute and data intensive applications
• Hardware bundled for turnkey purchasing and deployment
• Scalable offerings to simplify process of deploying and managing small clusters to global HPC clouds
• Broad heterogeneous support enables managing broad technologies and multiple workload managers
• Enables multi-tenant HPC clouds
Platform Computing
The Application Accelerator Storm
• GPU adoption is increasing
– 53 systems on the Top500 released in Nov, 2012 are using GPGPUs
– GPGPUs are penetrating both high-end and mainstream HPC
• Nvidia is leading the accelerator race
– 100’s of K’s of trained CUDA developers worldwide
© 2012 IBM Corporation6
– 100’s of K’s of trained CUDA developers worldwide
– 50 systems powered by Nvidia on the latest Top500 list
• Other accelerator technologies are emerging
– Intel: Xeon Phi Coprocessor
– AMD: FireStream
Platform Computing
© 2012 IBM Corporation7
GPGPU ADOPTION IN THEHPC MARKET
Platform Computing
Market Landscape: Technical Applications are Exploding
Creativity
GeoScience Financial
CAE
Adoption Drivers Technical Applications
© 2012 IBM Corporation8
Productivity
Visualization
GeoScience
Life-
Sciences
Government
& Education
EDA
Financial
TechnicalProcessing
Quality
Platform Computing
The Big Buzz in HPC: Hybrid Computing
• Hybrid Computing: CPUs and GPUs working together
• Applications Taking Advantage Of GPUs
When do I use them?What is the ROI?How do I schedule jobs to them?How to maximize utilization, various published
benchmarks showing dramatic performance increases?
© 2012 IBM Corporation9
• Applications Taking Advantage Of GPUs
– Life Sciences• Unipro UGENE, Agile Molecule, many others
– Financial Services• Volmaster FX, ClusterTech Financial Library, many others
– Manufacturing
• Fidesys, Ansys, 3ds, many others
– Oil and Gas• Acceleware Seismic Solvers, many others
Platform Computing
© 2012 IBM Corporation10
GPU SCHEDULING & MANAGEMENT
Platform Computing
What do Intelligent GPU Scheduling and Management Bring to You?
• Improved application performance by allocating GPU suitable workloads on those resources and free up CPUs for other types of workloads.
• Reduced infrastructure cost by maximizing cluster utilization.
• Simplified system management via easy to use GUI and timely alerts.
• Increased productivity for administrators and application developers.
© 2012 IBM Corporation11Intelligent scheduling improves cluster efficiency
Platform Computing
GPUs: Schedule, Monitor & Manage
• DEPLOY: Quickly deploy workload to GPU resources
– Easy job submission to GPUs in a cluster via CUDA job submission wrappers
– Install CUDA across a cluster is a couple of clicks
• MANAGE: Easily manage heterogeneous clusters
– Deploy & manage both CPU & GPU resources in the same cluster
– Remotely manage & view the status of your jobs
© 2012 IBM Corporation12
Take immediate advantage of the exceptional HPC performance provided by GPUs
Platform Computing
GPUs: Schedule, Monitor & Manage
• MONITOR: Monitor GPU metrics
– GPU slot utilization, temperature & status
– Detect ECC error accumulation
© 2012 IBM Corporation13
Platform Computing
Scheduling to GPGPUs Today
• Managing latest GPGPUs and CUDA (V5.0) applications using:– IBM Platform LSF– IBM Platform HPC– IBM Platform Symphony
• GPU ELIM provides:– Monitoring and detection of GPUs– Group hosts with GPU(s) into a
resource group– Compute slots on these hosts are
user configured
Resource Group = RG_GPU
Compute HostGPU
ELIMLIM
Compute Host
Info on GPU(s)
GPU
© 2012 IBM Corporation14
• GPU-enablement is the responsibility of the application developer
GPU Management:• # of GPU• # of GPU in “normal”• # of GPU in “exclusive”• # of GPU in “prohibited”
LSFSCHED
Compute Host
GPUELIMLIM
Compute Host
ELIMLIMGPU Monitoring:• Mode (normal, exclusive, prohibited)• Temperature• ECC error count
Platform Computing
© 2012 IBM Corporation15
USE CASES
Platform Computing
Use Case #1: Simple Use Case
LSF Clusterjobs
jobs
© 2012 IBM Corporation16
• Nvidia GPGPU only• CUDA 5.0 and older• Simple monitoring statistics
jobs
jobs
ELIM
Platform Computing
Use Case #2: Complex Use Case
LSF Clusterjobs
jobs
© 2012 IBM Corporation17
• Multiple GPGPU / accelerators OR• Use of newer CUDA features > 3.2 OR• Monitoring of memory and GPU core utilization
jobs
jobs
Platform Computing
Use Case #3: NUMA optimization within a single server
GPU
Mem
ory
Mem
ory
16xCPU
Asymmetric Bandwidth
© 2012 IBM Corporation18
GPU
GPU
Mem
ory
Mem
ory
PC
I E
xpre
ss
CPU
8x
8x
Platform Computing
Use Case #3: NUMA Optimization within a single server
Asymmetric bandwidth requires:
– LSF: Non-GPU jobs to be scheduled to hosts without GPUs first
– LSF: Non-GPU jobs be scheduled to cores with low GPU bandwidth
– LSF: GPU jobs schedule to cores with maximum GPU bandwidth
© 2012 IBM Corporation19
GPU
GPU
GPU
Me
mo
ry
Me
mo
ryM
em
ory
Me
mo
ry
16x
PC
I E
xp
ress
CPU
CPU
8x
8x
Platform Computing
Use Case #4: NUMA optimization for multi-server MPI jobs
© 2012 IBM Corporation20
MPI job optimization
– MPI selects optimal cores for multi-host job MPI processes
GPU MPI job CPU only MPI jobGPU serial jobs
Platform Computing
Use Case #4: NUMA Optimization for multi-server MPI jobs
MPI based multi-server GPU and non-GPU jobs
– LSF: Single servers – LSF scheduling plugin controls core placement
– LSF: Multiple servers – LSF scheduling plugin does nothing
– MPI: Single servers – MPI scheduler does nothing
– MPI: Multiple servers – LSF scheduling plugin controls core placement
© 2012 IBM Corporation21
Platform Computing
© 2012 IBM Corporation22
SUMMARY
Platform Computing
Reality and Conclusions
Don’t have application developed for GPUs?
• Many ISVs are working hard to adopt CUDA and/or openCL for their applications
© 2012 IBM Corporation23
IBM Platform Computing has available solutions for
• Managing both GPU and CPU resources in a cluster
• Monitoring & visualizing important parameters for GPUs
• Scheduling serial jobs to available and functional GPUs
• Scheduling parallel jobs to available and functional GPUs
• Scheduling & optimizing mixed mode serial and parallel workload
Platform Computing
© 2012 IBM Corporation24
QUESTIONS?