Boost Your Productivity with GPGPUs and IBM Platform Computing

Post on 12-Sep-2021

3 views 0 download

Transcript of Boost Your Productivity with GPGPUs and IBM Platform Computing

Platform Computing

Boost Your Productivity with GPGPUs and IBM Platform Computing Software

NVIDIA GTC 2013Chris Porter, IBM

March, 2013

© 2012 IBM Corporation1

Chris Porter, IBM

Platform Computing

Agenda

• IBM Platform Computing offerings

• GPGPU Adoption in the HPC Market

• GPGPU Scheduling & Management

- IBM Platform Computing Solutions for GPGPUs

© 2012 IBM Corporation2

- IBM Platform Computing Solutions for GPGPUs

- Benefits from Intelligent GPU Scheduling & Management

- Use Case Examples

• Summary

Platform Computing

© 2012 IBM Corporation3

IBM PLATFORM COMPUTING OFFERINGS

Platform Computing

IBM Platform Computing The leader in cluster, grid and HPC cloud management software

• Acquired by IBM in 2012 as part of mainstream Technical Computing strategy

• 20 year history delivering leading workload and resource

management software for technical computing and big data/analytics environments

• 2000+ global customers including 23 of 30 largest enterprises

De facto Standard for Commercial

HPC

60% of top Financial

© 2012 IBM Corporation4

• 2000+ global customers including 23 of 30 largest enterprises

• Market leading scheduling engine with high performance,

mission-critical reliability and extreme scalability

• Comprehensive capability from ready-to-deploy complete cluster systems to large global grids to HPC clouds

• Large ISV and global partner ecosystem

• Global services and support coverage

Over 5 MM CPUs under management

Financial Services

Platform Computing

IBM Platform Computing offerings

Platform LSF Family

Platform HPC for

System x

Scalable, comprehensive workload management suite for heterogeneous compute environments

Simplified, integrated, purpose-built HPC management software integrated with systemsW

ork

load M

anagem

ent • Unmatched experience through market share

• Powerful multi-policy scheduling engine

• Unmatched scalability through high end accounts

• Unmatched breadth of offering due to extent of add-ons

• All-in-one integrated solution with leading web interface

• Applicable to the smallest of clusters

• Leverages Platform LSF technology base

• Hardware bundled for turnkey purchasing and deployment

© 2012 IBM Corporation5

Platform Symphony

Family

Platform Cluster

Manager

integrated with systems

High-throughput, low-latency compute and data intensive analytics applications

Provisioning and management of HPC clusters

Work

load M

anagem

ent

Analy

tics

Infr

astr

uctu

reF

lexib

le

Clu

ste

rs

• Leading experience due to 50%+ major investment banks as customers (translates to other industries)

• High scalability and better application performance due to fast, low latency processing (sub millisecond)

• Proven business model for sharing grid infrastructure

• Both compute and data intensive applications

• Hardware bundled for turnkey purchasing and deployment

• Scalable offerings to simplify process of deploying and managing small clusters to global HPC clouds

• Broad heterogeneous support enables managing broad technologies and multiple workload managers

• Enables multi-tenant HPC clouds

Platform Computing

The Application Accelerator Storm

• GPU adoption is increasing

– 53 systems on the Top500 released in Nov, 2012 are using GPGPUs

– GPGPUs are penetrating both high-end and mainstream HPC

• Nvidia is leading the accelerator race

– 100’s of K’s of trained CUDA developers worldwide

© 2012 IBM Corporation6

– 100’s of K’s of trained CUDA developers worldwide

– 50 systems powered by Nvidia on the latest Top500 list

• Other accelerator technologies are emerging

– Intel: Xeon Phi Coprocessor

– AMD: FireStream

Platform Computing

© 2012 IBM Corporation7

GPGPU ADOPTION IN THEHPC MARKET

Platform Computing

Market Landscape: Technical Applications are Exploding

Creativity

GeoScience Financial

CAE

Adoption Drivers Technical Applications

© 2012 IBM Corporation8

Productivity

Visualization

GeoScience

Life-

Sciences

Government

& Education

EDA

Financial

TechnicalProcessing

Quality

Platform Computing

The Big Buzz in HPC: Hybrid Computing

• Hybrid Computing: CPUs and GPUs working together

• Applications Taking Advantage Of GPUs

When do I use them?What is the ROI?How do I schedule jobs to them?How to maximize utilization, various published

benchmarks showing dramatic performance increases?

© 2012 IBM Corporation9

• Applications Taking Advantage Of GPUs

– Life Sciences• Unipro UGENE, Agile Molecule, many others

– Financial Services• Volmaster FX, ClusterTech Financial Library, many others

– Manufacturing

• Fidesys, Ansys, 3ds, many others

– Oil and Gas• Acceleware Seismic Solvers, many others

Platform Computing

© 2012 IBM Corporation10

GPU SCHEDULING & MANAGEMENT

Platform Computing

What do Intelligent GPU Scheduling and Management Bring to You?

• Improved application performance by allocating GPU suitable workloads on those resources and free up CPUs for other types of workloads.

• Reduced infrastructure cost by maximizing cluster utilization.

• Simplified system management via easy to use GUI and timely alerts.

• Increased productivity for administrators and application developers.

© 2012 IBM Corporation11Intelligent scheduling improves cluster efficiency

Platform Computing

GPUs: Schedule, Monitor & Manage

• DEPLOY: Quickly deploy workload to GPU resources

– Easy job submission to GPUs in a cluster via CUDA job submission wrappers

– Install CUDA across a cluster is a couple of clicks

• MANAGE: Easily manage heterogeneous clusters

– Deploy & manage both CPU & GPU resources in the same cluster

– Remotely manage & view the status of your jobs

© 2012 IBM Corporation12

Take immediate advantage of the exceptional HPC performance provided by GPUs

Platform Computing

GPUs: Schedule, Monitor & Manage

• MONITOR: Monitor GPU metrics

– GPU slot utilization, temperature & status

– Detect ECC error accumulation

© 2012 IBM Corporation13

Platform Computing

Scheduling to GPGPUs Today

• Managing latest GPGPUs and CUDA (V5.0) applications using:– IBM Platform LSF– IBM Platform HPC– IBM Platform Symphony

• GPU ELIM provides:– Monitoring and detection of GPUs– Group hosts with GPU(s) into a

resource group– Compute slots on these hosts are

user configured

Resource Group = RG_GPU

Compute HostGPU

ELIMLIM

Compute Host

Info on GPU(s)

GPU

© 2012 IBM Corporation14

• GPU-enablement is the responsibility of the application developer

GPU Management:• # of GPU• # of GPU in “normal”• # of GPU in “exclusive”• # of GPU in “prohibited”

LSFSCHED

Compute Host

GPUELIMLIM

Compute Host

ELIMLIMGPU Monitoring:• Mode (normal, exclusive, prohibited)• Temperature• ECC error count

Platform Computing

© 2012 IBM Corporation15

USE CASES

Platform Computing

Use Case #1: Simple Use Case

LSF Clusterjobs

jobs

© 2012 IBM Corporation16

• Nvidia GPGPU only• CUDA 5.0 and older• Simple monitoring statistics

jobs

jobs

ELIM

Platform Computing

Use Case #2: Complex Use Case

LSF Clusterjobs

jobs

© 2012 IBM Corporation17

• Multiple GPGPU / accelerators OR• Use of newer CUDA features > 3.2 OR• Monitoring of memory and GPU core utilization

jobs

jobs

Platform Computing

Use Case #3: NUMA optimization within a single server

GPU

Mem

ory

Mem

ory

16xCPU

Asymmetric Bandwidth

© 2012 IBM Corporation18

GPU

GPU

Mem

ory

Mem

ory

PC

I E

xpre

ss

CPU

8x

8x

Platform Computing

Use Case #3: NUMA Optimization within a single server

Asymmetric bandwidth requires:

– LSF: Non-GPU jobs to be scheduled to hosts without GPUs first

– LSF: Non-GPU jobs be scheduled to cores with low GPU bandwidth

– LSF: GPU jobs schedule to cores with maximum GPU bandwidth

© 2012 IBM Corporation19

GPU

GPU

GPU

Me

mo

ry

Me

mo

ryM

em

ory

Me

mo

ry

16x

PC

I E

xp

ress

CPU

CPU

8x

8x

Platform Computing

Use Case #4: NUMA optimization for multi-server MPI jobs

© 2012 IBM Corporation20

MPI job optimization

– MPI selects optimal cores for multi-host job MPI processes

GPU MPI job CPU only MPI jobGPU serial jobs

Platform Computing

Use Case #4: NUMA Optimization for multi-server MPI jobs

MPI based multi-server GPU and non-GPU jobs

– LSF: Single servers – LSF scheduling plugin controls core placement

– LSF: Multiple servers – LSF scheduling plugin does nothing

– MPI: Single servers – MPI scheduler does nothing

– MPI: Multiple servers – LSF scheduling plugin controls core placement

© 2012 IBM Corporation21

Platform Computing

© 2012 IBM Corporation22

SUMMARY

Platform Computing

Reality and Conclusions

Don’t have application developed for GPUs?

• Many ISVs are working hard to adopt CUDA and/or openCL for their applications

© 2012 IBM Corporation23

IBM Platform Computing has available solutions for

• Managing both GPU and CPU resources in a cluster

• Monitoring & visualizing important parameters for GPUs

• Scheduling serial jobs to available and functional GPUs

• Scheduling parallel jobs to available and functional GPUs

• Scheduling & optimizing mixed mode serial and parallel workload

Platform Computing

© 2012 IBM Corporation24

QUESTIONS?