Workgroups with DGX Station and Kubernetes No Data Center ...€¦ · can be run as interactive...

Post on 26-Jun-2020

0 views 0 download

Transcript of Workgroups with DGX Station and Kubernetes No Data Center ...€¦ · can be run as interactive...

GTC DC 2019 - DC91209

No Data Center? No Problem! Supporting AI Workgroups with DGX Station and Kubernetes

2

Michael BalintSenior Product ManagerNVIDIA @michaelbalint

Markus WeberSenior Product ManagerNVIDIA @MarkusAtNVIDIA

3

Topics

➔ Intro to DGX Station

➔ Sharing Your GPU Compute Resource● Basic● Intermediate● Advanced● Futures

➔ Takeaways

4

NVIDIA DGX STATION

Groundbreaking AI in Your Office

The AI Workstation for Data Science Teams

4

Key Features

1. 4 x NVIDIA Tesla V100 GPU (32GB)

2. 2nd-gen NVLink (4-way)

3. Water-cooled design

4. 3 x DisplayPort (4K resolution)

5. Intel Xeon E5-2698 20-core

6. 256GB DDR4 RAM

2

1

5

4

3

6

5

Deployment Scenarios

6

Deployment Scenarios

7

Today’s Focus

8

Today’s Focus

9

Basic Sharing

10

DGX STATION SOFTWARE STACK

DGX SOFTWARE STACK

Advantages:

Instant productivity with NVIDIA optimized deep learning frameworks

Caffe, Caffe2, PyTorch, TensorFlow, MXNet, and others

Performance optimized across the entire stack

Faster Time-to-Insight with pre-built, tested,and ready to run framework containers

Flexibility to use different versions of libraries like libc, cuDNN in each framework container

Fully Integrated Software for Instant Productivity

10

11

Using Individual GPUs$ docker run -e NVIDIA_VISIBLE_DEVICES=0,1 --rm nvidia/cuda nvidia-smi

Thu Mar 7 23:34:24 2019

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Tesla V100-DGXS... On | 00000000:07:00.0 On | 0 |

| N/A 36C P0 37W / 300W | 432MiB / 16125MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 1 Tesla V100-DGXS... On | 00000000:08:00.0 Off | 0 |

| N/A 36C P0 38W / 300W | 0MiB / 16128MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: GPU Memory |

| GPU PID Type Process name Usage |

|=============================================================================|

+-----------------------------------------------------------------------------+

12

Using Individual GPUs$ docker run -e NVIDIA_VISIBLE_DEVICES=0,1 --rm nvidia/cuda nvidia-smi

Thu Mar 7 23:34:24 2019

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Tesla V100-DGXS... On | 00000000:07:00.0 On | 0 |

| N/A 36C P0 37W / 300W | 432MiB / 16125MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 1 Tesla V100-DGXS... On | 00000000:08:00.0 Off | 0 |

| N/A 36C P0 38W / 300W | 0MiB / 16128MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: GPU Memory |

| GPU PID Type Process name Usage |

|=============================================================================|

+-----------------------------------------------------------------------------+

$ docker run -e NVIDIA_VISIBLE_DEVICES=2,3 --rm nvidia/cuda nvidia-smi

Thu Mar 7 23:35:13 2019

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Tesla V100-DGXS... On | 00000000:0E:00.0 Off | 0 |

| N/A 36C P0 38W / 300W | 0MiB / 16128MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 1 Tesla V100-DGXS... On | 00000000:0F:00.0 Off | 0 |

| N/A 36C P0 40W / 300W | 0MiB / 16128MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: GPU Memory |

| GPU PID Type Process name Usage |

|=============================================================================|

| No running processes found |

+-----------------------------------------------------------------------------+

13

Using Individual GPUs$ docker run -e NVIDIA_VISIBLE_DEVICES=0,1 --rm nvidia/cuda nvidia-smi

Thu Mar 7 23:34:24 2019

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Tesla V100-DGXS... On | 00000000:07:00.0 On | 0 |

| N/A 36C P0 37W / 300W | 432MiB / 16125MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 1 Tesla V100-DGXS... On | 00000000:08:00.0 Off | 0 |

| N/A 36C P0 38W / 300W | 0MiB / 16128MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: GPU Memory |

| GPU PID Type Process name Usage |

|=============================================================================|

+-----------------------------------------------------------------------------+

$ docker run -e NVIDIA_VISIBLE_DEVICES=2,3 --rm nvidia/cuda nvidia-smi

Thu Mar 7 23:35:13 2019

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Tesla V100-DGXS... On | 00000000:0E:00.0 Off | 0 |

| N/A 36C P0 38W / 300W | 0MiB / 16128MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 1 Tesla V100-DGXS... On | 00000000:0F:00.0 Off | 0 |

| N/A 36C P0 40W / 300W | 0MiB / 16128MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: GPU Memory |

| GPU PID Type Process name Usage |

|=============================================================================|

| No running processes found |

+-----------------------------------------------------------------------------+

14

NVIDIA GPU Cloud (NGC)

15

Using Individual GPUs

Joe:

docker run -e NVIDIA_VISIBLE_DEVICES=0 --rm nvcr.io/nvidia/pytorch:19.02-py3 python \ /workspace/examples/upstream/mnist/main.py

docker run -e NVIDIA_VISIBLE_DEVICES=1 --rm nvcr.io/nvidia/pytorch:18.11-py2 python \ /workspace/examples/upstream/mnist/main.py

Jane:

docker run --it -e NVIDIA_VISIBLE_DEVICES=2,3 --rm -v /home/jane/data/mnist:/data/mnist nvcr.io/nvidia/tensorflow:19.02-py3

Real-World Execution Examples

16

Using Individual GPUs

Joe:

docker run -e NVIDIA_VISIBLE_DEVICES=0 --rm nvcr.io/nvidia/pytorch:19.02-py3 python \ /workspace/examples/upstream/mnist/main.py

docker run -e NVIDIA_VISIBLE_DEVICES=1 --rm nvcr.io/nvidia/pytorch:18.11-py2 python \ /workspace/examples/upstream/mnist/main.py

Jane:

docker run --it -e NVIDIA_VISIBLE_DEVICES=2,3 --rm -v /home/jane/data/mnist:/data/mnist nvcr.io/nvidia/tensorflow:19.02-py3

Real-World Execution Examples

17

“Manual” Sharing

18

Using VNC

19

Intermediate Sharing

20

Data StorageInternal RAID 0 | Internal RAID 5 | External DAS

$ lsblkNAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTsda 8:0 0 1.8T 0 disk ├─sda1 8:1 0 487M 0 part /boot/efi└─sda2 8:2 0 1.8T 0 part /sdb 8:16 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raidsdc 8:32 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raidsdd 8:48 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raid

$ sudo configure_raid_array.py -m raid5

$ sudo configure_raid_array.py -m raid0

21

Data StorageInternal RAID 0 | Internal RAID 5 | External DAS

$ lsblkNAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTsda 8:0 0 1.8T 0 disk ├─sda1 8:1 0 487M 0 part /boot/efi└─sda2 8:2 0 1.8T 0 part /sdb 8:16 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raidsdc 8:32 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raidsdd 8:48 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raid

$ sudo configure_raid_array.py -m raid5

$ sudo configure_raid_array.py -m raid0

DASeSATA

22

Data StorageInternal RAID 0 | Internal RAID 5 | External DAS

$ lsblkNAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTsda 8:0 0 1.8T 0 disk ├─sda1 8:1 0 487M 0 part /boot/efi└─sda2 8:2 0 1.8T 0 part /sdb 8:16 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raidsdc 8:32 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raidsdd 8:48 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raid

$ sudo configure_raid_array.py -m raid5

$ sudo configure_raid_array.py -m raid0

DASUSB 3.1 (Gen 2)

23

Configuring a NFS Cache

/raidUsed for FSC

3 drives5.2 TB

NFS Shared storage

/boot

1.8 TB

/mnt mount pointRemote NFS

storage

Highspeed Ethernet

10GBASE-T (RJ45)

24

Advanced Sharing

25

DEMO

26

AI as a ServiceUse Case #1: Interactive Session

1. User requests 2 GPUs for interactive session via browser

2. Cluster finds 2 free GPUs and spawns container which taps into them

3. User is presented with interactive python notebook

GPU GPU GPU GPUGPU GPU GPU GPU

GPU GPU GPU GPUGPU GPU GPU GPU

GPU GPU GPU GPUGPU GPU GPU GPU

x

2x GPUs

27

AI as a ServiceUse Case #2: ML Workflow

1. User defines pipeline, each step uses a container; submits it to cluster

2. Cluster finds resources for each step of pipeline, spawning necessary containers and tapping into GPUs

3. Results written to disk; user analyzes

1. Preprocess2. Train A3. Train B

GPU GPU GPU GPUGPU GPU GPU GPU

GPU GPU GPU GPUGPU GPU GPU GPU

GPU GPU GPU GPUGPU GPU GPU GPU

| | | | |

| | | | |

| | | | |

1

2

3

| | | | || | | | || | | | |

NOTE: Hyperparameter optimization can use a pipeline as well, spawning containers for each operation.

28

AI as a ServiceUse Case #3: Inference Server

1. User submits to the cluster a deployment with requirements, including redundancy, a model to be served, and a desired URL endpoint

3x replicasTF modelServe on port 8080

2. Cluster serves the model with a container spawned for each replica

GPU GPU GPU GPUGPU GPU GPU GPU

GPU GPU GPU GPUGPU GPU GPU GPU

GPU GPU GPU GPUGPU GPU GPU GPU

3. If one of the replicas go down, a new container is automatically spawned to replace it, guaranteeing service

GPU GPU GPU GPUGPU GPU GPU GPU

GPU GPU GPU GPUGPU GPU GPU GPU

GPU GPU GPU GPUGPU GPU GPU GPU

| | | | |

| | | | |

| | | | |

X

29

Example Tech Stack With K8s

Kubernetes

Kubeflow

NGC Container

Jupyter

NGC Container

NGC Container

NGC Container

NGC Container

NGC Container

NGC Container

GPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

GPU

• Jupyter Notebooks provide an interactive interface to the cluster

• NGC Containers include CUDA and encapsulate DL & ML frameworks that can be run as interactive sessions (w/ Jupyter), workflows (involving multiple containers), or model serving

• Kubeflow interfaces with Kubernetes, simplifying the process of creating deployments

• Kubernetes acts as the OS of the cluster, keeping track of hardware resources and scheduling as necessary

Interactive Session

ML Workflow

Inference Server

30

Deployment Use CasesWhere can it be leveraged?

Many users, many nodesOn-prem

Many users, single nodeOn-prem

Cloud burstingHybrid

Edge/IoTMulti-region

Production Inferencing*

DGX DGX DGX

DGX DGX DGX

DGX DGX DGX

Cluster API

DG

X Station

Cluster API

DGX

DGX

DGX

CSP

DGX

DGX

DGX

DGX

DGX

Jetson TX2

Jetson TX2

Jetson TX2

Jetson TX2

DGX

DGX

DGX

DGX

DGX

31

DeepOps

• Opinionated defaults, incorporating NVIDIA best-practices

• Highly-modular, organized into components that can be customized and installed ad-hoc.

• Open source, freely-available, but requires some DevOps knowledge to customize & deploy.

• GitHub: https://github.com/NVIDIA/deepops

• Installs latest DGX OS on compute nodes

• Manages firmware, drivers, and other software

• Deploys job scheduling (Kubernetes and/or Slurm)

• Provides logging and monitoring services

• Scripts for additional services (Kubeflow, Dask, etc)

Note: DeepOps can also be used to configure any NVIDIA GPU-Accelerated platform

For cluster deployment and management

32

To Summarize

Basic Intermediate Advanced

OS Users Internal Storage DeepOps

SSH External Storage Kubernetes

Docker / Containers NFS Cache Scripts

NGC Scheduling

Manual Scheduling Orchestration

VNC Monitoring