Workgroups with DGX Station and Kubernetes No Data Center ...€¦ · can be run as interactive...
Transcript of Workgroups with DGX Station and Kubernetes No Data Center ...€¦ · can be run as interactive...
GTC DC 2019 - DC91209
No Data Center? No Problem! Supporting AI Workgroups with DGX Station and Kubernetes
2
Michael BalintSenior Product ManagerNVIDIA @michaelbalint
Markus WeberSenior Product ManagerNVIDIA @MarkusAtNVIDIA
3
Topics
➔ Intro to DGX Station
➔ Sharing Your GPU Compute Resource● Basic● Intermediate● Advanced● Futures
➔ Takeaways
4
NVIDIA DGX STATION
Groundbreaking AI in Your Office
The AI Workstation for Data Science Teams
4
Key Features
1. 4 x NVIDIA Tesla V100 GPU (32GB)
2. 2nd-gen NVLink (4-way)
3. Water-cooled design
4. 3 x DisplayPort (4K resolution)
5. Intel Xeon E5-2698 20-core
6. 256GB DDR4 RAM
2
1
5
4
3
6
5
Deployment Scenarios
6
Deployment Scenarios
7
Today’s Focus
8
Today’s Focus
9
Basic Sharing
10
DGX STATION SOFTWARE STACK
DGX SOFTWARE STACK
Advantages:
Instant productivity with NVIDIA optimized deep learning frameworks
Caffe, Caffe2, PyTorch, TensorFlow, MXNet, and others
Performance optimized across the entire stack
Faster Time-to-Insight with pre-built, tested,and ready to run framework containers
Flexibility to use different versions of libraries like libc, cuDNN in each framework container
Fully Integrated Software for Instant Productivity
10
11
Using Individual GPUs$ docker run -e NVIDIA_VISIBLE_DEVICES=0,1 --rm nvidia/cuda nvidia-smi
Thu Mar 7 23:34:24 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-DGXS... On | 00000000:07:00.0 On | 0 |
| N/A 36C P0 37W / 300W | 432MiB / 16125MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-DGXS... On | 00000000:08:00.0 Off | 0 |
| N/A 36C P0 38W / 300W | 0MiB / 16128MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
12
Using Individual GPUs$ docker run -e NVIDIA_VISIBLE_DEVICES=0,1 --rm nvidia/cuda nvidia-smi
Thu Mar 7 23:34:24 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-DGXS... On | 00000000:07:00.0 On | 0 |
| N/A 36C P0 37W / 300W | 432MiB / 16125MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-DGXS... On | 00000000:08:00.0 Off | 0 |
| N/A 36C P0 38W / 300W | 0MiB / 16128MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
$ docker run -e NVIDIA_VISIBLE_DEVICES=2,3 --rm nvidia/cuda nvidia-smi
Thu Mar 7 23:35:13 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-DGXS... On | 00000000:0E:00.0 Off | 0 |
| N/A 36C P0 38W / 300W | 0MiB / 16128MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-DGXS... On | 00000000:0F:00.0 Off | 0 |
| N/A 36C P0 40W / 300W | 0MiB / 16128MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
13
Using Individual GPUs$ docker run -e NVIDIA_VISIBLE_DEVICES=0,1 --rm nvidia/cuda nvidia-smi
Thu Mar 7 23:34:24 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-DGXS... On | 00000000:07:00.0 On | 0 |
| N/A 36C P0 37W / 300W | 432MiB / 16125MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-DGXS... On | 00000000:08:00.0 Off | 0 |
| N/A 36C P0 38W / 300W | 0MiB / 16128MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
$ docker run -e NVIDIA_VISIBLE_DEVICES=2,3 --rm nvidia/cuda nvidia-smi
Thu Mar 7 23:35:13 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-DGXS... On | 00000000:0E:00.0 Off | 0 |
| N/A 36C P0 38W / 300W | 0MiB / 16128MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-DGXS... On | 00000000:0F:00.0 Off | 0 |
| N/A 36C P0 40W / 300W | 0MiB / 16128MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
14
NVIDIA GPU Cloud (NGC)
15
Using Individual GPUs
Joe:
docker run -e NVIDIA_VISIBLE_DEVICES=0 --rm nvcr.io/nvidia/pytorch:19.02-py3 python \ /workspace/examples/upstream/mnist/main.py
docker run -e NVIDIA_VISIBLE_DEVICES=1 --rm nvcr.io/nvidia/pytorch:18.11-py2 python \ /workspace/examples/upstream/mnist/main.py
Jane:
docker run --it -e NVIDIA_VISIBLE_DEVICES=2,3 --rm -v /home/jane/data/mnist:/data/mnist nvcr.io/nvidia/tensorflow:19.02-py3
Real-World Execution Examples
16
Using Individual GPUs
Joe:
docker run -e NVIDIA_VISIBLE_DEVICES=0 --rm nvcr.io/nvidia/pytorch:19.02-py3 python \ /workspace/examples/upstream/mnist/main.py
docker run -e NVIDIA_VISIBLE_DEVICES=1 --rm nvcr.io/nvidia/pytorch:18.11-py2 python \ /workspace/examples/upstream/mnist/main.py
Jane:
docker run --it -e NVIDIA_VISIBLE_DEVICES=2,3 --rm -v /home/jane/data/mnist:/data/mnist nvcr.io/nvidia/tensorflow:19.02-py3
Real-World Execution Examples
17
“Manual” Sharing
18
Using VNC
19
Intermediate Sharing
20
Data StorageInternal RAID 0 | Internal RAID 5 | External DAS
$ lsblkNAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTsda 8:0 0 1.8T 0 disk ├─sda1 8:1 0 487M 0 part /boot/efi└─sda2 8:2 0 1.8T 0 part /sdb 8:16 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raidsdc 8:32 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raidsdd 8:48 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raid
$ sudo configure_raid_array.py -m raid5
$ sudo configure_raid_array.py -m raid0
21
Data StorageInternal RAID 0 | Internal RAID 5 | External DAS
$ lsblkNAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTsda 8:0 0 1.8T 0 disk ├─sda1 8:1 0 487M 0 part /boot/efi└─sda2 8:2 0 1.8T 0 part /sdb 8:16 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raidsdc 8:32 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raidsdd 8:48 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raid
$ sudo configure_raid_array.py -m raid5
$ sudo configure_raid_array.py -m raid0
DASeSATA
22
Data StorageInternal RAID 0 | Internal RAID 5 | External DAS
$ lsblkNAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTsda 8:0 0 1.8T 0 disk ├─sda1 8:1 0 487M 0 part /boot/efi└─sda2 8:2 0 1.8T 0 part /sdb 8:16 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raidsdc 8:32 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raidsdd 8:48 0 1.8T 0 disk └─md0 9:0 0 5.2T 0 raid0 /raid
$ sudo configure_raid_array.py -m raid5
$ sudo configure_raid_array.py -m raid0
DASUSB 3.1 (Gen 2)
23
Configuring a NFS Cache
/raidUsed for FSC
3 drives5.2 TB
NFS Shared storage
/boot
1.8 TB
/mnt mount pointRemote NFS
storage
Highspeed Ethernet
10GBASE-T (RJ45)
24
Advanced Sharing
25
DEMO
26
AI as a ServiceUse Case #1: Interactive Session
1. User requests 2 GPUs for interactive session via browser
2. Cluster finds 2 free GPUs and spawns container which taps into them
3. User is presented with interactive python notebook
GPU GPU GPU GPUGPU GPU GPU GPU
GPU GPU GPU GPUGPU GPU GPU GPU
GPU GPU GPU GPUGPU GPU GPU GPU
x
2x GPUs
27
AI as a ServiceUse Case #2: ML Workflow
1. User defines pipeline, each step uses a container; submits it to cluster
2. Cluster finds resources for each step of pipeline, spawning necessary containers and tapping into GPUs
3. Results written to disk; user analyzes
1. Preprocess2. Train A3. Train B
GPU GPU GPU GPUGPU GPU GPU GPU
GPU GPU GPU GPUGPU GPU GPU GPU
GPU GPU GPU GPUGPU GPU GPU GPU
| | | | |
| | | | |
| | | | |
1
2
3
| | | | || | | | || | | | |
NOTE: Hyperparameter optimization can use a pipeline as well, spawning containers for each operation.
28
AI as a ServiceUse Case #3: Inference Server
1. User submits to the cluster a deployment with requirements, including redundancy, a model to be served, and a desired URL endpoint
3x replicasTF modelServe on port 8080
2. Cluster serves the model with a container spawned for each replica
GPU GPU GPU GPUGPU GPU GPU GPU
GPU GPU GPU GPUGPU GPU GPU GPU
GPU GPU GPU GPUGPU GPU GPU GPU
3. If one of the replicas go down, a new container is automatically spawned to replace it, guaranteeing service
GPU GPU GPU GPUGPU GPU GPU GPU
GPU GPU GPU GPUGPU GPU GPU GPU
GPU GPU GPU GPUGPU GPU GPU GPU
| | | | |
| | | | |
| | | | |
X
29
Example Tech Stack With K8s
Kubernetes
Kubeflow
NGC Container
Jupyter
NGC Container
NGC Container
NGC Container
NGC Container
NGC Container
NGC Container
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
• Jupyter Notebooks provide an interactive interface to the cluster
• NGC Containers include CUDA and encapsulate DL & ML frameworks that can be run as interactive sessions (w/ Jupyter), workflows (involving multiple containers), or model serving
• Kubeflow interfaces with Kubernetes, simplifying the process of creating deployments
• Kubernetes acts as the OS of the cluster, keeping track of hardware resources and scheduling as necessary
Interactive Session
ML Workflow
Inference Server
30
Deployment Use CasesWhere can it be leveraged?
Many users, many nodesOn-prem
Many users, single nodeOn-prem
Cloud burstingHybrid
Edge/IoTMulti-region
Production Inferencing*
DGX DGX DGX
DGX DGX DGX
DGX DGX DGX
Cluster API
DG
X Station
Cluster API
DGX
DGX
DGX
CSP
DGX
DGX
DGX
DGX
DGX
Jetson TX2
Jetson TX2
Jetson TX2
Jetson TX2
DGX
DGX
DGX
DGX
DGX
31
DeepOps
• Opinionated defaults, incorporating NVIDIA best-practices
• Highly-modular, organized into components that can be customized and installed ad-hoc.
• Open source, freely-available, but requires some DevOps knowledge to customize & deploy.
• GitHub: https://github.com/NVIDIA/deepops
• Installs latest DGX OS on compute nodes
• Manages firmware, drivers, and other software
• Deploys job scheduling (Kubernetes and/or Slurm)
• Provides logging and monitoring services
• Scripts for additional services (Kubeflow, Dask, etc)
Note: DeepOps can also be used to configure any NVIDIA GPU-Accelerated platform
For cluster deployment and management
32
To Summarize
Basic Intermediate Advanced
OS Users Internal Storage DeepOps
SSH External Storage Kubernetes
Docker / Containers NFS Cache Scripts
NGC Scheduling
Manual Scheduling Orchestration
VNC Monitoring