NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst...
Transcript of NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst...
![Page 1: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/1.jpg)
May 2015
NVIDIA GPU TECHNOLOGY
UPDATE
Axel Koehler
Senior Solutions Architect, NVIDIA
![Page 2: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/2.jpg)
2
PC DATA CENTER MOBILE
ENTERPRISE VIRTUALIZATION
AUTONOMOUS MACHINES
HPC & CLOUD SERVICE PROVIDERS GAMING DESIGN
NVIDIA: The VISUAL Computing Company
![Page 3: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/3.jpg)
3
Tesla Accelerated Computing Platform
![Page 4: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/4.jpg)
4
Tesla GPU Accelerators for 2015
Server
Seismic, Data Analytics, HPC Labs, Defense
Multi-GPU Accelerated Apps
Single and Double Precision Workloads
Server, Workstation, Liquid Cooled
Higher Ed, Data Analytics, HPC Labs, Defense
Double Precision Workloads
Tesla K40 Tesla K80
Best Single GPU Performance Maximize Throughput within a Server
![Page 5: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/5.jpg)
5
Tesla K40 / K80
K40 K80
GPU GK110B GK210
Peak SP (board @ base clock)
4.29TFLOPS ~5.6TFLOPS (Base)
Peak DP (per board)
1.43 TFLOPS
1.68 TFLOPS(Boost)
~1.87 TFLOPS (Base)
~2.7 TFLOPS (Boost)
# of GPUs 1 2
# of CUDA
Cores/board 2880 4992
PCIe Gen Gen 3 Gen 3
GDDR5 Memory Size
(per board) 12 GB 24 GB
Memory Bandwidth 288 GB/s ~480GB/s
GPUBoost 2 Levels >10 levels
Power 235W 300W
Form Factors PCIe Active
PCIe Passive PCIe Passive
![Page 6: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/6.jpg)
6
Average GPU Power in Watts
0
20
40
60
80
100
120
140
160
180
AMBER ANSYS Black Scholes Chroma GROMACS GTC LAMMPS LSMS NAMD Nbody QMCPACK RTM SPECFEM3D
Board
Pow
er
(Watt
s)
Avg GPU Power in Watts for Real Applications on K20X
![Page 7: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/7.jpg)
7
GPU Boost
Base
Clock
Workload # 1
Worst case
Reference
App
23
5W
Boost
Clock #1
Workload # 2
E.g. AMBER
23
5W
Boost
Clock #2
Workload # 3
E.g. ANSYS
Fluent
23
5W
GPU Boost K40
810Mhz
745Mhz
875Mhz
Zero Idle
Boost
Base
GPU Clock
1.87 Teraflops
DP @ 560 MHz
875 MHz
40-50% more flops
with Boost
Most CUDA Apps Run At Boost Clocks
DGEMM Heavy Apps Run at Base Clocks
Dynamic GPU Boost K80
![Page 8: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/8.jpg)
8
GPU Roadmap
![Page 9: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/9.jpg)
9
Pascal GPU Features NVLINK and Stacked Memory
NVLINK GPU high speed interconnect
5x PCIe bandwidth
Move data at CPU memory speed
3x lower energy/bit
3D Stacked Memory 4x Higher Bandwidth (~1 TB/s)
3x Larger Capacity
4x More Energy Efficient per bit
![Page 10: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/10.jpg)
Developer View Without Unified Memory
Developer View With Unified Memory
Unified Memory System Memory
GPU Memory
Unified Memory Dramatically Lower Developer Effort
![Page 11: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/11.jpg)
11
NVLink and Unified Memory
Enable Data Transfer At Speed of CPU Memory
![Page 12: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/12.jpg)
Move Data where it is Needed Fast
Accelerated Communication
GPU Direct RDMA NVLINK
Fast Access to other Nodes
Eliminate CPU Latency
Eliminate CPU Bottleneck
2x App Performance
5x Faster Than PCIe
Fast Access to System Memory
GPU Direct P2P
Multi-GPU Scaling
Fast GPU Communication
Fast GPU Memory Access
![Page 13: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/13.jpg)
13 SC14 TALK AT MELLANOX BOOTH
GPUDIRECT RDMA
GPU
CPU
IOH
HCA
CPU prepares and queues communication tasks on HCA
CPU synchronizes with GPU tasks
HCA directly accesses GPU memory
GPU
CPU
IOH
HCA
CPU prepares and queues communication tasks on GPU
GPU triggers communication on HCA
HCA directly accesses GPU memory
GPUDIRECT ASYNC
http://on-demand.gputechconf.com/gtc/2015/presentation/S5412-Davide-Rossetti.pdf
Improving GPUDirect RDMA
![Page 14: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/14.jpg)
14
Developer Platform With Open Ecosystem Accelerate Applications Across Multiple CPUs
x86
Libraries
Programming
Languages
Compiler
Directives
AmgX
cuBLAS
/
![Page 15: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/15.jpg)
15
Drop-in Acceleration with GPU Libraries
Speedups out of the box
AmgX cuFFT
NPP cuBLAS cuRAND
cuSPARSE MATH
Linear Performance Scaling with XT libraries
cuBLAS-XT Machine learning, O&G, Material Sience, Defense,
Supercomputing
cuFFT-XT O&G, Molecular Dynamics, Defense
AmgX CFD, Supercomputing, O&G Reservoir Sim
![Page 16: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/16.jpg)
CUDA 7 – New Features
• C++11 feature support
– Auto, Lambda, std::initializer_list, Variadic Templates, Static_asserts,
Constexpr, Rvalue references, Range based for loops
• Runtime Compilation (RTC)
• cuSolver library
– Routines for solving sparse and dense linear systems and Eigen problems
– Three APIs: Dense, Sparse, Refactorization
• Thrust improvements
– Device-side Thrust , API support for CUDA streams, Performance
• HyperQ/MPI (MPS): Multiple GPUs per Node
![Page 17: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/17.jpg)
17
CUDA7: Supported C++11 Features
C++11 language features enabled, including:
Auto
Lambda
std::initializer_list
Variadic Templates
Static_asserts
Constexpr
Rvalue references
Range based for loops
…
Not supported: thread_local
Standard libraries
std::thread,
Etc.
![Page 18: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/18.jpg)
18
Application
// launch foo()
Runtime
Compilation
Library
(libnvrtc)
CUDA 7.0 Runtime Compilation
Compile CUDA kernel source at run time
Compiled kernels can be cached on disk
Runtime C++ Code Specialization
Optimize code based on run-time data
Reduce compile time and compiled code size
Enables runtime code generation, C++ template-based DSLs
__global__
foo(..) { .. }
Compiled
Kernel
![Page 19: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/19.jpg)
19
cuSOLVER
cusolverDN Dense Cholesky, LU, SVD, QR
Optimization, Computer vision, CFD
cusolverSP Sparse direct solvers & Eigensolvers
Newton’s method, Chemical kinetics
cusolverRF Sparse refactorization solver
Chemistry, ODEs, Circuit simulation
0x
5x
10x
15x
20x
mhd4800b ex33 Muu gyro_m
cusolverSP Speedup over CPU
cuSOLVER 7.0, MKL 11.0.4, SuiteSparse 3.6.0
K40, i7-3930K CPU @ 3.20GHz
0x
2x
4x
6x
8x
SPOTRF DPOTRF CPOTRF ZPOTRF
cusolverDN Speedup over CPU
M=N=4096
![Page 20: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/20.jpg)
GPU 0 GPU 1
CUDA
MPI
Rank 0
CUDA
MPI
Rank 1
CUDA
MPI
Rank 2
CUDA
MPI
Rank 3
CUDA7: HyperQ/MPI (MPS): Multiple GPUs per Node
MPS Server
MPS Server efficiently overlaps work
from multiple ranks to each GPU
lrank=$OMPI_COMM_WORLD_LOCAL_RANK
case ${lrank} in
[0]) export CUDA_VISIBLE_DEVICES=0; numactl —cpunodebind=0 ./executable;;
[1]) export CUDA_VISIBLE_DEVICES=1; numactl —cpunodebind=1 ./executable;;
[2]) export CUDA_VISIBLE_DEVICES=0; numactl —cpunodebind=0 ./executable;;
[3]) export CUDA_VISIBLE_DEVICES=1; numactl —cpunodebind=1 ./executable;;
esac
![Page 21: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/21.jpg)
21
2008 – PGI Accelerator Model (targeting NVIDIA GPUs)
2011 – OpenACC 1.0 (targeting NVIDIA GPUs, AMD GPUs)
data regions, compute regions, gang/worker/vector
2013 – OpenACC 2.0
procedures, dynamic data lifetimes
2015 – OpenACC 2.5
minor fixes, additions
2015/16 – OpenACC 3.0
deep copy
http://on-demand.gputechconf.com/gtc/2015/presentation/S5382-Michael-Wolfe.pdf
http://on-demand.gputechconf.com/gtc/2015/video/S5382.html
OpenACC Timeline
![Page 22: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/22.jpg)
Vision: Mainstream Parallel Programming
• Enable more programmers to write parallel software
• Give programmers the choice of language to use
• Embrace and evolve key programming standards
C
![Page 23: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/23.jpg)
http://on-demand.gputechconf.com/gtc/2015/presentation/S5820-Mark-Harris.pdf
http://on-demand.gputechconf.com/gtc/2015/video/S5820B.html
![Page 24: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/24.jpg)
![Page 25: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/25.jpg)
Mixed Precision Computation
![Page 26: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/26.jpg)
• Half precision (fp16) data type in addition to single (fp32), double (fp64)
• fp16: half the bandwidth, twice the throughput
• Format: s1e5m10
• Range ~ -6*10^-8 … 6*10^4 as it includes denormals
• Limitations
– Limited precision: 11-bit mantissa
– Vector operations only: 32-bit register holds 2 fp16 values
Mixed Precision Computation
![Page 27: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/27.jpg)
FP16 Support in CUDA
![Page 28: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/28.jpg)
28
Deep Learning using Deep Neural Networks
Image “Sara”
Today’s Largest Networks ~10 layers. 1B parameters, 10M images, ~30 Exaflops, ~30 GPU days
http://devblogs.nvidia.com/parallelforall/accelerate-machine-learning-cudnn-deep-neural-network-library
NVIDIA cuDNN Library
Low-level Library of GPU-accelerated routines
Out-of-the-box speedup of Neural Networks
Developed and maintained by NVIDIA
First release focused on Convolutional Neural Networks
Already part of major open-source frameworks
Caffe, Torch, Theano
https://developer.nvidia.com/cuDNN
![Page 29: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/29.jpg)
29
DIGITS
Data Scientists & Researchers:
Quickly design the best deep neural network (DNN) for your data
Visually monitor DNN training quality in real-time
Manage training of many DNNs in parallel on multi-GPU systems
Interactive Deep Learning GPU Training System
developer.nvidia.com/digits
![Page 30: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/30.jpg)
30
Use Cases Image Classification, Object
Detection, Localization Face Recognition Speech & Natural Language
Processing
Medical Imaging & Interpretation
Seismic Imaging & Interpretation Recommendation
![Page 31: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/31.jpg)
31
NVIDIA DRIVE PX Auto-Pilot Platform
Complex scenes require Deep Learning-
based object identification and classification
Two Tegra X1 processors
Up to twelve camera inputs can be
processed by one Tegra X1 in real-time
![Page 32: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/32.jpg)
32
Cars that see better … and Learn
![Page 33: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/33.jpg)
33
US TO BUILD WORLD’S TWO FASTEST SUPERCOMPUTERS
Major Step Forward on the Path to Exascale
100-300 PFLOPS Peak Performance
IBM POWER CPU + NVIDIA Volta GPU
NVLink High Speed Interconnect
40 TFLOPS per Node, >3,400 Nodes
2017
SUMMIT SIERRA
![Page 34: NVIDIA GPU TECHNOLOGY UPDATE - Max Planck Society€¦ · GPU Boost Base Clock Workload # 1 Worst case Reference App 23 5W Boost Clock #1 Workload # 2 E.g. AMBER 23 5W Boost Clock](https://reader033.fdocuments.us/reader033/viewer/2022050306/5f6e53b4e4d59a3e58126a30/html5/thumbnails/34.jpg)
34
nvidia.qwiklabs.com Self-paced hands-on sessions that run on real GPUs in the cloud
Using IPython Notebook technology lab instructions, editing and execution of code, and even interaction with visual tools are all weaved together into a single web application