Impact of GPU Virtualization on Higher Education |...
-
Upload
duongtuyen -
Category
Documents
-
view
226 -
download
0
Transcript of Impact of GPU Virtualization on Higher Education |...
S3467:Impact of GPU Virtualization on Higher Education
Didier ContisCollege of Engineering / Georgia [email protected]
How of all of this started…Back in early 2007, we were trying to address the following issues:
• Student Computer Ownership policy did not address Engineering Applications Licensing problems.
• Despite growing enrollment, funding for computer labs more difficult to get.
• 24 x 7 access to computer labs = physical security and support issues.
• Computer labs are inefficient (e.g. SPACE, power, cooling).
• Student population is increasingly mobile and geographically dispersed.
A picture is worth a thousand words…
Objective: Support Pedagogy and Delivery Modes
Provide elastic capabilities for• Design • Simulation• Experimentation
Which are accessible on‐demand from • Anywhere• Anytime• Any device (and we mean any)
The Vlab and Matrix Projects @GT
2 x Cisco Nexus 5596 + Nexus 2K Expanders
EMC NS-120 (30TB)
CoEServers
CoAServers
IACServers
CoSServers
VmwareViewCitrix XenDesktop 5.6 Redhat VDIApplication
VMwareESX 4.x
MicrosoftHyper-V 2008 R2
RedhatKVM
Hypervisor
Server
Storage
Network
Xen6.0.2
NetApp 3240 (76.8TB)
CoBServers
Windows RDS
MicrosoftHyper-V 2012
EQL PS-6000E (10TB)
1312 cores7.15TB mem
• Introduction to Engineering Graphics and Visualization
• Required classSpring 2013 Semester: 12 sections / 40 students per section
• Course description:Introduction to engineering graphics and visualization including sketching, line drawing, and solid modeling. Development and interpretation of drawings and specifications for product realization.
• Course built around latest version of AutoCAD and Autodesk Inventor Professional
• Supported by two computers labs (40 seats each)
AE / CEE / ME 1770
VDI and 3D CAD1770 Course – Example of a team projectDesign of the Atlantic Station Millennium Gatehttp://www.thegateatlanta.com/
The Problematic of Rendering with VDI
Computer labs used by 1770 Course due for refresh
80 x Dell T3500 workstations to be replaced
What do we do?
• VDI or NOT ??
• If we go VDI how do we implement virtual gpu?
• Need a solution operational on the 1st day of Fall 2012 Semester class: Monday August 20th !!!
The Challenge early Spring 2012
“We couldn’t prove that it couldn’t be done…So we decided it could be.”Tony Tamasi, NVIDIA Senior Vice President of Content and Strategy
Major risk but only realistic solution with August 20th 2012 as a target deadline.
Could it have been done differently…. Debatable
Lots of things would need to fall into in place
Our solution: Win 2012 VDI & GRID K1
How we did it…
Microsoft VDI + RemoteFX Architecture
RemoteFX GPU Enabled Hyper-V nodes
Our Server Hardware Configuration
2 x E5‐2660 (16 cores – 95W)192GB of memory2 x 10GB NICs2 x 1100W power supplies
Dell R720
Virtual GPU NVIDIA GRID K1GPU 4 Kepler GPUs
CUDA cores 768 (192 / GPU)
Memory Size 16GB DDR3 (4GB / GPU)
Max Power 130 W
Form Factor Dual Slot ATX, 10.5”
Display IO None
Aux power requirement 6‐pin connector
PCIe x16
PCIe Generation Gen3 (Gen2 compatible)
Cooling solution Passive
# users 4 ‐ 1001
OpenGL 4.3
Microsoft DirectX 11
VGX Hypervisor support Yes
Hyper-V Configuration
Virtualizing GPUs
!!!Microsoft RDHV
Virtualization Host Role enabled
Virtual Machine RemoteFX Configuration
RemoteFX 3D Adapter ConfigurationRemote FX VM Configuration
220MB
Theoretical density using one 1920x1200 screen per VM# of VMs per K1 GPU: 18# of VMs per K1 board: 72
Note: 72 x average of 2.5GB memory usage per VM = 180GB Fits server memory footprint.
Microsoft VDI + RemoteFX Architecture
How everything ties together
Our Windows 8 VDI Collections
Civil Engineering Computer Lab – Before40 Dell Precision T3500 + 19” Screens
Civil Engineering Computer Lab – After40 Dell Wyse Z90D7 + 23.5” Screens
Customized WES7Customized WES7 Image with RDP 8.0 Client
Access via Microsoft Web UI
Lessons Learned
Students using Win8 on 1st day of class was a non‐event
There is a cost for being on the bleeding edge
Pre‐production hardware (1st GPU card lasted 17h) Beta drivers
SAN Lun alignment problem == bad performance
Tracking post‐doc who saturates building uplink every day
“Sabotage” by our Friends of Central IT
Most of our problems were self-inflicted
How to monitor virtualized GPUs usage?Strategy #1: nvidia‐smi tool and scriptsnvidia-smi -q --display=UTILIZATION,PERFORMANCE --loop=60 \
--filename=c:\Temp\NVIDIA_Log_2.txt
==============NVSMI LOG==============Timestamp : Wed Mar 06 16:35:26 2013Driver Version : 310.90
Attached GPUs : 4GPU 0000:07:00.0
Performance State : P0Clocks Throttle Reasons : N/AUtilization
Gpu : 21 %Memory : 11 %
GPU 0000:08:00.0Performance State : P0Clocks Throttle Reasons : N/AUtilization
Gpu : 61 %Memory : 33 %
[……]
Trying to visualize nvidia-smi results
0
50
100
150
200
250
300
8:00
:59
8:11
:59
8:22
:59
8:33
:59
8:44
:59
8:55
:59
9:06
:59
9:17
:59
9:28
:59
9:39
:59
9:50
:59
10:01:59
10:12:59
10:23:59
10:34:59
10:45:59
10:56:59
11:07:59
11:18:59
11:29:59
11:40:59
11:51:59
12:02:59
12:13:59
12:24:59
12:35:59
12:46:59
12:57:59
13:08:59
13:20:00
13:31:00
13:42:00
13:53:00
14:04:00
14:15:00
14:26:00
14:37:00
14:48:00
14:59:00
15:10:00
15:21:00
15:32:00
15:43:00
15:54:00
16:05:00
16:16:00
16:27:00
16:38:00
16:49:00
17:00:00
17:11:00
17:22:00
17:33:00
17:44:00
17:55:00
coe‐hyperv401g ‐ 3/6/2013
GPU #4
GPU #3
GPU #2
GPU #1
Cumulated usage in %from each GPU
0
50
100
150
200
250
300
350
8:00
:25
8:11
:25
8:22
:25
8:33
:25
8:44
:25
8:55
:25
9:06
:25
9:17
:25
9:28
:25
9:39
:25
9:50
:25
10:01:25
10:12:25
10:23:25
10:34:25
10:45:25
10:56:25
11:07:25
11:18:25
11:29:25
11:40:25
11:51:25
12:02:25
12:13:25
12:24:25
12:35:25
12:46:25
12:57:26
13:08:26
13:19:26
13:30:26
13:41:26
13:52:26
14:03:26
14:14:26
14:25:26
14:36:26
14:47:26
14:58:26
15:09:26
15:20:26
15:31:26
15:42:26
15:53:26
16:04:26
16:15:26
16:26:26
16:37:26
16:48:26
16:59:26
17:10:26
17:21:26
17:32:26
17:43:26
17:54:26
coe‐hyperv402g 3/6/2013
GPU #4
GPU #3
GPU #2
GPU #1
Cumulated usage in %from each GPU
A closer look: visualizing nvidia-smi results
0
50
100
150
200
250
300
March 6th 2013 ‐ 12:55pm to 14:00pm 1 minute sampling
coe‐hyperv401g
GPU #4
GPU #3
GPU #2
GPU #1
How to monitor GPUs and VMs ? Strategy #2: Microsoft Server 2012 Perfmon tool
Trying to visualize 12h of Perfmon data
157MB of data for 12 hours !!!
A closer look at one Perfmon RemoteFX valueLet’s focus on the TDR timeouts from coe‐hyperv401g – March 6th 2013
No TDR timeout – RemoteFX Root GPUIt is a good thing…..
TDR Timeout Detection and
RecoveryDetects when the GPU stops responding. If necessary tries to fix it via a re‐initialization, avoiding the need for reboots.
GPU… Potential single point of failure ?What happens when the host GPU card or driver crashes....
HOST DRIVERCRASH
ONE OF THE STUDENT VM’s
Not all applications are created equal…
Performance Tuner Results LogVersion: 19.0.2.0Date of Last Tune: 1/7/2013
Machine Configuration---------------------Processor Speed : 2.2 GHzRAM : 1660 MB
3D Device---------Name : Microsoft RemoteFX Graphics Device - WDDMManufacturer : MicrosoftChip set : RemoteFX Graphics Device -WDDMMemory : 191 MBDriver : 6.2.9200.16384
Your machine contains a 3D Device that is not certified.
[…]
Current application driver: Software
AutoCAD 2013 SP 1.1 Autodesk Inventor 2013 SP 1.1
“Who you gonna call” when your CAD apps crash
• Inventor 2013 SP1.1 Update 1
• Windows 8 64bit patched
• Windows 2012 patched
• NVIDIA WHQL GRID 310.90 driver
• Production Grid K1 board
• Dell R720 with Bios 1.4.8
Problematic CAD Software certification in a Virtualized Environment…
Final Thoughtsand
Future Work
Does the technology work Yes !!!• Technology is maturing quickly. Expect things to move very quickly in
the next 6 to 12 months.
Virtual GPU…. Is it a “Game Changer” for VDI ??• Not exactly. But it does satisfy a BIG need we had.
We DO need more (management) integration between Virtual GPU and Hypervisors.• Better dashboard / monitoring from the Hypervisor
• Session load balancing / VM placement based on GPU usage.
What have we learned?
Mixing hardware
Dell R720 + GRID cards (K1 / K2) == building block
K1 for 80% of workload needs / K2 for remaining 20%
Future direction
K1 K2+
Mixing Technologies
Can run different hypervisor / VDI solution / App publishing on top of the brick.
Testing XenDesktop.Next / XenApp.Next / RemoteFX.Next…
Future direction
Backup Slides
Thin Client Challenges• Why WES7 and not ThinOS?
• Active Directory not feasible for these devices
• Secure, limited access by local user account
• Access to Citrix and RemoteFX (W8) Pools
– WES7 with RDC 6.2.9200 update
• Reduced workload for local support
• Central management of client image
– Wyse Device Manager
Creating the W8 Master and Collection• Create Base OS Image
• Provide Departmental RDP Access to Master
• Departmental Software Install and Testing
• Save copy of Virtual Hard Drive
• Sysprep
• Create Collection with Virtual GPU and User Profile Disks enabled
• Apply GPO
sysprep -generalize -oobe -shutdown -mode:vm
Collection Properties
CEE W8 Desktop