SYN 208: Power HDX 3D Applications with Intel and NVIDIA GPUs.

1 © 2016 Citrix | Confidential

Image from Shutterstock, http://www.shutterstock.com/pic.mhtml?id=395520979&src=lb‐44840047, Standard license. Purchased by David Cottingham.


Audience participation required.

Which of these is NVIDIA’s latest GRID card?

What percentage of XD 7.6 do you think is hosted on XS?

How many vGPUs does XS 6.5 support per host?


(In accordance with https://about.twitter.com/press/brand‐assets.)


Let’s travel back in time…


Citrix has more experience in 3D graphics remoting than any vendor in our space. Back in 2006 our K2 technology was developed for Boeing, enabling Dassault CATIA to be delivered to Dreamliner (787) design engineers around the world.


After the introduction of XenDesktop, Citrix brought GPU‐accelerated 3D graphics remoting to General Availability in 2009 with the introduction of XenDesktop HDX 3D Pro. At that time, the solution required a blade workstation for each user. In 2010, Citrix introduced high performance GPU Sharing for DirectX based applications, driving down cost per user. 2011 saw the introduction of the first GPU Passthrough technology to the market as part of XenServer 6.0. This allowed customers to install multiple GPUs on the server, again bringing down the cost per user. And in 2012 we introduced XenDesktop 5.6 Feature Pack 1 which was the first product to leverage NVIDIA’s VGX API (rebranded as GRID in 2013) for direct GPU frame buffer access, resulting in an even more responsive user experience. 2012 also saw improvements to our H.264‐based Deep Compression technology for delivering 3D graphics over bandwidth‐constrained WAN connections. And 2013 was the year of high density, high performance GPU sharing for OpenGL and DirectX. In 2014 we saw a steady stream of new hardware designed to take advantage of high performance GPU sharing, including new actively cooled cards from NVIDIA and many new server and data center workstation platforms. And in 2015 VMware added vGPU to vSphere/ESX, which is also fully supported by HDX 3D Pro.


Not really…

Yes, there are applications at the top which need a high‐performance vGPU profile (or even an entire GPU passed‐through to them).

But there are a good crop of power users who run certain applications that need significant GPU power (otherwise it’s offloaded to the CPU).

And there are _lots_ of task workers who use Office, web browsers (e.g. HTML5 applications), and view increasingly complex PDFs. Yes, everything can be done by the CPU, but at that point desktop density (and user experience) will drop. This is where high‐density vGPU comes into its own. A small slide of GPU goes a long way.

Case in point: try PowerPoint with transitions or diagrams that involve 3‐D and you’ll experience it for yourself. Users who have tried a vGPU‐enabled desktop don’t want to return to the “old world”!


Image from Shutterstock, http://www.shutterstock.com/pic.mhtml?id=370508654&src=lb‐44840047, Standard license. Purchased by David Cottingham.


From an administration and TCO perspective, moving from individually‐managed full size desktops to thin clients accessing virtual desktops that are hosted in a datacentre makes huge amounts of sense.

However, a physical desktop will have a physical GPU onboard.

Moving to a virtual desktop with no GPU acceleration will provide an inferior experience to one that does. This is increasingly the case as more and more applications assume the presence of a GPU (as described previously).

Note that it’s not just user experience that will suffer without a GPU: applications will attempt to compensate by burning CPU resources instead. This then reduces user density, thus increasing TCO.


Many high‐end designers have full workstations underneath their desks, dedicated to their use.

This is clearly inefficient, as for half the day (at least) that workstation is unlikely to be in use.

Hosting these machines in a datacentre not only increases efficiency, but also centralises data for security and ease of cross‐geographical working. It also means that many other devices can be used to access the data, e.g. viewing a 3‐D model of a construction site on an iPad.


Prior to XD‐XA 7.8, HDX 3D Pro required Enterprise or Platinum edition; now these features are supported across all editions.

Blade/rack workstations are ideal, but any form factor can be used for the host.


Both modes also support vGPUs. Standard Mode supports NVIDIA GRID vGPU, Intel GVT‐g and Microsoft RemoteFX vGPU (used with RDP). 3D Pro Mode supports NVIDIA GRID vGPUand Intel GVT‐g.

GPU acceleration for windowed DX 10/11/12 apps running on Windows 10 was introduced in XD 7.8.

Since 7.8, HDX 3D Pro is included in all product editions.


4:2:0 refers to a method of encoding images where less resolution is used for chroma information than for luma information, taking advantage of the human eye’s lower acuity for color differences than for luminance. 4:2:0 uses 33% less bandwidth than 4:4:4 (RGB) but images containing lines and saturated colors will have significant artifacts.

To compensate for the lower visual quality of 4:2:0 compared to 4:4:4, an option is available to overlay text using lossless compression. This comes at the cost of increased CPU consumption on the host.

With 4:4:4, there is no chroma subsampling, so visual quality is excellent, but this comes at the cost of about 50% increased bandwidth consumption.


7.6 FP3 introduced Interactive Mode for a superior experience on WAN connections. Available on both XenApp and XenDesktop. However, special key combinations (shortcuts) may not be available on XenApp since the standard 3Dconnexion driver was not designed for Windows Server RDS.

A fix is available on XenDesktop to address an issue with some key mappings, when using 3Dconnexion’s most recent driver (new driver model).

HDX 3D Pro includes USB redirection support for special purpose peripherals used by designers and engineers such as the 3D Space Mouse. The USB redirection virtual channel can be prioritized to receive maximum responsiveness. Zero is the recommended priority. In addition, using QoS policies in HDX, you can prioritize HDX traffic such as real time traffic, interactive and display, bulk, and background traffic. Recommend prioritizing interactive and display traffic to higher level. Note that with XenApp, Generic USB Redirection for specialty devices requires Windows Server 2012 R2. WAN optimizations were introduced in 7.6 FP3 (September 2015).


The 100‐150 Kbps per user bandwidth guidance is based on typical office workloads, and compares to 100 Kbps for Thinwire (non‐video); see https://www.citrix.com/blogs/2014/04/16/from‐the‐field‐xendesktopxenapp‐bandwidth‐update/

3D professional graphics and server‐rendered video workloads require more bandwidth, similar to Thinwire H.264

Framehawk puts user experience ahead of bandwidth frugality


Real estate investment trust

• “Working in XenDesktop over WAN and working at 35% packet loss! Welcome to the future!”

• “I love how @citrix #framehawk handles scrolling and redraws under adverse conditions.”


We’ve been doing graphics in virtual desktops for a while…

Announced at Synergy 2013, ran a private beta, then a public beta that year.

Released in early 2014.


Poets are able to take advantage of this, hence why shouldn’t we? ;‐).


When I draw one of these, I really mean one of these.

I call it “simplification”. You might call it “not very good at art” ;‐).


This diagram shows how vGPU on XenServer is architected, i.e. the complicated version!

At the bottom of the slide is a physical NVIDIA Kepler GPU. On GRID K1 and GRID K2 cards, there are multiple such physical GPUs on each card.

An NVIDIA driver in XenServer’s control domain (Dom0) is responsible for partitioning up the physical GPU(s).

The control domain presents hardware to the guest through what are known as device emulators (which are then backed by physical hardware slices).

The NVIDIA driver within the guest VM can then communicate with the host driver in Dom0.

This allows both graphics to be presented over the VNC console (to be viewed, e.g., through XenCenter), and also as an HDX 3D Pro stream.


Maxwell generation of cards has increased support from 96 vGPUs per host to 128 vGPUs per host.

This means you can obtain significant user densities on a single host, with each user having a slice of a real GPU to enhance their experience.


These GPUs are very powerful, and hence power‐hungry. For the M60, you are likely to need a power supply upgrade in your chosen server.


This slide depicts the different sizes of “slice” that a physical NVIDIA GRID GPU can be carved into.

Depending on the use case, different resolutions/numbers of heads may be needed.

NVIDIA and Citrix will work to add further slice sizes over time (on current and future hardware).


With NVIDIA GRID, there is a cost for the physical hardware, and also a yearly cost for the support and software maintenance.


The NVIDIA prices for software and support, updates, and maintenance (SUMS) are per concurrent user, and depend on the use case.

This is irrespective of whether you are using vGPU, GPU pass‐through, or on bare metal (no hypervisor at all).

Pricing can be an annual subscription, or an up‐front perpetual license and 1 year of SUMS, followed by a yearly SUMS payment.

Prices taken from NVIDIA sales material on 05/05/2016.


On XS 6.5 SP1, we support NVIDIA GPU pass‐through for Linux guests, and on XS 7.0, we support NVIDIA vGPU for them too!

Note that use with Linux requires a NVIDIA Virtual Workstation license.


This version (1.2) is already released


This is a new released targeted for release by end of Q2 2016 (Release around Synergy)

Additional click down information

• Client Drive Mapping (CDM) – Already approved in the previous deck

• CDM is client drive mapping. This feature is already available on Windows VDA. It enables access to client side drives inside the XA/XD session enabling data transfer between client device and XA/XD desktop/app

• This is a VDA only feature

Install directory change

• This feature is about changing the install directory location of Linux VDA to a location which is a standard for Citrix products and also does not create conflicts with user install locations


CentOS 6.6, 7 support (Already approved in the previous deck)

• CentOS is a Linux distribution almost same as RHEL which we already support. This feature is to support Linux VDA on CentOS as well. This is already approved in the previous approved deck


Tech preview

• HDX 3D Pro for RHEL 6.7, CentOS 6.7, 7.2: This is to support more Linux distribution with HDX 3D Pro for Linux, but as a tech preview capability in Project Deira

• vGPU support: This is to support vGPU (virtual GPU support for graphics acceleration) with HDX 3D Pro for Linux, but as a tech preview capability in Project Deira

• These are VDA only features, except vGPU which requires support in hypervisors. VmwarevSphere already supports it, the support for vGPU on XenServer is coming in Q2, it is also already available as tech preview.


No release date or project is identified for these features

Additional click down details

Provisioning support

• Provisioning support is support for PVS or MCS technologies for provisioning Linux virtual machines in XenApp and XenDesktop architecture. We are working towards supporting PVS only and for select distributions only to start with.

• This feature is already supported for Windows VDA and it is a separate component from the VDA.

App publishing

• Seamless apps

• This is already approved in the previous approved roadmap deck.

Direct SSL

• Direct SSL is about making the SSL connection from Receiver to the Linux VDA. This is required for higher security.


Policy management


Ubuntu support



Which company do you think this chap is from: Citrix or HP?

He actually works for neither . This is Frank Socqui from Intel. He’s holding up an Intel CPU with Iris Pro graphics, at Synergy 2014.

XenServer 6.5 SP1 was the first hypervisor in the industry to support Intel Iris Pro GPU pass‐through (GVT‐d) in May 2015. XenServer 7.0 is now the only hypervisor in the industry to support Intel Iris Pro GPU virtualisation (GVT‐g).


Architecture is broadly similar to that of NVIDIA vGPU.

Note that the video RAM for the GPU is “stolen” from system RAM, unlike the dedicated video memory found on NVIDIA cards.

The in‐guest drive is the standard Intel GPU driver, as compared to a specific NVIDIA vGPUdriver (though note that the latter is Quadro‐certified, i.e. it behaves as a standard Quadrodriver would be expected to).


Clearly this will change in the future.


Iris Pro technology is included in some Haswell CPUs today, but Broadwell CPUs are the first where GVT‐g will be officially supported.

It’s available in machines such as this Gigabyte Brixbox, HP’s Moonshot ProLiant M710 cartridge, and forthcoming Cisco servers.

See all of these in action at the booths on the expo floor.


As the GPU is onboard the CPU, the additional power budget is zero. This has the advantage that no additional power supply is needed, but the downside that the GPU shares the CPU’s thermal dissipation power envelope. In other words, the power to the (onboard) GPU is limited by how much the CPU is being used.


Integrated GPU must be enabled in the BIOS.

Intel GPU BAR size affects the maximum number of virtual GPU’s. The GPU BAR size is usually configurable in the BIOS and is often referred to as the “Aperture Size”. This needs to be set to 1024MB to support 7 VMs, do not set this any larger than 1024MB.

Host must have a C226 chipset in order for integrated GPU to be enabled.

DVMT settings don’t need to be configured/modified.


Intel GVT‐g’s only cost is the physical CPU in the server, i.e. no add‐on costs, and no recurring ones either.


For VDI, clearly you can achieve much higher density per host (128 users) with NVIDIA GRID than with Intel GVT‐g (7 users today).

However, the cost of a host that has capacity for two GRID cards is significant, whereas a Xeon E3‐based server is likely to be very easy to obtain.

What are your constraints as regards number of failure domains (i.e. if one host goes down, how many users can you afford to lose)?


There are case studies that you can download today on all of these organisations deploying NVIDIA vGPU. And there are plenty of others who we haven’t had a chance to write studies on yet!

• http://www.citrix.com/go/customer/customers.html

• http://international.download.nvidia.com/pdf/grid/resources/nvidia‐grid‐case‐study‐daewooshipbuilding.pdf

• http://international.download.nvidia.com/pdf/grid/resources/nvidia‐grid‐case‐study‐peugeot‐citroen.pdf

• http://international.download.nvidia.com/pdf/grid/resources/nvidia‐grid‐case‐study‐rogerwilliams.pdf

• http://www.citrix.com/customers/university‐of‐sao‐paulo‐en.html and http://www.hpcadvisorycouncil.com/events/2014/brazil‐workshop/preso/12_USP.pdf

• For others, see https://virtuallyvisual.wordpress.com/useful‐links/remote‐graphics‐case‐studies/ .


Video: http://on‐demand.gputechconf.com/gtc/2015/video/S5625.html

200 users in China

Customer contact: Alain Gonzalez

See also GTC015 session S5625


Video of GTC seminar by Jeff Retey: http://nvidia.fullviewmedia.com/gtc2014/S4735.html

Gulfstream’s HDX 3D Pro implementation began with SAP 3D Virtual Enterprise (formerly known as Right Hemisphere) and evolved to full desktop replacement.

Used even across the Atlantic (between America and the UK).

iPad users are able to use their iPhone trackpad for the mouse, thanks to Citrix technology.

Platform as of Q1 2014: HP SL250 blades with NVIDIA GRID K2 cards


Video: https://youtu.be/QRlod9cPNHk


Using vSphere Desktop Edition.

VMware vSphere Desktop (100 VM Pack) $8,000, plus VMware vSphere Desktop (100 VM Pack) Production Support/Subscription, 3 Years $5,280

Plus 2 vCenter licenses at $17,484

A 7,000 seat solution over three years would cost $947,084

A 5,000 seat solution over three years would cost $681,484

Bringing the cost per desktop to $136.30/desktop.


Photo copyright Citrix Systems, from https://www.flickr.com/photos/32283893@N02/17807980856/ .

SYN 208: Power HDX 3D Applications with Intel and NVIDIA GPUs.

Technology

Transcript of SYN 208: Power HDX 3D Applications with Intel and NVIDIA GPUs.