VMworld 2013: Protect vCenter Server with vCenter Server Heartbeat Deep Dive
VMworld 2013: A Technical Deep Dive on VMware Horizon View 5.2 Performance and Best Practices
-
Upload
vmworld -
Category
Technology
-
view
900 -
download
6
description
Transcript of VMworld 2013: A Technical Deep Dive on VMware Horizon View 5.2 Performance and Best Practices
A Technical Deep Dive on VMware Horizon View 5.2
Performance and Best Practices
Banit Agrawal, VMware
Warren Ponder, VMware
EUC5706
#EUC5706
2
Session Outline
Introduction
View 5.2 Shared Graphics (3D) Feature and Performance
VMware View 5.2 Enhancements and Performance Results
• PCoIP protocol Improvements
• Windows 8 performance
• SE-Sparse Performance
Performance Tuning and Best Practices
Conclusion
3
Task Worker
Basic data
entry/usage is
central to work
Productivity /
Knowledge Worker
Standard productivity
tools are central to work
Desktop Power User
Some compute
intensive apps, require
3D graphics
performance
Workstation Users
Workstation class
performance for
compute with dedicated
graphics
Image Quality
Interactivity
Cost/Seat
2D / 3D
Heavy Users
Many Applications
Light Users
Fewer Applications
vSGA
Virtualized 3D Hardware Graphics Resources
--VRAM ++ VRAM
vDGA
GPU PCI
Passthrough
Heavy Users
Native Driver
Soft 3D
Software Rendered
Accelerated 3D
Virtual Desktop User Segmentation
4
Overview
Benefits
• Supports DirectX 9 and OpenGL 2.1 apps
• No physical GPU required
• Lower initial VDI CAPEX
• No client side dependencies
Soft 3D – Basic 3D without GPU
Software renderer provides 3D to productivity apps
• Basic 3D graphics capabilities for
productivity workers
• Targeted at Task and Knowledge Workers
who need AERO or applications that
require basic 3D graphics
5
Overview
Benefits
Enable workstation class use cases
Reduce Cost - with multiple VMs sharing
3D graphics cards
Compatible with key platform features
such as vMotion, DRS
Support for mixing physical host clusters
with and without physical GPUS
vSGA - Shared 3D Graphics Among Multiple Virtual Machines
Run rich 3D applications with higher consolidation
Enables shared access to physical
graphics cards for 3D and high
performance graphical workloads.
Desktops use VMware SVGA device for
maximum virtual machine compatibility &
portability.
Cost effective with multiple VMs sharing
single graphics card for maximum benefit
6
Overview
Benefits
Full capabilities of physical GPUs
True workstation replacement option
vDGA – Direct Passthrough to a Specific Virtual Machine
Full workstation class user experience
Enables dedicated access to physical
GPU hardware for 3D and high
performance graphical workloads.
Uses native nVidia drivers
CUDA available to virtual machine
Best for super high performance needs
like manufacturing, oil & gas
7
Tracking vSGA Performance
On a vSphere host, you can execute the commands below to track
system/GPU performance
System Performance (Run “esxtop”)
GPU Stats (Run “nvidia-smi -l”)
*More details can be found in the View 5.2 vSGA performance whitepaper:
http://www.vmware.com/files/pdf/view/vmware-horizon-view-hardware-accelerated-
3Dgraphics-performance-study.pdf
8
vSGA Configuration Best Practices
Virtual Machine Hardware
• Latest Virtual Machine Hardware Setting
• Configure VMs to use VMXNET3 NICs
In Guest Virtual Machine Settings
• Throttle the application frame rate to match the configured PCoIP frame rate.
• This configuration is achieved by using the following registry setting
(REG_DWORD):
HKLM\SOFTWARE\VMware, Inc.\VMware SVGA DevTap\MaxAppFrameRate
• Setting this registry entry has been found to significantly improve performance and
consolidation ratios
• Consider disabling PCoIP’s build-to-lossless mode
9
vSGA Experimental Setup
Desktop VMs Client VMs or Users
10
vSGA Workload Testing: Light 3D workload
Composed of common desktop applications
• View Planner: Office 2010, Adobe Reader, 720p video, IE9 displaying a web
album
• Google Earth
Aero Enabled
Screen Resolution: 1600 x 1200
Represents a use-case scenario typical of a knowledge worker
11
vSGA Performance: Light 3D Workload
• CPU was getting bottlenecked first while peak GPU utilization was around 20%
• 112 VMs ran light 3D workload with good response time
12
vSGA Workloads: Interactive 3D UE benchmark
Composed of common 3D and Interactive operations
• Some simple 3D rendering operations
• Dragging
• Scrolling
• Windows Maximize and Minimize
Screen Resolution: 1600 x 1200
User Experience or responsiveness metric based on frame arrival
and inter-frame delay
13
vSGA Performance: UE Benchmark
• Using hardware accelerated 3D improves responsiveness in comparison with a software solution, even at lower consolidation ratios, where CPU is not exhausted.
• Adding GPUs to an existing software-renderer solution enables the VM consolidation ratio to be almost doubled while maintaining user experience.
14
vSGA Workloads: Light CAD Workload
Composed of some common apps and CAD viewer
• View Planner: Office 2010, Adobe Reader, 720p video, IE9 displaying a web
album
• SolidWorks CAD Viewer with these models
Response metric: 95% response time and FPS
15
vSGA Performance: Light CAD Workload
• Could scale to 64 VMs without reaching the threshold
• CPU utilization less than 100% at 64 VMs signifies the View Planner threshold crossing doesn’t mean CPU needs to be pegged 100%
16
vSGA Workloads: Complex CAD Workload
Solid Edge Viewer ran in isolation
• A 3-1 reducer model was used
Response metric: Remoted Frames per sec (FPS)
17
vSGA Performance: Complex CAD workload
• We see 30 VMs scale nicely and remains with in 80% threshold of the normalized best case frame rate
• The range above each line bar shows the FPS variation in each VM – The narrow range suggests that all VMs are fairly distributed to the CPU/GPU and doesn’t show much variance.
18
Session Outline
Introduction
View 5.2 Shared Graphics (3D) Feature and Performance
VMware View 5.2 Enhancements and Performance Results
• PCoIP protocol Improvements
• Windows 8 performance
• SE-Sparse Performance
Performance Tuning and Best Practices
Conclusion
19
Overview
Benefits
Better scrolling performance
Downlink bandwidth reduction
More users supported in the same
network link
PCoIP Protocol Performance Improvements
Efficient Client size Caching to improve bandwidth usage
Improved client side caching with new
compression techniques
Improved Cache handling of progressive
build operations
Caching support of scrolling operations
Dynamic GPO settings
Relative mouse support
20
Experimental Setup: System and Network Configurations
Network
conditions
Bandwidth and
Round-trip latency
LAN 100Mbps with 1ms
latency
WAN 2Mbps connection
with 100ms latency
Extreme
WAN
300kbps
connection with
100ms latency
Host
Configur
ation
VMware vSphere 5.1
Dell T610
2.53 GHz Nehalem
48 GB Physical RAM
On local SSD
Desktop
Guest VM
32-bit Win7 desktop
1-VCPU,
1GB RAM
1152x864 resolution
32-bit WinXP SP3
1-VCPU, 768 MB
1152x864 resolution
Network link Display
protocol
21
Desktop Workload: VMware View Planner 2.1
Network link Display
protocol
Office
2007
Other
Apps
Workload: VMware
View Planner 2.1
22
Workload: VMware View Planner
Workload generator and sizing tool
• Platform characterization (CPU, memory, storage)
• Evaluate user experience
• Understand scaling issues and identify bottlenecks
Workload parameters
• All applications selected (PowerPoint, Excel, Word,
Outlook, Web album, Video, Firefox, Adobe, 7Zip, IE9)
• Thinktime of 10 seconds
A newer benchmark version (3.0) was just
released. For more info, send email to
23
Run Configurations
Settings PCoIP (View 5.2)
Resolution and color depth 1152x864 and 32-bit color
Clear Type fonts Enabled (default)
Window-maximize transient effect Disabled
Busy animated cursor Changed to default cursor
Image Quality BTL off
Max. Initial image quality (70)
Frame rate 24
24
PCoIP Caching Improvements
Reducing cache size
• View 5.2 with 5x cache reduction can provide equivalent bandwidth savings and slightly higher compared to View 5.1 with 250MB RAM
• Good for memory constrained thin-clients and tablet devices
25
PCoIP Caching Improvements
• View 5.2 provides about 5% lower bandwidth usage in LAN and WAN and about 5-10% in extreme WAN conditions
• Lower bandwidth, more caching of display data using new compression techniques
26
Overview
Benefits
Use the latest Windows 8 OS for desktops
and clients
Windows 8 Support
Full support of Windows 8 as desktop and client
View 5.2 fully supports Windows 8 as
desktop
View clients also supported in Windows 8
27
Windows 8 Performance and Optimizations
• With the optimizations, bandwidth usage can be reduced up to 60%
28
PCoIP and RDP 8 Performance
• Windows 8 PCoIP consumes least bandwidth usage once all the optimizations are applied
• PCoIP is 10-20% better than RDP8
29
Overview
Benefits
Reduced storage capacity requirements
(lower CAPEX) for Persistent Desktops,
even on lower-tier hardware.
View Composer or Mirage can be used for
provisioning simplicity, even if recompose
is never used (e.g. knowledge workers).
SE Sparse Disk Utilization
More efficient use of storage capacity
Leverages new vSphere capability…
A new disk format for VMs on VMFS.
Reduces grain size & more efficiently
utilize every allocated block by filling it
with real data.
Unused space is reclaimed and View
Composer desktops stay small.
30
Dell PowerEdge R710 with
16-core Intel Xeon E5-
2660 @ 2.2 GHz with
392G RAM with SSD
storage
VMware
vSphere 5.1
32-bit Win7 desktop
1-vCPU
1GB RAM
32-bit WinXP SP3
1-vCPU
768 MB RAM
Dell PowerEdge R710
with 12-core Intel Xeon
E5645 @ 2.4 GHz with
296G RAM with SSD
storage
VMware vSphere
5.1
PCoIP
SE Sparse Performance: Experiment Setup
31
SE Sparse Performance: Workload and Configurations
View Planner workload with custom apps
• Install and Uninstall VI Client and VLC Player
• Download files from web and delete the files
• Copy some files and delete these files
10s think time, 2 iterations, remote mode with PCoIP protocol
Number of VMs tested : 100 VMs
All desktop VMs are placed on SSD disk
Wipe/shrink done at the rate of 10 in every 6 minutes, so for 100
VMs, it took 60 minutes (1 hour)
32
SE Sparse Disk Space Reclamation
• Since the wipe/shrink operation can be I/O-intensive for space reclamation, View administrators are encouraged to use the blackout periods appropriately (available in the View admin UI) to minimize any perturbation in the user experience.
• Also, depending upon the underlying storage, administrators can tune the concurrency level in LDAP (under OU=Properties, OU=Virtual Center) and edit the pae-SeSparseOperationsLimit for the desired vCenter.
33
View Admin Operations Enhancements
Significant acceleration of the Admin Backend by servings request
from an in-memory cache as opposed to fetching data from LDAP
• Improvements in backend time (20 pools, 10K simulated VMs):
• 2x for Inventory -> Desktops
• 4x for Inventory -> Pools
Support of cluster with 32 hosts (now with both NFS and VMFS)
Operational time of View management operations such as
provisioning, recomposing, and rebalancing has improved
significantly (by up to 2x) in View 5.2
34
Session Outline
Introduction
View 5.2 Shared Graphics (3D) Feature and Performance
VMware View 5.2 Enhancements and Performance Results
• PCoIP protocol Improvements
• Windows 8 performance
• SE-Sparse Performance
Performance Tuning and Best Practices
• Platform Best Practices
• Guest-level Optimizations
• Protocol and Network Best Practices
Conclusion
35
Platform Best Practices
Config Best Practices
View Storage Acceleration
(CBRC)
Always enable CBRC (on by default)
Will reduce bootstorm IOPS requirement by 80%
Will also reduce loginstorm IOPS requirement
Space-efficient Sparse
Disks (SE-Sparse) disks
Use SE-sparse disks and you can reclaim the wasted space.
Use the wipe/shrink operations in blackout periods as IOPS
requirement may be high
VDI replica Keep the desktop replica on SSD
Memory-overcommitment Use memory over-commitment as long as the active memory fits in
the physical memory (you can use View Planner custom apps
features to get an estimation)
Try to avoid Ballooning or Swapping
IOPS requirement Typical knowledge worker about 10-15 IOPS.
Depending on your applications, YMMV
CPU requirement About 200 to 500 MHz per user depending upon the application
requirements
36
Guest Level Optimizations
More details in the white paper: http://www.vmware.com/files/pdf/view/vmware-horizon-view-best-practices-performance-study.pdf
Parameter Best Practices
vCPU 1 for WinXP/Win7/Win8, 2 for multimedia intensive apps
Memory 512-768 MB for WinXP, 1GB for 32-bit Win7 and Win8
2GB for 64-bit Win7 and Win8, 1.5-2GB for WinXP, Win7, and Win8
32-bit, 3GB for Win7 and Win8 64-bit for memory-intensive apps
Network adapter Vmxnet3, flexible
Storage adapter pvSCSI or LSI logic SAS
VMware Tools Latest installed
Visual settings “Adjust to Best performance”, Disable Animations for Windows
Maximize and Minimize operations
Use default cursor for busy and working cursor
Disabling services Windows Update, Super-fetch, Windows Index,
Group policy settings Disable Hibernation, System restore disable, Screensaver to None
Other settings Turn off clear-type
Disable fading effects
Disable last access timestamp
37
All Desktop / Network Condition Tuning Recommendations
Setting Recommendation Benefit Description
Build to lossless Disable – Standard Desktops Enable – CAD/CAM – Medical Imaging
Saves 10-15% bandwidth
Used to enable / disable image quality building to fully lossless
Session Audio BW limit 50 - 100Kbps Reduces
bandwidth and CPU usage
Reduces BW usage of audio with usable quality
Maximum frame rate 10 / 15 FPS
Standard Desktops
Reduces Bandwidth and
CPU usage
In WAN conditions, this will be helpful for video playback and fast graphics operations
Client side cache size
50 – 100MB
Depending on available client RAM
Avg. 30% reduction in bandwidth
This allows you to configure the client side image cache size.
More details in the white paper: http://www.vmware.com/files/pdf/view/vmware-horizon-view-best-practices-performance-study.pdf
38
Specific Network Condition Tuning Recommendations
Setting Recommendation Benefit Description
Max Session Bandwidth
Set for LAN / WAN 1-2Mb Standard Desktops
3 – 5Mb 3D Desktops
Note: Always with your use cases for the most accurate range
Reduced Avg. Bandwidth and fair
sharing
Caps the peak bandwidth per session
Session Audio BW limit
50 - 100Kbps Reduces
bandwidth and CPU usage
Reduces BW usage of audio with usable quality
Maximum Image Quality
60-70% Reduces
Bandwidth and CPU usage
Helps in low bandwidth conditions or with heavy multimedia use cases
Configure Session Floor
Not lower than 100MB
Depending on available client RAM
Improved user experience
Helps with better bandwidth estimation and improves user experience in high packet loss scenarios or on WiFi, 3G/4G networks
More details in the white paper: http://www.vmware.com/files/pdf/view/vmware-horizon-view-best-practices-performance-study.pdf
39
3D / Intense Graphics Tuning Recommendations
Setting Recommendation Benefit Description
Max Frame Rate
Set based on client
capability
Zero Client – 30FPS
Atom based client – 15FPS
Dual Core ARM client – 20FPS
Desktop 30+FPS
Provides
consistent end to
end user
experience
Caps the maximum frame
rate encoded and sent to the
client for decode
Max App Frame Rate Set to match the Max PCoIP
Frame Rate
Sends only frames
that can be
encoded from the
app to PCoIP
Limits high frame rate
applications from generating
excessive FPS
More details in the white paper: http://www.vmware.com/files/pdf/view/vmware-horizon-view-best-practices-performance-study.pdf
40
Conclusion
• View 5.2 provides support of hardware accelerated shared
graphics and we can easily scale from 32 to 100 desktop VMs with
different intensity of 3D graphics workload
• PCoIP caching improvements resulted in 5-10% bandwidth
improvements compared to View 5.1
• SE sparse disk can reclaim wasted space and provide significant
space savings
• View admin operations and UI performance enhancements
• With appropriate best practices, user experience can be improved
for different network conditions
41
Other VMware Activities Related to This Session
HOL:
HOL-MBL-1301
Horizon View from A to Z
Group Discussions:
EUC1001-GD, EUC1006-GD
View with Matt Coppinger or Andre Leibovici
EUC5706
THANK YOU
A Technical Deep Dive on VMware Horizon View 5.2
Performance and Best Practices
Banit Agrawal, VMware
Warren Ponder, VMware
EUC5706
#EUC5706
45
Backup slides
46
Performance Metrics We Care About
Network link Display
protocol
Desktop CPU
usage
Bandwidth
usage
User
Experience
Lower CPU usage
Better host consolidation
Lower cost
Lower BW usage
More users supported
Better user experience
Lower response time
Better user experience
Happy VDI users
47
View 5.2 Feature Pack 2
Real-time Audio Video
Flash URL redirection
48
Overview
Benefits
Improved end user experience with
broader application support
Up to 100x bandwidth reduction
Improves installation and administration of
microphone and webcam devices
Real-Time Audio-Video
Improved Microphone and Webcam Experience
Webcams and Microphones are now
generally supported with Horizon View
Windows clients
Broader application support for webcams
with Webex, Skype and GoogleTalk
Compressed audio/video reduces
upstream BW to as low as 300kbps
View
Client
V
A
V
Compressed
A/V
Skype Webex GoogleTalk
49
“Real-Time Audio-Video” Overview
Before
• Webcams were unsupported with Horizon View desktops, unless specifically
used with optimized UC vendor solutions
• USB redirection of webcams and headsets resulted in bandwidth explosion
• Single webcam stream can result in 60 Mbps upstream to remote desktop
• Some customers redirected anyway, but with poor results
After
• General support for microphones and webcams with Horizon View desktops
• Broader application support for use with webcam video and microphone audio
• Audio/video from microphone/webcam is encoded and compressed on client
endpoint
• Bandwidth reduction to as little as 300-600kbps
50
How “Real-Time Audio-Video” Works
Skype
View Client
Encoded
audio/video Compressed Webex
GoogleTalk
View Agent
• Audio and video captured on client machine
• Audio/video encoded and compressed
• Compressed audio/video sent back to remote desktop
• On View desktop, audio/video decoded and presented to virtual webcam driver and virtual audio driver
51
Flash URL Redirection
Streaming of live video events from Adobe Media Server
Adobe Media
Server
Overview
Benefits
Stream live video events optimally to
Horizon View desktops
Support for live video streaming on Adobe
Media Server
Supported with Windows
and Linux thin clients
Stream live video events to virtual
desktops without affect datacenter server
and network
Enables new multimedia use cases with
virtual desktops
Multicast stream
52
Tuning and Optimization Strategies
Disable Build-to-lossless
• No-brainer – first and easiest way to shave 10-15% bandwidth
• Only enable when there is a defined requirement for pixel perfect accuracy
(Medical, CAD/CAM, Graphic Design)
Configure the maximum session bandwidth
• For low bandwidth links set the limit at or slightly below (10%) the max link rate
• Even on the LAN it may make sense to set a max limit
Configure the session floor when…
• PCoIP is experiencing packet loss but the network link has plenty of headroom
• May not always improve user experience – YMMV
• Packet loss is seen on WiFi or 3/4G networks
• Be careful to avoid unintentional oversaturation
53
PCoIP Best Practices Recommendations
Setting Default Recommendation Description
Build to lossless On Turn Off Enables the ability to enable or disable build to lossless
Session Audio BW limit 500Kbps 50 - 100Kbps Reduces BW usage of audio with usable quality
Maximum frame rate 30 Change to 10-15 based
on network settings
In WAN conditions, this will be helpful for video playback and fast graphics operations
Maximum link rate - Set it as per network
conditions Good for better bandwidth estimation
Client side cache size
250MB
Set per client-side memory available
This allows you to configure the client side image cache size.
More details in the white paper: http://www.vmware.com/files/pdf/view/vmware-horizon-view-best-practices-performance-study.pdf
54
Tuning and Optimization Strategies
Configure the maximum frame rate
• In almost all cases the maximum frame rate can be reduced to 18-20fps with
little noticeable impact – but also little gain.
• Settings below 15fps may be noticeable in use cases which require rich media
• Task workers without media requirements can often utilize settings as low as
6-8fps without significant visual impact
• Examine the PCoIP Server log files and WMI Image stats to determine
average frame rate for desired use case:
MGMT_IMG :log: cur_s 0 max_s 30 tbl 2 bwc 0.01 bwt 8.95 fps 5.57
MGMT_IMG :log: cur_s 0 max_s 30 tbl 2 bwc 0.01 bwt 8.95 fps 6.26
Configure the maximum initial image quality
• When on a WAN link with constrained bandwidth reduce this setting to 60-70%
• For use cases that use large amounts of multimedia/video – large impact
• Setting this value too low may result in noticeably “fuzzy” or “blurry” images
55
Tuning and Optimization Strategies
Configure the minimum image quality
• This value must be below the maximum initial image quality setting
• The default value of 50% is acceptable for most cases
Configure the audio bandwidth limit
• For use cases that utilize significant amounts of audio - legal/medial
transcription for example – reducing audio bandwidth may increase user
density
• Audio bandwidth limit is a target, not a literal value
• Vary the audio bandwidth limit between 450Kbps – 50Kbps until the desired
mix of bandwidth savings and audio intelligibility is achieved
Configure the Client-side cache size
• When using thin client devices with limited RAM using a larger cache size than
the device can support may lead to dropped sessions
• Reduce the cache size until connections are unaffected, typically 50-100MB
56
Overview
Benefits
Enhanced Usability: One stop shopping
for end user access to all their corporate
workloads.
Horizon Brokering of View Desktops
Horizon Supports User Entitlement to Desktops and SSO
View Desktop pools are connected into
Horizon after they are provisioned
Horizon provides single point of access for
end users to desktops, data and apps.
Horizon supports SSO brokering user to
available desktops based on entitlement
policy