VMworld 2013: Building a Validation Factory for VMware Partners

42
Building a Validation Factory for VMware Partners Tim Harris, VMware TEX5485 #TEX5485

description

VMworld 2013 Tim Harris, VMware Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare

Transcript of VMworld 2013: Building a Validation Factory for VMware Partners

Page 1: VMworld 2013: Building a Validation Factory for VMware Partners

Building a Validation Factory for VMware Partners

Tim Harris, VMware

TEX5485

#TEX5485

Page 2: VMworld 2013: Building a Validation Factory for VMware Partners

2 2

Disclaimer

This session may contain product features that are

currently under development.

This session/overview of the new technology represents

no commitment from VMware to deliver these features in

any generally available product.

Features are subject to change, and must not be included in

contracts, purchase orders, or sales agreements of any kind.

Technical feasibility and market demand will affect final delivery.

Pricing and packaging for any new technologies or features

discussed or presented have not been determined.

Page 3: VMworld 2013: Building a Validation Factory for VMware Partners

3 3

About the Speaker…

Tim Harris:

• At VMWare since 2007

• Currently running ISV Validation Program

• Engineering and Lab Resources for TAP members

• Oracle Corp for nearly 10 years

• Managed various performance engineering teams

• Ran Oracle Applications Standard Benchmark effort

• PhD in Computer Science

• Focus on Parallel Computing algorithms and architectures

• BS in Electrical Engineering

3

Page 4: VMworld 2013: Building a Validation Factory for VMware Partners

4 4

Agenda

Validation Services Overview

• Goals and Best Practices

Why Build a Validation Factory?

• Business and Technical Value

Process and Procedures

• Org charts, resources, planning and objectives

Tuning Best Practices and Telco

• What’s challenging today, and how best to solve those challenges

4

Page 5: VMworld 2013: Building a Validation Factory for VMware Partners

5 5

Validation Services

Page 6: VMworld 2013: Building a Validation Factory for VMware Partners

6 6

Overview of Validation Services

Engineering Back-End to ISV Alliances

• Lab and Engineer Resources

• Free of Cost, Indirect Revenue for VMware

Performance Validations

• Virtualized Net-New App

Business Continuity/Disaster Recovery

• Site Recovery Manager

• VMware HA, vMotion, DRS, FT

Cloud Migration Services

• vCloud Director

• vApps

• vShield

• Hosting and Billing

Performance

Validations

View and BCDR

Cloud/ SAAS

Page 7: VMworld 2013: Building a Validation Factory for VMware Partners

7 7

Settings Goals for a Performance Validation

Primary: Remove blockers for adoption

• As perceived by you, the Partner

VMware in supporting role here

• We do not set requirements

Supportability

• Our mutual customers should be happy

Maximize Value Proposition

• Synergy in combined functionality?

• 1 + 1 = 3 opportunities?

Page 8: VMworld 2013: Building a Validation Factory for VMware Partners

8 8

Performance Goals

Same performance as physical?

• Is “nearly the same” enough?

What are the application stress points?

• Realtime access to CPU?

• High throughput access to I/O?

• Dynamic memory footprint?

Infrastructure requirements

• Storage requirements

• Load driver requirements

Application level KPIs?

• For small, medium and large customers

Page 9: VMworld 2013: Building a Validation Factory for VMware Partners

9 9

Validation Goals and Common vSphere Use Cases

Validation Collaboration

• Many general learning opportunities

What’s likely vSphere configuration

• Existing cluster of 6 to 12 nodes

• DRS turned on

• Reservations turned off

• HA turned on

• Mix of diverse workloads

vSphere Admin’s may

• Prioritize the good of the many

• Vs the good of the few (applications)

Any conflicts with your best practices?

Page 10: VMworld 2013: Building a Validation Factory for VMware Partners

10 10

Vmware Ready and Validations

Vmware ready is marketing certification program

• Applications Category requires some performance testing

• Designed as self-service activity

Validations mean can waive testing requirements

• If you’ve done good performance work

• Can provide testing waiver

Testing requirements are modest

• Apply load and observe behavior and capacity

Page 11: VMworld 2013: Building a Validation Factory for VMware Partners

11 11

Why Build a Validation Factory?

Page 12: VMworld 2013: Building a Validation Factory for VMware Partners

12 12

What Is a Validation Factory?

Validate All Your Applications

• Solution Level, Suite Level, Company Level

Plan for Capacity with Resource Requirements

• Hardware, Manpower, Marketing, Management

• Move from Event to Service model

Leverage results

• Document, Market, Enable the Field

Broaden solutions

• BC/DR, Hybrid Cloud (Private/Public), VDI

Get Certified

• VMware Ready status for all products

Page 13: VMworld 2013: Building a Validation Factory for VMware Partners

13 13

Validation Factory: Why Do It?

Provide Suite level virtualization advice

• Combine point products into virtualized solutions

Differentiate from competitors

• Establish technical leadership across products

Provide broader value of single platform

• Point products not sufficient

Enable delivery of specific deployment architectures

• E.g. 5 product suite on 3 node cluster supports 200 users

Page 14: VMworld 2013: Building a Validation Factory for VMware Partners

14 14

Process and Procedures

Page 15: VMworld 2013: Building a Validation Factory for VMware Partners

15 15

Org Chart and Process

Centralized Resources are easier

• Center of Expertise Model

Two Major Product Categories

• Need full validation to support

• Just need VMware Ready logo

Build Prioritized List

• Easy/Quick wins

• Hard/Longer Challenges

Internal and External Marketing

• Take credit for incremental achievements

Page 16: VMworld 2013: Building a Validation Factory for VMware Partners

16 16

Factory Deliverables

Suite Level VMware Ready Status

• vSphere based solutions

• Reference architectures

• Availability story

• Solution Deployment Guide

Span the Gap from R&D to Field

• Key architects in the loop

• Field enabled to understand and sell

Document and Market

• External doc delivered

• Internal message delivered

Page 17: VMworld 2013: Building a Validation Factory for VMware Partners

Planning Your Validation Effort

Page 18: VMworld 2013: Building a Validation Factory for VMware Partners

18 18

Validation Process in Agile Sprints

Planning Sprint: 3 weeks

• Iteratively populate test plan template

• HW resource requirements

• Storage volume and throughput

• Workload and Load Driver Tooling

Execution: 3 weeks

• At VMware Labs or ISV Labs

Wrap up: 3 weeks

• Interactively create Field Facing Documents

• Any join marketing/Press releases/VMware Ready Logos, etc

Add concurrency to increase throughput

• Different products can overlap sprints

Plan Execute Wrap –

up

Page 19: VMworld 2013: Building a Validation Factory for VMware Partners

19 19

Planning Risk Factors

Infastructure limitations

• Little is learned by testing with insufficient capacity

• Entire benchmark limited by smallest bottleneck

Storage throughput

• Do we know the requirements?

• Can we verify the device can hit requirements?

• E.g. run IOMeter before testing begins

Length of effort

• Assume problems throughput before locking in dates

• Or choose timeline and work backwards to test schedule

• E.g. We plan 2 weeks of testing and reserve 3 weeks of HW

Page 20: VMworld 2013: Building a Validation Factory for VMware Partners

Executing on Your Validation Effort

Page 21: VMworld 2013: Building a Validation Factory for VMware Partners

21 21

Environment Build-out

Assume Build period largely single threaded

• Not considered full lab time

Start all staging/installs week before

• Assume long copy/install/datagen steps

• May include snail mail steps

• Ship USB drives for items bigger than 20G

• 10G and under via FTP

Full install on greenfield VM

• Most common process

vApps (OVFs) arguably better

• But more likely to break size limits for FTP

Page 22: VMworld 2013: Building a Validation Factory for VMware Partners

22 22

Load Drivers and Validations

Good load driver is critical to Performance testing

• Not virtualization specific

Load drivers are expensive to build

• Assume 2 man years and 6 calendar months

Bad load drivers don’t represent realistic use cases

• Focus should be on customer critical activities

• Proving the performance of edge cases is a waste of resources

• Load should represent common production load

Page 23: VMworld 2013: Building a Validation Factory for VMware Partners

23 23

Physical vs. Virtual Comparisons

Obvious choice, but not always correct choice

• Costs substantially more

• Adds a bit more value

Assume P-vs-V costs 2X+ more time/resources

• Physical HW setup is slow and inflexible

• Apples to Oranges comparisons common

Apples to Apples is…

• Must remove resources from physical to match VM

• VM must not consume all physical resources

• Hypervisor will have resources in production

• Needs to have resources in testing too

Page 24: VMworld 2013: Building a Validation Factory for VMware Partners

24 24

Tuning Best Practices and Telco

Page 25: VMworld 2013: Building a Validation Factory for VMware Partners

25 25

Executive Summary: vSphere Tuning in Last 5 Years

Used to be scary – now they just work:

• High I/O Applications: Run at wire speed now

• Monster VM type workloads: Big iron now in a VM

• Enterprise use cases for Linux: Now safer

What’s still hard?

• Realtime requirements under 1 ms

• ESX 3.5 – 100 ms

• ESX 4 and 5 – 10 ms

• ESX 5.1 and 5.5 – working on sub-ms (100s of microseconds) now

• vMotion of Huge Realtime VMs

• 64 GB in-memory DBs like to stay still

Page 26: VMworld 2013: Building a Validation Factory for VMware Partners

26 26

Example Telco Workload Challenges

Service Provider Use Cases

• Large SAAS deployments

BC/DR QOS Built into application

• Realtime active/passive failover

Conservative by nature

• “Don’t try and fix it if you might break it”

Realtime Transaction Rates

• Latency requirements of <10ms

Page 27: VMworld 2013: Building a Validation Factory for VMware Partners

27 27

Tuning Strategies

Shopping list of tune-ables may be misused

• Changes for changes sake

Experimental science says

• Make one change at a time

• Assess value of change

• Remove or move on to next change

Prioritize by relative impact

• No reason to make change if can’t solve a problem

Page 28: VMworld 2013: Building a Validation Factory for VMware Partners

28 28

Large Tuning Knobs Available

Incrementally back off virtualization

• Realtime demands likely can be met

Reservations for CPU and Memory

• Hard allocation of resources

If truly needed – CPU Affinity

• Exclusive or with Halt.desched flag

If truly needed – NIC passthrough

• With SRIOV or not

Horizontally scaled apps

• Still have less scheduling overhead

Storage design still critical

• Ensure Iops are available before tuning

Page 29: VMworld 2013: Building a Validation Factory for VMware Partners

29 29

Advanced Tuning: CPU Affinity

CPU Affinity (aka Pinning)

• Rumored to be critical for VOIP

• Our data shows little gain with vSphere 4.x and before

Affinity and vSphere 5.0

• Allows “Exclusive Affinity”

• Previously, cores still accessible to other VMs despite affinity

0

2

4

6

8

Max DSP Execution Timein Milliseconds

SLA

Without ExclusiveAffinity

With ExclusiveAffinity

Page 30: VMworld 2013: Building a Validation Factory for VMware Partners

30 30

Halt Desched vs. Affinity vs. Latency Sensitive

“Pre-Allocating” CPU resources to a VM

• Reducing benefits of virtualization (vmotion, overcommit)

• Reducing scheduling overhead

Hierarchy of Techniques

• Simple reservations first

• Exclusive CPU Affinity (5.0 and beyond)

• Halt Desched option

Latency Sensitive UI available in 5.1 and beyond

• At highest setting, equivalent to Exclusive CPU affinity

Halt Desched

• vCPUs at 100% usage even if no work being done

• monitor_control.halt_desched set to FALSE

Page 31: VMworld 2013: Building a Validation Factory for VMware Partners

31 31

Horizontal Scaling and Latency Sensitivity

Scheduling overhead a function of vCPUs per VM

• 4 to 8 vCPU VMs may be our sweet spot

Many Applications scale horizontally effectively

• Doesn’t need to impact aggregate resources for an application

• E.g. double vm count and halve vCPUs per VM

• Trade-offs with management overhead of more VMs

Expect less jitter with smaller VMs

• Empirical result across many workloads

Page 32: VMworld 2013: Building a Validation Factory for VMware Partners

32 32

Non-Uniform Memory Access (NUMA) Impacts

Physical Memory Spread across NUMA Nodes

• Typically one node per socket

Access to remote node’s memory expensive

• Access to local node “cheap”

Monitor from ESXtop

• NUMA stats: %local memory should be 100

• vSphere 5 more NUMA aware than previous

• Small-ish VMs and Smallish RAM best case

Align Core count per socket with vCPUs

• Fully occupy integer socket count

Disable “Node Interleaving” at BIOS to enable NUMA

• Node interleaving (enabled) leads to consistent but poor performance

Page 33: VMworld 2013: Building a Validation Factory for VMware Partners

33 33

Advanced Tuning: Direct Path I/O

Direct Path I/O (aka NIC Pass-through)

• Disables vMotion

• Makes physical NIC available for only one VM

Substantial jitter improvements in realtime workloads

• But at substantial cost in vSphere functionality

SRIOV provides alternative

• Reusable NIC with vMotion and Pass-through

020406080

100

Worst Case Latency inMilliseconds

SLA

Without Direct PathI/O

Direct Path I/O

Page 34: VMworld 2013: Building a Validation Factory for VMware Partners

34 34

Interrupt Management and Latency Sensitive Workloads

Interrupt coalescing in vSphere 4.x and 5

• Does “Adaptive Interrupt Coalescing” by default

• Groups interrupts to reduce impact and CPU

• Group size (queue depth) dynamically adjusts to the workload

Adaptive coalescing may introduce latency

• Can disable coalescing for latency sensitive workloads

• Some improvements observed, but not always a win

Pinning of interrupts

• Likely used with CPU pinning

• Keeps all interrupts on vCPU and hence pCPU

• Modest gain – test before using

Page 35: VMworld 2013: Building a Validation Factory for VMware Partners

35 35

Latency Sensitive Tuning and Overcommitment

Safest solution – undercommit physical cores on each host

• E.g. 16 core server runs no more than 14 vCPUs

• 1-2 cores per host and 2G of RAM uncommitted

Challenges with undercommitment

• HW utilization, DRS in cluster with mixed workloads, etc.

• Most viable with dedicated (to one app) clusters

Alternative approaches

• CPU Affinity locks a VM to cores

• Other cores available for general use in cluster

Page 36: VMworld 2013: Building a Validation Factory for VMware Partners

36 36

Realtime Tuning Summary

Start with simple techniques

• Reservations, BIOS tuning, etc

Move towards pre-allocation of resources

• CPU Exclusive Affinity if CPU bound

• NIC-passthrough if network bound

Consider horizontal scaling of configuration

• More, smaller VMs

Test one change at a time and iterate

• Don’t overlap your changes

Page 37: VMworld 2013: Building a Validation Factory for VMware Partners

37 37

Telco Progress In-flight

Active Efforts with Nearly Every Global Telco Provider

• Some solutions in market, so on the way

Easy to virtualize pieces definitely exist

• Careful prioritization of efforts underway

Realtime workloads are achievable

• 2ms for compute and packet send consistently achievable (5.1)

• <1ms QOS work in progress (5.5?)

Availability still adds value

• Augment built in availability story

• Protect previous unprotected components

Page 38: VMworld 2013: Building a Validation Factory for VMware Partners

38 38

Validation Factory Summary

Vendors see value in Suite Level solutions design

• TAP program can provide support for such efforts

VMware Ready status for all applications

• Detailed performance assessment for some

What was once hard is not possible

• Most challenging applications successfully virtualized today

Page 39: VMworld 2013: Building a Validation Factory for VMware Partners

39 39

Questions?

Page 40: VMworld 2013: Building a Validation Factory for VMware Partners

THANK YOU

Page 41: VMworld 2013: Building a Validation Factory for VMware Partners
Page 42: VMworld 2013: Building a Validation Factory for VMware Partners

Building a Validation Factory for VMware Partners

Tim Harris, VMware

TEX5485

#TEX5485