ESXi Installable and vCenter Server Setup Guide - ESXi 4.0 Installable
VMworld 2013: ESXi Native Networking Driver Model - Delivering on Simplicity and Performance
-
Upload
vmworld -
Category
Technology
-
view
1.560 -
download
7
description
Transcript of VMworld 2013: ESXi Native Networking Driver Model - Delivering on Simplicity and Performance
ESXi Native Networking Driver Model - Delivering on
Simplicity and Performance
Margaret Petrus, VMware
TEX4759
2 2
Disclaimer
This presentation may contain product features that are currently
under development.
This overview of new technology represents no commitment from
VMware to deliver these features in any generally available
product.
Features are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new technologies or features
discussed or presented have not been determined.
3 3
Key Takeaways
1. The benefits of moving to native driver model with an overview
of the different layers.
2. Jumpstart to build your own native driver.
3. The significant CPU savings achieved using the native model,
while retaining simplicity and supportability.
4 4
Agenda
Overview of Native Model
Module Components and Interactions
Native Network Driver Deep Dive
Building your driver in the native model
Advanced Features
Performance
Summary
5 5
Overview of Native Model
6 6
Why Native Driver Model?
Foundation to build new extensible features for ESXi hypervisor
Increasing number of VMs in growing cloud deployments demand
Device driver robustness
Best performance
Better supportability, manageability, and debuggability
Provide long term binary compatibility support
Better flexibility and support to release new features in the networking,
storage areas, etc.
7 7
High-level Native Driver Model Overview
vmkernel
I/O Subsystems Device
Manager
Device
Layer
Legend: Physical device Logical device Relationship
Device and Driver
Objects
Drivers
8 8
Module Components and Interactions
9 9
Quick Comparison with VMKLNX Model
VM
I/O Subsystems
vmkplexer
vmklinux
Linux driver
vmKernel
Emulated
Linux Driver
Model VM
I/O Subsystems
Device Layer
Dev
Mgr
vmKernel
Native Driver Native Driver ESXi Driver
Native Driver
Model
10 10
Device layer
PCI Native driver
vmkdevmgr
IO subsys (scsi, net)
vmklinux
vmklnx_driver
ACPI
vmkctl driver.map
Layer Interactions in Native Model vs. Vmklinux Model
User Level
Kernel
11 11
Native Network Driver Deep Dive
12 12
Device layer
PCI elxnet
vmkdevmgr
IO subsys (scsi, net)
ACPI
vmkctl elxnet_devices.py
High Level Native Networking Driver Model (using elxnet)
Uplink Module
elxnet – Emulex Native Driver for BE3 Devices
User Level
Kernel
13 13
Native Networking Driver Module Interactions
Module layer - Register/Unregister driver with module layer interface • init_module()
• cleanup_module()
Device Driver layer – Register with device driver interface • Provide vmk_DriverProps and vmk_DriverOps
• Callbacks for: DriverAttachDevice(), DriverDetachDevice()
DriverScanDevice(), DriverForgetDevice()
DriverStartDevice(), DriverQuiesceDevice()
PCI layer – Needed for PCI Config access, BAR mapping, SR-IOV, etc. • vmk_PCIReadConfig(), vmk_PCIWriteConfig()
• vmk_PCIMapIOResource(), vmk_PCIUnmapIOResource()
Uplink layer - Provides the access to the networking stack • Driver has to interact with uplink directly for all operations
• Uplink registration results in logical child (vmnicX) creation
• Register networking HW capabilities and provide appropriate callbacks
Management CLI - Support only via esxcli, not ethtool!
14 14
Module Layer
15 15
Module Layer: init_module()
Key steps: 1. Register module with vmkernel via vmk_ModuleRegister().
2. Initialize driver name via vmk_NameInitialize().
3. Create heap via vmk_HeapCreate() and memory pool via
vmk_MemPoolCreate().
4. Register for driver logging via vmk_LogRegister().
5. Create lock domain for the module via vmk_LockDomainCreate().
6. Register the driver with the driver database via vmk_DriverRegister(). This is where you register the driver properties, i.e. the device layer CB handlers.
static vmk_DriverOps elxnetDrvOps = {
.attachDevice = elxnet_attachDevice,
.detachDevice = elxnet_detachDevice,
.scanDevice = elxnet_scanDevice,
.startDevice = elxnet_startDevice,
.quiesceDevice = elxnet_quiesceDevice,
.forgetDevice = elxnet_forgetDevice,
};
static vmk_DriverProps elxnetDrvProps = {
.ops = &elxnetDrvOps,
};
16 16
Module Layer: cleanup_module()
The steps executed in init_module() occur in the reverse order:
1. Unregister driver via vmk_DriverUnregister().
2. Destroy created lock domain via vmk_LockDomainDestroy().
3. Unregister driver log via vmk_LogUnregister().
4. Destroy heap via vmk_HeapDestroy().
5. Destroy memory pool via vmk_MemPoolDestroy().
6. Unregister module via vmk_ModuleUnregister().
17 17
Device Layer
18 18
How does Native Driver claim its devices?
1. PCI bus drv scans PCI bus, detects PCI NICs, produces PCI NIC dev object.
2. Device Layer notifies Device Manager of device existence. Device Manager consults the PCI bus plugin to locate the driver
3. NIC Drv registers with Dev layer, providing CBs to claim PCI NIC dev object.
4. Device Manager binds NIC driver module to PCI NIC device object.
5. Device Layer calls NIC driver's AttachDevice callback: NIC driver claims PCI NIC device object
NIC driver initializes hardware
6. Device Layer calls NIC driver's StartDevice callback
NIC driver leaves quiesced state
7. Device Layer calls NIC driver's ScanDevice callback: NIC driver produces logical uplink device object.
8. Device Layer notifies Device Manager of logical device existence. Device Manager consults the Logical bus plugin to locate the driver
Device manager binds the uplink device to the uplink driver
attach, start, scan callbacks invoked for uplink device
9. NIC driver registers uplink capabilities in Uplink Registration callback.
10.NIC driver can start RX and networking subsystem can start TX on this NIC.
19 19
Flow to claim NIC and make it IO-able
NIC Driver Networking
Subsystem Device Layer
vmk_DriverAttachDevice(vmk_PCIDevice)
vmk_DriverStartDevice()
vmk_DriverScanDevice()
vmk_DeviceRegister(vmk_DeviceProps,
vmkDev, &uplinkDev) Create and Register uplinkDev
vmk_UplinkAssociate() to asynchronously
notify uplink for the device
vmk_UplinkCapRegister() to register
each capability
vmk_UplinkStartIO()
1. Arm interrupts in HW
2. Enable interrupts in vmkernel
3. Update uplink link status
Uplink is ready
for Tx/Rx
processing
HW initialized
for IO
20 20
Device Layer: DriverAttachDevice()
The attachDevice callback registered in vmk_DriverRegister() is invoked.
• The driver should start driving this device, get it ready for IO.
• If not capable of driving, return error and restore device to original state.
What is done in this routine? 1. Allocate memory for driver data structures.
2. Invoke vmk_DeviceGetRegistrationData() to get PCI Device handle.
3. Invoke vmk_PCIQueryDeviceID() to validate driver can support this device.
4. Invoke vmk_PCIQueryDeviceAddr() to get the PCI Device Address.
5. Create the DMA engine via vmk_DMAEngineCreate() with right properties.
6. Map the bars via vmk_PCIMapIOResource() calls.
7. Initialize the HW and ensure that it comes up fine, else error out.
8. Setup stats collections, other driver specific stuff, etc.
9. Allocate interrupt vectors via vmk_PCIAllocIntrCookie() (w/ typeVec, numVec).
10. Create UplinkData – fills up the registration data ops and sharedData fields.
11. Do other controller setup and any other needed configurations.
12. Call vmk_DeviceSetAttachedDriverData() to associate drvPrivDataPtr with
vmk_Device handle.
21 21
Device Layer: DriverStartDevice()
Callback after successful attachDevice: Device will not be ready, i.e. not in IO-able state until this callback is done.
Puts the device in an IO-able state.
Can be invoked to place a device back in an IO-able state any time after
vmk_DriverQuiesceDevice() has explicitly put device in quiesced state.
What it does? 1. Get drvPrivDataPtr using vmk_DeviceGetAttachedDriverData().
2. Post rx fragments for all the Rx queues it supports.
3. Register interrupts allocated during uplink shared data creation:
• Register interrupts via vmk_IntrRegister().
• Set affinity via vmk_NetPollInterruptSet().
4. Create any worker threads as worlds via vmk_WorldCreate().
22 22
Device Layer: DriverScanDevice()
Invoked at least once after a device has been attached to a driver.
May be invoked at other device hotplug events as appropriate.
New devices may be registered from this callback only.
Main Steps: 1. Find bus type of the PCI Device via vmk_BusTypeFind().
2. Create logical address via vmk_LogicalCreateBusAddress().
3. Register device with vmkernel via vmk_DeviceRegister() passing in the
vmk_DeviceProps structure.
typedef struct {
vmk_Driver registeringDriver;
vmk_DeviceID *deviceID; VMK_UPLINK_DEVICE_IDENTIFIER
vmk_DeviceOps *deviceOps; has callback .removeDevice
vmk_AddrCookie registeringDriverData; holds drvPrivDataPtr
vmk_AddrCookie registrationData;
} vmk_DeviceProps;
23 23
Device Layer: DriverForgetDevice()
Notification callback from vmkernel
To indicate device is no longer accessible
Driver no longer to wait indefinitely on any device operation
Must always return with success for any subsequent device callbacks
vmk_DriverQuiesceDevice()
vmk_DriverDetachDevice()
Case-specific callback, surprise removal only, not always called
24 24
Device Layer: DriverQuiesceDevice()
Callback places the device in quiesce’d state:
Prepare for operations like device removal, driver unload, or system shutdown.
This callback indicates that driver should:
• Complete any IO on the device
• Flush any device caches to quiesce device
Steps (reverse of StartDevice):
1. Get drvPrivDataPtr via vmk_DeviceGetAttachedDriverData().
2. Halt and destroy any worker threads created during StartDevice.
3. Handle all Tx completions.
4. Cleanup all Rx queues.
5. Unregister interrupts for all Rx queues:
• Invoke vmk_NetPollInterruptUnSet() to remove affinity.
• Invoke vmk_IntrUnregister() to unregister previously registered interrupt.
25 25
Device Layer: DriverDetachDevice()
This is another handler passed in during vmk_DriverRegister() call. • Driver should stop driving this device, and release its resources.
• Driver should not touch the device after this.
Steps: 1. Get drvPrivDataPtr via vmk_DeviceGetAttachedDriverData().
2. Cleanup all the resources allocated for your interface:
Destroy any queues allocated
Notify HW that you are stopping all access
3. Cleanup UplinkData created and setup in DriverAttachDevice().
4. Release all interrupt vectors via vmk_PCIFreeIntrCookie().
5. Cleanup any memory allocated for driver structures from memory pool or heap.
6. Any other control path cleanup, i.e. destroy spinlock or semaphores.
7. Unmap BARs via vmk_PCIUnmapIOResource().
8. Destroy created DMA Engine via vmk_DMAEngineDestroy().
9. Free up and clean out any other resources allocated.
26 26
Logical Uplink Layer
27 27
Uplink Layer Major Data Structures
vmk_UplinkRegData – uplink registration data
Driver responsible for allocating and populating this structure
Pointer to this struct is stored in vmk_DeviceProps->registrationData
vmk_UplinkOps – handler for basic uplink operations
vmk_UplinkSharedData – data shared between uplink layer and NIC driver
Allocated and initialized by driver
Driver readable and writable
Uplink layer readable only
vmk_UplinkSharedQueueInfo – shared info for all queues between uplink
layer and driver
vmk_UplinkSharedQueueData – shared data for a single queue
28 28
Uplink Layer: vmk_UplinkRegData
Driver associates the following registration data to the vmk_Device
when creating the logical uplink:
typedef struct vmk_UplinkRegData {
vmk_revnum apiRevision; // VMKAPI version
vmk_ModuleID moduleID; // module ID of NIC drv
vmk_UplinkOps ops;
vmk_UplinkSharedData *sharedData; // Runtime data shared
// b/w kernel & driver
vmk_AddrCookie driverData; // Driver context data
} vmk_UplinkRegData;
29 29
Uplink Layer: vmk_UplinkOps
Structure containing function pointers for required driver operations.
The functions are callbacks from the vmkernel into the NIC driver.
typedef struct vmk_UplinkOps {
vmk_UplinkTxCB uplinkTx; // Tx packt list CB
vmk_UplinkMTUSetCB uplinkMTUSet; // modify MTU CB
vmk_UplinkStateSetCB uplinkStateSet; // modify state CB
vmk_UplinkStatsGetCB uplinkStatsGet; // get stats CB
vmk_UplinkAssociateCB uplinkAssociate; // notify drv about assoc uplink
vmk_UplinkDisassociateCB uplinkDisassociate; // notify drv of disassoc uplink
vmk_UplinkCapEnableCB uplinkCapEnable; // cap enable CB
vmk_UplinkCapDisableCB uplinkCapDisable; // cap disable CB
vmk_UplinkStartIOCB uplinkStartIO; // start IO CB
vmk_UplinkQuiesceIOCB uplinkQuiesceIO; // queiesce all IO
vmk_UplinkResetCB uplinkReset; // reset issued uplink
} vmk_UplinkOps;
30 30
Uplink Layer: vmk_UplinkSharedData
The vmk_UplinkRegData->sharedData points to a driver allocated data
structure shared between vmkernel and NIC driver:
typedef struct vmk_UplinkSharedData {
vmk_VersionedAtomic lock; // ensure snapshot consistency
vmk_UplinkFlags flags; // uplink flags
vmk_UplinkState state; // uplink state
vmk_LinkStatus link; // uplink link status
vmk_uint32 mtu; // uplink mtu
vmk_EthAddress macAddr; // current logical MAC
vmk_EthAddress hwMacAddr; // permanent HW MAC
vmk_UplinkSupportedMode *supportedModes;
vmk_uint32 supportedModesArraySz;
vmk_UplinkDriverInfo driverInfo; // driver info
vmk_UplinkSharedQueueInfo *queueInfo; // shared qinfo
} vmk_UplinkSharedData;
31 31
Uplink Layer: vmk_UplinkSharedQueueInfo
Defines uplink level shared queue info for all queues
For the queueData field above, drivers need to populate one queue
even if they do not support multiple queues.
vmk_UplinkSharedQueueInfo {
vmk_UplinkQueueType supportedQueueTypes;
vmk_UplinkQueueFilterClass supportedRxQueueFilterClasses;
vmk_UplinkQueueID defaultRxQueueID;
vmk_UplinkQueueID defaultTxQueueID;
vmk_uint32 maxRxQueues;
vmk_uint32 maxTxQueues;
vmk_uint32 activeRxQueues;
vmk_uint32 activeTxQueues;
vmk_BitVector *activeQueues;
vmk_uint32 maxTotalDeviceFilters;
vmk_UplinkSharedQueueData *queueData;
} vmk_UplinkSharedQueueInfo;
32 32
Uplink Layer: vmk_UplinkSharedQueueData
Contains all the info about one specific Tx or Rx queue.
This struct is shared with uplink layer.
typedef struct vmk_UplinkSharedQueueData {
volatile vmk_UplinkQueueFlags flags;
vmk_UplinkQueueType type;
vmk_UplinkQueueID qid;
volatile vmk_UplinkQueueState state;
vmk_UplinkQueueFeature supportedFeatures;
vmk_UplinkQueueFeature activeFeatures;
vmk_uint32 maxFilters;
vmk_uint32 activeFilters;
vmk_NetPoll poll; // associated netPoll context
vmk_DMAEngine dmaEngine; // associated dma engine
vmk_UplinkQueuePriority priority; // tx queue priority
vmk_UplinkCoalesceParams coalesceParams;
} vmk_UplinkSharedQueueData;
33 33
Creation of UplinkSharedData during DriverAttachDevice
Create/initialize sharedData area: sharedData has a versioned atomic (not a spinlock)
Uplink layer can only read from this area
Driver can read/write to this area
Driver needs to define its own spinlock for writer serialization
Shared Data: 1. Supported speed/duplex modes to be advertised to uplink.
2. Current MTU setting, and link/speed/duplex states.
3. Queue info (numQ, supported queue types, supported filter classes).
4. Rx and Tx queue fields (flags, type, state, supportedFeatures, dmaEngine,
maxFilters).
5. netPoll for each Rx queue via vmk_NetPollCreate().
6. Allocated default Rx and Tx queues (not yet activated).
34 34
Uplink Layer: uplinkStartIO() Callback
1. Arm the interrupts (link, multiQ, etc) in the HW.
2. Configure for VLAN filtering as needed
3. Change internal driver state to IO-able.
4. Set configured flow control.
5. Now, enable interrupts in vmkernel via vmk_IntrEnable().
6. Check for link status changes, update sharedData and invoke
vmk_UplinkUpdateLinkState() as needed.
35 35
Uplink Layer: uplinkQuiesceIO() Callback
1. Check if IO is already quiesce’d due to possible failures.
2. Disarm interrupts.
3. Disable netpoll via vmk_NetPollDisable() and vmk_NetPollFlushRx().
4. Mark link state as down via vmk_UplinkUpdateLinkState().
5. Stop all Tx queues
6. Sync all vectors via vmk_IntrSync().
7. Disable all vectors via vmk_IntrDisable().
8. Change internal driver state to quiesced.
36 36
Register NIC capabilities to Uplink Layer
Handled when uplinkAssociateCB() is invoked to associate uplink with the
device.
Call vmk_UplinkCapRegister() to register each capability.
Two capability types:
No callbacks needed:
VMK_UPLINK_CAP_IPV4_CSO
VMK_UPLINK_CAP_VLAN_RX_STRIP
Capabilities that require callbacks:
VMK_UPLINK_CAP_MULTI_QUEUE
VMK_UPLINK_CAP_COALESCE_PARAMS
37 37
Callback Ops for VMK_UPLINK_CAP_COALESCE_PARAMS:
Callback Ops for VMK_UPLINK_CAP_PRIV_STATS:
Examples of Capabilities with Callbacks
typedef struct vmk_UplinkCoalesceParamsOps {
vmk_UplinkCoalesceParamsGetCB getParams;
vmk_UplinkCoalesceParamsSetCB setParams;
} vmk_UplinkCoalesceParamsOps;
typedef struct vmk_UplinkPrivStatsOps {
vmk_UplinkPrivStatsLengthGetCB privStatsLengthGet;
vmk_UplinkPrivStatsGetCB privStatsGet;
} vmk_UplinkPrivStatsOps;
38 38
Interrupt/Netpoll Handling
Registering interrupts: vmk_IntrProps is populated and passed to vmkernel in DriverStartDevice().
Driver Ack handler: Ack interrupt to HW if needed (INTx)
Increment interrupt counter
Driver Isr handler: Handle any queue notifications as needed
Activate the netpoll for the particular queue via vmk_NetPollActivate()
Driver NetPoll Callback Handler: Handle any Tx, Rx, or Ctrl events
If there is work but budget exceeded, remain in poll mode & return VMK_TRUE
If no more work, go back to interrupt mode and return VMK_FALSE
typedef struct vmk_IntrProps {
vmk_Device device;
vmk_Name deviceName;
vmk_IntrAcknowledge acknowledgeInterrupt; driver ack handler
vmk_IntrHandler handler; driver isr handler
void *handlerData;
vmk_uint64 attrs;
} vmk_IntrProps;
39 39
Packet Management VMKAPIs in the Tx/Rx Path
Basic allocation, release and field manipulation: • vmk_PktAlloc()
• vmk_PktRelease()
• vmk_PktReleasePanic
• vmk_PktFrameLenGet()
• vmk_PktFrameLenSet()
• vmk_PktTrim()
• vmk_PktPartialCopy()
SG Handling: • vmk_PktSgArrayGet()
• vmk_PktSgElemGet()
• vmk_PktFrameMappedPointerGet()
• vmk_PktIsBufDescWritable()
Processing the sent down packet list: • vmk_PktListIterStart()
• vmk_PktListIterIsAtEnd()
• vmk_PktListGetFirstPkt()
• vmk_PktListIterInsertPktBefore()
• vmk_PktListIterRemovePkt()
• vmk_PktListAppendPkt()
40 40
Packet Management VMKAPIs in the Tx/Rx Path
Parse/Find the different layer headers: • vmk_PktHeaderL2Find()
• vmk_PktHeaderL3Find()
• vmk_PktHeaderEntryGet()
• vmk_PktHeaderDataGet()
• vmk_PktHeaderDataRelease()
• vmk_PktHeaderLength()
Offload Handling: • vmk_PktIsMustCsum()
• vmk_PktSetCsumVfd()
• vmk_PktIsLargeTcpPacket()
• vmk_PktGetLargeTcpPacketMss()
VLAN Handling: • vmk_PktMustVlanTag()
• vmk_PktVlanIDGet()
• vmk_PktVlanIDSet()
• vmk_PktPriorityGet()
• vmk_PktPrioritySet()
41 41
Advanced Features
MultiQueue Handling
SR-IOV
VXLAN Offload
Dynamic Load Balancing
42 42
Multi-Queue Support
Register multi-queue support via VMK_UPLINK_CAP_MULTI_QUEUE
Following callbacks passed to uplink when registering this capability
typedef struct vmk_UplinkQueueOps {
vmk_UplinkQueueAllocCB queueAlloc;
vmk_UplinkQueueAllocWithAttrCB queueAllocWithAttr;
vmk_UplinkQueueReallocWithAttrCB queueReallocWithAttr;
vmk_UplinkQueueFreeCB queueFree;
vmk_UplinkQueueQuiesceCB queueQuiesce;
vmk_UplinkQueueStartCB queueStart;
vmk_UplinkQueueFilterApplyCB queueApplyFilter;
vmk_UplinkQueueFilterRemoveCB queueRemoveFilter;
vmk_UplinkQueueStatsGetCB queueGetStats;
vmk_UplinkQueueFeatureToggleCB queueToggleFeature;
vmk_UplinkQueueTxPrioritySetCB queueSetPriority;
vmk_UplinkQueueCoalesceParamsSetCB queueSetCoalesceParams;
} vmk_UplinkQueueOps;
43 43
Multi-Queue VMKAPIs in the Tx/Rx path
Refer to vmkapi_net_queue.h
Main list of APIs for implementing multi-queue support: • vmk_UplinkQueueMkFilterID()
• vmk_UplinkQueueMkTxQueueID()
• vmk_UplinkQueueMkRxQueueID()
• vmk_UplinkQueueIDVal()
• vmk_UplinkQueueIDType()
• vmk_UplinkQueueFilterIDVal()
• vmk_UplinkQueueIDUserVal()
• vmk_UplinkQueueSetQueueIDUserVal()
• vmk_UplinkQueueIDQueueDataIndex()
• vmk_UplinkQueueSetQueueIDQueueDataIndex()
• vmk_UplinkQueueGetNumQueuesSupported()
• vmk_UplinkQueueStart()
• vmk_UplinkQueueStop()
• vmk_PktQueueIDGet()
• vmk_PktQueueIDSet()
44 44
SR-IOV Support
Setup VFs: • During DriverAttachDevice(), if SR-IOV is supported by device:
Enable VFs via vmk_PCIEnableVFs()
• During DriverScanDevice(), driver Registers its VFs via vmk_PCIRegisterVF() passing along its .removeVF callback
static vmk_PCIVFDeviceOps elxnetVFDevOps = {
.removeVF = elxnet_removeVFDevice
};
Sets control callback for VF w/ vmkernel via vmk_PCISetVFPrivateData()
Cleanup VFs: • The .removeVF callback registered during registration is called:
vmk_PCIUnregisterVF() invoked to unregister particular VF from vmkernel
• DriverDetachDevice() should call vmk_PCIDisableVFs() to disable all its VFs.
Misc VF vmkapi: • vmk_PCIGetVFPCIDevice() should be used during VF registration to get the
vmk_PCIDevice handle of a PCI VF given its parent PF and VF index.
45 45
VXLAN Offload Support
Register vxlan offload capability via VMK_UPLINK_CAP_ENCAP_OFFLOAD
Callback Ops for VMK_UPLINK_CAP_ENCAP_OFFLOAD:
If supporting RX_VXLAN filter, indicate in supportedRxQueueFilterClasses vmk_UplinkSharedQueueInfo->supportedRxQueueFilterClasses |=
VMK_UPLINK_QUEUE_FILTER_CLASS_VXLAN;
Packet parser APIs to get information on inner encapsulated headers: • vmk_PktHeaderEncapFind()
• vmk_PktHeaderEncapL2Find()
• vmk_PktHeaderEncapL3Find()
• vmk_PktHeaderEncapL4Find()
typedef struct vmk_UplinkEncapOffloadOps {
/** Handler used by vmkernel to notify VXLAN port number updated */
vmk_UplinkVXLANPortUpdateCB vxlanPortUpdate;
} vmk_UplinkEncapOffloadOps;
46 46
Dynamic Load Balancing
New NetQ feature introduced in ESXi 5.5 release:
• VMKNETDDI_QUEUEOPS_QUEUE_FEAT_DYNAMIC
NIC requirements to support this feature:
• Device able to support different NetQ "features" on any particular NetQ
• Adding or removing support for a particular NetQ not require any critical operations
If NIC driver registers DYNAMIC feature support, load balancer can/will
• Move filters between queues (i.e. bin-packing of filters), hence reducing the number
of queues in use
• Unpack filters to more queues either for latency sensitive VMs, or to reduce burden
on over saturated queues
47 47
Performance
48 48
Throughput in Gbps on a 16VM Configuration
3.00 2.97
9.40 9.40
3.03 3.02
9.41 9.40
0.000
1.000
2.000
3.000
4.000
5.000
6.000
7.000
8.000
9.000
10.000
Tx Throughput (256B) Rx Throughput (256B) Tx Throughput (64KB) Rx Throughput (64KB)
be2net
elxnet
49 49
Overall CPU Gains on a 16VM Configuration
320.89
335.32
29.45
55.40
282.56
307.15
29.34
52.04
0.000
50.000
100.000
150.000
200.000
250.000
300.000
350.000
400.000
Tx CPU Util (256B) Rx CPU Util (256B) Tx CPU Util (64KB) Rx CPU Util (64KB)
be2net
elxnet
12% Savings 6% Savings 8% Savings
50 50
Vmkernel Cost Savings on a 16VM Configuration
137.92 132.75
8.06
26.17
89.50
96.29
7.03
21.34
0
20
40
60
80
100
120
140
160
Tx CPU Util (256B) Rx CPU Util (256B) Tx CPU Util (64KB) Rx CPU Util (64KB)
be2net
elxnet
35% Savings 27% Savings 13% Savings 18% Savings
51 51
Total Mean Ping Response Time (usec) on a 16VM Config
134.19
126.82
130.04
133.39
116.23
122.41
105
110
115
120
125
130
135
140
128b 256b 512b
be2net
elxnet
Reduced by 1% Reduced by 6% Reduced by 8%
52 52
GettingStartedontheNativeDriver…
Go to https://developercenter.vmware.com/group/iovp/certs/5.5/dev-kits for
1. Native DDK Developer Guide
2. Needed toolchain RPMs
vmware-esx-common-toolchain
vmware-esx-kmdk-psa-toolchain
3. Vib-Suite RPM
vmware-esx-vib-suite-5.5.0-0.0.xxxxxxx.i386.rpm
4. Vmkapi DDK RPM:
vmware-esx-vmkapiddk-devtools-5.5.0-0.0.xxxxxxx.i386.rpm
53 53
Summary
A layered model approach with easy extensibility for new features
Overview of native model
Interaction of driver with different layers
Basic structs and handlerOps for different layers
Native model does not use vmklinux compatability layer
A layer of indirection completely removed
Translations (eg. pkt<->skb) avoided
o Allocation of skbs is not needed
o Savings in avoiding slab allocation (esp. at high packet rates)
Driver communicates directly with various layers
Performance boost in cpu savings
New IO features for ESXi will only be developed for native model.
54 54
Questions?
Contact VMware PM for more details of the native model support and for the devkits.
55 55
Other VMware Activities Related to This Session
HOL:
HOL-SDC-1302
vSphere Distributed Switch from A to Z
TEX4759
56 56
• TAP Access membership includes:
New TAP Access NFR Bundle
• Access to NDA Roadmap sessions at VMworld, PEX and Onsite/Online
• VMware Solution Exchange (VSX) and Partner Locator listings
• VMware Ready logo (ISVs)
• Partner University and other resources in Partner Central
• TAP Elite includes all of the above plus:
• 5X the number of licenses in the NFR Bundle
• Unlimited product technical support
• 5 instances of SDK Support
• Services Software Solutions Bundle
• Annual Fees
• TAP Access - $750
• TAP Elite - $7,500
• Send email to [email protected]
TAP Membership Renewal – Great Benefits
57 57
TAP
• TAP support: 1-866-524-4966
• Email: [email protected]
• Partner Central:
http://www.vmware.com/partners/partners.html
TAP Team
• Kristen Edwards – Sr. Alliance Program Manager
• Sheela Toor – Marketing Communication Manager
• Michael Thompson – Alliance Web Application Manager
• Audra Bowcutt –
• Ted Dunn –
• Dalene Bishop – Partner Enablement Manager, TAP
TAP Resources
VMware Solution Exchange
• Marketplace support –
• Partner Marketplace @ VMware
booth pod TAP1
THANK YOU
ESXi Native Networking Driver Model - Delivering on
Simplicity and Performance
Margaret Petrus, VMware
TEX4759