VMworld 2013: Failsafe at PCIe Level: Enabling PCIe Hot Swap

Post on 13-Jul-2015

86 views 1 download

Tags:

Transcript of VMworld 2013: Failsafe at PCIe Level: Enabling PCIe Hot Swap

Failsafe at PCIe Level: Enabling PCIe Hot Swap

Wenchao Cui, VMware

Caixue Lin, VMware

TEX5316

#TEX5316

2

Disclaimer

This presentation may contain product features that are currently

under development.

This overview of new technology represents no commitment from

VMware to deliver these features in any generally available

product.

Features are subject to change, and must not be included in

contracts, purchase orders, or sales agreements of any kind.

Technical feasibility and market demand will affect final delivery.

Pricing and packaging for any new technologies or features

discussed or presented have not been determined.

3

Agenda

Background and overview

Behind the scene: Native driver & PCIe hot-plug

Case study: PCIe SSD driver with hot plug capability

Get involved

Challenges and future work

Conclusion

Q&A

4

Background

Benefits of adopting PCIe hot plug

• Extend hardware with no down time

• Remove/replace hardware with no down time

• Enables fail safe at PCIe level

• PCIe SSDs to be a major driving factor

SFF-8639

5

Background (Cont.)

Differences between PCIe hot plug and disk hot plug

Disk Hot Plug

• Events delivered to HBA

• PCI device still functional

• PCI config space intact

• Interrupts still active

• Impacts SCSI LUNs and Paths

• Minimal resource cleanup

PCIe Hot Plug

• Events delivered to PCI Bridge

• PCI device not functional

• PCI config space invalidated

• Interrupts lost

• Impacts SCSI LUNs, Paths,

and VMHBAs

• Full resource cleanup

6

Overview: What’s New - Native Driver Model

New vSphere Driver Architecture with Native VMkernel API

Reference PCIe SSD Driver

Development Kits and Tools

7

Behind the Scene

Native Driver Components

Device model

Device lifecycle management

Hot plug in the picture

Hot unplug impact to storage I/O

Quick summary

8

Native Driver Model Overview (Big Picture)

Hardware

VMkernel

Native

Driver

PCI Bridge

PCI Device PCI Device PCI Device

PSA

Support

Storage

Stack

Network

Support

PCI Bus

Support VMKLinux

Device

Manager

Network

Stack PCI

Subsystem

User World

Device

Manager

Existing components

Modified components

New components

Native

Driver

VMKLinux

Driver

9

Native Driver Model Components

Device Manager

• Manages device and driver mapping

• Maintains device tree and driver bindings

• Delivers events to drivers

Logical PCI bus driver

• Claims PCI bridges

• Monitors PCI bus changes

• Register PCI device nodes

Logical function driver

• Registers function objects to the IO stack

Device driver

• Claims and operates devices

Hardware

VMkernel

Native

Driver

PCI Bridge

PCI Device PCI DevicePCI Device

PSA

Support

Storage

Stack

Network

Support

PCI Bus

SupportHBA

Support

Device Layer

in Kernel

Network

StackPCI

Subsystem

User World

Device Layer

in UW

10

Behind the Scene: Device Model

vmkernel

I/O Subsystems Device

Manager

Device

Layer

Legend: Physical device Logical device Relationship

Device and Driver

Objects

Drivers

11

Behind the Scene: Device Lifecycle Management

Unregistered

Registered,

Unclaimed

Register Device

Claimed,

Quiesced

Attach Device

IO-able

Start Device

Scan Device

Quiesce Device

Detach Device

Unregister Device

Unregistered

Registered,

Unclaimed

Claimed,

Quiesced

IO-able

12

PCIe Device Hot Add

vmkernel

I/O Subsystems Device

Manager

Legend: Physical device Logical device Relationship

Device and Driver

Objects

Drivers

1

3

4

5

6

PCI Bridge &

PCI Bus Driver

2

13

PCIe Device Hot Remove

vmkernel

I/O Subsystems Device

Manager

Legend: Physical device Logical device Relationship

Device and Driver

Objects

Drivers

1

3

5

4

PCI Bridge &

PCI Bus Driver

2

“Forget Device”

“Forget Device”

6

“Quiesce/Detach”

7

“Quiesce/Detach”

14

Device Unplug Impact to Storage I/O

Permanent Device Loss (PDL)

• All LUNs of the removed HBA are considered PDL

• Limitations of PDL applies

• Not all of the storage stack supports PDL

• End user needs to ensure all open handles are closed for full cleanup

• Device removal might fail in the middle due to PDL limitations

Driver’s attention required

• Prepare for MMIO read/write failures

• Prepare for I/O disruption

• Return proper codes for all outstanding SCSI commands

• Manage resources according to device life cycle

15

Quick Summary

New VMkernel driver architecture

• Native VMkernel API

• Device tree

• Device lifecycle management

With hot plug/unplug awareness

• Device layer coordinates driver load, attach, and detach

• Parent/bus driver monitors device insertion and removal

• Surprise device removal supported via “Forget Device” notification

16

Case Study: PCIe SSD with Hot Plug

17

Case Study: PCIe SSD with Hot Plug

18

Get Involved

Program

Implementation and Testing

Certification

19

Get Involved: Program

IO Vendor

Program

Developer

Center

20

Get Involved: Development and Testing

Acquire VMKAPI DDK

Review Docs and Sample Code

Develop Drivers Test with CLI tools and Manual Cases

21

Get Involved: Certification

Certification for PCIe hot-plug (In Progress)

• IOVP certification (for testing driver/device functions)

• Hot-plug tests are manual and need on-demand request

• HCL listing does not include PCIe hot plug feature yet

22

Challenges and Future Work

Challenges

• Device tree and device life cycle management

• PCI Config Space re-initialization

• I/O disruption handling

Future work

• Development/validation of wider range of devices

• Device manager enhancements

• I/O stack enhancements

• Certification/HCL listing support for hot swappable devices and drivers

23

Conclusion

Key Takeaways

• New driver architecture with PCIe hot-plug awareness

• Driver development considerations

• Program enrollment and certification

24

Q&A

26

TAP Membership Renewal – Great Benefits

• TAP Access membership includes:

New TAP Access NFR Bundle

• Access to NDA Roadmap sessions at VMworld, PEX and Onsite/Online

• VMware Solution Exchange (VSX) and Partner Locator listings

• VMware Ready logo (ISVs)

• Partner University and other resources in Partner Central

• TAP Elite includes all of the above plus:

• 5X the number of licenses in the NFR Bundle

• Unlimited product technical support

• 5 instances of SDK Support

• Services Software Solutions Bundle

• Annual Fees

• TAP Access - $750

• TAP Elite - $7,500

• Send email to tapalliance@vmware.com

27

TAP Resources

TAP

• TAP support: 1-866-524-4966

• Email: tapalliance@vmware.com

• Partner Central: http://www.vmware.com/partners/partners.html

TAP Team

• Kristen Edwards – Sr. Alliance Program Manager

• Sheela Toor – Marketing Communication Manager

• Michael Thompson – Alliance Web Application Manager

• Audra Bowcutt –

• Ted Dunn –

• Dalene Bishop – Partner Enablement Manager, TAP

VMware Solution Exchange

• Marketplace support –

vsxalliance@vmware.com

• Partner Marketplace @ VMware

booth pod TAP1

THANK YOU

Failsafe at PCIe Level: Enabling PCIe Hot Swap

Wenchao Cui, VMware

Caixue Lin, VMware

TEX5316

#TEX5316