Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA...
Transcript of Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA...
VoltaireThe Grid Interconnect Company
VoltaireVoltaireThe Grid Interconnect CompanyThe Grid Interconnect Company
Fast I/O for XEN using RDMA Technologies
April 2005
Yaron Haviv, Voltaire, CTO
© 2005 Voltaire, Inc. April 13, 2005 2
The Enterprise Grid Model and VirtualizationThe Enterprise Grid Model and Virtualization
Disks, Arrays ServersSwitches/Routers
and Ports
LUNs,
Logical Volumes VMs VLANs
Network/Cluster
File SystemMiddleware Layer 3-7 services
Hardware
Virtual Hardware
Virtual Environment
VMs need to interact efficiently with the rest of the
Virtual/Grid world to be successful
Agenda: Ways for I/O virtualization with minimal performance hit
Storage Compute Network
© 2005 Voltaire, Inc. April 13, 2005 3
The I/O Virtualization ModelThe I/O Virtualization Model
Virtual I/O enable Virtual Environments,
and complement XEN
Virtual
Servers
SAN
Virtual
StorageVirtual
Network
Virtual
Routers
Virtual
Servers
SAN
Virtual
StorageVirtual
Network
Virtual
Servers
SAN
Virtual
StorageVirtual
Network
Management
Servers & Blades
Remote
Clients and
Networks
InfiniBand
Multi-Service
Switch
GbE
RAID’s
FC,iSCSI
or RDMA
Physical Topology
GbE
Management
Servers & Blades
Remote
Clients and
Networks
InfiniBand
Multi-Service
Switch
GbE
RAID’s
FC,iSCSI
or RDMA
Physical Topology
GbE
Logical Topology
© 2005 Voltaire, Inc. April 13, 2005 4
XEN Over InfiniBand BenefitsXEN Over InfiniBand Benefits
Slower I/O
No isolation
Multiple cards and fabrics
Direct HW access for I/O
One 10Gb/s card for Network,
Storage, and IPC
Scale out using clustering (IPC)
VM1
vOS
VM2
vOS
VM3
vOS
GbE NIC
VM1
vOS
VM2
vOS
VM3
vOS
IB HCAFC HCA
Software based Network
and Storage switching
XEN VM0
XEN
VM0
IPC
StorageNetwork
Direct HW access
Better Performance, Isolation, and TCO with InfiniBandBetter Performance, Isolation, and TCO with InfiniBand
© 2005 Voltaire, Inc. April 13, 2005 5
XEN Networking TodayXEN Networking Today
The problems:
- Context Switches
- Same Receive/Send Queue (I/O Dependencies and starvation)
- Configuration/Security challenges (Multiple VLANs, routing, etc’)
Hypervisor
VM1
vOS
VM2
vOS
VM3
vOS
NIC
VM0
Linux
Bridge
The Solution:
- Dedicate Send/Receive Queue per Virtual NIC
- Bypass Hypervisor and VM0 by using Memory Mapped I/O (separate Mem/IO Pages per VM)
- Packet switching and QoS in hardware
- And, still make sure the VMs are not bound to a specific Hardware
© 2005 Voltaire, Inc. April 13, 2005 6
XEN Networking With Direct HW Access XEN Networking With Direct HW Access
Slow path done in VM0
Allow direct access (DA) from Guest to data path operations (with safe DMA through mem registration)
Can have a generic API (e.g. Datagram Send/Receive), loaded and used by the Ethernet front-end driver
Direct access code need to be exported from VM0 to guests (like PXE/UNDI)
Can be used by different hardware/NIC providers (not just for IB), and allow keeping an HW neutral guest
Hypervisor
VM1
vOS
VM2
vOS
VM3
vOS
IB HCA
VM0
Init
(Slow Path)
Data Path
© 2005 Voltaire, Inc. April 13, 2005 7
SCSI Low Layer
kDAPLNet IF
(Ethernet/IPoIB)
iSCSI-Linux
Open Source
Project
In OpenIB
iSCSI
TCP
iSNS
/SLP
IB HCARNICNIC
Management
TOE iSER
Datamover API
iSCSI and iSER Driver ArchitectureiSCSI and iSER Driver Architecture
Datamover Architecture (DA) and iSER are optimal for XEN
- One driver and management, support for many different cases
- Centrally managed and provisioned through iSCSI infrastructure (iSNS/SLP)
Same architecture apply to NICs, RNICs (Ethernet/RDMA). InfiniBand
Can migrate a virtual machine from IB to Ethernet and keep on working
© 2005 Voltaire, Inc. April 13, 2005 8
Path Resolution in Heterogeneous Environments Path Resolution in Heterogeneous Environments
Automatic discovery of storage
Automatic selection of Transport (iSER or TCP)
Same standard management across the entire solution - Including IP, IB, and FC storage
192.168.3.11
GbE Router
GbE Switch
IB
Switch
GbE to IB
192.168.10.77
192.168.3.44 iSERServer
Interfaces
TCP
iSER or
IPoIB
TCP/GbE
192.168.3.33
Native IB RAID
iSER to FC
Gateway
192.168.3.10
192.168.1.10iSCSI
TCP iSER192.168.3.55
FCP
© 2005 Voltaire, Inc. April 13, 2005 9
XEN Block Storage With Direct AccessXEN Block Storage With Direct Access
Similar design to DA Networking
Need definition for direct access API (e.g. send/receive sector )
Storage front-end driver can detect and use direct access API
In InfiniBand will map to separate QPsand CQs per VM, same model can apply to other HW
Migration handled by iSCSI recovery mechanisms (rediscover and connect)
Hypervisor
VM1
vOS
VM2
vOS
VM3
vOS
IB HCA
VM0
Init
(Slow Path)
Data Path
© 2005 Voltaire, Inc. April 13, 2005 10
Using XEN For Scale Out Using XEN For Scale Out
Low latency and zero copy IPC is critical to clustering
Need new Front-End drivers for IPC
- Socket/TOE abstraction (e.g. a BSD Scok_stream), can be used between VMs
- RDMA abstraction (e.g. DAPL/IT)
Hypervisor/VM0
VM1
vOS
VM2
vOS
VM3
vOS
CPU CPU CPU CPU
OS
CPU CPU CPU CPU
Applications
Hypervisor/VM0
VM1/1
vOS
CPU CPU CPU CPU
Hypervisor/VM0
VM1/2
vOS
CPU CPU CPU CPU
Cluster Middleware
Traditional OS n VM’s per host Scale out with VM’s and Clustering
Scale out
with XEN
© 2005 Voltaire, Inc. April 13, 2005 11
VM1
vOS
VM2
vOS
VM3
vOS
IPC and Direct User Access to HWIPC and Direct User Access to HW
Recommend mapping to an industry standard such as DAPL
- Allow existing RDMA clients such as MPI, Oracle, NFS/RDMA, iSER to work transparantly
- Mapped rather well to IB & Ethernet/RDMA
Open issues around migration, will initially require applicationawareness (rediscover and reconnect on failure)
Open XEN to new real-time, HPC, Media, and transactional markets
Hypervisor
VM0
Init
(Slow Path)
Data Path
IB HCA
© 2005 Voltaire, Inc. April 13, 2005 12
Summary Summary
InfiniBand and RDMA can solve the I/O problem TODAY
- Dedicated I/O per VM
- Direct access from VM to Hardware through channel architecture
- Protected DMA
- Same card can provide virtual network, storage and IPC (and provide the best performance in each category)
Can build a single infrastructure in XEN for:
- Plain Ethernet, InfiniBand and Ethernet/RDMA
- Allow migration between heterogeneous I/O hardware
Welcome people to participate and in an open source environment
© 2005 Voltaire, Inc. April 13, 2005 13
Call for ActionCall for Action
Establish a direct access I/O working group
-Open discussion on architecture and implementation alternatives
- Looking for volunteers to help code missing pieces
Incorporate required changes in OpenIB (in Linux Kernel)
Incorporate work into distros (SuSe, RH, ..)
Work with CPU vendors to optimize memory and interrupt sharing mechanisms