Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA...

13
Voltaire The Grid Interconnect Company Voltaire Voltaire The Grid Interconnect Company The Grid Interconnect Company Fast I/O for XEN using RDMA Technologies April 2005 Yaron Haviv, Voltaire, CTO [email protected]

Transcript of Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA...

Page 1: Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA Technologies April 2005 Yaron Haviv, Voltaire, CTO yaronh@voltaire.com. April 13, 2005

VoltaireThe Grid Interconnect Company

VoltaireVoltaireThe Grid Interconnect CompanyThe Grid Interconnect Company

Fast I/O for XEN using RDMA Technologies

April 2005

Yaron Haviv, Voltaire, CTO

[email protected]

Page 2: Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA Technologies April 2005 Yaron Haviv, Voltaire, CTO yaronh@voltaire.com. April 13, 2005

© 2005 Voltaire, Inc. April 13, 2005 2

The Enterprise Grid Model and VirtualizationThe Enterprise Grid Model and Virtualization

Disks, Arrays ServersSwitches/Routers

and Ports

LUNs,

Logical Volumes VMs VLANs

Network/Cluster

File SystemMiddleware Layer 3-7 services

Hardware

Virtual Hardware

Virtual Environment

VMs need to interact efficiently with the rest of the

Virtual/Grid world to be successful

Agenda: Ways for I/O virtualization with minimal performance hit

Storage Compute Network

Page 3: Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA Technologies April 2005 Yaron Haviv, Voltaire, CTO yaronh@voltaire.com. April 13, 2005

© 2005 Voltaire, Inc. April 13, 2005 3

The I/O Virtualization ModelThe I/O Virtualization Model

Virtual I/O enable Virtual Environments,

and complement XEN

Virtual

Servers

SAN

Virtual

StorageVirtual

Network

Virtual

Routers

Virtual

Servers

SAN

Virtual

StorageVirtual

Network

Virtual

Servers

SAN

Virtual

StorageVirtual

Network

Management

Servers & Blades

Remote

Clients and

Networks

InfiniBand

Multi-Service

Switch

GbE

RAID’s

FC,iSCSI

or RDMA

Physical Topology

GbE

Management

Servers & Blades

Remote

Clients and

Networks

InfiniBand

Multi-Service

Switch

GbE

RAID’s

FC,iSCSI

or RDMA

Physical Topology

GbE

Logical Topology

Page 4: Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA Technologies April 2005 Yaron Haviv, Voltaire, CTO yaronh@voltaire.com. April 13, 2005

© 2005 Voltaire, Inc. April 13, 2005 4

XEN Over InfiniBand BenefitsXEN Over InfiniBand Benefits

Slower I/O

No isolation

Multiple cards and fabrics

Direct HW access for I/O

One 10Gb/s card for Network,

Storage, and IPC

Scale out using clustering (IPC)

VM1

vOS

VM2

vOS

VM3

vOS

GbE NIC

VM1

vOS

VM2

vOS

VM3

vOS

IB HCAFC HCA

Software based Network

and Storage switching

XEN VM0

XEN

VM0

IPC

StorageNetwork

Direct HW access

Better Performance, Isolation, and TCO with InfiniBandBetter Performance, Isolation, and TCO with InfiniBand

Page 5: Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA Technologies April 2005 Yaron Haviv, Voltaire, CTO yaronh@voltaire.com. April 13, 2005

© 2005 Voltaire, Inc. April 13, 2005 5

XEN Networking TodayXEN Networking Today

The problems:

- Context Switches

- Same Receive/Send Queue (I/O Dependencies and starvation)

- Configuration/Security challenges (Multiple VLANs, routing, etc’)

Hypervisor

VM1

vOS

VM2

vOS

VM3

vOS

NIC

VM0

Linux

Bridge

The Solution:

- Dedicate Send/Receive Queue per Virtual NIC

- Bypass Hypervisor and VM0 by using Memory Mapped I/O (separate Mem/IO Pages per VM)

- Packet switching and QoS in hardware

- And, still make sure the VMs are not bound to a specific Hardware

Page 6: Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA Technologies April 2005 Yaron Haviv, Voltaire, CTO yaronh@voltaire.com. April 13, 2005

© 2005 Voltaire, Inc. April 13, 2005 6

XEN Networking With Direct HW Access XEN Networking With Direct HW Access

Slow path done in VM0

Allow direct access (DA) from Guest to data path operations (with safe DMA through mem registration)

Can have a generic API (e.g. Datagram Send/Receive), loaded and used by the Ethernet front-end driver

Direct access code need to be exported from VM0 to guests (like PXE/UNDI)

Can be used by different hardware/NIC providers (not just for IB), and allow keeping an HW neutral guest

Hypervisor

VM1

vOS

VM2

vOS

VM3

vOS

IB HCA

VM0

Init

(Slow Path)

Data Path

Page 7: Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA Technologies April 2005 Yaron Haviv, Voltaire, CTO yaronh@voltaire.com. April 13, 2005

© 2005 Voltaire, Inc. April 13, 2005 7

SCSI Low Layer

kDAPLNet IF

(Ethernet/IPoIB)

iSCSI-Linux

Open Source

Project

In OpenIB

iSCSI

TCP

iSNS

/SLP

IB HCARNICNIC

Management

TOE iSER

Datamover API

iSCSI and iSER Driver ArchitectureiSCSI and iSER Driver Architecture

Datamover Architecture (DA) and iSER are optimal for XEN

- One driver and management, support for many different cases

- Centrally managed and provisioned through iSCSI infrastructure (iSNS/SLP)

Same architecture apply to NICs, RNICs (Ethernet/RDMA). InfiniBand

Can migrate a virtual machine from IB to Ethernet and keep on working

Page 8: Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA Technologies April 2005 Yaron Haviv, Voltaire, CTO yaronh@voltaire.com. April 13, 2005

© 2005 Voltaire, Inc. April 13, 2005 8

Path Resolution in Heterogeneous Environments Path Resolution in Heterogeneous Environments

Automatic discovery of storage

Automatic selection of Transport (iSER or TCP)

Same standard management across the entire solution - Including IP, IB, and FC storage

192.168.3.11

GbE Router

GbE Switch

IB

Switch

GbE to IB

192.168.10.77

192.168.3.44 iSERServer

Interfaces

TCP

iSER or

IPoIB

TCP/GbE

192.168.3.33

Native IB RAID

iSER to FC

Gateway

192.168.3.10

192.168.1.10iSCSI

TCP iSER192.168.3.55

FCP

Page 9: Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA Technologies April 2005 Yaron Haviv, Voltaire, CTO yaronh@voltaire.com. April 13, 2005

© 2005 Voltaire, Inc. April 13, 2005 9

XEN Block Storage With Direct AccessXEN Block Storage With Direct Access

Similar design to DA Networking

Need definition for direct access API (e.g. send/receive sector )

Storage front-end driver can detect and use direct access API

In InfiniBand will map to separate QPsand CQs per VM, same model can apply to other HW

Migration handled by iSCSI recovery mechanisms (rediscover and connect)

Hypervisor

VM1

vOS

VM2

vOS

VM3

vOS

IB HCA

VM0

Init

(Slow Path)

Data Path

Page 10: Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA Technologies April 2005 Yaron Haviv, Voltaire, CTO yaronh@voltaire.com. April 13, 2005

© 2005 Voltaire, Inc. April 13, 2005 10

Using XEN For Scale Out Using XEN For Scale Out

Low latency and zero copy IPC is critical to clustering

Need new Front-End drivers for IPC

- Socket/TOE abstraction (e.g. a BSD Scok_stream), can be used between VMs

- RDMA abstraction (e.g. DAPL/IT)

Hypervisor/VM0

VM1

vOS

VM2

vOS

VM3

vOS

CPU CPU CPU CPU

OS

CPU CPU CPU CPU

Applications

Hypervisor/VM0

VM1/1

vOS

CPU CPU CPU CPU

Hypervisor/VM0

VM1/2

vOS

CPU CPU CPU CPU

Cluster Middleware

Traditional OS n VM’s per host Scale out with VM’s and Clustering

Scale out

with XEN

Page 11: Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA Technologies April 2005 Yaron Haviv, Voltaire, CTO yaronh@voltaire.com. April 13, 2005

© 2005 Voltaire, Inc. April 13, 2005 11

VM1

vOS

VM2

vOS

VM3

vOS

IPC and Direct User Access to HWIPC and Direct User Access to HW

Recommend mapping to an industry standard such as DAPL

- Allow existing RDMA clients such as MPI, Oracle, NFS/RDMA, iSER to work transparantly

- Mapped rather well to IB & Ethernet/RDMA

Open issues around migration, will initially require applicationawareness (rediscover and reconnect on failure)

Open XEN to new real-time, HPC, Media, and transactional markets

Hypervisor

VM0

Init

(Slow Path)

Data Path

IB HCA

Page 12: Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA Technologies April 2005 Yaron Haviv, Voltaire, CTO yaronh@voltaire.com. April 13, 2005

© 2005 Voltaire, Inc. April 13, 2005 12

Summary Summary

InfiniBand and RDMA can solve the I/O problem TODAY

- Dedicated I/O per VM

- Direct access from VM to Hardware through channel architecture

- Protected DMA

- Same card can provide virtual network, storage and IPC (and provide the best performance in each category)

Can build a single infrastructure in XEN for:

- Plain Ethernet, InfiniBand and Ethernet/RDMA

- Allow migration between heterogeneous I/O hardware

Welcome people to participate and in an open source environment

Page 13: Xen & RDMA · 2011-02-28 · Voltaire The Grid Interconnect Company Fast I/O for XEN using RDMA Technologies April 2005 Yaron Haviv, Voltaire, CTO yaronh@voltaire.com. April 13, 2005

© 2005 Voltaire, Inc. April 13, 2005 13

Call for ActionCall for Action

Establish a direct access I/O working group

-Open discussion on architecture and implementation alternatives

- Looking for volunteers to help code missing pieces

Incorporate required changes in OpenIB (in Linux Kernel)

Incorporate work into distros (SuSe, RH, ..)

Work with CPU vendors to optimize memory and interrupt sharing mechanisms