High Performance Communication for Oracle using InfiniBand

39

description

Session id: #36568. High Performance Communication for Oracle using InfiniBand. Ross Schibler CTO Topspin Communications, Inc. Peter Ogilvie Principal Member of Technical Staff Oracle Corporation. Session Topics. Why the Interest in InfiniBand Clusters InfiniBand Technical Primer - PowerPoint PPT Presentation

Transcript of High Performance Communication for Oracle using InfiniBand

Page 1: High Performance Communication for Oracle using InfiniBand
Page 2: High Performance Communication for Oracle using InfiniBand

High Performance Communication for Oracle using InfiniBand

Ross SchiblerCTO

Topspin Communications, Inc

Session id: #36568

Peter OgilviePrincipal Member of Technical Staff

Oracle Corporation

Page 3: High Performance Communication for Oracle using InfiniBand

Session Topics Why the Interest in InfiniBand Clusters InfiniBand Technical Primer Performance Oracle 10g InfiniBand Support Implementation details

Page 4: High Performance Communication for Oracle using InfiniBand

Why the Interest in InfiniBand InfiniBand is key new feature in Oracle 10g

Enhances price/performance and scalability; simplifies systems

InfiniBand fits broad movement towards lower costsHorizontal scalability; converged networks, system virtualization...grid

Initial DB performance & scalability data is superbNetwork tests done; Application level benchmarks now in progress

InfiniBand is widely supported standard - available todayOracle…Dell, HP, IBM, Network Appliance, Sun and ~100 others involved.

Tight alliance btw Oracle and Topspin enables IB for 10gIntegrated & tested; delivers complete Oracle “wish list” for high speed interconnects

Page 5: High Performance Communication for Oracle using InfiniBand

Server Revenue Mix

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

$0-2.9K $3-5.9K $6-9.9K $10-24.9K

$25-49.9K

$50-99.9K

$100-249.9K

$250-499.9K

$500-999.9K

$1M-3M $3M+

Price Band

Sh

are

of

Re

ven

ues

1996

2001

2002

Source: IDC Server Tracker, 12/2002

23%Entry

Mid

High-End

Server Revenue Mix

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

$0-2.9K $3-5.9K $6-9.9K $10-24.9K

$25-49.9K

$50-99.9K

$100-249.9K

$250-499.9K

$500-999.9K

$1M-3M $3M+

Price Band

Sh

are

of

Re

ven

ues

1996

2001

2002

Source: IDC Server Tracker, 12/2002

23%

39%

Entry

Mid

High-End

Server Revenue Mix

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

$0-2.9K $3-5.9K $6-9.9K $10-24.9K

$25-49.9K

$50-99.9K

$100-249.9K

$250-499.9K

$500-999.9K

$1M-3M $3M+

Price Band

Sh

are

of

Re

ven

ues

1996

2001

200223%

39%

43%

Entry

Mid

High-End

Source: IDC Server Tracker, 12/2002

System Transition Presents Opportunity

Major shift to standard systems - blade impact not even factored in yet Customer benefits from scaling horizontally across standard systems

– Lower up-front costs, Granular scalability, High availability

Page 6: High Performance Communication for Oracle using InfiniBand

The Near Future

Server Revenue Mix

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

$0-2.9K $3-5.9K $6-9.9K $10-24.9K

$25-49.9K

$50-99.9K

$100-249.9K

$250-499.9K

$500-999.9K

$1M-3M $3M+

Price Band

Sh

are

of

Re

ven

ues

Scale Out Scale Up

Legacy &

Big Iron Apps

Database Clusters &

Grids

Market Splits around Scale-Up vs. Scale-Out

Database grids provide foundation for scale out

InfiniBand switched computing interconnects are critical enabler

Enterprise Apps

Web Services

Page 7: High Performance Communication for Oracle using InfiniBand

Application Servers

SharedStorage

Oracle RAC

Gigabit Ethernet

Traditional RAC Cluster

Fibre Channel

Page 8: High Performance Communication for Oracle using InfiniBand

Application Servers

SharedStorage

Oracle RAC

OUCH!OUCH!

Scalability within the Database Tier limited by Interconnect Latency, Bandwidth, and Overhead

Gigabit Ethernet

Three Pain Points

OUCH!OUCH!

OUCH!OUCH!

Throughput Between the Application Tier and Database Tier limited by Interconnect Bandwidth, and Overhead

I/O Requirements driven by number of servers instead of application performance requirements

Fibre Channel

Page 9: High Performance Communication for Oracle using InfiniBand

Application Servers

SharedStorage

Oracle RAC

Clustering with Topspin InfiniBand

Page 10: High Performance Communication for Oracle using InfiniBand

Application Servers

SharedStorage

Oracle RAC

Removes all Three Bottlenecks

Central server to storage I/O scalability through InfiniBand switch

Removes I/O bottlenecks to storage and provides smoother scalability

InfiniBand provides 10 Gigabit low latency interconnect for clusterApplication tier can run over

InfiniBand, benefiting from same high throughput and low latency as cluster

Page 11: High Performance Communication for Oracle using InfiniBand

Example Cluster with Converged I/O

Ethernet to InfiniBand gateway for LAN access Four Gigabit Ethernet ports per gateway Create virtual Ethernet pipe to each server

Fibre Channel to InfiniBand gateway for storage access Two 2Gbps Fibre Channel ports per gateway Create 10Gbps virtual storage pipe to each server

InfiniBand switches for cluster interconnect Twelve 10Gbps InfiniBand ports per switch card Up to 72 ports total ports with optional modules Single fat pipe to each server for all network traffic

Industry Standard Storage

Industry Standard Server

Industry Standard Network

Industry Standard Storage

Industry Standard Storage

Industry Standard Storage

Industry Standard Server

Industry Standard Server

Industry Standard Server

Industry Standard Network

Industry Standard Network

Industry Standard Network

Industry Standard Server

Page 12: High Performance Communication for Oracle using InfiniBand

Topspin InfiniBand Cluster Solution

Ethernet or Fibre ChannelGateway modules

Integrated System and Subnet management

Family of switches

Host Channel Adapter With Upper Layer Protocols

Protocols

uDAPL SDP

SRP IPoIB

Platform Support

Linux: Redhat, Redhat AS, SuSE

Solaris: S10

Windows: Win2k & 2003

Processors: Xeon, Itanium, Opteron

Cluster Interconnect with Gateways for I/O Virtualization

Page 13: High Performance Communication for Oracle using InfiniBand

InfiniBand is a new technology used to interconnect servers, storage and networks together within the datacenter

Runs over copper cables (<17m) or fiber optics (<10km)

Scalable interconnect:– 1X = 2.5Gb/s– 4X = 10Gb/s– 10X = 30Gb/s

InfiniBand Primer

Page 14: High Performance Communication for Oracle using InfiniBand

ServerServer ServerServer

InfiniBand Nomenclature

Ethernet Storage Network

Topspin 360/90

HostHostHostHost

HostHostHostHost

HostHostHostHost

HostHostHostHost

HostHostHostHost

HostHostHostHost

HostHostHostHost

HostHostHostHost

HostHostHostHost

HostHostHostHost

HostHostServerServerServerServer

CPU

CPU

Ho

st I

nte

rco

nn

ect

MemCntlr

SystemMemory

IB L

ink

HCA

SM

Switch

IB Link TCA

IB Link TCA

Ethernet linkIB Link

FC link

Page 15: High Performance Communication for Oracle using InfiniBand

InfiniBand Nomenclature

SM

Switch

IB Link TCA

IB Link TCA

Ethernet linkIB Link

FC link

CPU

CPU

Ho

st I

nte

rco

nn

ect

MemCntlr

SystemMemory

IB L

ink

HCA

HCA – Host Channel Adaptor SM - Subnet manager TCA – Target Channel

Adaptor

Page 16: High Performance Communication for Oracle using InfiniBand

Kernel BypassKernel Bypass Model

Hardware

Application

Kernel

User

TCP/IPTransport

Driver

uDAPLSocketsLayer

SDP

async sockets

Page 17: High Performance Communication for Oracle using InfiniBand

NIC

Copy on Receive

CPU

CPU

Ho

st I

nte

rco

nn

ect

MemCntlr

Server (Host)

inte

rcon

nect

System Memory

OS Buffer

App Buffer

Data traverses bus 3 times

Page 18: High Performance Communication for Oracle using InfiniBand

With RDMA and OS Bypass

HCA

CPU

CPU

Ho

st I

nte

rco

nn

ect

MemCntlr

Server (Host)

inte

rcon

nect

System Memory

OS Buffer

App Buffer

Data traverses bus once, saving CPU and memory cycles

Page 19: High Performance Communication for Oracle using InfiniBand

6.4Gb/s6.4Gb/s3.2Gb/s

1.2Gb/s

APIs and Performance

BSD Sockets Async I/O

extension

Application

1GE

RDMAIPoIB

TCP

IP

SDP

10G IB

0.8Gb/s

uDAPL

Page 20: High Performance Communication for Oracle using InfiniBand

Why SDP for OracleNet & uDAPL for RAC?

RAC IPC– Message based– Latency sensitive– Mixture of previous APIs

   use of uDAPL OracleNet

– Streams based– Bandwidth intensive– Previously written to sockets

use of Sockets Direct Protocol API

Page 21: High Performance Communication for Oracle using InfiniBand

InfiniBand Cluster Performance Benefits

Source: Oracle Corporation and Topspin on dual Xeon processor nodes

0

5000

10000

15000

20000

25000

30000

2-node cluster 4-node cluster

InfiniBand

GigE

Network Level Cluster Performance for Oracle RAC

InfiniBand delivers 2-3X higher block transfers/sec as compared to GigE

Block Transfer/sec (16KB)

Page 22: High Performance Communication for Oracle using InfiniBand

InfiniBand Application to Database Performance Benefits

InfiniBand delivers 30-40% lower CPU utilization and 100% higher throughput as compared to Gigabit Ethernet

Source: Oracle Corporation and Topspin

0

50

100

150

200

250

CPU Utilization Throughput

InfiniBand

GigE

Percent

Page 23: High Performance Communication for Oracle using InfiniBand

Broad Scope of InfiniBand Benefits

Sniffer Servermonitoring/analysis

Sniffer Servermonitoring/analysis

Sniffer Servermonitoring/analysis

Sniffer Servermonitoring/analysis

Sniffer Servermonitoring/analysis

Sniffer Servermonitoring/analysis

Sniffer Servermonitoring/analysis

Sniffer Servermonitoring/analysis

Sniffer Servermonitoring/analysis

Oracle RAC

Application Servers

Network

Sniffer Servermonitoring/analysis

Sniffer Servermonitoring/analysis

Sniffer Servermonitoring/analysis

Sniffer Servermonitoring/analysis

Sniffer Servermonitoring/analysis

Sniffer Servermonitoring/analysis

Sniffer Servermonitoring/analysis

Sniffer Servermonitoring/analysis

SharedStorage

Ethernetgateway

FC gateway: host/lun mapping

OracleNet: over SDP over IB

Intra RAC: IPC over uDAPL over IB

DAFS over IB

SAN

NAS

20% improvement in throughput

2x improvement in throughput and 45% less

CPU

3-4x improvement in block updates/sec

30% improvement in DB performance

Page 24: High Performance Communication for Oracle using InfiniBand

Database

uDAPL Optimization Timeline

IB HW/FW

uDAPL

skgxp

CacheFusion

Workload

CM

Sept 2002: uDAPL functional with 6Gb/s throughput

Dec 2002: Oracle interconnect performance released, showing improvements in bandwidth (3x),

latency(10x) and cpu reduction (3x)

Feb 2003: Cache Block Updates show fourfold performance improvement in 4-node RAC

April-August 2003: Gathering OAST and industry standard workload performance metrics. Fine tuning

and optimization at skgxp, uDAPL and IB layers

Jan 2003: added Topspin CM for improved scaling of number of connections and reduced setup times

LM

Page 25: High Performance Communication for Oracle using InfiniBand

RAC Cluster Communication

High speed communication is key– must be faster to fetch a block from a remote cache than to read

the block from disk– Scalability is a function of communication CPU overhead

Two Primary Oracle Consumers– Lock manager / Oracle buffer cache– Inter instance parallel query communication

SKGXP Oracle’s IPC driver interface– Oracle is coded to skgxp– Skgxp is coded to vendor high performance interfaces– IB support delivered as a shared library libskgxp10.so

Page 26: High Performance Communication for Oracle using InfiniBand

Cache Fusion Communication

Shadowprocesses

to client

LMSLock request

cache cache

RDMA

Page 27: High Performance Communication for Oracle using InfiniBand

Parallel Query Communication

PXServers

PXServers

to client msg data

data

data

Page 28: High Performance Communication for Oracle using InfiniBand

Cluster Interconnect Wish List

OS bypass (user mode communication) Protocol offload Efficient asynchronous communication model RDMA with high bandwidth and low latency Huge memory registrations for Oracle buffer caches Support large number of processes in an instance Commodity Hardware Software interfaces based on open standards Cross platform availability

InfiniBand is first interconnect to meet all of these requirements

Page 29: High Performance Communication for Oracle using InfiniBand

Asynchronous Communication

Benefits– Reduces impact of latency– Improves robustness by avoiding communication

dead lock– Increases bandwidth utilization

Drawback- Historically costly, as synchronous operations are

broken into separate submit and reap operations

Page 30: High Performance Communication for Oracle using InfiniBand

Protocol Offload & OS Bypass

Bypass makes submit cheap– Requests are queued directly to hardware from

Oracle Offload

– Completions move from the hardware to Oracle’s memory

– Oracle can overlap commutation and computation without a trap to the OS or context switch

Page 31: High Performance Communication for Oracle using InfiniBand

InfiniBand Benefits by Stress Area

Stress Area BenefitCluster Network Extremely low latency

10 Gig throughput

Compute CPU & kernel offload removes TCP overhead

Frees CPU cycles

Server I/O Single converged 10 Gig network for cluster, storage, LAN

Central I/O scalability

Stress level varies over time with each queryInfiniBand provides substantial benefits in all three areas

Page 32: High Performance Communication for Oracle using InfiniBand

Benefits for Different Workloads

High bandwidth and low latency benefits for Decision Support (DSS)

– Should enable serious DSS workloads on RAC clusters

Low latency benefits for scaling Online Transaction Processing (OLTP)

Our estimate: One IB Link replaces 6-8 Gigabit Ethernet links

Page 33: High Performance Communication for Oracle using InfiniBand

Commodity Hardware

Higher capabilities and lower cost than propriety interconnects

InfiniBand’s large bandwidth capability means that a single link can replace multiple GigE and FC interconnects

Page 34: High Performance Communication for Oracle using InfiniBand

Memory Requirements

The Oracle buffer cache can consume 80% of a host’s physical memory

64 bit addressing and decreasing memory prices mean ever larger buffer caches

Infiniband provides…– Zero copy RDMA between very large buffer

caches– Large shared registrations moves memory

registration out of the performance path

Page 35: High Performance Communication for Oracle using InfiniBand

Two Efforts Coming TogetherRAC/Cache Fusion and Oracle Net

Two Oracle engineering teams working at cluster and application tiers

– 10g incorporates both efforts Oracle Net benefits from many of the same capabilities

as Cache Fusion– OS kernel bypass – CPU offload– New transport protocol (SDP) support– Efficient asynchronous communication model– RDMA with high bandwidth and low latency– Commodity hardware

Working on external and internal deployments

Page 36: High Performance Communication for Oracle using InfiniBand

Open Standard Software APIsuDAPL and Async Sockets/SDP

Each new communication driver is a large investment for Oracle

One stack which works across multiple platforms means improved robustness

Oracle grows closer to the interfaces over time Ready today for immerging technologies Ubiquity and robustness of IP for high speed

communication

Page 37: High Performance Communication for Oracle using InfiniBand

Summary Oracle and major system & storage vendors are supporting

InfiniBand

InfiniBand presents superb opportunity for enhanced horizontal scalability and lower cost

Oracle Net’s InfiniBand Support significantly improves performance for both the app server and the database in Oracle 10g

Infiniband provides the performance to move applications to low cost Linux RAC databases. ????

Page 38: High Performance Communication for Oracle using InfiniBand

AQ&Q U E S T I O N SQ U E S T I O N S

A N S W E R SA N S W E R S

Page 39: High Performance Communication for Oracle using InfiniBand

Next Steps….

See InfiniBand demos first hand on the show floor– Dell, Intel, Netapp, Sun, Topspin (booth #620)– Includes clustering, app tier and storage over

InfiniBand

InfiniBand whitepapers on both Oracle and Topspin websites

– www.topspin.com– www.oracle.com