Voltaire ufm en_nov10

32
© 2010 Voltaire Inc. November 19, 2010 Unified Fabric Manager Overview Ghislain de Jacquelot

Transcript of Voltaire ufm en_nov10

Page 1: Voltaire ufm en_nov10

© 2010 Voltaire Inc.

November 19, 2010

Unified Fabric Manager Overview

Ghislain de Jacquelot

Page 2: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 2

Voltaire Software Portfolio

Robust RDMA Drivers

Fabric provisioning and

performance monitoring

Robust Drivers

MPI Acceleration

Multicast Acceleration

Storage Access Acceleration

Collective communication

offload

Multicast and TCP transport utilizing

Kernel bypass technology

RDMA based storage iSCSI target

Multicast and TCP

transport utilizing

Kernel bypass

technology

Page 3: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 3

Unified Fabric Manager

“So far, I haven't seen any other solutions claiming to be a "fabric manager" offer the sophisticated insight, resource management, performance trending, and core fabric function extension that UFM can … it fully illustrates what a well architected fabric should be capable of.”

Jeff Boles, Taneja Group, June 2009

Page 4: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 4

Infiniband Traditional Management

Page 5: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 5

An Infiniband Fabric is not a black box (1/2)

► Requires Hardware management

• Detect failures, communication problems

Inside the Infiniband Fabric

- Port counters

- Port status (QDR,DDR,SDR – 4X,2X,1X)

- Firmware upgrades (Switch and HCA ASICs)

Outside the Infiniband Fabric

- Chassis

- Power supplies

- Fans

- Temperature

- Chassis software updates (Switch management)

Page 6: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 6

An Infiniband Fabric is not a black box (2/2)

►What about performance ?

►Some embarrassing questions…

• Blocking vs non-blocking fabrics ?

• Influence of routing algorithms ?

• Congestion ?

• Mixing different protocols on the same fabric ?

• Running multiple jobs on the same fabric ?

• Performance monitoring Tools ?

Page 7: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 7

UFM Central Management Platform

► In-depth visibility into fabric health and traffic

• Central Dashboard, Unique Congestion Map

• Advanced monitoring engine, threshold based alerts

► Optimize application performance

• Quality of Service

• Traffic Aware Routing Algorithm

► Efficient operations of thousands of fabric components

• Automated configuration of hosts and switches, group tasks

• Seamless change management

Unified

Fabric

Manager

Page 8: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 8

Introducing UFM

UFM Server

CLIGUI

(Java)

Web

Services

IB-SM

(OpenSM)

Perf Mng

Providers

Device Mng

Providers

SQL

DB

HA

Daemon

Access

Control

Central administration

of multiple switches

(or hosts)Hierarchal performance

monitoring,

variety of sources

Leverage open

source SM

engine

Transparent

fail-over

Fast retrieval,

historical data

Manage complex

relations and

workflows

Voltaire

Plug-ins

User and

application

interfaces

Page 9: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 10

Advanced Monitoring and Analysis

► Monitor & analyze fabric performance

• Bandwidth utilization

• Unique congestion monitoring

• Dashboard for aggregated fabric view

► Real-time fabric-wide health monitoring

• Monitor events and errors through-out the fabric

• Threshold based alarms

• Granular monitoring of host and switch parameters

► Innovative congestion mapping

• One view for fabric-wide congestion and traffic patterns

• Enables root cause analysis for routing, job placement or resource allocation inefficiencies

► All is managed at the application/aggregation level

• Event effects are clearly visible

• Pro-active measures can be taken

Page 10: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 11

Central Dashboard

Resource Utilization

& StatusCongestion Map

Top 10 alerted nodesEvent Pane

Top 10’s

B/W, Congestion

B/W Consumers

Page 11: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 12

Advanced Monitoring Engine

Multiple sessions

On demand

Sessions per Logical

Groups – no need to

know physical nodes

Aggregation per

Multiple devices

Various graphs (linear,

bar, historgram, pie…)

Correlate switch and

host information

Formulas (AVG, Max,

Min, Sum)

Page 12: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 13

Performance Optimization Cycle with UFM

Characterize

traffic pattern and priorities

Unique logical fabric model

QoS to prioritize critical apps.

Optimize routing with Voltaire’s

Traffic Optimized Routing (TOR)

Show traffic and congestion

information

Unique Congestion Map

Feedback and Analysis

OptionalOrchestrators

& Schedulers

Application Requirements

UFM OptimizationUFM Monitoring

Page 13: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 14

Advanced Performance Optimization Mechanisms

► Fabric virtualization and Quality of Service (QoS)

• Run multiple clusters or multiple jobs on the same infrastructure

• Assure critical applications get priority through QoS policy

• Provide the required isolation for different departments or jobs

► Traffic Aware Routing Algorithm (TARA)

• Voltaire’s major shift from static to traffic aware routing

• Routing enhancements are built on top of OpenSM in a modular plug-in architecture

• Takes into consideration traffic patterns and loads

• Traffic model can be derived automatically from fabric model or via API with 3rd party schedulers

Applicable to both DDR and QDR Environments

Page 14: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 15

Congestion Example

► Degradation due to node oversubscription

• Destination port in saturation (multiple sources)

• Congestion spread across the fabric

• ALL other flows drop to 20% of capacity

• Take time to recover

• Common with storage traffic

drop

recovery

Page 15: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 16

Quality of Service Optimization

UFM Enables QoS Optimization

Page 16: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 17

Test Environment

► 2 nodes running a latency critical job

► 12 nodes running a bandwidth consuming job

► Goal: achieve best performance with Latency critical tasks

Page 17: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 18

W/O Partitioning Latency degradation of ~215%

Latency job running alone

(Latency = ~2.1 us)Bandwidth job added on

same partition

(Latency = ~4.5 us)

Page 18: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 19

UFM Logical Model Creates Partition and Sets QoS

► 2 Logical Groups

• Latency job

• B/W oriented job

► QoS settings

► UFM creates virtual NICs, partitions and assigns Service Levels on the fabric

Page 19: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 20

With UFM QoSCross Application Interference fixed

Single job in cluster

(Latency = 2.1us)

2 jobs, UFM optimization

(Latency = 2.2us)

2nd job added

(Latency = 4.5us)

100% Better Performance Through QoS Implementation

Page 20: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 21

Optimize performance #2: routing

► Existing routing algorithms

• Are not aware of application communication flow

• They distribute paths evenly across the fabric links

► In real life, fabrics have non uniform usage

• Some endpoints “talk” a lot, some don’t “talk” at all

• Many-to-many (cluster) and any-to-many (storage) topologies

► Result

• Unbalanced fabric

• Congestion is created leading to slower performance and high latency

Congestion = Latency

Page 21: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 22

TARA Optimization

► TARA provides the following benefits:

• Reduces competition between fabric resources, thus decreasing congestion

• Increases available bandwidth, resulting in improved fabric utilization

• Delivers lower latency and shorter application runtime

► How ?

• Uses knowledge of cluster usage: logical servers, networks.

• Balances routes depending on usage

• Not based on real-time analysis of bandwidth / congestion

Page 22: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 23

Routing ?

► InfiniBand packets are ‘destination routed’ based on the Destination Logical ID (DLID) field in the header

► In IB: DLID=route (not only remote address)

► DLIDs are 16 bits

• 48K values are used for unicast

• 16K values are used for multicast

► At each switch ASIC, the incoming unicast DLID is used as an index into a Linear ForwardingTable (LFT) that returns the outgoing switchport number

• E.g. the InfiniScale III ASIC supports all 48K possible LFT entries

Out Port #

DLID

01234567891011

Page 23: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 24

The real wording should be « rearrangeably non-blocking »

36p switch

Nodes 1-18

36p switch

Nodes 19-36

36p switch

Nodes 37-54

36p switch

Nodes 55-72

36p switch 36p switch

Each link represents 9 cables

18 uplinks

54 nodes

At boot time, 3 routes are assigned to each uplink, lets assume:

19-37-55 on port #1

20-38-56 on port #2, etc…

What happens if you have a job running on nodes 1-2-3-19-37-55 ?

Unbalanced communication, congestion…

Page 24: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 25

TARA Optimization

► TARA provides the following benefits:

• Reduces competition between fabric resources, thus decreasing congestion

• Increases available bandwidth, resulting in improved fabric utilization

• Delivers lower latency and shorter application runtime

► How ?

• Uses knowledge of cluster usage: logical servers, networks.

• Balances routes depending on usage

• Not based on real-time analysis of bandwidth / congestion

Page 25: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 26

With TARA

36p switch

Nodes 1-18

36p switch

Nodes 19-36

36p switch

Nodes 37-54

36p switch

Nodes 55-72

36p switch 36p switch

Each link represents 9 cables

18 uplinks

3 nodes

job running on nodes 1-2-3-19-37-55

At job launch time, routes to nodes used by the job are balanced over

all uplinks:

19 on port #1

37 on port #2

55 on port #3

Others are unchanged

Page 26: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 30

0

200

400

600

800

1000

1200

1400

1600

1800

2000

1.18

1.28

2.20

2.30

3.22

3.32

4.24

4.34

5.26

6.18

6.28

7.20

7.30

8.22

8.32

9.24

9.34

10.2

6

11.1

8

11.2

8

12.2

0

12.3

0

13.2

2

13.3

2

14.2

4

14.3

4

15.2

6

16.1

8

16.2

8

17.2

0

17.3

0

switch.port

po

rt w

eig

ht

0

200

400

600

800

1000

1200

1400

1600

1800

2000

1.18

1.28

2.20

2.30

3.22

3.32

4.24

4.34

5.26

6.18

6.28

7.20

7.30

8.22

8.32

9.24

9.34

10.2

6

11.1

8

11.2

8

12.2

0

12.3

0

13.2

2

13.3

2

14.2

4

14.3

4

15.2

6

16.1

8

16.2

8

17.2

0

17.3

0

switch.port

po

rt w

eig

ht

Internal ports on the line cards

traff

ic b

an

dw

idth

Traffic Optimized RoutingOpenSM

Job 147 nodes

Job 646 nodes

Job 241 nodes

Job 563 nodes

Job 371 nodes

Job 425 nodes

storage

Nodes (24)

traffic to/from storage

Average of 200MB/s per node

Internal traffic inside each job, 1000 MB/s from each node

Example: TARA with 324 nodes cluster

300 servers

24 storage nodes

Logical

Topology

Physical Topology

Page 27: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 31

Scale-out and Maintain Control on Fabric

► Dozens of switches and 1000s of nodes become a massive operational burden

► UFM automates I/O and switch configuration enabling isolation and QoS

► Central Device Management for switches and hosts

► High-availability and seamless failover of SM and UFM

► Advanced API for seamless integration in existing environments

Automatic, seamless operations save hours of configuration and set-up work

Page 28: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 32

Efficient Troubleshooting

► Dozens of traffic and health events

• Easy central drill-down to counters, alerts and events to the port level

► Configurable thresholds and criticality levels

► GUI and log level alarms

► Alerts correlated to the application level

► Alerts correlated to the DC rack level

Page 29: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 33

Open system

► Extensible architecture based on Web-services

► Open API for users or 3rd party extensions

► Expose entire fabric and datacenter object model

► Allow simple reporting, provisioning, monitoring, and task automation

► Tools already benefiting from UFM API

Scheduler integration (e.g. Moab)

UFM Support tool kit

Various command line tools/extensions to UFM

Web fabric portal

* Provided in UFM Advanced packages

Page 30: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 34

UFM Adaptive Suite- Separate UFM offering integrated with Platform LSF

Intelligent &

automatic resource

allocation

Optimize fabric

performance

Maintain

connectivity upon

changes

Central monitoring

This is the first integrated solution that correlates network fabric

management and workload management for dynamic data centers

Platform LSFService Policy

UFMFabric Provisioning

Control & Optimization

Page 31: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 35

Integration with Platform LSF - how does it work ?

Automation and Optimization

Page 32: Voltaire ufm en_nov10

© 2010 Voltaire Inc. 36

UFM Benefits

Simple and Automated

Lowers administration tasks

time from days to minutes

Increased Performance

Reduce congestion, lower latency

Quicker application runtime

Little Fabric Visibility

Unnoticed performance degradation

Difficult to assess impact

Low Performing Unutilized Fabrics

Arbitrary routing algorithms, QoS seldom implemented

Congested fabrics, latency affected

Complex and Manual Processes

Needs admin skills

Many options left unused at all

Ineffective TroubleshootingLong troubleshooting time

Performance issues take days to analyze

Quick Issue ResolutionDashboard, Alarms, Congestion Map

Reduces downtime, high fabric utilization

In-Depth Visibility and Control

Clear health and performance visualization

Business oriented impact and root analysis

Fabrics w/o UFM UFM Customers