Fca Product Overview Feb222010 As

11
© 2009 Voltaire Inc. August 27, 2022 Voltaire Fabric Collective Accelerator™ (FCA) Accelerate your Fabric

description

 

Transcript of Fca Product Overview Feb222010 As

Page 1: Fca Product Overview Feb222010 As

© 2009 Voltaire Inc.

April 10, 2023

Voltaire Fabric Collective Accelerator™ (FCA)Accelerate your Fabric

Page 2: Fca Product Overview Feb222010 As

© 2010 Voltaire Inc. 2

The Challenge: Collective Operations Performance

► Collective operations take large amount of the application

run time and don’t scale well

► System/OS “noise” affects scalability

► Simple offload solutions DON’T address the key problems:

• Network congestion due to “All-to-All” communication

• Computation & messaging performance

• Difficult to manage and orchestrate

Poor application scalability and low cluster efficiencyPoor application scalability and low cluster efficiency

Page 3: Fca Product Overview Feb222010 As

© 2010 Voltaire Inc. 3

Collective Communication Portion of MPI Runtime

0

10

20

30

40

50

60

70

80

90

100

ANSYSFLUENT

SAGE CPMD LSTC LS-DYNA

CD-AdapcoSTAR-CD

Dacapo

Per

cent

age

Collective Operations % of MPI Job Runtime

Page 4: Fca Product Overview Feb222010 As

© 2010 Voltaire Inc. 4

Introducing Voltaire Fabric Collective Accelerator (FCA)

4036SM

PWR PS/Fan

RstCLI

Eth

Info SM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

4036SM

PWR PS/Fan

RstCLI

Eth

Info SM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

4036SM

PWR PS/Fan

RstCLI

Eth

Info SM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

4036SM

PWR PS/Fan

RstCLI

Eth

Info SM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

+

+

+

+

……….

+

+

CPU in switch CPU in switch used to offload used to offload

collective collective operationsoperations

Collective tree & Rank Collective tree & Rank placement optimized to placement optimized to

the topologythe topology

Use of IB Use of IB multicast for multicast for

result result distributiondistribution

Inter-core Inter-core communication communication

optimizedoptimized

Page 5: Fca Product Overview Feb222010 As

© 2010 Voltaire Inc. 5

FCASolution Architecture

First fully integrated solution to offload collectives

Combines intelligence on server, switches, and management

• UFM™ - Automates fabric collective offload/monitoring and integrates with schedulers

• Voltaire “smart” switch based CPUs perform reduction and messaging operation

• Voltaire OMA (Open MPI plug-in) - Addresses host side collective (multi-core)

Page 6: Fca Product Overview Feb222010 As

© 2010 Voltaire Inc. 6

FCAAddressing the Problem End To End

► Increase performance, reduce congestion• Reduce fabric traffic to single message per wire, dramatically reduce

congestion

• FCA offload “shields” collective operation from node “noise”

• Enable non-blocking collective (overlap communication and calculation)

• Linear scalability to many thousands of nodes with predictable hardware performance

► Simple, fully integrated • No change in application – OMA drop-in Open MPI plug-in

• Switches come equipped with FCA offload code out of the box

• UFM automates the process and integrates with scheduler, saving setup burden

• Fully integrated monitoring capabilities

► FCA reduced collective operations runtime by up to 100X• 11K nodes MPI collective operations within 25 usec

Page 7: Fca Product Overview Feb222010 As

© 2010 Voltaire Inc. 7

FCAPreliminary Performance Results

IMB (PALLAS) Allreduce and Reduce for Open MPI and FCA(40 ranks, 5 x Nehalem 5520)

0

10

20

30

40

50

AllReduce Reduce

La

ten

cy

(u

se

c)

Open MPI

FCA

78%

66%

Page 8: Fca Product Overview Feb222010 As

© 2010 Voltaire Inc. 8

FCAWhat is the alternative/competitive solution?

FCA NIC-based offload

Topology aware

Network Congestion Elimination

Fabric switches offload computation

Result distribution based on IB multicast

Integration with Job Schedulers

OS “noise” reduction

Expected MPI Job runtime Improvement 30-40% 1-2%

A Fabric Wide Challenge requires a Fabric Wide SolutionA Fabric Wide Challenge requires a Fabric Wide Solution

Page 9: Fca Product Overview Feb222010 As

© 2010 Voltaire Inc. 9

FCABringing InfiniBand to Capability Clusters

► Scalability of collective operations has been limiting the reach of InfiniBand when it comes to capability computing

► FCA is the first and only solution in the market allowing collective operations to efficiently scale to thousand of ranks

► Voltaire is the only standard-based, high performance fabric solution suitable for both capability & capacity computing

Price/Complexity

Performance

Capacity

Capability

Page 10: Fca Product Overview Feb222010 As

© 2010 Voltaire Inc. 10

Voltaire Fabric Collective AcceleratorSummary

► Fabric computing offload • Combination of SW & HW in a single solution

• Offloading blocking computational tasks

• Algorithms leveraging the topology for computation (trees)

► Extreme MPI performance & scalability• Capability computing on commodity clusters

• Two orders of magnitude, ten-times faster in Collective runtime

• Linear scalability (O18)

► Transparent to the application• Standard Open MPI plug-in

• Plug & play - No need for do any code changes

• Simple SDK for integration with other MPIs

Accelerate your Fabric!Accelerate your Fabric!

Page 11: Fca Product Overview Feb222010 As

© 2009 Voltaire Inc.

April 10, 2023

Thank You