Fca Product Overview Feb222010 As
-
Upload
chris-oneal -
Category
Documents
-
view
500 -
download
2
description
Transcript of Fca Product Overview Feb222010 As
© 2009 Voltaire Inc.
April 10, 2023
Voltaire Fabric Collective Accelerator™ (FCA)Accelerate your Fabric
© 2010 Voltaire Inc. 2
The Challenge: Collective Operations Performance
► Collective operations take large amount of the application
run time and don’t scale well
► System/OS “noise” affects scalability
► Simple offload solutions DON’T address the key problems:
• Network congestion due to “All-to-All” communication
• Computation & messaging performance
• Difficult to manage and orchestrate
Poor application scalability and low cluster efficiencyPoor application scalability and low cluster efficiency
© 2010 Voltaire Inc. 3
Collective Communication Portion of MPI Runtime
0
10
20
30
40
50
60
70
80
90
100
ANSYSFLUENT
SAGE CPMD LSTC LS-DYNA
CD-AdapcoSTAR-CD
Dacapo
Per
cent
age
Collective Operations % of MPI Job Runtime
© 2010 Voltaire Inc. 4
Introducing Voltaire Fabric Collective Accelerator (FCA)
4036SM
PWR PS/Fan
RstCLI
Eth
Info SM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
4036SM
PWR PS/Fan
RstCLI
Eth
Info SM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
4036SM
PWR PS/Fan
RstCLI
Eth
Info SM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
4036SM
PWR PS/Fan
RstCLI
Eth
Info SM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
+
+
+
+
……….
+
+
CPU in switch CPU in switch used to offload used to offload
collective collective operationsoperations
Collective tree & Rank Collective tree & Rank placement optimized to placement optimized to
the topologythe topology
Use of IB Use of IB multicast for multicast for
result result distributiondistribution
Inter-core Inter-core communication communication
optimizedoptimized
© 2010 Voltaire Inc. 5
FCASolution Architecture
First fully integrated solution to offload collectives
Combines intelligence on server, switches, and management
• UFM™ - Automates fabric collective offload/monitoring and integrates with schedulers
• Voltaire “smart” switch based CPUs perform reduction and messaging operation
• Voltaire OMA (Open MPI plug-in) - Addresses host side collective (multi-core)
© 2010 Voltaire Inc. 6
FCAAddressing the Problem End To End
► Increase performance, reduce congestion• Reduce fabric traffic to single message per wire, dramatically reduce
congestion
• FCA offload “shields” collective operation from node “noise”
• Enable non-blocking collective (overlap communication and calculation)
• Linear scalability to many thousands of nodes with predictable hardware performance
► Simple, fully integrated • No change in application – OMA drop-in Open MPI plug-in
• Switches come equipped with FCA offload code out of the box
• UFM automates the process and integrates with scheduler, saving setup burden
• Fully integrated monitoring capabilities
► FCA reduced collective operations runtime by up to 100X• 11K nodes MPI collective operations within 25 usec
© 2010 Voltaire Inc. 7
FCAPreliminary Performance Results
IMB (PALLAS) Allreduce and Reduce for Open MPI and FCA(40 ranks, 5 x Nehalem 5520)
0
10
20
30
40
50
AllReduce Reduce
La
ten
cy
(u
se
c)
Open MPI
FCA
78%
66%
© 2010 Voltaire Inc. 8
FCAWhat is the alternative/competitive solution?
FCA NIC-based offload
Topology aware
Network Congestion Elimination
Fabric switches offload computation
Result distribution based on IB multicast
Integration with Job Schedulers
OS “noise” reduction
Expected MPI Job runtime Improvement 30-40% 1-2%
A Fabric Wide Challenge requires a Fabric Wide SolutionA Fabric Wide Challenge requires a Fabric Wide Solution
© 2010 Voltaire Inc. 9
FCABringing InfiniBand to Capability Clusters
► Scalability of collective operations has been limiting the reach of InfiniBand when it comes to capability computing
► FCA is the first and only solution in the market allowing collective operations to efficiently scale to thousand of ranks
► Voltaire is the only standard-based, high performance fabric solution suitable for both capability & capacity computing
Price/Complexity
Performance
Capacity
Capability
© 2010 Voltaire Inc. 10
Voltaire Fabric Collective AcceleratorSummary
► Fabric computing offload • Combination of SW & HW in a single solution
• Offloading blocking computational tasks
• Algorithms leveraging the topology for computation (trees)
► Extreme MPI performance & scalability• Capability computing on commodity clusters
• Two orders of magnitude, ten-times faster in Collective runtime
• Linear scalability (O18)
► Transparent to the application• Standard Open MPI plug-in
• Plug & play - No need for do any code changes
• Simple SDK for integration with other MPIs
Accelerate your Fabric!Accelerate your Fabric!
© 2009 Voltaire Inc.
April 10, 2023
Thank You