In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... ·...

21
Paving the Road to Exascale June 2017 In-Network Computing

Transcript of In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... ·...

Page 1: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

Paving the Road to Exascale

June 2017

In-Network Computing

Page 2: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 2

Exponential Data Growth – The Need for Intelligent and Faster Interconnect

CPU-Centric (Onload) Data-Centric (Offload)

Must Wait for the Data

Creates Performance BottlenecksAnalyze Data as it Moves!

Faster Data Speeds and In-Network Computing Enable Higher Performance and Scale

Page 3: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 3

Data Centric Architecture to Overcome Latency Bottlenecks

HPC / Machine Learning

Communications Latencies of 30-40us

HPC / Machine Learning

Communications Latencies of 3-4us

CPU-Centric (Onload) Data-Centric (Offload)

Network In-Network Computing

Intelligent Interconnect Paves the Road to Exascale Performance

Page 4: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 4

In-Network Computing to Enable Data-Centric Data Center

GPU

GPU

CPUCPU

CPU

CPU

CPU

GPU

GPU

In-Network Computing Key for Highest Return on Investment

Page 5: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 5

In-Network Computing to Enable Data-Centric Data Centers

GPU

GPU

CPUCPU

CPU

CPU

CPU

GPU

GPU

In-Network Computing Key for Highest Return on Investment

CORE-Direct

RDMA

Tag-Matching

SHARP

SHIELD

Programmable

(FPGA)

Programmable (ARM)

Security

GPUDirect

SecurityNVMe Storage

NVMe over Fabrics

Page 6: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 6

Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)

Reliable Scalable General Purpose Primitive

• In-network Tree based aggregation mechanism

• Large number of groups

• Multiple simultaneous outstanding operations

Applicable to Multiple Use-cases

• HPC Applications using MPI / SHMEM

• Distributed Machine Learning applications

Scalable High Performance Collective Offload

• Barrier, Reduce, All-Reduce, Broadcast and more

• Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND

• Integer and Floating-Point, 16/32/64/128 bits

SHArP Tree

SHARP Tree Aggregation Node

(Process running on HCA)

SHARP Tree Endnode

(Process running on HCA)

SHARP Tree Root

Page 7: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 7

Allreduce Performance

Allreduce Latency (8 Bytes, 128 Bytes)

Late

ncy (

usec)

Number of Nodes (1PPN)

Number of Nodes (1PPN)

Late

ncy (

usec)

Allreduce Latency (1K Bytes, 2K Bytes)

Page 8: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 8

OpenFOAM Application Performance

OpenFOAM is a popular computational fluid dynamics application

Software In-Network Computing

Page 9: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 9

Tag Matching Logistics

Page 10: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 10

Tag Matching – Common Implementation Protocols

Page 11: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 11

Tag Matching – Hardware Implementation Overview

Page 12: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 12

MPI Tag-Matching Offload Advantages

Lo

wer

is b

ett

er31%

Lo

wer

is b

ett

er97%

31% lower latency and 97% lower CPU utilization for MPI operations

Performance comparisons based on ConnectX-5

Page 13: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 13

In-Network Computing to Enable Data-Centric Data Center

GPU

GPU

CPUCPU

CPU

CPU

CPU

GPU

GPU

In-Network Computing Key for Highest Return on Investment

In-Network Computing

In-Network Memory

Self-Healing

Page 14: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 14

The SHIELD Self Healing Interconnect Technology

Software-based solutions for network failures create long delays: 5-30 seconds for 1K to 10K node clusters

During software-based recovery time, data can be lost, applications can fail

Adaptive Routing creates further issues (failing links may act as “black holes”)

Mellanox SHIELD technology is an innovative hardware-based solution

SHIELD technology enables the generation of Self-Healing Interconnect

The ability to overcome network failures by the network intelligent devices

Accelerates network recovery time by 5000X

Highly scalable and available for EDR and HDR solutions and beyond

Self-Healing Network Enables Unbreakable Data Centers

Page 15: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 15

Consider a Flow From A to B

Data

Server A Server B

Page 16: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 16

The Simple Case: Local Fix

Server A Server B

Data

Page 17: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 17

The Remote Case: Using FRN’s (Fault Recovery Notifications)

Server A Server B

Data

FRN

Data

Page 18: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 18

In-Network Computing Enables Deep Learning Frameworks

CUDA

Mellanox Interconnect SolutionsMellanox Accelerations for Machine Learning and Big Data

SHARP

Middleware (MPI, gRPC) - Optional

GPUDirect RDMA NVMe over FabricsrCUDA

Page 19: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 19

2X Acceleration for Baidu

Machine Learning Software from Baidu

• Usage: word prediction, translation, image processing

RDMA (GPUDirect) speeds training

• Lowers latency, increases throughput

• More cores for training

• Even better results with optimized RDMA

~2X Acceleration for Paddle Training with RDMA

Page 20: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

© 2017 Mellanox Technologies 20

The Ever Growing Demand for Higher Performance

2000 202020102005

“Roadrunner”

1st

2015

Terascale Petascale Exascale

Single-Core to Many-CoreSMP to Clusters

Performance Development

Co-Design

HW SW

APP

Hardware

Software

Application

The Interconnect is the Enabling Technology

Page 21: In-Network Computing - Ohio State Universitynowlab.cse.ohio-state.edu/static/media/workshops/... · 2017. 7. 18. · Software-based solutions for network failures create long delays:

Thank You