High Node Count - Scalability Challenges for...

Post on 01-Jan-2020

6 views 0 download

Transcript of High Node Count - Scalability Challenges for...

High Node Count -Scalability Challenges for Interconnection Networks

Professor Olav Lysne

Simula Research Laboratory

Overview

Congestion control

Fault Tolerance

Scalable Modular Routing

State Of The Art:State of TechnologyState of KnowledgeState of Problem

CONGESTION CONTROL

Congestion tree

HOL blocked traffic (Victim)

FECN

BECN

The InfiniBand CC mechanism relies on a closed loop feedback control systems to remove the congestion tree.

Shared network resources could lead to network congestion and head-of-line (HOL) blocking.

Switch- Threshold- Marking Rate- Packet Size

Switch- Threshold- Marking Rate- Packet Size

Host- CCT- CCT Index Increase- CCT Index Min- CCT Index Limit- CCT Index Timer

Host- CCT- CCT Index Increase- CCT Index Min- CCT Index Limit- CCT Index Timer

Experiments show that the HOL blocking leads to performance degradation when CC is not activated.

The InfiniBand CC mechanism is able to remove both the HOL blocking and the parking lot problem.

Without CCWith CC

Parameter Values:

Threshold 15Marking Rate 1Packet Size 8

CCTI Increase 1CCTI Limit 127CCTI Min 0CCTI Timer 150

Parameter Values:

Threshold 15Marking Rate 1Packet Size 8

CCTI Increase 1CCTI Limit 127CCTI Min 0CCTI Timer 150

The average throughput of the victim flow as a function of the Marking_Rate (sw) and the CCTI_Timer (host).

The average combined throughput of the contributors as a function of the Marking_Rate and the CCTI_Timer.

Contributors may experience unfairness if an unfortunate CCTI_Timer value is chosen

Contributors experience unfairnessamong each other for an extendedperiode of time each time a newcontributer is added whenan unfortunate timer is chosen.

Max value

Min value

.

.∆

∆ = (max value) – (min value)

TVV = Var(∆1, ∆2, ..., ∆n)

The “treatment variation variable”:

The “treatment variation variable” rules out a large part of the parameter space.

Parameter Values:

Threshold 15Marking Rate 1Packet Size 8

CCTI Increase 1CCTI Limit 127CCTI Min 0CCTI Timer 150

Parameter Values:

Threshold 15Marking Rate 1Packet Size 8

CCTI Increase 1CCTI Limit 127CCTI Min 0CCTI Timer 150

InfiniBand Congestion Control in M9(SUN™ DATACENTER INFINIBAND SWITCH 648)

• 20% of the nodes send to everyone• 80% of the nodes send to 8 hotspots

Gbps

Gbps

• IBTA Specification 1.2 compliant• 648 QDR/DDR/SDR 4x InfiniBand ports• Three-stage internal full Clos network (non-blocking)

!HS,

!CC

HS, !

CC

HS, C

C

HS, C

C☺

HS, C

C –

QP ☺

!HS,

CC

!HS,

!CC

HS, !

CC

HS, C

C

HS, C

C☺

!HS,

CC

HS, C

C –

QP ☺

Further simulation studies:- Different traffic patterns- Other topologies (M24: SUN DATA CENTER SWITCH 3456)

Congestion Control - State of the art

•State of technology InfiniBand Congestion Control Fecn/BecnDatacenter Ethernet - TBDMuch more to be expected

•State of KnowledgeRegional Explicit Congestion NotificationImprovements on Fecn/BecnParametrizationsDynamics…?Impact on applications…?Much more to do

Fault Tolerance-Living with faults

StaticReconfiguration-basedEnd-to-End ReroutingLocal Rerouting

What is network deadlock?

• Deadlock is a cycle of packets all waiting for the next packet in the cycle to proceed before it can proceed itselg

• Routing functions may be deadlock free – topologies may not– for almost all topologies there

exist reasonable but deadlocking routing functions, as well as reasonable and deadlock free routing functions.

Static Fault Tolerance

Checkpoint – Reconfigure – Rollback – Restart

Requires topology agnostic routing algorithms

LASH, TOR, LASH/TOR, L-turn, Segment-based, Up*/Down

Dynamic Reconfiguration

A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

•Dependencies •of Rold

Dynamic Reconfiguration

•Dependencies •of Rold

•TOKEN

A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

•TOKEN •TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

•TOKEN •TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

•TOKEN •TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

•TOKEN •TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

•TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

•TOKEN A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Dynamic Reconfiguration

•Dependencies •of Rold

•Depend. •of Rnew

•STOP

•TOKEN

A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Dynamic Reconfiguration

•Depend. •of Rnew

A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Dynamic Reconfiguration

•Depend. •of Rnew

A deadlock will contain old packets waiting behind new packets as well as new packets waiting behind old packets

Key idea: make sure that new packets never wait behind old packets

Views – Fully Connected Subnetworks for endpoint fault-tolerance

The fat-tree is divided into a set of sub-networks.Each of these constitute a view.

Views – Fully Connected Subnetworks for endpoint fault-tolerance

Close-up of a subtree with 3 views

One link is present in one, and only one, view

Any path through the network is contained entirely within one view

Only bottom-tier switches (and the endnode-connections) will contain traffic for serveral views.

FROOTS – Dynamic Fault Tolerance

Configuration 1 Configuration 2

1 2 3

4 5

6 7 8

1 2 3

4 5

6 7 8

Full VL for the non affected traffic, and two VLs for the traffic affected by faults.

1 2 3

4 5

876

Handling faults

1 2 3

4 5

876

Fault tolerance – State Of The Art

•State of technology Topology agnostic routing algorithms (OFED)Static Reconfiguration with LASH (OFED)Endpoint Dynamic Reconfiguration (APM in IBA)

•State of KnowledgeDynamic ReconfigurationLocal Rerouting

New “Compatible” Routing function

Modularity of routing

What is the problem?

Dependencies aggregate

1c

2c

The aggregated dependencies in a switch fabric must either be identified and removed, or taken into consideration in how the fabric is used

So...

A configuration of ”Network of networks” is free from deadlocks if its channel dependency graph extended with the aggregated dependencies in the switches is free from deadlocks.

Well…what about local fault tolerance…?

Modularity of routing

•State of technology Not present

•State of KnowledgeWide open…but there is an approach

There is a way to do it better – find it!T. A. Edison

Simplicity is the ultimate sophistication.L.DaVinci