Micro Load Balancing for Low-latency Data Center...

45
DRILL Micro Load Balancing for Low-latency Data Center Networks Soudeh Ghorbani Zibin Yang Brighten Godfrey Yashar Ganjali Amin Firoozshahian Micro Load Balancing for Low-latency Data Center Networks

Transcript of Micro Load Balancing for Low-latency Data Center...

Page 1: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

DRILLMicro Load Balancing for

Low-latency Data Center Networks

Soudeh GhorbaniZibin YangBrighten GodfreyYashar GanjaliAmin Firoozshahian

Micro Load Balancing forLow-latency Data Center Networks

Page 2: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

Data center topologies provide high capacity

Page 3: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

BUT WE ARE STILL NOT USING THE CAPACITY EFFICIENTLY!

Networks experience high congestion drops as utilizationapproaches 25%[1].

Further improving fabric congestion response remains an ongoing effort[1].

A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network,Google, SIGCOMM 2015.

[1] Jupiter Rising :

Page 4: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

THE GAP?

Facebook data center fabric : https://code.facebook.com/posts/360346274145943

High bandwidth provided via massive Multipathing.

Page 5: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

Facebook data center fabric : https://code.facebook.com/posts/360346274145943

High bandwidth provided via massive Multipathing.

Congestion happens even when there is spare capacity to mitigate it elsewhere[2].

[2] Network Traffic Characteristics of Data Centers in the Wild, T. Benson et al., IMC 2010.

THE GAP?

Page 6: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

OPTIMAL LOAD BALANCER?

Page 7: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

A STARTING POINT :

“EQUAL SPLIT FLUID” (ESF)

Equally split all incoming flow to other leaves along all shortest outgoing paths.

Page 8: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

A STARTING POINT :

“EQUAL SPLIT FLUID” (ESF)

Each of n spines receives1/n of all inter­leaf traffic.

Page 9: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

A STARTING POINT :

“EQUAL SPLIT FLUID” (ESF)

Therefore, any two paths between the same source and destination experience the same utilization (and 

mix of traffic) at all corresponding hops.

ESF is optimal for all traffic demands.

Page 10: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

APPROXIMATING ESF

Packet

Flowcell /Flowlet

Flow

O(µs)

ECMP

PrestoCONGA

Plank

Hedera

Control loop latencyLoad aware

O(10ms) O(100ms) O(1s)O(1ms)

Load Oblivious

ElephantFlow

Granularity

Page 11: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

APPROXIMATING ESF

Packet

Flowcell /Flowlet

Flow

O(µs)

ECMP

PrestoCONGA

Plank

Hedera

Control loop latencyLoad aware

O(10ms) O(100ms) O(1s)O(1ms)

Load Oblivious

ElephantFlow

Granularity

Mic

ro b

urst

s

Page 12: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

Packet

Flowcell /Flowlet

Flow

O(µs)

ECMP

PrestoCONGA

Plank

Hedera

Control loop latencyLoad aware

O(10ms) O(100ms) O(1s)O(1ms)

Load Oblivious

Granularity

Mic

ro b

urst

s

MICRO LOAD BALANCINGElephant

Flow

• Per-packet decisions• Microsecond timescales• Microscopic, i.e., local to each

switch

Page 13: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

SIMPLIFIED MICRO LOAD BALANCING

(1) Discover topology.(2) Compute multiple shortest paths.(3) Install into FIB.

Switch control plane :

Switch data plane:(1) Look up the candidate ports.(2) Check the queue occupancy of  

the candidate ports.(3) Enqueue the packet in the least 

loaded candidate port.

~ECMP {

BA

dst port

1.2.3.0/24

… …

1, 2

Forwarding engine

Dst: 1.2.3.4

Page 14: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

Efficient micro load balancing implementation inside a switch

Reordering caused by per-packet decisions

Poor decisions in asymmetric topologies

CHALLENGES

Page 15: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

MICRO LOAD BALANCING INSIDE A SWITCH

• is hard, especially for high radix switches.

• can cause synchronization effect in switches with distributed architecture.

Checking all queues at line rate for every packet:

fwd engine

fwd engine

fwd engine

fwd engine

Page 16: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

MICRO LOAD BALANCING WITH DRILL

• Discover topology.• Compute multiple shortest paths.• Install into FIB.

Switch control plane :

• Look up the candidate ports.• Sample queue lengths of 2 random 

queues plus the least loaded queue from the previous packet.

• Send the packet to the least loaded one among those three.

Switch data planefwd engine

fwd engine

fwd engine

fwd engine

Page 17: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

MICRO LOAD BALANCING WITH DRILL

• Discover topology.• Compute multiple shortest paths.• Install into FIB.

Switch control plane :

• Look up the candidate ports.• Sample queue lengths of 2 random 

queues plus the least loaded queue from the previous packet.

• Send the packet to the least loaded one among those three.

Switch data planefwd engine

fwd engine

fwd engine

fwd engine

Verilog switch implementation shows that DRILL imposes less than 1% switch area overhead.

Page 18: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

THE POWER OFTWO CHOICES

• n bins and n balls• Each ball choosing a bin

independently and uniformly at random

• Max load:

Y. Azar, A. Z. Broder, A. R. Karlin, and E. Upfal. Balanced allocations.SIAM Journal on Computing, 1994

𝜃(𝑙𝑜𝑔𝑛

𝑙𝑜𝑔𝑙𝑜𝑔𝑛)

Page 19: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

THE POWER OFTWO CHOICES

• n bins and n balls• Balls placed sequentially, • in the least loaded of d≥2

random bins

• Max load:

Y. Azar, A. Z. Broder, A. R. Karlin, and E. Upfal. Balanced allocations.SIAM Journal on Computing, 1994

𝜃(𝑙𝑜𝑔𝑙𝑜𝑔𝑛

𝑙𝑜𝑔𝑑)

Page 20: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

THE POWER OFTWO CHOICES

• n bins and n balls• Balls placed sequentially, • in the least loaded of d≥2

random bins

• Max load:

Y. Azar, A. Z. Broder, A. R. Karlin, and E. Upfal. Balanced allocations.SIAM Journal on Computing, 1994

Optimal

d=1

d=2

𝜃(𝑙𝑜𝑔𝑙𝑜𝑔𝑛

𝑙𝑜𝑔𝑑)

Page 21: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

WHAT WE WANT

Switch has queues, instead of bins, from which packets leave.

Each forwarding engine chooses a queue independently, no coordination.

Page 22: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

WHAT WE WANT

fwdengine

fwdengine

fwdengine

fwdengine

Switch has queues, instead of bins, from which packets leave.

Each forwarding engine chooses a queue independently, no coordination.

Page 23: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

THE PITFALLS OF CHOICE

Topology: Clos topology with 8 spines, 10 leafs, each connected to 48 hosts. Links: 10Gbps.Switches have 6 forwarding engines. Network is under 75% load.

Trace: Inside the social network’s (datacenter) network, Facebook, SIGCOMM 2015.

d (number of sampled queues)

Queu

e lengths [STD

V]

Page 24: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

THE PITFALLS OF CHOICE

Topology: Clos topology with 8 spines, 10 leafs, each connected to 48 hosts. Links: 10Gbps.Switches have 6 forwarding engines. Network is under 75% load.

Trace: Inside the social network’s (datacenter) network, Facebook, SIGCOMM 2015.

d (number of sampled queues)

Queu

e lengths [STD

V]

d (number of sampled queues)

Que

ue le

ngth

s [S

TDV]

Page 25: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

THE PITFALLS OF CHOICE

Topology: Clos topology with 8 spines, 10 leafs, each connected to 48 hosts. Links: 10Gbps.Switches have 6 forwarding engines. Network is under 75% load.

Trace: Inside the social network’s (datacenter) network, Facebook, SIGCOMM 2015.

d (number of sampled queues)

Queu

e lengths [STD

V]

d (number of sampled queues)

Que

ue le

ngth

s [S

TDV]

Balls and bins

Page 26: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

THE PITFALLS OF CHOICE

Topology: Clos topology with 8 spines, 10 leafs, each connected to 48 hosts. Links: 10Gbps.Switches have 6 forwarding engines. Network is under 75% load.

Trace: Inside the social network’s (datacenter) network, Facebook, SIGCOMM 2015.

d (number of sampled queues)

Queu

e lengths [STD

V]

d (number of sampled queues)

Que

ue le

ngth

s [S

TDV]

Balls and bins

DRILL is provably stable and delivers 100% throughput.

Page 27: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

Efficient micro load balancing implementation inside a switch

Reordering caused by per-packet decisions

Poor decisions in asymmetric topologies

CHALLENGES

Page 28: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

THOU SHALT NOT SPLIT FLOWS!

Splitting flows Reordering Throughput degradation

Page 29: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

DRILL SPLITS FLOWS ALONG MANY PATHS

Splitting flows Throughput degradation

Reordering

Page 30: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

Splitting flows Throughput degradation

Reordering

DRILL SPLITS FLOWS ALONG MANY PATHS

YET CAUSES LITTLEREORDERING!

Page 31: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

Splitting flows Throughput degradation

Reordering

DRILL SPLITS FLOWS ALONG MANY PATHS

YET CAUSES LITTLEREORDERING!

Page 32: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

Splitting flows Latency variance Throughput degradation

Reordering

DRILL SPLITS FLOWS ALONG MANY PATHS

YET CAUSES LITTLEREORDERING!

Page 33: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

Splitting flows Latency variance Throughput degradationReordering

DRILL SPLITS FLOWS ALONG MANY PATHS

BUT HAS LOW QUEUEINGDELAY VARIANCE

AND THEREFORE CAUSES LITTLE REORDERING!

Insight: Queueing delay variance is so small that it doesn’t matter what path the packet takes.

Page 34: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

Efficient micro load balancing implementation inside a switch

Reordering caused by per-packet decisions

Poor decisions in asymmetric topologies

Page 35: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

ASYMMETRYWHAT CAN GO WRONG?

?

1. Different queueing results in reordering.

2. TCP flows split along multiple paths will respond to congestion on the worst path, causing bandwidth inefficiency.

KEY PROBLEMS

dst

Page 36: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

ASYMMETRYWHAT CAN GO WRONG?

?

Flow splitting across asymmetric paths.

ROOT CAUSE:

KEY IDEA: Decompose the network into symmetric components and micro-LB inside components.

dst

Page 37: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

ASYMMETRY: GRAPH DECOMPOSITIONIntuition: Two paths are symmetric if they have equal number of hops and the ith queue carries the same flows on all paths for all i.

dst

Page 38: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

MICRO LOAD BALANCING IN ASYMMETRIC TOPOLOGIES

• Discover topology.• Compute multiple shortest paths.• Decompose the network into symmetric 

components.

SWITCH CONTROL PLANE :

• Hash flows to component.• Micro load balance inside a component.

Switch data plane:

Forwarding engine

dst

Hdst

Page 39: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

EXPERIMENTAL EVALUATION

Environment Topology Workload

• OMNET++ simulator

• Ported Linux 2.6 TCPimplementation

• 2­ and 3­stage Clos

• Asymmetric topologies

• Varying failures

• Real measurements [1]

• Synthetic

• Incast patterns

[1] Inside the social network’s (datacenter) network, Facebook, SIGCOMM 2015.

Page 40: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

DRILL REDUCES FCT ESPECIALLY UNDER LOADClos with 16 spines, 16 leafs, each connected to 20 hosts, links: 10Gbps

ECMPCONGAPresto

DRILL

ECMPCONGAPresto

DRILL

30% load 80% load

Page 41: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

DRILL ALLOWS HIGHER UTILIZATION WITH EQUAL LATENCYClos with 16 spines, 16 leafs, each connected to 20 hosts, links: 10Gbps

1.52x 1.44x1.28x

Page 42: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

DRILL HANDLES ASYMMETRYClos with 4 spines and 16 leafs, each connected to 20 hosts, edge links: 10Gbps, core links: 40Gbps. 5 randomly selected leaf-spine links fail.

ECMPCONGA

Presto

DRILL

Page 43: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

DRILL CUTS THE TAIL LATENCY IN INCASTClos with 4 spines and 16 leafs, each connected to 20 hosts, edge links: 10Gbps, core links: 40Gbps. Network is under 20% load. 10% of hosts send simultaneous requests for 10KB flows to 10% of the other hosts (all randomly selected).

ECMPCONGAPresto

DRILL

Page 44: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

UNDER THE HOOD

Leaf Spine: big improvementSpine Leaf: some improvementLeaf Host: no improvement

DRILL

PrestoPer-packet Round

RobinPer-packet Random

Enough duplicate ACKs to trigger TCP retransmission

Fine granularity is helpful but not enough: load-awareness keeps dup ACKs under control.

Page 45: Micro Load Balancing for Low-latency Data Center Networksconferences.sigcomm.org/sigcomm/2017/files/program/ts-6-1-drill.pdfSwitches have 6 forwarding engines. Network is under 75%

DRILL’S KEY IDEAS Key results:

Graph decomposition

Micro-LB:Randomized algorithm

Simple switch designLess than 1% area overheadNo host changesNo global coordination

Low FCT2.5x tail FCT reduction in incast

Theoretical analysis: Stable and 100% throughput