IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch...

28
IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer DeCusatis

Transcript of IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch...

Page 1: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, Zürich Research Laboratory

R3C2: Reactive Route & Rate Control for CEE

Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer DeCusatis

Page 2: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

2

Outline

• Introduction• Background• R3C2

• Evaluation• Conclusions

Page 3: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

3

Introduction & Motivation• Context: Congestion management in L2

datacenter networks

• Existing congestion mgmt schemes for CEE– Rate-only: Quantized Congestion Notification

• Rate/window reduction don’t benefit trading, HPC, BA apps

– Route-only: Switch Adaptive Routing• Based on QCN’s load sensor• Exploits path diversity

• Proposal: R3C2

– Dual Route & Rate control– Source-driven

Page 4: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

4

Outline

• Introduction• Background• R3C2

• Evaluation• Conclusions

Page 5: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

5

QCN 802.1Qau

switc

h

NIC RL

switc

h

switc

h

switc

h

NIC

NICendnode

endnode

CNM

CNM

endnode

Page 6: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

6

R3C2 Concept

Take advantage of CNMs at the source for

adaptive load-balancing

• Congestion Point issues CNMs– Where is the hotspot?– How severe is the hotspot?

• Source receives the CNMs– Identifies the most severe hotspots– Reroutes traffic around the hotspots– Splits flows and rate-limits subflows

Page 7: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

7

Source Routing in CEE: VLAN

• Plain Ethernet is not source-routed

• Solution: VLAN – One tree per VLAN

• Source– Set VLAN# at

injection path selection

P0 P1 P2 P3 P4 P5 P6 P7

S1_0 S1_1 S1_2 S1_3

S2_0 S2_1 S2_2 S2_3

S3_0 S3_1 S3_2 S3_3

VLAN0

VLAN1

VLAN2

VLAN3

Page 8: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

8

R3C2 Algorithm

P0 P7S1_0 S1_3

S2_0

S2_1

S2_2

S2_3

S3_0

S3_1

S3_2

S3_3

• No overload: Deterministic single path• Congestion: Activate additional paths• Path activation: avoid hotspots• Use RL along each path

Page 9: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

9

R3C2 Reaction Point

• Packet assigned the VLAN# of the 1st eligible Rate Limiter

RLflow 1

RLflow 2

RLflow 3

Packets from upper layers for a given destination D

TX queue

Reaction Point for D

To network

Assign VLAN1

Assign VLAN2

Assign VLAN3

MU

X

Page 10: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

10

Outline

• Introduction• Background• R3C2

• Evaluation• Conclusions

Page 11: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

11

Evaluation Methodology• Venus + Dimemas simulator

• Traffic– Synthetic: permutations + hotspot– HPC Traces:

• NAS: BT, CG, FT, IS, MG• WRF, NAMD, Liso, Airbus

• Model parameters– 10Gbps CEE with MTU = 1500B– QCN and PFC: 802 DCB settings

• Topology: 2-ary n-tree

Page 12: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

12

Permutation Traffic

0.2

0.4

0.6

0.8

1

Switch AR

Switch AR w

ith RL

Random

Random with

RL

Deterministi

c

Deterministi

c with

RLR3C2

Rela

tive

tota

l Tpu

t

Bit reverse Bit rotation Bit shuffle Bit transpose

Page 13: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

13

Hotspot Traffic Scenario

P0 P7S1_0 S1_3

S2_0

S2_1

S2_2

S2_3

S3_0

S3_1

S3_2

S3_3

C=25%

C=10%

C=10%

C=50%

Single 95% flow

Page 14: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

14

Hotspot Traffic

0

200

400

600

800

1000

1200

1400

0 0.02 0.04 0.06 0.08 0.1Time [s]

Tput

[MB/

s]

Random Switch AR R3C2

Page 15: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

15

HPC Traces: Hotspot

00.5

11.5

22.5

33.5

44.5

R3C2

Switch AR

Switch AR w

ith RL

Deterministi

c

Random

Random with

RL

Deterministi

c with

RL

Hashed

Hashed w

ith RL

Rela

tive

slow

dow

n

Min Average Max

Page 16: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

16

Conclusions

• Introduced R3C2

– Source-driven adaptive routing scheme– QCN and VLAN are key

• Dual Route & Rate control– Improved stability and performance

• Performance benefits– 80% over Deterministic– 40% over Random

Page 17: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

17

Backup

Page 18: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

18

Routing SchemesDeterministic

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Random

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Switch AR

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Page 19: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

19

Routing SchemesDeterministic

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Random

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Switch AR

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Page 20: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

20

Routing SchemesDeterministic

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Random

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Switch AR

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Page 21: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

21

Routing SchemesDeterministic

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Random

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Switch AR

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

R3C2

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Page 22: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

22

Routing SchemesDeterministic

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Random

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Switch AR

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

R3C2

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Page 23: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

23

Routing SchemesDeterministic

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Random

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Switch AR

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

R3C2

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Page 24: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

24

Routing SchemesDeterministic

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Random

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Switch AR

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

R3C2

0

0.25

0.5

0.75

1

1 2 3 4Path

Prob

abili

ty

Page 25: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

25

HPC Traces: No Hotspot

00.5

11.5

22.5

33.5

44.5

55.5

66.5

Random

Random with

RLR3C2

Switch AR

Switch AR w

ith RL

Deterministi

c

Hashed

Deterministi

c with

RL

Hashed w

ith RL

Rela

tive

slow

dow

n

Min Average Max

Page 26: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

26

Hotspot Traffic (1)

0

200

400

600

800

1000

1200

1400

0 0.02 0.04 0.06 0.08 0.1

Time [s]

Tput

[MB/

s]

Random Random with RL

Page 27: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

27

Hotspot Traffic (2)

0

200

400

600

800

1000

1200

1400

0 0.02 0.04 0.06 0.08 0.1

Time [s]

Tput

[MB/

s]

Switch AR Switch AR with RL

Page 28: IBM Research GmbH, Zürich Research Laboratory R 3 C 2 : Reactive Route & Rate Control for CEE Mitch Gusat, Daniel Crisan, Cyriel Minkenberg, and Casimer.

IBM Research GmbH, ZRL

28

Hotspot Traffic (3)

0

200

400

600

800

1000

1200

1400

0 0.02 0.04 0.06 0.08 0.1

Time [s]

Tput

[MB/

s]

Deterministic R3C2