Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock...
Transcript of Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock...
![Page 1: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/1.jpg)
Tagger: Practical PFC Deadlock Prevention in Data Center Networks
Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo* (Toutiao), Kun Tan*(Huawei), Jitendra Padhye, Kai Chen (HKUST)
Microsoft
1* Work done while at Microsoft
CoNEXT 2017, Incheon, South Korea
![Page 2: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/2.jpg)
2
RDMA: Remote Direct Memory Accessv High throughput, low latency with low CPU overheadv Microsoft, Google, etc. are deploying RDMA
RDMAApplication
RDMANIC
Kernel
RDMAApplication
RDMANIC
LosslessNetwork
kernelbypass
kernelbypass
(WithPFC)
Kernel
RDMA is Being Widely Deployed
![Page 3: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/3.jpg)
Congestion
PAUSE upstream switch when PFC threshold reachedv Avoid packet drop due to buffer overflow
3
Priority Flow Control (PFC)
PFCthreshold:3pkts
PAUSE
![Page 4: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/4.jpg)
4
Due to Cyclic Buffer Dependency (CBD) A->B->C->ANot just a theoretical problem, we have seen it in our datacenters too!
PFC thresholdSwitch A
Switch BPAUSE
PAUSEPAUSE
Switch C
A Simple Illustration of PFC Deadlock
![Page 5: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/5.jpg)
5
CBD in the Clos Network
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
![Page 6: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/6.jpg)
6
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
flow 1 flow 2
consider two flows initially follow shortest UP-DOWN paths
CBD in the Clos Network
![Page 7: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/7.jpg)
7
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
flow 1 flow 2
CBD in the Clos Network
due to link failures, both flows are locally rerouted to non-shortest paths
![Page 8: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/8.jpg)
8
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
flow 1 flow 2
CBD: L2->S1->L3->S2->L2
L2
S1RX
L3
S2RX
RX RX
RX RX
buffer dependency graph
CBD in the Clos Network
these two DOWN-UP bounced flows create CBD
![Page 9: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/9.jpg)
9
Real in Production Data Centers?
Packetreroutemeasurementsinmorethan20datacenters:
~100,000 DOWN-UP reroutes!
![Page 10: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/10.jpg)
• #1: transient problem à PERMANENT deadlockv Transient loops due to link failuresv Packet floodingv …
• #2: small deadlock can cause large deadlock
deadlock
10
PAUSEPAUSE
PAUSE PAUSE
PAUSE
PAUSEPAUSE
Handling Deadlock is Important
![Page 11: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/11.jpg)
Three Key Challenges
11
What are the challenges in designing a practical deadlock prevention solution?
Ø No change to existing routing protocols or hardwareØ Link failures & routing errors are unavoidable at scaleØ Switches support at most 8 limited lossless priorities
(and typically only two can be used)
![Page 12: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/12.jpg)
• #1: deadlock-free routing protocolsv not supported by commodity switches (fail challenge #1)v not work with link failures or routing errors (fail challenge #2)
• #2: buffer management schemesv require a lot of lossless priorities (fail challenge #3)
12
The Existing Deadlock Prevention Solutions
Our answer: Tagger
![Page 13: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/13.jpg)
TAGGER DESIGN
13
![Page 14: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/14.jpg)
14
Important Observation
Fat-tree [Sigcomm’08] VL2 [Sigcomm’09]
desired path set: all shortest paths
BCube [Sigcomm’09]
desired path set: dimension-order paths
HyperX [SC’09]
Takeaway: In a data center, we can ask operator to supply a set of expected lossless paths (ELP)!
![Page 15: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/15.jpg)
15
Basic Idea of Tagger
1. Ask operators to provide: v topology & expected lossless paths (ELP)
2. Packets carrying tags when in the network
3. Pre-install match-action rules at switches for tag manipulation and packet queueingv packets travel over ELP: lossless queues & CBD never formsv packets deviate ELP: lossy queue, thus PFC not triggered
![Page 16: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/16.jpg)
16
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
flow 1 flow 2
Illustrating Tagger for Clos Topology
ELP = all shortest paths (CBD-free)
Root cause of CBD: packets deviate UP-DOWN routing!
![Page 17: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/17.jpg)
17
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
tag = NoBounce
• Under Tagger, packets carry tags when travelling in the network • Initially, tag value = NoBounce• At switches, Tagger pre-install match-action rules for tag manipulation
Tag InPort OutPort NewTag
NoBounce S1 S2 Bounced
… … … …
flow 1
match action
Illustrating Tagger for Clos Topology
match-action rules installed at switches
![Page 18: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/18.jpg)
18
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
Packet received by switch L3
Tag InPort OutPort NewTag
NoBounce S1 S2 Bounced
… … … …
flow 1
match actiontag = NoBounce
Illustrating Tagger for Clos Topology
match-action rules installed at switches
![Page 19: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/19.jpg)
19
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
tag = NoBounce
rewrite tag once DOWN-UP bounce detected
flow 1
match action
Tag InPort OutPort NewTag
NoBounce S1 S2 Bounced
… … … …
down-up bounce observed!
Bounced
Illustrating Tagger for Clos Topology
![Page 20: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/20.jpg)
20
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2 tag = Bounced
• S2 knows it is a bounced packet that deviates ELP à placed in the lossy queue• No PFC PAUSE sent from S2 to L3 à buffer dependency from L3 to S2 removed
flow 1
Illustrating Tagger for Clos Topology
![Page 21: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/21.jpg)
21
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
flow 2
• Tagger will do the same for packets of flow 2• 2 buffer dependency edges are removed à CBD is eliminated
CBD: L2->S1->L3->S2->L2
L2
S1RX
L3
S2RX
RX RX
RX RX
buffer dependency graph
L2
S1RX
L3
S2RX
RX RX
RX RX
Illustrating Tagger for Clos Topology
![Page 22: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/22.jpg)
22
What If ELP Has CBD?
ELP = shortestpaths
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
+ 1-bounce paths
(ELP has CBD now!)
![Page 23: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/23.jpg)
23
Segmenting ELP into CBD-free Subsets
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
flow 1 flow 2
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
flow 1 flow 2
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
flow 1 flow 2
path segments before bounce(only have UP-DOWN paths, no CBD)
two bounced paths are in ELP now
path segments after bounce(only have UP-DOWN paths, no CBD)
![Page 24: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/24.jpg)
24
Isolating Path Segments with Tags
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
flow 1 flow 2
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
flow 1 flow 2
tag 1 à path segments before bounce tag 2 à path segments after bounce
![Page 25: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/25.jpg)
25
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
flow 1
tag = 1
Isolating Path Segments with Tags
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2 tag = 2
Adding a rule at switch L3: (Tag = 1, Inport=S1, OutPort = S2) -> NewTag = 2
![Page 26: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/26.jpg)
26
No CBD after Segmentation
CBD: L2->S1->L3->S2->L2
buffer dependency graph
L2
S112
1
1L3
S221
1
1
packets with tag i à i-th lossless queue
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
flow 1 flow 2
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
flow 1 flow 2
tag 2tag 1
![Page 27: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/27.jpg)
27
What If k-bounce Paths all in ELP?
ELP = shortest up-down paths + 1-bounce paths
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
k-bounce paths
solution: just segmenting ELP into k CBD-free subsets based on number of bounced times!
![Page 28: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/28.jpg)
28
Summary: Tagger Design for Clos Topology
1. Initially, packets carry with tag = 1
2. pre-install match-action rules at switches:• DOWN-UP bounce: increase tag by 1 • Enqueue packets with tag i to i-th lossless queue (i <= k+1)• Enqueue packets with tag i to lossy queue(i > k+1)
For Clos topology, Tagger is optimal in terms of # of lossless priorities.
![Page 29: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/29.jpg)
29
How to Implement Tagger?
• DSCP field in the IP header as the tag carried in the packets
• build 3-step match-action pipeline with basic ACL rules available in commodity switches
![Page 30: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/30.jpg)
30
Tagger Meets All the Three Challenges
1. Work with existing routing protocols & hardware
2. Work with link failures & routing errors
3. Work with limited number of lossless queues
![Page 31: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/31.jpg)
More Details in the Paper
• Proof of Deadlock freedom
• Analysis & Discussions– Algorithm complexity– Optimality– Compression of match-action rules– …
31
![Page 32: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/32.jpg)
32
Evaluation-1: Tagger prevents Deadlock
L1 L2
T1 T2
L3 L4
T3 T4
S1 S2
flow 1 flow 2
Scenario: two flows forms CBD
Tagger avoids CBD caused by bounced flows, and prevents deadlock!
deadlock!
![Page 33: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/33.jpg)
33
Evaluation-2: Scalability of Tagger
Tagger is scalable in terms of number of lossless priorities and ACL rules.
Match-actionrulesandprioritiesrequiredforJellyfishtopology
*lastentryincludes additional 20,000randompaths.
![Page 34: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/34.jpg)
34
Evaluation-3: Overhead of Tagger
Tagger rules have no impact on throughput and latency
![Page 35: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/35.jpg)
35
Conclusion
• Tagger: a tagging system guarantees deadlock-freedom– Practical:
Ørequire no change to existing routing protocolsØimplementable with existing commodity switching ASICsØwork with limited number of lossless priorities
– General: Øwork with any topologies Øwork with any ELPs
![Page 36: Tagger: Practical PFC Deadlock Prevention in Data Center ...Tagger: Practical PFC Deadlock Prevention in Data Center Networks Shuihai Hu*(HKUST), Yibo Zhu, Peng Cheng, Chuanxiong Guo*](https://reader034.fdocuments.us/reader034/viewer/2022042200/5ea0558847409873380e0634/html5/thumbnails/36.jpg)
36
Thanks!