Fault Tolerance in Network-on-Chip by Using Single Error Correction and Double Error Detection
Traffic Engineering with Forward Fault Correction (FFC)
-
Upload
ginger-thornton -
Category
Documents
-
view
25 -
download
1
description
Transcript of Traffic Engineering with Forward Fault Correction (FFC)
![Page 1: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/1.jpg)
Traffic Engineering with Forward Fault Correction (FFC)
Hongqiang “Harry” Liu, Srikanth Kandula, Ratul Mahajan, Ming Zhang,
David Gelernter (Yale University)
1
![Page 2: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/2.jpg)
Cloud services require large network capacity
2
Cloud Services
Cloud NetworksGrowing traffic
Expensive(e.g. cost of WAN: $100M/year)
![Page 3: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/3.jpg)
TE is critical to effectively utilizing networks
3
Traffic Engineering
WAN Network
• Microsoft SWAN• Google B4• ……
Datacenter Network
• Devoflow• MicroTE• ……
![Page 4: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/4.jpg)
But, TE is also vulnerable to faults
4
TE controller
Network
Network view
Network configuration
Frequent updates for high utilization
Control-plane faults
Data-plane faults
![Page 5: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/5.jpg)
Control plan faults
5
Failures or long delays to configure a network device
TE Controller Switch
TE configurations
Memory shortage
RPC failure
Firmware bugs Overloaded CPU
![Page 6: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/6.jpg)
Congestion due to control plane faults
s1
s2
s3
s4
Link Capacity: 10 7
3
3
7
10s1
10
10
10
10
s2
s3
s4
6
New Flows (traffic demands):s1 s2 (10)s1 s3 (10)s1 s4 (10)
Configuration failure
Congestions1
7
3
10
10
10
s2
s3
s4
10
s2 s4 (10)
s3 s4 (10)
![Page 7: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/7.jpg)
Data plane faults
7
Link and switch failures
s1
7
3
3
7
s2
s3
s4
link failure link failure
s110
7
s2
s3
s43congestion
Rescaling: Sending traffic proportionally to residual paths
Link Capacity: 10
s2 s4 (10)
s3 s4 (10)
![Page 8: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/8.jpg)
Control and data plane faults in practice
8
Control plane: fault rate = 0.1% -- 1% per TE update.
Data plane: fault rate = 25% per 5 minutes.
In production networks:• Faults are common.• Faults cause severe congestion.
![Page 9: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/9.jpg)
State of the art for handling faults
•Heavy over-provisioning
•Reactive handling of faults• Control plane faults: retry • Data plane faults: re-compute TE and update networks
9
Cannot prevent congestion
Slow(seconds -- minutes)
Blocked by control plane faults
Big loss in throughput
![Page 10: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/10.jpg)
10
How about handling congestion proactively?
![Page 11: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/11.jpg)
Forward fault correction (FFC) in TE
11
• [Bad News] Individual faults are unpredictable.• [Good News] Simultaneous #faults is small.
Network faults
Packet loss
FFC guarantees no congestion under up to k arbitrary faults.
FEC guarantees no information loss under up to k arbitrary packet drops.
with careful data encoding
with careful traffic distribution
![Page 12: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/12.jpg)
Example: FFC for control plane faults
s1
s2
s3
s4
Link Capacity: 10 7
3
3
7
10s1
10
10
10
10
s2
s3
s4
Configuration failure
Congestions1
7
3
10
10
10
s2
s3
s4
10
Non-FFC
12
![Page 13: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/13.jpg)
Example: FFC for control plane faults
s1
s2
s3
s4
Link Capacity: 10 7
3
3
7
13
Configuration failure
s1
7
3
10
10
10
s2
s3
s4
7
s1
s2
s3
s4
10
10
10
10
7
Control Plane FFC (k=1)
Configuration failure
s1
10
7
10
10
s2
s3
s47
3
![Page 14: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/14.jpg)
Trade-off: network utilization vs. robustness
10s1
10
10
10
10
s2
s3
s4
Non-FFC
s1
s2
s3
s4
10
10
10
10
7
K=1 (Control Plane FFC)
s1
10
4
10
s2
s3
s4
10
10
K=2 (Control Plane FFC)
14
Throughput: 44Throughput: 47Throughput: 50
![Page 15: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/15.jpg)
Systematically realizing FFC in TE
15
Formulation:How to merge FFC into existing TE framework?
Computation:How to find FFC-TE efficiently?
![Page 16: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/16.jpg)
Basic TE linear programming formulations
16
max.
Sizes of flows
control plane faults
Deliver all granted flows
No overloaded link
FFC constraints:
Maximizing throughput
No overloaded link up to link failures
switch failures
TE decisions:Traffic on paths
Basic TE constraints:
TE objective:
…
𝑏𝑓
𝑙 𝑓 , 𝑡
s.t.
…
LP formulations
![Page 17: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/17.jpg)
Formulating control plane FFC
17
s1 s2
𝑓 1
𝑓 2
𝑓 3
𝑙1𝑛𝑒𝑤+𝑙2
𝑛𝑒𝑤+ 𝑙3𝑜𝑙𝑑≤ link cap
𝑙1𝑛𝑒𝑤+𝑙2
𝑜𝑙𝑑+𝑙3𝑛𝑒𝑤 ≤ link cap
𝑙1𝑜𝑙𝑑+𝑙2
𝑛𝑒𝑤+𝑙3𝑛𝑒𝑤 ≤ link cap
(𝟑𝟏)’s load in old TE ’s load in new TE
Fault on :
Fault on :
Fault on :
Total load under faults?
With n flows and FFC protection k: #constraints = for each link.
Challenge: too many constraints
![Page 18: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/18.jpg)
An efficient and precise solution to FFC
19
Our approach:A lossless compression from O() constraints to O(kn) constraints.
Given , FFC requires thatthe sum of arbitrary k elements in is link spare capacity
O()
Define as the mth largest element in : link spare capacity
Expressing with ?
O()
: additional load due to fault-i
Total load under faults link capacity
Total additional load due to faults link spare capacity
![Page 19: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/19.jpg)
Sorting network
19
𝑥1𝑥2𝑥3𝑥4
A comparison𝑥1𝑥2
=max{}
=min{}
𝑧1𝑧 2𝑧 3
𝑧 4 𝑧5 (1st largest) (2nd largest)
𝑧 6𝑧7𝑧 8
1st round 2nd round
• Complexity: O(kn) additional variables and constraints.
• Throughput: optimal in control-plane and data plane if paths are disjoint.
+ link spare capacity
![Page 20: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/20.jpg)
FFC extensions
• Differential protection for different traffic priorities
•Minimizing congestion risks without rate limiters
• Control plane faults on rate limiters
• Uncertainty in current TE
• Different TE objectives (e.g. max-min fairness)
• …20
![Page 21: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/21.jpg)
Evaluation overview
•Testbed experiment• FFC can be implemented in commodity switches• FFC has no data loss due to congestion under faults
• Large-scale simulation
21
A WAN network with O(100) switches and O(1000) links
Injecting faults based on real failure reports
Single priority traffic in a well-provisioned network
Multiple priority traffic in a well-utilized network
![Page 22: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/22.jpg)
FFC prevents congestion with negligible throughput loss
FFC Throughput / Optimal Throughput
40
60
20
0
Rati
o (%
)
22
80
100
High priority (High FFC protection)Medium priority (Low FFC protection)Low priority (No FFC protection)
Single priority
FFC Data-loss / Non-FFC Data-loss
160%
<0.01%
![Page 23: Traffic Engineering with Forward Fault Correction (FFC)](https://reader036.fdocuments.us/reader036/viewer/2022081516/56812a44550346895d8d71d4/html5/thumbnails/23.jpg)
Conclusions
• Centralized TE is critical to high network utilization but is vulnerable to control and data plane faults.
• FFC proactively handle these faults.• Guarantee: no congestion when #faults ≤ k.• Efficiently computable with low throughput overhead in practice.
23
Heavy network over-provisioning
High risk of congestion
FFC