P561: Network Systems Week 6: Transport #2 Tom Anderson Ratul Mahajan TA: Colin Dixon.
A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang,...
-
Upload
zane-walmsley -
Category
Documents
-
view
217 -
download
3
Transcript of A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang,...
![Page 1: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/1.jpg)
A Network-State
Management
ServicePeng Sun
Ratul Mahajan, Jennifer Rexford,
Lihua Yuan, Ming Zhang, Ahsan Arefin
Princeton & Microsoft
![Page 2: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/2.jpg)
2
Complex Infrastructure
Variety of vendors/models/time
Number of 2010
Data Center A few
Network Device 1,000s
Network Capacity 10s of Tbps
Number of 2010 2014
Data Center A few 10s
Network Device 1,000s 10s of
1,000s
Network Capacity 10s of Tbps Pbps
Microsoft Azure
![Page 3: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/3.jpg)
3
Management Applications
Traffic Engineering
Load Balancing Link
Corruption MitigationDevice
Firmware Upgrade
……
![Page 4: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/4.jpg)
4
Our Question
How to safely run multiple management applications on shared infrastructure
![Page 5: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/5.jpg)
5
• It does not work due to 2 problems
Naïve Solution
• Run independentlyTraffic
Engineering
Link Corruption Mitigation
Firmware Upgrade
Network Devices
![Page 6: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/6.jpg)
6
Agg A
ToRs
Agg B
Core1 2
Problem #1: Conflict
Link-corruption-mitigation adjusts traffic away from Core1
TE tunes traffic among links to Core1, 2
![Page 7: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/7.jpg)
7
Agg A
ToRs
Agg B
Core1 2
Problem #2: Safety Violation
Link-corruption-mitigation shuts down faulty Agg A
Firmware-upgrade schedules Agg B to upgrade
![Page 8: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/8.jpg)
8
Potential Solution #1
• One monolithic application
• Central control of all actions
Traffic Engineerin
g
Firmware Upgrade
Link Corruption Mitigation
![Page 9: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/9.jpg)
9
Too Complex to Build
• Difficult to develop• Combine all applications that are
already individually complicated
• High maintenance cost• for such huge software in practice
![Page 10: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/10.jpg)
10
Potential Solution #2
• Explicit coordination among applications
• Consensus over network changes
Traffic Engineerin
g
Firmware Upgrade
Link Corruption Mitigation
![Page 11: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/11.jpg)
11
Still Too Complex
• Hard to understand each other• Diverse network interactions
Application Routing Device
Config
Traffic Engineering
Firmware upgrade
![Page 12: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/12.jpg)
12
Main Enemy: Complexity
• Application development
• Application coordination
MonolithicIndepen-dent
Explicitlycoordinate
Simple Complex
![Page 13: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/13.jpg)
13
What We Advocate
• Loose coupling of applications
• Design principle:• Simplicity with safety
guarantees
• Forgo joint optimization• Worthwhile tradeoff for simplicity• Applications could do it out-of-band
![Page 14: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/14.jpg)
14
Overview of Statesman
• Network operating system for safe multi-application operation
• Uses network state abstraction• Three views of network state• Dependency model of states
![Page 15: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/15.jpg)
15
The “State” in Statesman
• Complexity of dealing with devices• Heterogeneity• Device-specific commands
Network Devices
Network State
![Page 16: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/16.jpg)
16
State Variable Examples
State Variable Value
Device Power Status Up, down
Device Firmware Version number
Device SDN Agent Boot Up, down
Device Routing State Routing rules
Link Admin Status Up, down
Link Control Plane BGP, OpenFlow, …
![Page 17: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/17.jpg)
17
Simplify Device InteractionPast Now
SNMP, OF, vendor API, …
Read Write
Network Devices Network Devices
Network State
Application
Device Statistics
Application
Device-specificcmds
![Page 18: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/18.jpg)
18
Views of Network State
Network Devices
Observed State
Observed StateActual state of the whole network
Target StateDesired state to be updated on the whole network
Target State
Network State
ApplicationApplicationApplication
![Page 19: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/19.jpg)
19
Network Devices
Two Views Are Not Enough
Observed State
Target State
One More View
Proposed StateA group of entity-variable-valuesdesired by an application
Proposed State
ApplicationApplicationApplicationApplicationApplicationApplication
![Page 20: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/20.jpg)
20
How Merging Works
• Combine multiple proposed states into a safe target state
• Conflict resolution• Last-writer-wins• Priority-based locking• Sufficient for current deployment
• Safety invariant checking• Partial rejection & Skip update
![Page 21: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/21.jpg)
21
Choose Safety Invariants
• Our current choice• Connectivity: Every pair of ToRs in
one DC is connected• Capacity: 99% of ToR pairs have at
least 50% capacity
Hinder application too frequently
TightLoose
Cannot protect network operation
![Page 22: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/22.jpg)
22
Recap of Three-View Model
• Simplify network management
Observed State
Target StatePropose
d State
What we see from
the network
What we want the
network to be
What can be actually
done on the network
StatesmanApplicationApplicationApplication
![Page 23: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/23.jpg)
23
Yet Another Problem
• What’s in Proposed State• Small number of state variables
that application cares
• Implicit conflicts arises• Caused by state dependency
![Page 24: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/24.jpg)
24
A
B C
D
Implicit Conflict
TE writes new value of routing state of B for tunneling traffic
Firmware-upgrade writes new value of firmware state of B
![Page 25: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/25.jpg)
25
Dependency Relations
PowerState
FirmwareVersion
ConfigurationState
RoutingState
AdminState
ConfigurationState
PathState
Device
Link
bgpd SDN
![Page 26: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/26.jpg)
26
Build in Dependency Model
• Statesman calculates it internally
• Only exposes the result for each state variable• Whether the variable is
controllable
![Page 27: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/27.jpg)
27
Statesman System
TargetState
Monitor Updater
Checker
Proposed State
Observed State
Storage Service
![Page 28: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/28.jpg)
28
Deployment Overview
• Operational in Microsoft Azure for 10 months
• Cover 10 DCs of 20K devices
![Page 29: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/29.jpg)
29
Production Applications
• 3 diverse applications built• Device firmware upgrade• Link corruption mitigation• Traffic engineering
• Finish within months
• Only thousands of lines of code
![Page 30: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/30.jpg)
30
Case #1: Resolve ConflictInter-DC TE &
Firmware-upgrade
BR 1
BR 2
DC 1BR 8
BR 7
DC 4
BR 3BR 4
DC 2
BR 5
DC 3
BR 6
DC = Data CenterBR = Border Router
![Page 31: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/31.jpg)
31
Firmware-upgrade acquires lock of BR1
TE fails to acquire lock, and moves traffic away
BR1 firmware upgrade starts
BR1 firmware upgrade ends. Lock released.
TE re-acquires lock, and moves traffic back
……
……
![Page 32: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/32.jpg)
32
Case #1 Summary
• Each application: • Simple logic• Unaware of the other
• Statesman enables: • Conflict resolution• Necessary coordination
![Page 33: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/33.jpg)
33
Case #2: Maintain Capacity Invariant
Firmware-upgrade & Link-corruption-mitigation
…
ToR
Agg
…
… …
Core
…Pod 4
41
1 n…Pod 1
41
1 n …Pod 10
41
1 n
1 4
Link corrupting packets
![Page 34: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/34.jpg)
34
Upgrade proceeds in normal speed in Pod 3 and 5
Upgrade in Pod 4 is slowed down by checker due to lost
capacity
……
……
…
![Page 35: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/35.jpg)
35
Case #2 Summary
• Statesman:• Automatically adjusts
application progresses• Keeps the network within safety
requirements
![Page 36: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/36.jpg)
36
Conclusion
• Need network operating system for multiple management applications
• Statesman• Loose coupling of applications• Network state abstraction
• Deployed and operational in Azure
![Page 37: A Network-State Management Service Peng Sun Ratul Mahajan, Jennifer Rexford, Lihua Yuan, Ming Zhang, Ahsan Arefin Princeton & Microsoft.](https://reader036.fdocuments.us/reader036/viewer/2022062515/56649cb85503460f9497eb87/html5/thumbnails/37.jpg)
37
Thanks!
Questions?
Check paper for related works