Software-defined networking: Change is hard Ratul Mahajan with Chi-Yao Hong, Rohan Gandhi, Xin Jin,...

Post on 14-Dec-2015

214 views 0 download

Tags:

Transcript of Software-defined networking: Change is hard Ratul Mahajan with Chi-Yao Hong, Rohan Gandhi, Xin Jin,...

Software-defined networking:Change is hard

Ratul Mahajanwith

Chi-Yao Hong, Rohan Gandhi, Xin Jin, Harry Liu,Vijay Gill, Srikanth Kandula, Mohan Nanduri, Roger Wattenhofer, Ming Zhang

Inter-DC WAN: A critical, expensive resource

Hong Kong

Seoul

Seattle

Los Angeles

New York

Miami

Dublin

Barcelona

But it is highly inefficient

One cause of inefficiency: Lack of coordination

Another cause of inefficiency: Local, greedy resource allocation

Local, greedy allocation

A

B C D

E

FGH

B C D

FGH

A E

Globally optimal allocation[Latency inflation with MPLS-based traffic engineering, IMC 2011]

SWAN: Software-driven WAN

Highly efficient WANFlexible sharing policies

Coordinate across servicesCentralize resource allocation

Goals Key design elements

[Achieving high utilization with software-driven WAN, SIGCOMM 2013]

SWAN controller

SWAN overview

WAN

Service hosts

Network agentService broker

Traffic demand

BW allocation

Networkconfig.

Topology, traffic

Rate limiting

Key design challenges

Scalably computing BW allocations

Avoiding congestion during network updates

Working with limited switch memory

Congestion during network updates

Congestion-free network updates

Computing congestion-free update plan

Leave scratch capacity on each link Ensures a plan with at most steps

Find a plan with minimal number of steps using an LP Search for a feasible plan with 1, 2, …. max steps

Use scratch capacity for background traffic

SWAN provides congestion-free updatesCo

mpl

emen

tary

CD

F

Oversubscription ratio Extra traffic (MB)

SWAN comes close to optimal

SWAN

Thro

ughp

ut(r

elati

ve to

opti

mal

)

SWANw/o rate control

MPLS TE

Deploying SWAN

WAN

Data center

WAN

Data center

Partial deployment Full deployment

The challenge of data plane updates in SDN

Not just about congestion Blackholes, loops, packet coherence, …

The challenge of data plane updates in SDN

Not just about congestion Blackholes, loops, packet coherence, …

Real-world is even messier

CDF

Latency (seconds) Latency (seconds)

CDF

Google’s B4 Our controlled experiments

Many resulting questions of interest

Fundamental What consistency properties can be maintained and how? Is property strength and ease of maintenance related?

Practical How to quickly and safely update the data plane? Impacts failure recovery time, network utilization, flow response time

Minimal dependencies for a consistency property

[On consistent updates in software-defined networks, HotNets 2013]

None Self Downstream subset

Downstream all Global

Eventual consistency

Always guaranteed

Blackhole freedom Impossible Add before

remove

Loop freedom Impossible Rule dependency

forestRule dependency

tree

Packet coherence Impossible Flow version

numbersGlobal version

numbers

Congestion freedom Impossible Staged partial

moves

Fast, consistent network updates

Desired state

generator

Update planner

Routing policy

Consistency property

Target network

state

Update plan

Current network

state

Forward fault correction Computes states that are robust to common faults

DionysusDynamically schedules

network updates

Overview of forward fault correction

Control and data plane faults cause congestion Today, reactive data plane updates are needed to remove congestion

FFC handles faults proactively Guarantees absence of congestion for up to k faults

Main challenge: Too many possible faults Constraint reduction technique based on sorting networks

[Traffic engineering with forward fault correction, SIGCOMM 2014 (to appear)]

Congestion due to control plane faults

Current State Target state

FFC for control plane faults

Current State Vulnerable target state

Robust target state (k=1)

Robust target state (k=2)

Congestion due to data plane faults

Pre-failure traffic distribution Post-failure traffic distribution

FFC for data plane faults

Vulnerable traffic distribution Robust traffic distribution (k=1)

FFC guarantee needs too many constraints

[

: { | is a set of up to faulty switches} 𝑇 𝑙(𝑠) : Additional   traffic   on   link   𝑙   when   switch  𝑠   is  faulty Spare capacity of link in the absence of faults

Number of constraints is for each link

Efficient solution using sorting networks

: mth largest variable in the array

Use bubble sort network to compute linear expressions for k largest variables

O(nk) constraints

FFC performance in practice

Single-priority traffic(

Multi-priority traffic

Fast, consistent network updates

Desired state

generator

Update planner

Routing policy

Consistency property

Target network

state

Update plan

Current network

state

Forward fault correction Computes states that are robust to common faults

DionysusDynamically schedules

network updates

Overview of dynamic update scheduling

Current schedulers pre-compute a static update schedule Can get unlucky with switch delays

Dynamic scheduling adapts to actual conditions

Main challenge: Tractably exploring “safe” schedules

[Dionysus: Dynamic scheduling of network updates, SIGCOMM 2014 (to appear)]

Downside of static schedules

S1

S5S4

S3S2F2: 5 F3: 10

F4: 5F1: 5

Current State

S1

S5S4

S3S2

F1: 5

F4: 5

F2: 5 F3: 10

Target State

F2

F4F3

F1S1S2S3S4

21 time43

Plan A F4 F1

F2F3

F2

F4F3

F1S1S2S3S4

21 3 time4 5

Plan B F4

F1F2F3

F2

F4F3

F1S1S2S3S4

21 3 time

F2

F4F3

F1S1S2S3S4

431 2 time

Downside of static schedules

S1

S5S4

S3S2F2: 5 F3: 10

F4: 5F1: 5

Current State

S1

S5S4

S3S2

F1: 5

F4: 5

F2: 5 F3: 10

Target State

Dynamic plan

F4

F2F3

F1

Low update time regardless of latency variability

Static plan A

F4 F1

F2F3

Static plan B

F4

F1F2F3

Challenge in dynamic scheduling

Tractably explore valid orderings Exponential number of orderings Cannot completely avoid planning

S1

S5S4

S3S2

F2: 5

F3: 5F4: 5

F1: 5

Current State F5: 10

S1

S5S4

S3S2

F1: 5

F4: 5

F2: 5 F3: 10

Target State F5: 10

F3: 5

Dionysus pipeline

Dependency graph

generator

Consistency property

Target network

state

Dependency graph

Current network

stateUpdate

scheduler

Dionysus dependency graph

Nodes: updates and resourcesEdges: dependencies among nodes

S1

S5S4

S3S2

F2: 5

F3: 5F4: 5

F1: 5

Current State F5: 10

S1

S5S4

S3S2

F1: 5

F4: 5

F2: 5 F3: 10

Target State F5: 10

F3: 5

Dionysus scheduling

NP-complete problem with capacity and memory constraints

Approach Critical path scheduling Treat strongly connected components

as virtual nodes and favor them Rate limit flows to resolve deadlocks

Dionysus leads to faster updates

Median improvement over static scheduling (SWAN): 60-80%

Dionysus reduces congestion due to failures

99th percentile improvement over static scheduling (SWAN): 40%

Fast, consistent network updates

Desired state

generator

Update planner

Routing policy

Consistency property

Target network

state

Update plan

Current network

state

Forward fault correction Computes states that are robust to common faults

DionysusDynamically schedules

network updates

Summary

SDN enables new network operating points such as high utilization

But also pose a new challenge: fast, consistent data plane updates