1 Rethinking Network Control and Management David A. Maltz [email protected].

29
1 Rethinking Network Control and Management David A. Maltz [email protected]

Transcript of 1 Rethinking Network Control and Management David A. Maltz [email protected].

1

Rethinking Network Control and Management

David A. Maltz

[email protected]

222

Context for Network Control and Management

Many different network environments Access, backbone networks

Data-center networks, enterprise/campus

Many different technologies Longest-prefix routing, label switching, circuit switching

IP, Ethernet, MPLS, optical circuits

Outsourcing of responsibility into the network Middle-boxes: firewalls, network monitoring, …

Many different policies Routing, reachability, transit, traffic engineering, robustness

333

ATT/CMU Study of 31 Production networks

Provider & enterprise networks (10-1200 routers)

Many different routing designs

Packet filters, multiple OSPF instances, multiple ASs

Router ID8810

Lines in

config file

2000

1000

0

444

Fundamental Problem: Wrong Abstractions

Management Plane• Figure out what is happening in

network• Decide how to change it

Shell scripts Traffic Eng

DatabasesPlanning tools

OSPFSNMP netflow modemsConfigs

OSPFBGP

Link metrics

OSPFBGP

OSPFBGP

Control Plane• Multiple routing processes on

each router• Each router with different

configuration program• Huge number of control knobs:

metrics, ACLs, policy

FIB

FIB

FIB

Routing policies

Packet filters

Data Plane• Distributed routers• Forwarding, filtering, queueing• Based on FIB or labels

555

Inside a Single Network

Data Plane

Distributed routers

Forwarding, filtering, queueing

Based on FIB or labels

Management Plane• Figure out what is

happening in network• Decide how to change it

Shell scripts Traffic Eng

DatabasesPlanning tools

OSPFSNMP netflow modemsConfigs

OSPFBGP

Link metrics

OSPFBGP

OSPFBGP

Control Plane• Multiple routing processes

on each router• Each router with different

configuration program• Huge number of control

knobs: metrics, ACLs, policy

FIB

FIB

FIB

Routing policies

Packet filters

State everywhere!

• Dynamic state in FIBs

• Configured state in settings, policies, packet filters

• Programmed state in magic constants, timers

• Many dependencies between bits of state

State updated in uncoordinated, decentralized way!

666

Inside a Single Network

Data Plane

Distributed routers

Forwarding, filtering, queueing

Based on FIB or labels

Management Plane• Figure out what is

happening in network• Decide how to change it

Shell scripts Traffic Eng

DatabasesPlanning tools

OSPFSNMP netflow modemsConfigs

OSPFBGP

Link metrics

OSPFBGP

OSPFBGP

Control Plane• Multiple routing processes

on each router• Each router with different

configuration program• Huge number of control

knobs: metrics, ACLs, policy

FIB

FIB

FIB

Routing policies

Packet filters

State everywhere!

• Dynamic state in FIBs

• Configured state in settings, policies, packet filters

• Programmed state in magic constants, timers

• Many dependencies between bits of state

State updated in uncoordinated, decentralized way!

Logic everywhere!

• Path Computation built i

nto routing protocols

• Routin

g Policy distributed across the routers

• Packet Filte

rs placed by tools in Mng. Plane

No way to arbitrate inconsistencies between logic

777

Control Plane: The Key Leverage Point

Great Potential: control plane determines the behavior of the network

Reaction to events, reachability, services

Great Opportunities

Each network (administrative domain) has its own control plane

A radical clean-slate control plane can be deployed

– Agnostic to user data format: IPv4/v6, ethernet, circuit

– No changes to end-system software

Control plane is the nexus of network evolution

– Changing the control plane logic can smooth transitions in network technologies and architectures

888

An Alternative: The 4D Architecture

Key principles

Network-level objectives

Network-wide views

Direct control

Corollaries

Predictable behavior (including overload threshold)

Zero device-specific or manual configuration

Data plane support for network-wide view

Define objectives in terms of organizationally salient entities

999

Good Abstractions Reduce Complexity

All decision making logic lifted out of control plane

Eliminates duplicate logic in management plane

Dissemination plane provides robust communication to/from data plane switches

ManagementPlane

Control Plane

Data Plane

DecisionPlane

Dissemination

Data Plane

Configs

FIBs, ACLs FIBs, ACLs

101010

Overview of the 4D Architecture

Decision Plane:

All management logic implemented on centralized servers making all decisions

Decision Elements use views to compute data plane state that meets objectives, then directly writes this state to routers

Decision

Dissemination

Discovery

Data

Network-level objectives

Direct control

Network-wide views

111111

Concerns and Challenges

Distributed Systems issues

How will communication between routers and DEs survive failures in the network?

Latency means DE’s view of network is behind reality. Will the control loop be stable?

What is the overhead to/from the DEs?

What happens in a network partition?

Networking issues

Does the 4D simplify control and management?

Can we create logic to meet multiple objectives?

121212

Evaluation of the 4D Prototype

Evaluated using Emulab (www.emulab.net)

Linux PCs used as routers (650 – 800MHz)

Tested on 9 enterprise network topologies (10-100 routers each)

Example network with 49 switches and 5 DEs

131313

Performance of the 4D Prototype

Trivial prototype has performance comparable to well-tuned production networks

Recovers from single link failure in < 300 ms

< 1 s response considered “excellent”

Faster forwarding reconvergence possible

Survives failure of master Decision Element

New DE takes control within 1 s

No disruption unless second fault occurs

Gracefully handles complete network partitions

Less than 1.5 s of outage

14

Thanks!

151515

Future Work

Scalability Evaluate over 1-10K switches, 10-100K routes

Networks with backbone-like propagation delays

Structuring decision logic Arbitrate among multiple, potentially competing objectives

Unify control when some logic takes longer than others

Protocol improvements Better dissemination and discovery planes

Deployment in today’s networks Data center, enterprise, campus, backbone (RCP)

161616

Future Work

Expand relationships with security

Securing the infrastructure

Using 4D as mechanism for monitoring/quarantine

Formulate models that establish bounds of 4D

Scale, latency, stability, failure models, objectives

Generate evidence to support/refute principles

171717

Themes of Network Control & Management

Holistic Design

Many different technologies – a few common problems

Find the right abstractions: exploit commonality

Clean Slate

How much autonomy do routers/switches need?

New principles for controlling networks

Separate networking issues from distributed system issues

Leverage Network Structure

Many different types of networks exist - each with different objectives and topologies

181818

Recent Publications

G. Xie, J. Zhan, D. A. Maltz, H. Zhang, A. Greenberg, G. Hjalmtysson, J. Rexford, “On Static Reachability Analysis of IP Networks,” IEEE INFOCOM 2005, Orlando, FL, March 2005.

J. Rexford, A. Greenberg, G. Hjalmtysson, D. A. Maltz, A. Myers, G. Xie, J. Zhan, H. Zhang, “Network-Wide Decision Making: Toward a Wafer-Thin Control Plane,” Proceedings of ACM HotNets-III, San Diego, CA, November 2004.

D. A. Maltz, J. Zhan, G. Xie, G. Hjalmtysson, A. Greenberg, H. Zhang, “Routing Design in Operational Networks: A Look from the Inside,” Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (ACM SIGCOMM 2004), Portland, Oregon, 2004.

D. A. Maltz, J. Zhan, G. Xie, H. Zhang, G. Hjalmtysson, A. Greenberg, J. Rexford, “Structure Preserving Anonymization of Router Configuration Data,” Proceedings of ACM/Usenix Internet Measurement Conference (IMC 2004), Sicily, Italy, 2004.

191919

A Clean-slate Design

What are the fundamental causes of network problems?

How to secure the network and protect the infrastructure?

What functionality needs to be distributed – what can be centralized?

How to reduce/simplify the software in networks?

What would a “RISC” router look like?

How to leverage technology trends?

CPU and link-speed growing faster than # of switches

202020

Three Principles forNetwork Control & Management

Network-level Objectives:

Express goals explicitly

Security policies, QoS, egress point selection

Do not bury goals in box-specific configuration

ManagementLogic

Reachability matrixTraffic engineering rules

212121

Three Principles forNetwork Control & Management

Network-wide Views:

Design network to provide timely, accurate info

Topology, traffic, resource limitations

Give logic the inputs it needs

ManagementLogic

Reachability matrixTraffic engineering rules

Read state info

222222

Three Principles forNetwork Control & Management

Direct Control:

Allow logic to directly set forwarding state

FIB entries, packet filters, queuing parameters

Logic computes desired network state, let it implement it

ManagementLogic

Reachability matrixTraffic engineering rules

Read state info

Write state

232323

Overview of the 4D Architecture

Dissemination Plane:

Provides a robust communication channel to each router – and robustness is the only goal!

May run over same links as user data, but logically separate and independently controlled

Decision

Dissemination

Discovery

Data

Network-level objectives

Direct control

Network-wide views

242424

Overview of the 4D Architecture

Discovery Plane:

Each router discovers its own resources and its local environment

E.g., the identity of its immediate neighbors

Decision

Dissemination

Discovery

Data

Network-level objectives

Direct control

Network-wide views

252525

Overview of the 4D Architecture

Data Plane:

Spatially distributed routers/switches

Can deploy with today’s technology

Looking at ways to unify forwarding paradigms across technologies

Decision

Dissemination

Discovery

Data

Network-level objectives

Direct control

Network-wide views

262626

Fundamental Problem: Conflation of Issues

Ideal case: all routing information flooded to all routers inside network

Robustness achieved via flooding

Reality: routing information filtered and aggregated extensively

Route filtering used to implement security and resource policies

Route aggregation used to achieve scalability

272727

4D Separates Distributed Computing Issues from Networking Issues

Distributed computing issues ! protocols and network architecture Overhead

Resiliency

Scalability

Networking issues ! management logic Traffic engineering and service provisioning

Egress point selection

Reachability control (VPNs)

Precomputation of backup paths

282828

4D Can Leverage Network Structure

Decision plane logic can be specialized for structure of each physical network

Distributed protocols must be prepared for arbitrary topology graphs

4D enables network logic specialized differently for access and for backbone

E.g., creating aggregation tree in access network

Advantages

Faster route computations

Retain flexibility to evolve network as needed

Support transition to 100x100 architecture

292929

The Feasibility of the 4D Architecture

We designed and built a prototype of the 4D Architecture

4D Architecture permits many designs – prototype is a single, simple design point

Decision plane

Contains logic to simultaneously compute routes and enforce reachability matrix

Multiple Decision Elements per network, using simple election protocol to pick master

Dissemination plane

Uses source routes to direct control messages

Extremely simple & robust

Quickly route around failed data links, even multiple failures