Future Internet Chapter 4: Software DefinedNetworking 4c...

30
Computer Networks Group Universität Paderborn Future Internet Chapter 4: Software Defined Networking 4c: Scalability & resilience Holger Karl

Transcript of Future Internet Chapter 4: Software DefinedNetworking 4c...

Page 1: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Computer Networks GroupUniversität Paderborn

Future Internet Chapter 4: Software Defined Networking4c: Scalability & resilience

Holger Karl

Page 2: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Scalability & resilience concerns in SDN

• Possible conception: SDN has a single, centralized controller• Main point: Controller of SDN is considered logically centralized;

need not to be physically centralized • Central view is important, not implementation

• Line of argument 1: Even if it were centralized, it would not be a problem (or not a bigger problem than in non-SDN networks) • … for flow setup times • … for controller performance (# flow establishments / second) • ... for resilience (need to reach controller)

• Line of argument 2: Let’s build distributed SDN controllers

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 2

Page 3: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

From Foundations of Modern Networking: SDN, NFV, QoE, IoT, and Cloud by William Stallings (0134175395)

Copyright © 2016 Pearson Education, Inc. All rights reserved.

FIGURE 5.10 SDN Domain Structure

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 3

Page 4: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

From Foundations of Modern Networking: SDN, NFV, QoE, IoT, and Cloud by William Stallings (0134175395)

Copyright © 2016 Pearson Education, Inc. All rights reserved.

FIGURE 5.11 Federation of SDN Controllers [GUPT14]

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 4

Page 5: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

From Foundations of Modern Networking: SDN, NFV, QoE, IoT, and Cloud by William Stallings (0134175395)

Copyright © 2016 Pearson Education, Inc. All rights reserved.

FIGURE 5.12 Heterogeneous Autonomous Systems with OpenFlow and Non-OpenFlow Domains

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 5

Page 6: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

From Foundations of Modern Networking: SDN, NFV, QoE, IoT, and Cloud by William Stallings (0134175395)

Copyright © 2016 Pearson Education, Inc. All rights reserved.

FIGURE 5.13 East-West Connection Establishment, Route, and Flow Setup

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 6

Page 7: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

From Foundations of Modern Networking: SDN, NFV, QoE, IoT, and Cloud by William Stallings (0134175395)

Copyright © 2016 Pearson Education, Inc. All rights reserved.

FIGURE 5.14 SDNi Components in OpenDaylight Structure (Helium)

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 7

Page 8: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

From Foundations of Modern Networking: SDN, NFV, QoE, IoT, and Cloud by William Stallings (0134175395)

Copyright © 2016 Pearson Education, Inc. All rights reserved.

FIGURE 5.15 OpenDaylight SDNi Wrapper

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 8

Page 9: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Content

• Scalability options • ONIX• Kandoo• DevoFlow• Trade-offs

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 9

Page 10: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Controller scalability

• Flow establishment in early implementations (NOX): about 30.000 requests/s, to keep establishment time reasonable

• Improved implementations achieve factor 10 higher rate • Provide defaults for short-lived flows; only talk to controller

for longer-lived flows (DevoFlow)• Physically distribute controller (Onix, HyperFlow)

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 10

Page 11: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Flow establishment overhead

• Push state proactively, don’t wait for actual requests (DIFANE)

• Keep controllers close to switches (to keep propagation time small)

• Improve update rates on flow tables • Bottleneck: slow switch CPUs,

limited bandwidth inside a switch between fabric and control

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 11

IEEE Communications Magazine • February 2013138

scalability issues associated with a centralizedcontroller, albeit for a more restricted set ofcontrol applications satisfying certain properties.

Kandoo [9] takes a different approach to dis-tributing the control plane. It defines a scope ofoperations to enable applications with differentrequirements to coexist: locally scoped applica-tions (i.e., applications that can operate usingthe local state of a switch) are deployed close tothe data path in order to process frequentrequests and shield other parts of the controlplane from the load. A root controller, on theother hand, takes charge of applications thatrequire network-wide state, and also acts as amediator for any coordination required betweenlocal controllers.

An interesting observation is that controlplane scalability challenges in SDN (e.g., conver-gence and consistency requirements) are notinherently different than those faced in tradi-tional network design. SDN, by itself, is neitherlikely to eliminate the control plane design com-plexity or make it more or less scalable.1 SDN,however:

• Allows us to rethink the constraints tradi-tionally imposed on control protocol designs(e.g., a fixed distribution model) and decideon our own trade-offs in the design space

• Encourages us to apply common softwareand distributed systems development prac-tices to simplify development, verification,and debugging

Unlike traditional networks, in SDN, we do notneed to address basic but challenging issues liketopology discovery, state distribution, andresiliency over and over again. As demonstratedin Onix, control applications can rely on the con-trol platform to provide these common func-tions; functions such as maintaining a cohesiveview of the network in a distributed and scalablefashion. In fact, it is significantly easier to devel-op applications for such cohesive distributedcontrol platforms than a swarm of autonomousapplications running on heterogeneous forward-ing elements.

OTHER SDN SCALABILITY CONCERNSIncreased load on the controller is only one ofthe voiced concerns about SDN scalability. Here,we briefly explain other causes of concern, alongwith potential solutions.

Flow Initiation Overhead — Ethane [10], anearly SDN security system, puts a controller incharge of installing forwarding state on switcheson a per-flow basis. Even though this reactiveform of flow handling introduces a great degreeof flexibility (e.g., easy fine-grained high-levelnetwork-wide policy enforcement in the case ofEthane), it introduces a flow setup delay and,depending on implementation, may limit scala-bility. Early designs, such as Ethane and NOX,lead to the widespread assumption that all SDNsystems are reactive. In reality, however, proac-tive designs — in which forwarding entries areset up before the initiation of actual flows — areperfectly acceptable in SDN, and can avoid theflow setup delay penalty altogether.

Let us review the flow setup process toexplain the bottlenecks and show how a gooddesign can avoid them. As illustrated in Fig. 2,the flow setup process has four steps:• A packet arrives at the switch that does not

match any existing entry in the flow table.• The switch generates a new flow request to

the controller.• The controller responds with a new for-

warding rule.• The switch updates its flow table.The performance in the first three steps and par-tially the last depends on the switch capabilitiesand resources (management CPU, memory, etc.)and the performance of its software stack. Thedelay in the third step is determined by the con-troller’s resources along with the control pro-gram’s performance. Finally, the switch’s FIBupdate time contributes to the delay in complet-ing the flow setup process.

Assuming controllers are placed in closeproximity of switches, the controller-switch com-munication delay is negligible. On the controllerside, even on a commodity machine with a singleCPU core, state-of-the-art controllers are wellcapable of responding to flow setup requests

Figure 2. The four steps in the flow setup process.

Rule

Rule

Packet Miss inflow table

Switch updates

the flow table.

Switch

4

1

Flow table

New rule

32

Controller sends a newforw

arding rule.

Controller

Flow request is sent to

the controller.

1 After all, one can repli-cate a traditional networkdesign with SDN by collo-cating equal numbers offorwarding and controlelements. Even thoughthis obviates the benefitsof SDN, it is technicallypossible.

YEGANEH LAYOUT_Layout 1 1/28/13 3:43 PM Page 138

From [1]

Page 12: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Link failure recovery

• Controllers need to be informed about link failure

• But only the controller! No need to flood network

• Possible problem: control network itself can be damaged • In particular, by

misconfiguration • Use separate, slow, reliable

out-of-band signalling network

• Similar convergence times

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 12

within a millisecond when the flow initiationrequests are on the order of several hundredthousand per second.

While Open vSwitch — an OpenFlow-enabledsoftware switch — is capable of installing tens ofthousands of flows per second with sub-millisec-ond latency, hardware switches only support a fewthousand installations per second with a sub-10 mslatency at best. This poor performance is typicallyattributed to lack of resources on switches (weakmanagement CPUs), poor support for high-fre-quency communication between the switchingchipset and the management CPU, and non-opti-mal software implementations. We expect theseissues to be resolved in a few years as more spe-cialized hardware is built. It is foreseeable that theFIB update time will become the main factor inthe switch-side flow setup latency.

While we argue that controllers and, in thenear future, switches would be able to sustainsufficient throughput with negligible latency forreactive flow setup, in the end the control logicdetermines the scalability of a reactive design. Acontrol program installing an end-to-end path ona per-flow basis does not scale, because the per-switch memory is fixed but the number of for-warding entries in the data path grows with thenumber of active flows in the network. However,the control program may install aggregate rulesmatching a large number of micro-flows (therebyfacing the same scalability challenges as a proac-tive design), or proactively install rules in thenetwork core to provide end-to-end connectivityand identify quality of service (QoS) classes,while classifying and reactively labeling flows atthe edge. A viable solution to the scalabilitychallenges of the proactive designs in the formerclass due to data path memory scarcity is pro-posed in DIFANE [5]; while the scalability ofthe latter class follows from the observation thatthe fanout of an edge switch and thus the num-ber of flows initiated there is bounded (just addedge controllers as the network grows in size).

Resiliency to Failures — Resiliency to failuresand convergence time after a failure have alwaysbeen a key concern in network performance.SDN is no exception, and, with the early systemssetting an example of designs with a single cen-tral control, resiliency to failures has been amajor concern. A state-synchronized slave con-troller would be sufficient to recover from con-troller failures, but a network partition wouldleave half of the network brainless. In a multi-controller network, with an appropriate con-troller discovery mechanisms, switches canalways discover a controller if one exists withintheir partition. Therefore, given a scalable dis-covery mechanism, controller failures do notpose a challenge to SDN scalability.

Let us decompose the process of repairing abroken link or switch to see how it is differentfrom the traditional networks. As shown in Fig. 3,convergence in response to a link failure has fivesteps. The switch detects a change. Then theswitch notifies the controller. Upon notification,the control program computes the repair actionsand pushes updates to the affected data path ele-ments, which, in turn, update their forwardingtables.2 In traditional networks, link failure noti-

fications are flooded across the network, where-as with SDN, this information is sent directly toa controller. Therefore, the information propa-gation delay in SDN is no worse than in tradi-tional networks. Also, as an advantage for SDN,the computation is carried out on more capablecontroller machines as opposed to weak man-agement CPUs of all switches, regardless ofwhether they are affected by the failure or not.

Note that the above argument was built on theimplicit assumption that the failed switch or linkdoes not affect the switch-controller communica-tion channel. The control network itself needs tobe repaired first if a failed link or switch was partof it. In that case, if the control network — builtwith traditional network gear — is running anIGP, the IGP needs to converge first beforeswitches can communicate with the controller torepair the data network. In this corner case,therefore, convergence may be slower than in tra-ditional networks. If this proves to be a problem,the network operator should deploy an out-of-band control network to alleviate this issue.

Overall, the failure recovery process in SDNis no worse than in traditional networks. Conse-quently, similar scalability concerns exist, andthe same techniques used to minimize downtimein traditional networks are applicable to SDN.For instance, SDN design can and should alsoleverage local fast failover mechanisms availablein switches to transiently forward traffic towardpreprogrammed backup paths while a failure isbeing addressed. We stress that, as demonstrat-ed in Onix [2], the control platform provides theessential failover and recovery mechanisms thatcontrol applications can reuse and rely upon.

IEEE Communications Magazine • February 2013 139

Figure 3. The five steps when converging on a link failure.

Switch

A switch detectsa link failure.

Controller pushesupdate to sw

itches.

Switch

4

Then notifies thecontroller aboutthe failure.

2

1

Controllercomputes therequired updates.

3

Switches updatetheir forwardingtable.

5

Switch

Controller

2 For switch failures, theprocess is very similar withthe exception that the con-troller itself detects thefailure.

YEGANEH LAYOUT_Layout 1 1/28/13 3:43 PM Page 139

From [1]

Page 13: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Scaling controllers: Case study Onix [3]

• Goal: Design a flexible and reliable control platform for SDN

• Platform should take care of distribution details; shield protocol

developer from them

• Main contributions of Onix

• Provide an API general enough to be applicable to scenarios ranging from WAN to clouds to data centres, …

• Provide flexible distribution primitives for network state/rules

• Requirements

• Scale to million of ports, high-performance networks

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 13

Page 14: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Onix components

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 14

Because the control platform simplifies the duties ofboth switches (which are controlled by the platform) andthe control logic (which is implemented on top of theplatform) while allowing great generality of function,the control platform is the crucial enabler of the SDNparadigm. The most important challenges in building aproduction-quality control platform are:

• Generality: The control platform’s API must allowmanagement applications to deliver a wide range offunctionality in a variety of contexts.

• Scalability: Because networks (particularly in thedatacenter) are growing rapidly, any scaling limita-tions should be due to the inherent problems of statemanagement, not the implementation of the controlplatform.

• Reliability: The control platform must handle equip-ment (and other) failures gracefully.

• Simplicity: The control platform should simplify thetask of building management applications.

• Control plane performance: The control platformshould not introduce significant additional controlplane latencies or otherwise impede managementapplications (note that forwarding path latenciesare unaffected by SDN). However, the requirementhere is for adequate control-plane performance, notoptimal performance. When faced with a tradeoffbetween generality and control plane performance,we try to optimize the former while satisficing thelatter.4

While a number of systems following the basicparadigm of SDN have been proposed, to date there hasbeen little published work on how to build a networkcontrol platform satisfying all of these requirements.To fill this void, in this paper we describe the designand implementation of such a control platform calledOnix (Sections 2-5). While we do not yet have extensivedeployment experience with Onix, we have implementedseveral management applications which are undergoingproduction beta trials for commercial deployment. Wediscuss these and other use cases in Section 6, and presentsome performance measures of the platform itself inSection 7.

Onix did not arise de novo, but instead derives froma long history of related work, most notably the line

4There might be settings where optimizing control planeperformance is crucial. For example, if one cannot use backup paths forimproved reliability, one can only rely on a fine-tuned routing protocol.In such settings one might not use a general-purpose control platform,but instead adopt a more specialized approach. We consider such settingsincreasingly uncommon.

Oni

xSwitch Import / Export

NIB

Distribution I / E

Network Control Logic

Switch Import / Export

NIB

Distribution I / E

Network Control Logic

Server 1 Server N

Managed Physical Network Infrastructure

Management Connectivity Network Infrastructure

Figure 1: There are four components in an Onix controllednetwork: managed physical infrastructure, connectivityinfrastructure, Onix, and the control logic implemented by themanagement application. This figure depicts two Onix instancescoordinating and sharing (via the dashed arrow) their views ofthe underlying network state, and offering the control logic aread/write interface to that state. Section 2.2 describes the NIB.

of research that started with the 4D project [15] andcontinued with RCP [3], SANE [6], Ethane [5] andNOX [16] (see [4,23] for other related work). While all ofthese were steps towards shielding protocol design fromlow-level details, only NOX could be considered a controlplatform offering a general-purpose API.5 However, NOXdid not adequately address reliability, nor did it givethe application designer enough flexibility to achievescalability.

The primary contributions of Onix over existing workare thus twofold. First, Onix exposes a far more generalAPI than previous systems. As we describe in Section 6,projects being built on Onix are targeting environmentsas diverse as the WAN, the public cloud, and theenterprise data center. Second, Onix provides flexibledistribution primitives (such as DHT storage and groupmembership) allowing application designers to implementcontrol applications without re-inventing distributionmechanisms, and while retaining the flexibility to makeperformance/scalability trade-offs as dictated by theapplication requirements.

2 DesignUnderstanding how Onix realizes a production-qualitycontrol platform requires discussing two aspects of itsdesign: the context in which it fits into the network, andthe API it provides to application designers.

2.1 ComponentsThere are four components in a network controlled byOnix, and they have very distinct roles (see Figure 1).

• Physical infrastructure: This includes networkswitches and routers, as well as any other networkelements (such as load balancers) that supportan interface allowing Onix to read and write the

5Only a brief sketch of NOX has been published; in some ways,this paper can be considered the first in-depth discussion of a NOX-likedesign, albeit in a second-generation form.

From [3]

Page 15: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Onix network model and API

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 15

1

Node

ForwardingEngineHost Forwarding

Tablen

nPort

1Network

Link

21

Figure 2: The default network entity classes provided byOnix’s API. Solid lines represent inheritance, while dashed linescorrespond to referential relation between entity instances. Thenumbers on the dashed lines show the quantitative mappingrelationship (e.g., one Link maps to two Ports, and twoPorts can map to the same Link). Nodes, ports and linksconstitute the network topology. All entity classes inherit thesame base class providing generic key-value pair access.

For example, there is a Port entity class that canbelong to a list of ports in a Node entity. Figure 2illustrates the default set of typed entities Onix provides –all typed entities have a common base class limited togeneric key-value pair access. The type-set within Onix isnot fixed and applications can subclass these basic classesto extend Onix’s data model as needed.6

The NIB provides multiple methods for the controllogic to gain access to network entities. It maintains anindex of all of its entities based on the entity identifier,allowing for direct querying of a specific entity. It alsosupports registration for notifications on state changesor the addition/deletion of an entity. Applications canfurther extend the querying capabilities by listening fornotifications of entity arrivals and maintaining their ownindices.

The control logic for a typical application is thereforefairly straightforward. It will register to be notified onsome state change (e.g., the addition of new switches andports), and once the notification fires, it will manipulatethe network state by modifying the key-value pairs of theaffected entities.

The NIB provides neither fine-grained nor distributedlocking mechanisms, but rather a mechanism to requestand release exclusive access to the NIB data structureof the local instance. While the application is given theguarantee that no other thread is updating the NIB withinthe same controller instance, it is not guaranteed thestate (or related state) remains untouched by other Onixinstances or network elements. For such coordination,it must use mechanisms implemented externally to theNIB. We describe this in more detail in Section 4; for now,we assume this coordination is mostly static and requirescontrol logic involvement during failure conditions.

All NIB operations are asynchronous, meaning thatupdating a network entity only guarantees that the updatemessage will eventually be sent to the corresponding

6Subclassing also enables control over how the key-value pairs arestored within the entity. Control logics may prefer different trade-offsbetween memory and CPU usage.

Category PurposeQuery Find entities.Create, destroy Create and remove entities.Access attributes Inspect and modify entities.Notifications Receive updates about changes.Synchronize Wait for updates being exported to

network elements and controllers.Configuration Configure how state is imported

to and exported from the NIB.Pull Ask for entities to be imported

on-demand.

Table 1: Functions provided by the Onix NIB API.

network element and/or other Onix instances – noordering or latency guarantees are given. While thishas the potential to simplify the control logic and makemultiple modifications more efficient, often it is useful toknow when an update has successfully completed. Forinstance, to minimize disruption to network traffic, theapplication may require the updating of forwarding stateon multiple switches to happen in a particular order (tominimize, for example, packet drops). For this purpose,the API provides a synchronization primitive: if calledfor an entity, the control logic will receive a callback oncethe state has been pushed. After receiving the callback,the control logic may then inspect the contents of the NIBand verify that the state is as expected before proceeding.We note that if the control logic implements distributedcoordination, race-conditions in state updates will eithernot exist or will be transient in nature.

An application may also only rely on NIB notificationsto react to failures in modifications as they would anyother network state changes. Table 1 lists available NIB-manipulation methods.

3 Scaling and ReliabilityTo be a viable alternative to the traditional networkarchitecture, Onix must meet the scalability and reliabilityrequirements of today’s (and tomorrow’s) production net-works. Because the NIB is the focal point for the systemstate and events, its use largely dictates the scalability andreliability properties of the system. For example, as thenumber of elements in the network increases, a NIB thatis not distributed could exhaust system memory. Or, thenumber of network events (generated by the NIB) or workrequired to manage them could grow to saturate the CPUof a single Onix instance.7

This and the following section describe the NIBdistribution framework that enables Onix to scale to very

7In one of our upcoming deployments, if a single-instanceapplication took one second to analyze the statistics of a single Portand compute a result (e.g., for billing purposes), that application wouldtake two months to process all Ports in the NIB.

From [3]

Page 16: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Onix itself

• Onix: distributed system, consisting of multiple instances, running on (typically) multiple physical servers • Facilitates reading/writing network state towards control logic • Coordinates with peer instances, exchanges state • Does not provide any particular network behaviour as such (has to

be realized by control logic)

• Onix network information bas (NIB) • Collection of network entities • One network entity: set of key/value pairs, identified by flat, 128bit

key• Optionally with a type (class): adds methods to key/value pairs

• Read, write, synchronous updates, register for notification when value changes

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 16

Page 17: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Distributing the NIB

• Partitioning: State can be partitioned across replicated controllers • Controllers only connected to some subset of network elements

• Aggregation: Controller can group together network elements, expose them as a single element to peer / higher-layer controllers • E.g., make a campus network appear as a single switch

• Consistency: Replicated state needs to be consistent under changes • Control logic application can choose (= is responsible) • Option 1: strong consistency via a database with distributed

transactions (e.g., forwarding state) • Option 2: One-hop DHT (e.g., link utilization), eventually consistent

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 17

Page 18: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Case study: Kandoo – Scaling by hierarchy

• Source of scaling problems for controllers: Too frequent events in network-global control plane • We could process many events in the data plane, but that is

exactly what we do not want

• Kandoo proposal: Build two layers of controllers• Bottom controllers only have local view of their network entities

• Only run local control applications• Do not even talk to their peer bottom controllers • Reduce rate of events to be processed on global level

• Top controllers have global view

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 18

Page 19: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Credible suggestion? • Questions to ask:

1. Is there a significant amount of control problems that can be solved based on local view? • If not, what’s the point?

2. Is processing power for such local controllers available locally? • If not, we would have to ship all the “local” control traffic to a remote

place, and then again, what’s the point?

• Tentative answers: 1. Yes, for some problems: Link discovery, policy enforcement, detect

big flows 2. Yes, in some environments: data centre networks, possibly also

enterprise networks

• Main difference to ONIX: No attempt is made to keep state even only eventually consistent

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 19

Page 20: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Kandoo design

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 20

Switch

Local Controller

Switch Switch Switch Switch

Local Controller

LocalController

Root Controller

RareEvents

FrequentEvents

Figure 1: Kandoo’s Two Levels of Controllers. Localcontrollers handle frequent events, while a logicallycentralized root controller handles rare events.

control applications. As illustrated in Figure 1, severallocal controllers are deployed throughout the network; eachof these controllers controls one or a handful of switches.The root controller, on the other hand, controls all localcontrollers.

It is easy to realize local controllers since they aremerely switch proxies for the root controller, and theydo not need the network-wide state. They can even beimplemented directly in OpenFlow switches. Interestingly,local controllers can linearly scale with the number ofswitches in a network. Thus, the control plane scales as longas we process frequent events in local applications and shieldthe root controller from these frequent events. Needlessto say, Kandoo cannot help any control applications thatrequire network-wide state (even though it does not hurtthem, either). We believe such applications are intrinsicallyhard to scale, and solutions like Onix [8] and HyperFlow [18]provide the right frameworks for running such applications.

Our implementation of Kandoo is completely compliantwith the OpenFlow specifications. Data and controlplanes are decoupled in Kandoo. Switches can operatewithout having a local controller; control applicationsfunction regardless of their physical location. The mainadvantage of Kandoo is that it gives network operatorsthe freedom to configure the deployment model of controlplane functionalities based on the characteristics of controlapplications.

The design and implementation of Kandoo are presentedin Sections 2. Our experiments confirm that Kandoo scalesan order of magnitude better than a normal OpenFlownetwork and would lead to more than 90% of eventsbeing processed locally under reasonable assumptions, asdescribed in Section 3. Applications of Kandoo are notlimited to the evaluation scenarios presented in this paper.In Section 4, we briefly discuss other potential applicationsof Kandoo and compare it to existing solutions. We concludeour discussion in Section 5.

2. DESIGN AND IMPLEMENTATIONDesign objectives. Kandoo is designed with the followinggoals in mind. First, Kandoo must be compatible withOpenFlow: we do not introduce any new data planefunctionality in switches, and, as long as they supportOpenFlow, Kandoo supports them, as well. Second, Kandooautomatically distributes control applications without anymanual intervention. In other words, Kandoo control

Root Controller

Local ControllerLocal Controller

SwitchSwitch

Appdetect

End-host Switch

Appdetect

End-host

Flow-Entry

Eelephant

LegendLogical Control ChannelDatapath Connection

Local Controller

Appdetect

Appreroute

Flow-Entry

Figure 2: Toy example for Kandoo’s design: In

this example, two hosts are connected using a simple

line topology. Each switch is controlled by one local

Kandoo controller. The root controller controls the

local controllers. In this example, we have two control

applications: Appdetect is a local control application, but

Appreroute is non-local.

applications are not aware of how they are deployed inthe network, and application developers can assume theirapplications would be run on a centralized OpenFlowcontroller. The only extra information Kandoo needs is aflag showing whether a control application is local or not.In what follows, we explain Kandoo’s design using a

toy example. We show how Kandoo can be used toreroute elephant flows in a simple network of three switches(Figure 2). Our example has two applications: (i) Appdetect,and (ii) Appreroute. Appdetect constantly queries each switchto detect elephant flows. Once an elephant flow is detected,Appdetect notifies Appreroute, which in turn may install orupdate flow-entries on network switches.It is extremely challenging, if not impossible, to implement

this application in current OpenFlow networks withoutmodifying switches [5]. If switches are not modified, a(logically) centralized control needs to frequently query allswitches, which would place a considerable load on controlchannels.

Kandoo Controller. As shown in Figure 3, Kandoo hasa controller component at its core. This component hasthe same role as a general OpenFlow controller, but ithas Kandoo-specific extensions for identifying applicationrequirements, hiding the complexity of the underlyingdistributed application model, and propagating events in thenetwork.A network controlled by Kandoo has multiple local

controllers and a logically centralized root controller.1 Thesecontrollers collectively form Kandoo’s distributed controlplane. Each switch is controlled by only one Kandoocontroller, and each Kandoo controller can control multipleswitches. If the root controller needs to install flow-entrieson switches of a local controller, it delegates the requeststo the respective local controller. Note that for highavailability, the root controller can register itself as the slavecontroller for a specific switch (this behavior is supported inOpenFlow 1.2 [1]).

1We note that the root controller in Kandoo can itself belogically/physically distributed. In fact, it is straightforwardto implement Kandoo’s root controller using Onix [8] orHyperflow [18].

20

Switch

Local Controller

Switch Switch Switch Switch

Local Controller

LocalController

Root Controller

RareEvents

FrequentEvents

Figure 1: Kandoo’s Two Levels of Controllers. Localcontrollers handle frequent events, while a logicallycentralized root controller handles rare events.

control applications. As illustrated in Figure 1, severallocal controllers are deployed throughout the network; eachof these controllers controls one or a handful of switches.The root controller, on the other hand, controls all localcontrollers.

It is easy to realize local controllers since they aremerely switch proxies for the root controller, and theydo not need the network-wide state. They can even beimplemented directly in OpenFlow switches. Interestingly,local controllers can linearly scale with the number ofswitches in a network. Thus, the control plane scales as longas we process frequent events in local applications and shieldthe root controller from these frequent events. Needlessto say, Kandoo cannot help any control applications thatrequire network-wide state (even though it does not hurtthem, either). We believe such applications are intrinsicallyhard to scale, and solutions like Onix [8] and HyperFlow [18]provide the right frameworks for running such applications.

Our implementation of Kandoo is completely compliantwith the OpenFlow specifications. Data and controlplanes are decoupled in Kandoo. Switches can operatewithout having a local controller; control applicationsfunction regardless of their physical location. The mainadvantage of Kandoo is that it gives network operatorsthe freedom to configure the deployment model of controlplane functionalities based on the characteristics of controlapplications.

The design and implementation of Kandoo are presentedin Sections 2. Our experiments confirm that Kandoo scalesan order of magnitude better than a normal OpenFlownetwork and would lead to more than 90% of eventsbeing processed locally under reasonable assumptions, asdescribed in Section 3. Applications of Kandoo are notlimited to the evaluation scenarios presented in this paper.In Section 4, we briefly discuss other potential applicationsof Kandoo and compare it to existing solutions. We concludeour discussion in Section 5.

2. DESIGN AND IMPLEMENTATIONDesign objectives. Kandoo is designed with the followinggoals in mind. First, Kandoo must be compatible withOpenFlow: we do not introduce any new data planefunctionality in switches, and, as long as they supportOpenFlow, Kandoo supports them, as well. Second, Kandooautomatically distributes control applications without anymanual intervention. In other words, Kandoo control

Root Controller

Local ControllerLocal Controller

SwitchSwitch

Appdetect

End-host Switch

Appdetect

End-host

Flow-Entry

Eelephant

LegendLogical Control ChannelDatapath Connection

Local Controller

Appdetect

Appreroute

Flow-Entry

Figure 2: Toy example for Kandoo’s design: In

this example, two hosts are connected using a simple

line topology. Each switch is controlled by one local

Kandoo controller. The root controller controls the

local controllers. In this example, we have two control

applications: Appdetect is a local control application, but

Appreroute is non-local.

applications are not aware of how they are deployed inthe network, and application developers can assume theirapplications would be run on a centralized OpenFlowcontroller. The only extra information Kandoo needs is aflag showing whether a control application is local or not.In what follows, we explain Kandoo’s design using a

toy example. We show how Kandoo can be used toreroute elephant flows in a simple network of three switches(Figure 2). Our example has two applications: (i) Appdetect,and (ii) Appreroute. Appdetect constantly queries each switchto detect elephant flows. Once an elephant flow is detected,Appdetect notifies Appreroute, which in turn may install orupdate flow-entries on network switches.It is extremely challenging, if not impossible, to implement

this application in current OpenFlow networks withoutmodifying switches [5]. If switches are not modified, a(logically) centralized control needs to frequently query allswitches, which would place a considerable load on controlchannels.

Kandoo Controller. As shown in Figure 3, Kandoo hasa controller component at its core. This component hasthe same role as a general OpenFlow controller, but ithas Kandoo-specific extensions for identifying applicationrequirements, hiding the complexity of the underlyingdistributed application model, and propagating events in thenetwork.A network controlled by Kandoo has multiple local

controllers and a logically centralized root controller.1 Thesecontrollers collectively form Kandoo’s distributed controlplane. Each switch is controlled by only one Kandoocontroller, and each Kandoo controller can control multipleswitches. If the root controller needs to install flow-entrieson switches of a local controller, it delegates the requeststo the respective local controller. Note that for highavailability, the root controller can register itself as the slavecontroller for a specific switch (this behavior is supported inOpenFlow 1.2 [1]).

1We note that the root controller in Kandoo can itself belogically/physically distributed. In fact, it is straightforwardto implement Kandoo’s root controller using Onix [8] orHyperflow [18].

20

From [4]

Non-local ApplicationNon-local

Application

Local ApplicationLocal

Application

Data-path Element

Local Controller

Local Applications

RegisterApplication

ActionsEvents

Events

Root Controller

RelayEvents

Subscribe to Events

OpenFlowPackets

Data-path ElementSwitches

API

API Non-local

ApplicationsActionsEvents

Events

RegisterApplication

Kandoo Controllers

Figure 3: Kandoo’s high level architecture.

Deployment Model. The deployment model of Kandoocontrollers depends on the characteristics of a network.For software switches, local controllers can be directlydeployed on the same end-host. Similarly, if we can changethe software of a physical switch, we can deploy Kandoodirectly on the switch. Otherwise, we deploy Kandoolocal controllers on the processing resources closest to theswitches. In such a setting, one should provision thenumber of local controllers based on the workload andavailable processing resources. Note that we can use ahybrid model in real settings. For instance, consider avirtualized deployment environment depicted in Figure 4,where virtual machines are connected to the network usingsoftware switches. In this environment, we can place localcontrollers in end-hosts next to software switches and inseparate nodes for other switches.

In our toy example (Figure 2), we have four Kandoocontrollers: three local controllers controlling the switchesand a root controller. The local controllers can be physicallypositioned using any deployment model explained above.Note that, in this example, we have the maximum numberof local controllers required.

Control Applications. Control applications functionusing the abstraction provided by the controller and are notaware of Kandoo internals. They are generally OpenFlowapplications and can therefore send OpenFlow messagesand listen on events. Moreover, they can emit Kandooevents (i.e., internal events), which can be consumed byother applications, and they can reply to the applicationthat emitted an event. Control applications are loaded inlocal name spaces and can communicate using only Kandooevents. This is to ensure that Kandoo does not introducefaults by o�oading applications.

In our example, Eelephant

is a Kandoo event that carriesmatching information about the detected elephant flow (e.g.,its OpenFlow match structure) and is emitted by App

detect

.A local controller can run an application only if the

application is local. In our example, Appreroute

is not local,i.e., it may install flow-entires on any switch in the network.Thus, the root controller is the only controller able to runApp

reroute

. In contrast, Appdetect

is local; therefore, allcontrollers can run it.

Event Propagation. The root controller can subscribe

Processing Resource

End-Host

Physical SwitchVM

Root Controller

Local Controller

Local Controller

Software Switch

VM

LegendLogical Control ChannelData Path Links

Physical SwitchVM

Figure 4: Kandoo in a virtualized environment. Forsoftware switches, we can leverage the same end-host for local controllers, and, for physical switches,we use separate processing resources.

to specific events in the local controllers using a simplemessaging channel plus a filtering component. Once thelocal controller receives and locally processes an event, itrelays the event to the root controller for further processing.Note that all communications between Kandoo controllersare event-based and asynchronous.In our example, the root controller subscribes to events

of type Eelephant

in the local controllers since it is runningApp

reroute

listening on Eelephant

. Eelephant

is fired by anApp

detect

instance deployed on one of the local controllersand is relayed to root controller. Note that if theroot controller does not subscribe to E

elephant

, the localcontrollers will not relay E

elephant

events.It is important to note that the data flow in Kandoo is not

always bottom-up. A local application can explicitly requestdata from an application deployed on the root controller byemitting an event, and applications on the root controllerscan send data by replying to that event. For instance, we canhave a topology service running on the root controller thatsends topology information to local applications by replyingto events of a specific type.

Reactive vs. Proactive. Although Kandoo provides ascalable method for event handling, we strongly recommendpushing network state proactively. We envision Kandoo tobe used as a scalable, adaptive control plane, where thedefault configuration is pushed proactively and is adaptivelyrefined afterwards. In our toy example, default paths canbe pushed proactively, while elephant flows will be reroutedadaptively.

Implementation Details. We implemented Kandoo in amixture of C, C++, and Python. Our implementation hasa low memory footprint and supports dynamically loadableplug-ins, which can be implemented in C, Python, andJava. It also provides an RPC API for more generalintegration scenarios. Our implementation of Kandoois extremely modular; any component or back-end canbe easily replaced, which simplifies porting Kandoo tophysical switches. Currently, Kandoo supports OpenFlow1.0 (OpenFlow 1.1 and 1.2 support is under development).For the applications, we created a “central application

repository” and developed a simple package management

21

Page 21: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Kandoo performance

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 21

0 500

1000 1500 2000 2500 3000

10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Mes

sage

s Per

Sec

ond

Ratio of Elephant to Mouse Flows

Normal OpenFlow Controller Kandoo's Root Controller

(a) Average number of messages received by thecontrollers

0 50

100 150 200 250 300

10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Band

widt

h Co

nsum

ptio

ns

(kB/

s)

Ratio of Elephant to Mouse Flows

Normal OpenFlow Controller Kandoo's Root Controller

(b) Average number of bytes received by the con-trollers

Figure 6: Control Plane Load for the ElephantFlow Detection Scenario. The load is based on thenumber of elephant flows in the network.

4. RELATED WORKDatapath Extensions. The problem that we tackle inthis paper is a generalization of several previous attemptsat scaling SDNs. A class of solutions, such as DIFANE [21]and DevoFlow [5], address this problem by extending dataplane mechanisms of switches with the objective of reducingthe load towards the controller. DIFANE tries to partlyo⇥oad forwarding decisions from the controller to specialswitches, called authority switches. Using this approach,network operators can reduce the load on the controller andthe latencies of rule installation. DevoFlow, on the otherhand, introduces new mechanisms in switches to dispatch farfewer “important” events to the control plane. Kandoo hasthe same goal, but, in contrast to DIFANE and DevoFlow,it does not extend switches; instead, it moves control planefunctions closer to switches. Kandoo’s approach is moregeneral and works well in data centers, but it might havea lower throughput than specific extensions implemented inhardware.

Interestingly, we can use Kandoo to prototype and testDIFANE, DevoFlow, or other potential hardware extensions.For instance, an authority switch in DIFANE can beemulated by a local Kandoo controller that manages a subsetof switches in the network. As another example, DevoFlow’sextensions can also be emulated using Kandoo controllersdirectly installed on switches. These controllers not onlyreplace the functionality of DIFANE or DevoFlow, but they

0 1000 2000 3000 4000 5000 6000

2 3 4 5 6 7

Mes

sage

s Per

Sec

ond

Fanout

Normal Openflow Root Controller

(a) Average number of packets received by the controllers

0 100 200 300 400 500 600

2 3 4 5 6 7

Band

widt

h Co

nsum

ptio

n

(kB/

s)

Fanout

Normal Openflow Root Controller

(b) Average number of bytes received by the controllers

Figure 7: Control Plane Load for the ElephantFlow Detection Scenario. The load is based on thenumber of nodes in the network.

also provide a platform to run any local control applicationin their context.

Distributed Controllers. HyperFlow [18], Onix [8],SiBF [10], and Devolved Controllers [17] try to distributethe control plane while maintaining logically centralized,eventually consistent network state. Although these ap-proaches have their own merits, they impose limitations onapplications they can run. This is because they assumethat all applications require the network-wide state; hence,they cannot be of much help when it comes to local controlapplications. That said, the distributed controllers can beused to realize a scalable root controller, the controller thatruns non-local applications in Kandoo.

Middleboxes. Middlebox architectures, such as Flow-Stream [7], SideCar [15] and CoMb [13], provide scalableprogrammability in data plane by intercepting flows usingprocessing nodes in which network applications are de-ployed. Kandoo is orthogonal to these approaches in thesense that it operates in the control plane, but it providesa similar distribution for control applications. In a networkequipped with FlowStream, SideCar or CoMb, Kandoo canshare the processing resources with middleboxes (given thatcontrol and data plane applications are isolated) in orderto increase resource utilization and decrease the number ofnodes used by Kandoo.

Active Networks. Active Networks (AN) and SDNs repre-sent di�erent schools of thought on programmable networks.SDNs provide programmable control planes, whereas ANsallow programmability in networking elements at packet

23

From [4]

Page 22: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Case study: DevoFlow [2]

• Claims: OpenFlow suffers from overhead and restrictions in• … the amount of work the controller has to do (“distributed system

costs”) • .. the amount of effort necessary for the switch to invoke its control

plane (“switch-implementation cost”) • For their environment, data-plane to control-plane is 4x slower than

forwarding

• Relevant both for flow setup and for gathering statistics

• This overhead is caused by the desire to have all flows globally visible (at the controller) and globally controlled • Maybe that was an exaggerated goal?? • Maybe only bother the controller with “important” flows, let the

switches deal with the ordinary ones?

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 22

Page 23: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

DevoFlow design goals

• Keep flows in the data plane as much as possible • Maintain enough visibility over network flows for effective

centralized flow management • Simplify design and implementation of fast switches

• Goals motivated by measurements on prototype OpenFlowhardware • Question: How representative are these measurements? • E.g.: HP ProCurve 5406zl switch: 300 Gbps data plane in linecard,

but only 80 Mbps between linecard and switch management CPU

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 23

Page 24: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

DevoFlow: Mechanisms for devolving control

• Rule cloning: The “action” of a wildcard rule is extended by a CLONE flag • If set: upon packet arrival, copy the wildcard rule into a specific

rule, replacing the wildcard fields by values from actual packet• If unset: follow usual OF rules • Purpose:

• Packets from such a flow will then update counters for the specific rule, not the wildcard rule

• Local actions: Allow more powerful actions • E.g., switch between alternative ports (realizing multi-path

protocols) • Intermediate between “trivial forwarding” and “talk to controller”

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 24

Page 25: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

DevoFlow: Extended statistics collection

• Sampling: randomly pick 1/n packet of flow, send (header) to controller

• Triggers: Once a counter crosses a threshold, send a report packet to a controller

• Approximate counters: Maintain counters for k largest flows more in detail

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 25

Page 26: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

DevoFlow example usage: Load balance big flows

• Claims:

• Load-balancing on all flows is infeasible (controller overhead,…)

• Only the big flows (“elephants”) matter; small flows can take

random route

• Define: a flow is an elephant flow if it has transferred at least X bytes

(e.g., 1-10 Mbytes)

• Approach

• Treat all flows with random multi-path wildcards

• Once counter crosses threshold, report to controller

• Controller computes routes for this newly-so-classified elephant

flow and distributes according forwarding table entries

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 26

Page 27: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Controller placement

• Given a large SDN: • How many controllers do we need? • Where should they be placed?

• Important for convergence time, flow establishment, … • E.g.: Internet2 about to run SDN in a 34 node production network

• Metrics: Minimize average-case or worst-case latency

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 27

Page 28: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Controller placement – formalization

• All variations of facility location, all are NP-hard • (see http://www.nada.kth.se/~viggo/wwwcompendium)• Minimum k-median: Not in APX • Minimum k-center:

• Under triangle inequality, approximable within 2, but not 2-\epsilon • Else: Not in APX ; capacitated version approximable within 5 (!)

• Maximum cover: greedy algorithm has best possible 1-1/e approximation

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 28

2. CONTROL PLANE DESIGNBoth decoupled and more traditional distributed archi-

tectures have a control plane: a network for propagatingevents, such as routing updates, new tra⇥c engineering poli-cies, or topology changes, to each packet-forwarding devicein the network. The key di�erence between these designsis the structure of their control-plane network. The controltopology of traditional distributed architectures like BGP,OSPF, and IS-IS is peer-to-peer: each forwarding devicehears events from its peers and makes autonomous decisionsbased on a local (and likely inconsistent) view of global state.

In contrast, the control networks in decoupled architec-tures are closer to client-server networks. Packet-forwardingdevices (the “clients”) have limited decision-making capabil-ities and must implement control decisions made by con-trollers (the “servers”). A common example is a BGP routereflector that presents each edge router with a subset of ad-vertised prefixes, rather than propagating the full mesh oflearned routes. A more recent example is the relationshipbetween an OpenFlow switch and controller; the switch hasno local control-plane logic and relies entirely on the con-troller to populate its forwarding table.1 Control networksfor SDNs may take any form, including a star (a single con-troller), a hierarchy (a set of controllers connected in a fullmesh, which connect to forwarding nodes below), or evena dynamic ring (a set of controllers in a distributed hashtable [16]).

Regardless of the exact form, the layout of controllers willa�ect the network’s ability to respond to network events.Understanding where to place controllers2 and how manyto use is a prerequisite to answering performance and faulttolerance questions for SDNs, and hence also a prerequisitefor quantitatively comparing them to traditional architec-tures. We call this design choice the controller placementproblem. In this paper, we consider only wide-area networkswhere the “best” controller placement minimizes propaga-tion delays; in a data center or in the enterprise, one mightinstead maximize fault tolerance or actively balance con-trollers among administrative domains.

For WANs, the best placement depends on propagationlatency, a quantity fixed by physics and physical topology.Propagation latency bounds the control reactions with a re-mote controller that can be executed at reasonable speedand stability. With enough delay, real-time tasks (like link-layer fault recovery) become infeasible, while others mayslow down unacceptably (BGP convergence). Note that re-gardless of the state consistency mechanisms in the controlplane implementation, these lower bounds apply.

In this paper, we compare placements using node-to-controller latency, for the fundamental limits imposed onreaction delay, fault discovery, and event propagation e⇥-ciency. Other metrics matter, such as availability and fair-ness of state, processing, and bandwidth — but our focus isthe WAN, where latency dominates. One can always reducethe e�ective delay by adding autonomous intelligence intoa switch or pushing failover plans, but these may add com-plexity and make network evolution harder. One goal of thispaper is to understand if, and for which networks, extensionsto the “dumb, simple switch” model are warranted.

1 We ignore “bootstrap state” for the control connection.2 We use“controllers” to refer to geographically distinct con-troller locations, as opposed to individual servers.

3. MOTIVATING EXAMPLESHaving defined the core problem, we show three types of

SDN users and motivate their use for controller placementdesign guidelines.

Network Operators. Rob Vietzke is the VP of Net-work Services at Internet2, and his team has committed toa SDN deployment of 34 nodes and about 41 edges, shownin Figure 1. This network, the Open Science, Scholarshipand Services Exchange (OS3E) [4], needs to peer with theoutside world through BGP. Placement matters to Rob be-cause his network should minimize downtime and multiplecontrollers are a requirement for high availability.

Controller Application Writers. Nikhil Handigol is agrad student who created Aster*x [13], a distributed loadbalancer that reactively dispatches requests to servers aswell as managing the network path taken by those requests.Nikhil would like to demonstrate the advantages of his algo-rithm on a service with real users, and ideally on a range oftopologies, like GENI. Placement matters to Nikhil becausehe can’t get users if his service goes down or does not per-form, but at the same time he would prefer to keep thingssimple with one controller. Ideally, we could provide Nikhilwith guidelines to evaluate the response-time potential ofdi�erent approaches, from centralized to distributed, beforehe implements extra code or does a deployment.

Network Management Software Writers. Rob Sher-wood built FlowVisor [19], a centralized network slicing toolthat enables network access and control to be split amongmultiple controllers or versions of controllers, given a controlpolicy. Since FlowVisor’s only consistent state is its configu-ration, multiple instances might be used to scale FlowVisor.Placement matters to Rob because FlowVisor sits betweencontrollers and switches, where its presence adds a delay topotentially every network command; this delay should beactively minimized, especially with multiple instances.

In each case, the SDN user must ask the question: “Howmany controllers should I use, and where should they go?”and benefits from practical methods for analyzing tradeo�s.

4. PLACEMENT METRICSWe now introduce and compare definitions of whole-

network latency, along with their corresponding optimiza-tion problems. Each is called a facility location problemand appears in many contexts, such as minimizing firefight-ing response times, locating warehouses near factories, andoptimizing the locations of content distribution nodes andproxy servers. All are NP-hard problems with an input fork, the number of controllers to place, and all have weightedvariations where nodes have varying importance.

Average-case Latency. For a network graph G(V,E)where edge weights represent propagation latencies, whered(v, s) is the shortest path from node v � V to s � V ,and the number of nodes n = |V |, the average propagationlatency for a placement of controllers S0 is:

Lavg(S0) =

1n

v2V

min(s2S0)

d(v, s) (1)

In the corresponding optimization problem, minimum k-median [6], the goal is to find the placement S0 from the setof all possible controller placements S, such that |S0| = kand Lavg(S

0) is minimum. For an overview of the approachesto solving this problem, along with extensions, see [20].

8

location in average-latency-optimized placement!

k = 1 k = 5

location in worst-case-latency-optimized placement!

Figure 1: Optimal placements for 1 and 5 controllersin the Internet2 OS3E deployment.

Worst-case latency. An alternative metric is worst-caselatency, defined as the maximum node-to-controller propa-gation delay:

Lwc(S0) = max

(v2V )min

(s2S0)d(v, s) (2)

where again we seek the minimum S0 � S. The relatedoptimization problem is minimum k-center [21].

Nodes within a latency bound. Rather than mini-mizing the average or worst case, we might place controllersto maximize the number of nodes within a latency bound;the general version of this problem on arbitrary overlap-ping sets is called maximum cover [14]. An instance ofthis problem includes a number k and a collection of setsS = S1, S2, ..., Sm, where Si � v1, v2, ..., vn. The objectiveis to find a subset S0 � S of sets, such that |

�Si2S0 Si| is

maximized and |S0| = k. Each set Si comprises all nodeswithin a latency bound from a single node.

In the following sections, we compute only average andworst-case latency, because these metrics consider the dis-tance to every node, unlike nodes within a latency bound.Each optimal placement shown in this paper comes fromdirectly measuring the metrics on all possible combinationsof controllers. This method ensures accurate results, but atthe cost of weeks of CPU time; the complexity is exponentialfor k, since brute force must enumerate every combinationof controllers. To scale the analysis to larger networks orhigher k, the facility location problem literature providesoptions that trade o� solution time and quality, from simplegreedy strategies (pick the next vertex that best minimizeslatency, or pick the vertex farthest away from the current se-lections) to ones that transform an instance of k-center intoother NP-complete problems like independent set, or evenones that use branch-and-bound solvers with Integer LinearProgramming. We leave their application to future work.

5. ANALYSIS OF INTERNET2 OS3EHaving defined our metrics, we now ask a series of ques-

tions to understand the benefits of multiple controllers forthe Internet2 OS3E topology [4]. To provide some intuitionfor placement considerations, Figure 1 shows optimal place-ments for k = 1 and k = 5; the higher density of nodes in thenortheast relative to the west leads to a di�erent optimal setof locations for each metric. For example, to minimize av-erage latency for k = 1, the controller should go in Chicago,which balances the high density of east coast cities with thelower density of cities in the west. To minimize worst-caselatency for k = 1, the controller should go in Kansas Cityinstead, which is closest to the geographic center of the US.

k = 5!4! 3! 2! 1! k = 5!4! 3!2! 1!

Figure 2: Latency CDFs for all possible controllercombinations for k = [1, 5]: average latency (left),worst-case latency (right).

Figure 3: Ratio of random choice to optimal.

5.1 How does placement affect latency?In this topology, placement quality varies widely. A few

placements are pathologically bad, most are mediocre, andonly a small percent approach optimal. Figure 2 shows thisdata as cumulative distributions, covering all possible place-ments for k = 1 to k = 5, with optimal placements at thebottom. All graphs in this paper show one-way network dis-tances, with average-optimized values on the left and worst-case-optimized values on the right. If we simply choose aplacement at random for a small value of k, the averagelatency is between 1.4x and 1.7x larger than that of the op-timal placement, as seen in Figure 3. This ratio is largerfor worst-case latencies; it starts at 1.4x and increases up to2.5x at k = 12. Spending the cycles to optimize a placementis worthwhile.

5.2 How many controllers should we use?It depends. Reducing the average latency to half that at

k = 1 requires three controllers, while the same reductionfor worst-case latency requires four controllers. Assumingwe optimize for one metric, potentially at the expense of theother, where is the point of diminishing returns? Figure 4shows the benefit-to-cost ratios for a range of controllers, de-fined as (lat1/latk)/k. A ratio of 1.0 implies a proportionalreduction; that is, for k controllers, the latency is 1/k of

Figure 4: Cost-benefit ratios: a value of 1.0 indicatesproportional reduction, where k controllers reducelatency to 1

k of the original one-controller latency.Higher is better.

9

From [5]

Page 29: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

Controller placement in Internet2 topology

• Lessons: • Random choice is bad • Sometimes, even one

controller is enough • More data from Internet

Topology Zoo

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 29

location in average-latency-optimized placement!

k = 1 k = 5

location in worst-case-latency-optimized placement!

Figure 1: Optimal placements for 1 and 5 controllersin the Internet2 OS3E deployment.

Worst-case latency. An alternative metric is worst-caselatency, defined as the maximum node-to-controller propa-gation delay:

Lwc(S0) = max

(v2V )min

(s2S0)d(v, s) (2)

where again we seek the minimum S0 � S. The relatedoptimization problem is minimum k-center [21].

Nodes within a latency bound. Rather than mini-mizing the average or worst case, we might place controllersto maximize the number of nodes within a latency bound;the general version of this problem on arbitrary overlap-ping sets is called maximum cover [14]. An instance ofthis problem includes a number k and a collection of setsS = S1, S2, ..., Sm, where Si � v1, v2, ..., vn. The objectiveis to find a subset S0 � S of sets, such that |

�Si2S0 Si| is

maximized and |S0| = k. Each set Si comprises all nodeswithin a latency bound from a single node.

In the following sections, we compute only average andworst-case latency, because these metrics consider the dis-tance to every node, unlike nodes within a latency bound.Each optimal placement shown in this paper comes fromdirectly measuring the metrics on all possible combinationsof controllers. This method ensures accurate results, but atthe cost of weeks of CPU time; the complexity is exponentialfor k, since brute force must enumerate every combinationof controllers. To scale the analysis to larger networks orhigher k, the facility location problem literature providesoptions that trade o� solution time and quality, from simplegreedy strategies (pick the next vertex that best minimizeslatency, or pick the vertex farthest away from the current se-lections) to ones that transform an instance of k-center intoother NP-complete problems like independent set, or evenones that use branch-and-bound solvers with Integer LinearProgramming. We leave their application to future work.

5. ANALYSIS OF INTERNET2 OS3EHaving defined our metrics, we now ask a series of ques-

tions to understand the benefits of multiple controllers forthe Internet2 OS3E topology [4]. To provide some intuitionfor placement considerations, Figure 1 shows optimal place-ments for k = 1 and k = 5; the higher density of nodes in thenortheast relative to the west leads to a di�erent optimal setof locations for each metric. For example, to minimize av-erage latency for k = 1, the controller should go in Chicago,which balances the high density of east coast cities with thelower density of cities in the west. To minimize worst-caselatency for k = 1, the controller should go in Kansas Cityinstead, which is closest to the geographic center of the US.

k = 5!4! 3! 2! 1! k = 5!4! 3!2! 1!

Figure 2: Latency CDFs for all possible controllercombinations for k = [1, 5]: average latency (left),worst-case latency (right).

Figure 3: Ratio of random choice to optimal.

5.1 How does placement affect latency?In this topology, placement quality varies widely. A few

placements are pathologically bad, most are mediocre, andonly a small percent approach optimal. Figure 2 shows thisdata as cumulative distributions, covering all possible place-ments for k = 1 to k = 5, with optimal placements at thebottom. All graphs in this paper show one-way network dis-tances, with average-optimized values on the left and worst-case-optimized values on the right. If we simply choose aplacement at random for a small value of k, the averagelatency is between 1.4x and 1.7x larger than that of the op-timal placement, as seen in Figure 3. This ratio is largerfor worst-case latencies; it starts at 1.4x and increases up to2.5x at k = 12. Spending the cycles to optimize a placementis worthwhile.

5.2 How many controllers should we use?It depends. Reducing the average latency to half that at

k = 1 requires three controllers, while the same reductionfor worst-case latency requires four controllers. Assumingwe optimize for one metric, potentially at the expense of theother, where is the point of diminishing returns? Figure 4shows the benefit-to-cost ratios for a range of controllers, de-fined as (lat1/latk)/k. A ratio of 1.0 implies a proportionalreduction; that is, for k controllers, the latency is 1/k of

Figure 4: Cost-benefit ratios: a value of 1.0 indicatesproportional reduction, where k controllers reducelatency to 1

k of the original one-controller latency.Higher is better.

9

From [5]

location in average-latency-optimized placement!

k = 1 k = 5

location in worst-case-latency-optimized placement!

Figure 1: Optimal placements for 1 and 5 controllersin the Internet2 OS3E deployment.

Worst-case latency. An alternative metric is worst-caselatency, defined as the maximum node-to-controller propa-gation delay:

Lwc(S0) = max

(v2V )min

(s2S0)d(v, s) (2)

where again we seek the minimum S0 � S. The relatedoptimization problem is minimum k-center [21].

Nodes within a latency bound. Rather than mini-mizing the average or worst case, we might place controllersto maximize the number of nodes within a latency bound;the general version of this problem on arbitrary overlap-ping sets is called maximum cover [14]. An instance ofthis problem includes a number k and a collection of setsS = S1, S2, ..., Sm, where Si � v1, v2, ..., vn. The objectiveis to find a subset S0 � S of sets, such that |

�Si2S0 Si| is

maximized and |S0| = k. Each set Si comprises all nodeswithin a latency bound from a single node.

In the following sections, we compute only average andworst-case latency, because these metrics consider the dis-tance to every node, unlike nodes within a latency bound.Each optimal placement shown in this paper comes fromdirectly measuring the metrics on all possible combinationsof controllers. This method ensures accurate results, but atthe cost of weeks of CPU time; the complexity is exponentialfor k, since brute force must enumerate every combinationof controllers. To scale the analysis to larger networks orhigher k, the facility location problem literature providesoptions that trade o� solution time and quality, from simplegreedy strategies (pick the next vertex that best minimizeslatency, or pick the vertex farthest away from the current se-lections) to ones that transform an instance of k-center intoother NP-complete problems like independent set, or evenones that use branch-and-bound solvers with Integer LinearProgramming. We leave their application to future work.

5. ANALYSIS OF INTERNET2 OS3EHaving defined our metrics, we now ask a series of ques-

tions to understand the benefits of multiple controllers forthe Internet2 OS3E topology [4]. To provide some intuitionfor placement considerations, Figure 1 shows optimal place-ments for k = 1 and k = 5; the higher density of nodes in thenortheast relative to the west leads to a di�erent optimal setof locations for each metric. For example, to minimize av-erage latency for k = 1, the controller should go in Chicago,which balances the high density of east coast cities with thelower density of cities in the west. To minimize worst-caselatency for k = 1, the controller should go in Kansas Cityinstead, which is closest to the geographic center of the US.

k = 5!4! 3! 2! 1! k = 5!4! 3!2! 1!

Figure 2: Latency CDFs for all possible controllercombinations for k = [1, 5]: average latency (left),worst-case latency (right).

Figure 3: Ratio of random choice to optimal.

5.1 How does placement affect latency?In this topology, placement quality varies widely. A few

placements are pathologically bad, most are mediocre, andonly a small percent approach optimal. Figure 2 shows thisdata as cumulative distributions, covering all possible place-ments for k = 1 to k = 5, with optimal placements at thebottom. All graphs in this paper show one-way network dis-tances, with average-optimized values on the left and worst-case-optimized values on the right. If we simply choose aplacement at random for a small value of k, the averagelatency is between 1.4x and 1.7x larger than that of the op-timal placement, as seen in Figure 3. This ratio is largerfor worst-case latencies; it starts at 1.4x and increases up to2.5x at k = 12. Spending the cycles to optimize a placementis worthwhile.

5.2 How many controllers should we use?It depends. Reducing the average latency to half that at

k = 1 requires three controllers, while the same reductionfor worst-case latency requires four controllers. Assumingwe optimize for one metric, potentially at the expense of theother, where is the point of diminishing returns? Figure 4shows the benefit-to-cost ratios for a range of controllers, de-fined as (lat1/latk)/k. A ratio of 1.0 implies a proportionalreduction; that is, for k controllers, the latency is 1/k of

Figure 4: Cost-benefit ratios: a value of 1.0 indicatesproportional reduction, where k controllers reducelatency to 1

k of the original one-controller latency.Higher is better.

9

Page 30: Future Internet Chapter 4: Software DefinedNetworking 4c ...groups.uni-paderborn.de/fg-karl/lehre/ss19/futureInternet/slides/... · mal software implementations. We expect these issues

References

1. S. Yeganeh, A. Tootoonchian, and Y. Ganjali, “On scalability of software-defined networking,” IEEE Communications Magazine, no. February, pp. 136–141, 2013.

2. A. R. Curtis, J. C. Mogul, J. Tourrilhes, P. Yalagandula, P. Sharma, and S. Banerjee, “DevoFlow : Scaling Flow Management for High-Performance Networks,” in ACM SIGCOMM, 2011, pp. 254–265.

3. T. Koponen, M. Casado, N. Gude, J. Stribling, L. Poutievski, M. Zhu, R. Ramanathan, Y. Iwata, H. Inoue, T. Hama, and S. Shenker, “Onix: a distributed control platform for large-scale production networks,” in Proceedings of the 9th USENIX conference on Operating systems design and implementation, 2010.

4. S. H. Yeganeh and Y. Ganjali, “Kandoo: a framework for efficient and scalable offloading of control applications,” in Proceedings of the first workshop on HotSDN, 2012, pp. 19–24.

5. B. Heller, R. Sherwood, and N. McKeown, “The controller placement problem,” in ACM SIGCOMM Computer Communication Review, 2012, vol. 42, no. 4, p. 473.

SS 19, v1.1 FI - Ch 4c: SDN scalability & resilience 30