Plankton: Scalable network conﬁguration veriﬁcation ...3 Plankton Design We now present...

Plankton: Scalable network configuration verification through model checking

Santhosh Prabhu, Kuan-Yen Chou, Ali Kheradmand, P. Brighten Godfrey, Matthew CaesarUniversity of Illinois at Urbana-Champaign

AbstractNetwork configuration verification enables operators to en-sure that the network will behave as intended, prior to de-ployment of their configurations. Although techniques rang-ing from graph algorithms to SMT solvers have been pro-posed, scalable configuration verification with sufficient pro-tocol support continues to be a challenge. In this paper,we show that by combining equivalence partitioning withexplicit-state model checking, network configuration verifi-cation can be scaled significantly better than the state of theart, while still supporting a rich set of protocol features. Wepropose Plankton, which uses symbolic partitioning to man-age large header spaces and efficient model checking to ex-haustively explore protocol behavior. Thanks to a highly ef-fective suite of optimizations including state hashing, partialorder reduction, and policy-based pruning, Plankton success-fully verifies policies in industrial-scale networks quicklyand compactly, at times reaching a 10000× speedup com-pared to the state of the art.

1 Introduction

Ensuring correctness of networks is a difficult and criticaltask. A growing number of network verification tools are tar-geted towards automating this process as much as possible,thereby reducing the burden on the network operator. Verifi-cation platforms have improved steadily in the recent years,both in terms of scope and scale. Starting from offline dataplane verification tools like Anteater [19] and HSA [13], thestate of the art has evolved to support real-time data planeverification [15, 12], and more recently, analysis of configu-rations [6, 5, 7, 1, 25].

Configuration analysis tools such as Batfish [6], ERA [5],ARC [7] and Minesweeper [1] are designed to take as input agiven network configuration, a correctness specification, andpossibly an “environment” specification, such as the maxi-mum expected number of failures, external route advertise-ments, etc. Their task is to determine whether, under thegiven environment specification, the network configuration

will always meet the correctness specification. As with mostformal verification domains, the biggest challenge in con-figuration analysis is scalability. Being able to analyze thebehavior of multiple protocols executing together is a non-trivial task. Past verifiers have used a variety of techniquesto try to surmount this scalability challenge. While many ofthem sacrifice their correctness or expressiveness in the pro-cess, Minesweeper maintains both by modeling the networkusing SMT constraints and performing the verification us-ing an SMT solver. However, we observe that this approachscales poorly with increasing problem size (4+ hours tocheck a 245-device network for loops, in our experiments).So, this paper addresses the following question: Can a con-figuration verification tool have broad protocol support, andalso scale well?

We begin our work by observing that scalability chal-lenges in configuration verification stem from two factors— the large space of possible packet headers, and the pos-sibly diverse outcomes of control plane execution, particu-larly in the presence of failures. We believe that general pur-pose SAT/SMT techniques are not as well equipped to tacklethese challenges as domain-specific techniques specificallydesigned to address them. In fact, these challenges have beenstudied extensively, in the domains of data plane verificationand software verification. Data plane verification tools an-alyze the large header space to determine all possible dataplane behaviors and check their correctness. Software veri-fication techniques explore the execution paths of software,including distributed software, and identify undesirable ex-ecutions that often elude testing. Motivated by the successof the analysis algorithms in these domains, we attemptedto combine the two into a scalable configuration verificationplatform. The result — Plankton — is a configuration ver-ifier that uses equivalence partitioning to manage the largeheader space, and explicit-state model checking to exploreprotocol execution. Thanks to these efficient analysis tech-niques, and an extensive suite of domain-specific optimiza-tions, Plankton delivers consistently high verification perfor-mance.

Feature Batfish BagPipe ARC ERA Minesweeper PlanktonAll data planes, including failures # # G# # Support beyond specific protocols # # Soundness when assumptions hold G#∗ ∗ For segmentation policies only

Figure 1: Comparison of configuration verification systems

Our key contributions are as follows:

• We define a configuration verification paradigm that com-bines packet equivalence computation and explicit-statemodel checking.

• We develop optimizations that make the design feasiblefor networks of practical scale, including optimizations toreduce the space of event exploration, and techniques toimprove efficiency of exploration.

• We implement a Plankton prototype with support forOSPF, BGP and static routing, and show experimentallythat it can verify policies at practical scale (less than asecond for networks with over 300 devices). Plankton out-performs the state of the art in all our tests, in some casesby as much as 4 orders of magnitude.

2 Motivation and Design PrinciplesConfiguration verifiers take as input the configuration filesfor network devices, and combine them with an abstractionof the lower layers — the distributed control plane, which ex-ecutes to produce data plane state, and the forwarding hard-ware that will use that state to process traffic. They may ad-ditionally take an environment specification, which describesinteractions of entities external to the network, such as pos-sible link failures, route advertisements, etc. Their task is toverify that under executions enabled by the supplied config-uration and environment, correctness requirements are neverviolated. Configuration verifiers thus help ensure correctnessof proposed configurations prior to deployment.

Figure 1 illustrates the current state of the art in configura-tion verification. As the figure shows, only Minesweeper [1]can reason about multiple possible converged data planestates of the network (e.g., due to topology changes or con-trol plane non-determinism), while also having the ability tosupport more than just a specific protocol, and maintainingsoundness of analysis. All other tools compromise on oneor more key features. ARC [7], for example, uses graph al-gorithms to compute the multiple converged states enabledby failures, but only for shortest-path routing. As a result itcannot handle common network configurations such as BGPconfigurations that use LocalPref, any form of recursive rout-ing, etc. The reason for the mismatch in Minesweeper’sfunctionality in contrast to others is that it makes a dif-ferent compromise. By using an SMT-based formulation,Minesweeper is able to achieve good data plane coverageand feature coverage, but pays the price in performance. Asexperiments show [2], Minesweeper scales poorly with net-work size, and is unable to handle networks larger than afew hundred devices in reasonable time. Our motivation for

Plankton is simple — can we design a configuration verifi-cation tool without compromising scale or functionality?

Achieving our goal requires tackling two challenges:packet diversity and data plane diversity. Packet diversity,which refers to the large space of packets that needs to bechecked, is easier to handle. We leverage the notion of PacketEquivalence Classes (PECs), which are sets of packets thatbehave identically. Using a trie-based technique similar toVeriFlow [15], we compute PECs as a partitioning of thepacket header space such that the behavior of all packets ina PEC remains the same throughout Plankton’s exploration.A more interesting aspect of PECs is how to handle depen-dencies across PECs without compromising performance. InPlankton, this is done by a dependency-aware scheduler de-signed to maximize independent analysis (§ 3.2).

Data plane diversity refers to the complexity of check-ing every possible converged data plane that an input net-work configuration may produce. It is the task of the controlplane model to ensure coverage of these possible outcomes.Simulation-based tools (the best example being Batfish [6])execute the system only along a single non-deterministicpath, and can hence miss violations in networks that havemultiple stable convergences, such as certain BGP configu-rations. ARC’s graph-based approach accounts for possiblefailures, but can support only shortest-path routing. In orderto overcome these shortcomings, Minesweeper, the currentstate of the art in terms of functionality, uses an SMT solverto search through possible failures and non-deterministicprotocol convergence, to find any converged state that rep-resents a violation of network correctness.

A key intuition behind our approach is that the genericsearch technique employed by SMT solvers makes the coreof the configuration verification problem much more diffi-cult than it has to be. Network control planes operate usingsimple algorithms which can not only be easily modeled insoftware, but can also find a protocol’s outcome much morequickly than general-purpose SMT solving. In fact, the com-mon case is that the control plane computes some variant ofshortest or least-cost paths. To illustrate this point, we im-plemented simple single-source shortest path solvers in SMT(Z3) and a model checker (SPIN). The SMT formulation isimplemented as constraints on the solution, while the modelchecker explores execution paths of the Bellman-Ford algo-rithm; and in this simplistic case the software has determinis-tic execution. The result is that the model checker approachis similar to a normal execution of software, and is around12,000× faster even in a moderate-sized fat tree network ofN = 180 nodes (Figure 2).

Of course, this is intentionally a simplistic, fully-

0.0001 0.001 0.01 0.1 1 10 100 1000 10000

N=20N=45N=80N=180

Timetaken(Seconds)ModelChecker SMT

Figure 2: Comparison of two ways to compute shortest paths.

deterministic case with simple implementations. The point isthat the model checking approach begins with a huge advan-tage — so huge that it could explore many non-deterministicexecution paths and still outperform SMT. This leads to ournext key intuition: the effect of non-determinism is impor-tant, but the amount of “relevant” non-determinism is lim-ited. Networks can experience “relevant” non-determinismlike the choice of what failures occur, and protocol execu-tion; as well as “irrelevant” non-determinism like messagetiming that ultimately doesn’t affect the outcome. Configura-tions and protocols are usually designed to keep the outcomemostly deterministic, with non-deterministic branch pointsultimately leading to one or a small number of different con-verged states.

Motivated by this intuition, we create a control planemodel that incorporates the possible non-deterministic be-haviors, but we also implement optimizations so that whenthe outcome of the executions is actually deterministic, the“irrelevant” non-determinism is pruned enough that perfor-mance is comparable to simulation. This model is ex-haustively explored by SPIN, a model checker designed forPromela programs. SPIN performs a depth-first search onthe state space of the program, looking for states that violatethe policy being checked. We further assist SPIN through op-timizations that minimize the size of individual states, thusmaking the traversal process more efficient. Thanks to thesetwo types of optimizations, Plankton achieves our goal ofscalable, general-purpose configuration verification.

3 Plankton DesignWe now present Plankton’s design, illustrated in Figure 3.

3.1 Packet Equivalence ClassesThe first phase in Plankton’s analysis is the computation ofPacket Equivalence Classes. As we discussed in § 2, Plank-ton uses a trie-based technique inspired by VeriFlow. Thetrie in Plankton holds prefixes obtained from the configu-ration, including any prefixes that are advertised (explicitlyor automatically), any prefixes appearing in route maps, anystatic routes, etc. Each prefix in the trie is associated witha config object, that describes any configuration informationthat is specific to that prefix. For example, consider Fig-ure 4, which illustrates a highly simplified example wherethe prefixes 128.0.0.0/1 and 192.0.0.0/2 are beingadvertised over OSPF in a topology with 3 devices. The trieholds three config objects — the default, and one for eachadvertised prefix.

Once the trie is populated, Plankton performs a recur-sive traversal, simultaneously keeping track of where theprefix boundaries define division of the header space. Foreach known partition, it stores the most up-to-date network-wide config known. When the end of any new prefixis reached, the config object that is associated with it ismerged with the network-wide config for the partition that

FIBBGP OSPF

Agents

Model Checker

Network Model

PolicyDependencyaware Scheduler

Config InitialTopology

Check Pass?Violating EventSequence No Policy HoldsYes

Equivalence Class Computation

Figure 3: Plankton design

denotes the prefix. In our simple example, the traversal pro-duces three classes defined by ranges — [192.0.0.0,255.255.255.255] with two nodes originating prefixes,[128.0.0.0, 191.255.255.255] with only one ori-gin, and [0.0.0.0, 127.255.255.255] without anynode originating any prefix. As the example shows, eachPEC-specific configuration computed this way will still in-clude information about the original prefixes contributing tothe PEC. Storing these prefixes may seem redundant. How-ever, note that even within a PEC, the lengths of the prefixesthat get advertised or get matched in route filters play a rolein decision making.

3.2 Dependency-aware SchedulingIt is tempting to believe that Packet Equivalence Classescould be analyzed fully independently of each other, and thatan embarrassingly parallel scheme could be used in the ver-ification process. While this is indeed true sometimes, therecan often be dependencies between various PECs. For exam-ple, consider a network that is running iBGP. For the variouspeering nodes to be able to communicate with each other,an underlying routing protocol such as OSPF should first es-tablish a data plane state that forwards packets destined tothe devices involved in BGP. In such a network, the mannerin which OSPF determines the forwarding behavior for thedevice addresses will influence the routing decisions madein BGP. In other words, the PECs that are handled by BGPdepend on the PECs handled by OSPF. In the past, configu-ration verification tools have either ignored such cases alto-gether, or, in the case of Minesweeper, modeled these classessimultaneously. Specifically, for a network of with n routersrunning iBGP, Minesweeper creates n+ 1 copies of the net-work, searching for the converged solution for the n loop-back addresses and also BGP. Effectively, this turns the veri-fication problem into one quadratically larger than the origi-nal. Given that configuration verification scales significantlyworse than linearly in input size, such a quadratic increase ininput size often makes the problem intractable.

Plankton goes for a more surgical approach. Once thePECs are calculated, Plankton identifies dependencies be-tween the Packet Equivalence Classes, based on recursiverouting entries, BGP sessions, etc. The dependencies arestored as a directed graph, whose nodes are the PECs, anddirected edges indicate which PECs depend on which others.In order to maximize parallelism in verification runs acrossPECs, a dependency-aware scheduler first identifies strongly

10

10

10

ospf_config:weights:0:1:102:101:0:102:102:0:101:10

R0 R1

R2

1

bit0

bit1

0

ospf_config:origin:R0:true

0.0.0.0-127.255.255.255

ospf_config:origin:R0:128.0.0.0/1R2:192.0.0.0/2weights:0:1:102:101:0:102:102:0:101:10

ospf_config:weights:0:1:102:101:0:102:102:0:101:10

128.0.0.0/1

Network Trie PacketEquivalenceClasses

192.0.0.0/2

1

bit2

0

ospf_config:origin:R2:true

128.0.0.0-191.255.255.255

ospf_config:origin:R0:128.0.0.0/1weights:0:1:102:101:0:102:102:0:101:10

192.0.0.0-255.255.255.255

Figure 4: Packet Equivalence Class computation

loopback1

loopback2

loopback3

loopback4

iBGP1

iBGP2

Figure 5: PEC Depen-dency Graph

connected components in the graph. These SCCs representgroups of PECs that are mutually dependent, and hence needto be analyzed simultaneously through a single verificationrun. In addition, if an SCC is reachable from another, it in-dicates that the upstream SCC can be scheduled for analysisonly after the one downstream has finished. Each verificationrun is a separate process. For an SCC S, if there is anotherSCC S′ that depends on it, Plankton forces all possible out-comes of S to be written to an in-memory filesystem (S willalways gets scheduled first). Outcomes refer to every possi-ble converged state for S, together with the non-deterministicchoices made in the process of arriving at them. When theverification of S′ gets scheduled, it reads these convergedstates, and uses them when necessary.

Minesweeper’s technique of replicating the networkroughly corresponds to the case where all PECs fall into asingle strongly connected component. We expect this to al-most never be the case. In fact, in typical cases, the SCCsare likely to be of size 1, meaning that every PEC can beanalyzed in isolation, with ordering of the runs being theonly constraint∗. For example, Figure 5 illustrates the de-pendency graph for a typical case where two PECs are beinghandled in iBGP on a network with 4 different routers. Theonly constraint in this case is that the loopback PECs shouldbe analyzed before the iBGP PECs can start. In such cases,Plankton’s keeps the problem size significantly smaller, andmaximizes the degree of parallelism that can be achieved.

When a PEC needs the relevant converged states of pastPECs for its exploration, the non-deterministic choices mayneed to be coordinated across all these PECs. In particu-lar, consider the choice of link failures: if we hypotheticallyexecuted one PEC assuming link L has failed and anotherPEC assuming L is working, the result represents an invalidexecution of the overall network. Therefore, our currentprototype matches topology changes across explorations. Asecond class of non-deterministic choices is protocol non-determinism. In our experiments, we have not seen casesof protocol non-determinism that requires matching acrossPECs. OSPF by its nature has deterministic outcomes, buton networks which have non-determinism in their internalrouting (e.g., non-deterministically configured BGP for in-∗An example where the SCCs are bigger than one PEC is the contrived

case where there exists a static route for destination IP A whose next hop isIP B, but another static route for destination IP B whose next hop is IP A.

ternal routing) and where message timing is correlated acrossPECs (e.g., via route aggregation), the system would need tocoordinate this non-determinism to avoid false positives.

3.3 Explicit-state Model CheckerThe explicit state model checker SPIN [9] provides Planktonits exhaustive exploration ability. SPIN verifies models writ-ten in the Promela modeling language, which has constructsto describe possible non-deterministic behavior. SPIN’s ex-ploration is essentially a depth-first search over the possiblestates of the supplied model. Points in the execution wherenon-deterministic choices are made represent branching inthe state space graph.

Plankton’s network model is essentially an implementa-tion of the control plane in Promela. Our current implemen-tation supports OSPF, BGP and static routing. Recall from§ 3.1 that Plankton partitions the header space into PacketEquivalence Classes. For each SCC, Plankton uses SPIN toexhaustively explore control plane behavior. In order to im-prove scalability, Plankton also performs another round ofpartitioning by executing the control plane for each prefixin the PEC separately. This separation of prefixes is helpfulin simplifying the protocol model. However, it does limitPlankton’s support for route aggregation. While featuressuch as a change in the routing metric can be supported, ifthere is a route map that performs an exact match on the ag-gregated prefix, it will not apply to the more specific routes,which Plankton models. Once the converged states of all rel-evant prefixes are computed, a model of the FIB combinesthe results from the various prefixes and protocols into a sin-gle network-wide data plane for the PEC.

In what follows, we present Plankton’s network model thatwill be executed by SPIN. We will initially present a simple,unoptimized model, which is functionally correct but has sig-nificant non-determinism that is irrelevant to finding differ-ent converged states. Subsequently, in § 4, we discuss howPlankton attempts to minimize irrelevant non-determinism,making the execution of the deterministic fragments of thecontrol plane comparable to simulation.

3.4 Abstract Protocol ModelTo define a control plane suitable for modeling real worldprotocols such as BGP and OSPF, we look to the technique

used by Minesweeper wherein the protocols were modeledas instances of the stable paths problem. Along similar lines,we consider the Simple Path Vector Protocol [8], which wasoriginally proposed to solve the stable paths problem. Wefirst extend SPVP to support some additional features thatwe wish to model. Based on this, we construct a protocolwe call the Reduced Path Vector Protocol, which we show tobe sufficient to correctly perform model checking, if we areinterested only in the converged states of the network. Weuse RPVP as the common control plane protocol for Plank-ton. We begin with a brief overview of SPVP, highlightingour extensions to the protocol. Appendix A contains the fulldetails of the protocol and our extensions.

3.4.1 SPVP and its extension

SPVP is an abstract model of real-world BGP, replacing thedetails of BGP configurations with abstract notions of im-port/export filters and ranking functions. For each node nand peer n′, the import and export filters dictate which ad-vertisements (i.e. the advertiser’s best path to the origin) ncan receive from and send to n′, respectively. The rankingfunction for n dictates the preference among all received ad-vertisements. These notions can be inferred from real-worldconfigurations.

We slightly extend the original SPVP [8] to support morefeatures of BGP. The extensions are as follows: we allow formultiple origins instead of a single one; the ranking functioncan be a partial order instead of a total one to allow for timebased tie breaking; and to be able to model iBGP, we allowthe ranking function of any node to change at any time duringthe execution of protocol.

It is well known that there are configurations which canmake SPVP diverge in some or all execution paths. How-ever, our goal is only to check the forwarding behavior in theconverged states, through explicit-state model checking. So,we define a much simpler model that can be used, withoutcompromising the soundness or completeness of the analy-sis (compared to SPVP).

3.4.2 Reduced Path Vector Protocol (RPVP)

We now describe RPVP, which is specifically designed forexplicit-state model checking of the converged states of theextended SPVP protocol.

In RPVP, the message passing model of SPVP is replacedwith a shared memory model. The network state only con-sists of the values of the best known path of each node ateach moment (best-path). In the initial state, the best pathof all nodes is ⊥, except origins, whose best path is ε. Ateach step, the set of all enabled nodes (E) is determined (Al-gorithm 1, line 5). A node n is considered enabled if either i)the current best path p of n is invalid, meaning that the nexthop in p has a best path that is not a prefix of p.

invalid(n) , best-path(best-path(n).head) 6= best-path(n).rest

Algorithm 1 RPVP1: procedure RPVP(:)2: Init : ∀n ∈N −Origins.best-path(n)←⊥3: Init : ∀o ∈Origins.best-path(o) = ε4: while true do:5: E← {n ∈N | invalid(n)∨∃n′ ∈ peers(n).can-updaten(n′)}6: ifE = ∅ then:7: break8: end if9: n← nondet-pick(E)

10: if invalid(n) then11: best-path(n)←⊥12: end if13: U ← best({n′ ∈ peers(n)|can-updaten(n′)})14: n′← nondet-pick(U)15: p← importn,n′ (exportn′,n(best-path(n′)))16: best-path(n)← p17: end while18: end procedure

Or ii) there is a node n′ among the peers of n that canproduce an advertisement which will change the current bestpath of n. In other words, n′ has a path better than the currentbest path of n, and the path is acceptable according to theexport and import policies of n′ and n respectively.

can-updaten(n′) , better(importn,n’(exportn’,n(best(n′)),best(n))Where bettern(p,p′) is true when path p is preferred over

p′ according to the ranking function of n.At any step of the execution, if there is no enabled node,

RPVP has reached a converged state. Otherwise a node nis non-deterministically picked among the enabled set (line9). If the current best path of n is invalid, the best path isset to ⊥. Among all peers of n that can produce adver-tisements that can update the best path of n, the neighborsthat produce the highest ranking advertisements are selected(line 13). Note that in our model we allow multiple pathsto have the same rank, so there may be more than one el-ements in the set U . Among the updates, one peer n′ isnon-deterministically selected and the best path of n is up-dated according to the advertisement of n′ (lines 14-16). Bythe end of line 16, an iteration of RPVP is finished. Notethat there are no explicit advertisements propagated; insteadnodes are polled for the advertisement that they would gen-erate based on their current best path when needed. Thethe protocol terminates once a converged state for the tar-get equivalence class is reached. RPVP does not define thesemantics of failure or any change to the ranking functions.Any topology changes to be verified are made before the pro-tocol starts its execution and the latest version of the rankingfunctions are considered.

A natural question is whether or not performing analysisusing RPVP is sound and complete with respect to SPVP.Soundness is trivial as each step of RPVP can be simulatedusing a few steps of SPVP. If we are only concerned aboutthe converged states, RPVP is complete as well:

Theorem 1. For any converged state reachable from the ini-tial state of the network with a particular set of links L fail-ing at certain steps during the execution of SPVP, there is an

execution of RPVP with the same import/export filters andranking functions equal to the latest version of ranking func-tions in the execution of SPVP, which starts from the initialstate in which all links in L have failed before the protocolexecution starts, and reaches the same converged state. Par-ticularly, there is a such execution in which at each step, eachnode takes a best path that does not change during the restof the execution.

Proof. The proof can be found in the Appendix.

Theorem 1 implies that performing model checking us-ing the RPVP model is complete. Note that RPVP does notpreserve all the transient states and the divergent behaviorsof SPVP. This frees us from checking unnecessary states aswe are only interested in the converged states. Yet, eventhe reduced state space has a significant amount of irrele-vant non-determinism. Consequently, we rely on a suite ofother domain-specific optimizations (§ 4) to eliminate muchof this non-determinism and make model checking practical.

Note that our presentation of RPVP has assumed the thata single best path is picked by each node. This matchesour current implementation in that we do not support multi-path in all protocols. In a special-case deviation from RPVP,our implementation allows a node running OSPF to main-tain multiple best paths, chosen based on multiple neighbors.While we could extend our protocol abstraction to allowmultiple best paths at each node, it wouldn’t reflect the real-world behavior of BGP which, even when multipath is con-figured, makes routing decisions based on a single best path.However, such an extension is valid under the constrained fil-tering and ranking techniques of shortest path routing. Ourtheorems can be extended to incorporate multipath in suchprotocols. We omit that to preserve clarity.

3.5 PoliciesWe primarily target verification of data plane policies overconverged states of the network. Similar to VeriFlow [15],we don’t implement a special declarative language for poli-cies; a policy is simply an arbitrary function computed overa data plane state and returning a Boolean value. Planktonimplements a Policy API where a policy supplies a callbackwhich will be invoked each time the model checker generatesa converged state. Plankton gives the callback the newly-computed converged data plane for a particular PEC, as wellas the relevant converged states of any other PEC that thecurrent PEC depends on. Plankton checks the callback’s re-turn value, and if the policy has failed, it writes a trail filedescribing the execution path taken to reach the particularconverged state.

Our API allows a policy to give additional information tohelp optimize Plankton’s search: source nodes and interest-ing nodes. We define two converged data plane states for aPEC to be equivalent if their paths from the source nodeshave the same length and the same interesting nodes are in

the same position on the path. Plankton may suppress check-ing a converged state if an equivalent one (from the perspec-tive of that policy) has already been checked (§ 4.2 and § 4.3describe how Plankton does this). If source and interestingnodes are not supplied, Plankton by default assumes that allnodes might be sources and might be interesting.

As an example, consider a waypoint policy: traffic froma set S of sources must pass through firewalls F . The pol-icy specifies sources S, interesting nodes F , and the callbackfunction steps through each path starting at S and fails whenit finds a path that does not contain a member of F . As an-other example, a loop policy can’t optimize as aggressively:it has to consider all sources.

In general, this API enables any policy that is a function ofa single PEC’s converged data plane state. We have foundit simple to add new policies, currently including: Reach-ability, Waypointing, Loop Freedom, BlackHole Freedom,Bounded Path Length and Multipath Consistency [1]. Wehighlight several classes of policies that fall outside this API:(i) Policies that inspect the converged control plane state, asopposed to the data plane: while not yet strictly in the cleanAPI, this information is easily available and we implementeda representative example, Path Consistency (§5), which as-serts that the control plane state as well as the data planepaths for a set of devices should be identical in the convergedstate (similar to Local Equivalence in Minesweeper [1]). (ii)Policies that require multiple PECs, e.g., “packets to two des-tinations use the same firewall”. This would be an easy ex-tension, leveraging Plankton’s PEC-dependency mechanism,but we have not performed a performance evaluation. (iii)Policies that inspect dynamic behavior, e.g., “no transientloops prior to convergence”, are out of scope just as theyare for all current configuration verification tools.

4 Optimizations

Although Plankton’s RPVP-based control plane greatly re-duces the state space, naive model checking is still not ef-ficient enough to scale to large networks. We address thischallenge through optimizations that fall into two major cat-egories — reducing the search space of the model checker,and making the search more efficient.

4.1 Partial Order Reduction

A well-known optimization technique in explicit-state modelchecking, Partial Order Reduction (POR) involves exploringa set of events only in one order, if the various orderings willresult in the same outcome. In general, precisely answeringwhether the order of execution affects the outcome can beas hard as model checking itself. Model checkers such asSPIN provide conservative heuristics to achieve some reduc-tion. However, in our experiments, this feature did not yieldany reduction. We believe this is because our model of thenetwork has only a single process, and SPIN’s heuristics are

designed to apply only in a multiprocess environment. Evenif we could restructure the model to use SPIN’s heuristics,we do not expect significant reductions, as evidenced in pastwork [4]. Instead, we implement POR heuristics, based onour knowledge of the RPVP control plane†.

4.1.1 Explore consistent executions only

To describe this optimization, we first introduce the notion ofa consistent execution: For a converged state S and a partialexecution of RPVP π , we say that π is consistent with S iffat each step of the execution, a node picks a path that is equalto it its best path in S and never changes it.

Readers may notice that Theorem 1 asserts the existenceof a consistent execution leading to each converged state ofthe network, once any failures have happened. This impliesthat if the model checker was to explore only executions thatare consistent with some converged state, completeness ofthe search would not be compromised (soundness is not vi-olated since every such execution is a valid execution). Ofcourse, when we start the exploration, we cannot know theexact executions that are consistent with some convergedstate, and hence need to be checked. So, we conservativelyassume that every execution we explore is relevant, and if weget evidence to the contrary (like a device having to changea selected best path), we stop exploring that execution.

4.1.2 Deterministic nodes

Even when there is the possibility of non-deterministic con-vergence, the “relevant” non-determinism is typically local-ized. In other words, after each non-deterministic choice,there is a significant amount of essentially deterministic be-havior before the opportunity for another non-deterministicchoice, if any, arises. (We consider it analogous to endgamesin chess, but applicable at any point in the protocol ex-ecution.) However, this is obscured by “irrelevant” non-determinism – particularly, ordering between node executionthat doesn’t impact the converged state. Our goal is to prunethe irrelevant non-determinism to reduce the search space forPlankton’s model checker.

For an enabled node n in state S with a single best up-date u, we say n is deterministic if in all possible convergedstates reachable from S, n will have the path selected aftern processes u. Of course, with the model checker havingonly partially executed the protocol, it is highly non-obviouswhich nodes are deterministic! Nevertheless, suppose for amoment we have a way to identify at least some determinis-tic nodes. How could we use this information? At each stepof RPVP, after finding the set of enabled nodes, if we canidentify at least one deterministic enabled node, we chooseone of these nodes and instruct SPIN to process its update.

†Since we wish to check all converged states of the network, it can beargued that any reduction in search space is essentially POR. But here, weare referring optimizations that have a localized scope.

(More specifically, we pick one arbitrarily.) This avoids thecostly non-deterministic branching caused by having to ex-plore the numerous execution paths where each one of theenabled nodes is the one executed next. The following theo-rem shows this is safe.

Theorem 2. Any partial execution of RPVP consistent witha converged state can be extended to a complete executionconsistent with that state.

Proof. The proof can be found in the Appendix.

By definition, choosing any deterministic node as the sin-gle node to execute next produces a new state that remainsconsistent with all possible converged states reachable fromthe prior state. Thus, Theorem 2 implies this determinis-tic choice does not eliminate any converged states from be-ing reachable, preserving completeness. Note that this op-timization does not require the entire network to have oneconverged state; it can apply at every step of the execution,possibly between non-deterministic choices.

What remains is the key: how can we identify determin-istic nodes? Perfect identification is too much to hope for,and we allow our heuristics to return fewer deterministicnodes than actually exist. We build heuristics that are spe-cific to each routing protocol, prioritizing speed and simplic-ity above handling atypical cases like circular route redistri-bution.

For OSPF, our detection algorithm runs a network-wideshortest path computation, and picks each node only afterall nodes with shorter paths have executed. We cache thiscomputation so it is only run once for a given topology, setof failures, and set of sources.

For BGP, the detection algorithm performs the followingcomputation: For each node which is enabled to update itsbest path, it checks whether there exists a pending updatethat would never get replaced, because the update woulddefinitely be preferred over other updates that are enablednow or may be in the future. To check this, we follow thenode’s BGP decision process, so if the update is tied formost-preferred in one step it moves to the next. For each stepof the decision process, the preference calculation is quiteconservative. For local pref, it marks an update as the win-ner if it matches an import filter that explicitly gives it thehighest local pref among all import filters. For AS Path, thepath length of the current update must be the minimum pos-sible in the topology. For IGP cost, the update must be fromthe peer with minimum cost. If at any node, any update isfound to be a clear winner, the node is picked as a determin-istic node, and is allowed to process the update. If no node isfound that has a clear winner but there is a node that has ≥ 2updates tied for the most preferred, then we deterministicallypick any one such node and have SPIN non-deterministicallychoose which of the multiple updates to process. Figure 6 il-lustrates these scenarios on a BGP network, highlighting onesequence of node selections (out of many possible).

2 3

1

4 5

Lower local pref for R5

1. R2 (Neighbor R1. Tied local pref, best AS Path)2. R5 (Neighbor R2. Highest local pref for R2)3. R3 (Neighbor R1. Tied local pref, best AS Path)4. None (R4 gets picked since all possible winning

updates are enabled, use SPIN to decide between neighbors R2, R3)

5. None (Only R6 is enabled, use SPIN to decide between neighbors R4 and R5

Deterministic nodes in each step:

Origin

6

Figure 6: Step-by-step choice of deterministic nodes (Each nodehas a different AS number).

The detection algorithm may fail to detect some determin-istic nodes. For instance, suppose node N is deterministicbut its import filter from neighbor M sets the highest lo-cal pref for updates with a particular community attribute,and M can never assign that attribute. Then the detectionalgorithm will fail to mark N as deterministic. But success-fully identifying at least one deterministic node in a step willavoid non-deterministic branching at that step. As long asthis happens frequently, the optimization will be helpful.

Even if the decision on a node is ambiguous in a particularstate, the system will often make progress to a state whereambiguities can be resolved. In the example above, onceM selects a path (and therefore will never change its pathas described in § 4.1.1), the detection algorithm no longerneeds to account for a possible more-preferred update fromit, and may then be able to conclude that N is deterministic.

4.1.3 Decision independence

If nodeA’s actions are independent of any future decisions ofnode B and vice versa, then the execution ordering betweenA and B does not matter. We check a sufficient conditionfor independence: any route advertisements passed betweenthese nodes, in either direction, must pass through a nodethat has already made its best path decision (and thereforewill not transmit any further updates). In this case, we picka single arbitrary execution order between A and B.

4.1.4 Failure ordering

As stated in § 3.4.2, the model checker performs all topol-ogy changes before the protocol starts execution. We alsoenforce a strict ordering of link failures, reducing branchingeven further.

4.2 Policy-based PruningPolicy-based pruning limits protocol execution to those partsof the network that are relevant to the satisfaction/failure ofthe policy. When a policy defines a set of source nodes(§ 3.5), it indicates that the policy can be checked by ana-lyzing the forwarding from those nodes only. The best ex-ample for this is reachability, which is checked from a par-ticular set of starting nodes. When an execution reaches astate where all source nodes have made their best-path de-cision, Plankton considers the execution, which is assumedto be consistent, to have finished. In the cases where only asingle prefix is defined in a PEC, Plankton performs a more

aggressive pruning, based on the influence relation. Any de-vice that cannot influence a source node is not allowed to ex-ecute. With some additional bookkeeping, the optimizationcan be extended to cases where multiple prefixes contributeto a PEC, but our current implementation does not supportthis. The optimization is also not sound when applied toPECs on which other PECs depend. A router that does notspeak BGP may not directly influence a source node, but itmay influence the routing for the router IP addresses, whichin turn may affect the chosen path of the source node. So,the optimization is not applied in such cases.

4.3 Choice of Failures

In addition to the total ordering of failures described in§ 4.1.4, Plankton also attempts to reduce the number of fail-ures that are explored, using equivalence partitioning of de-vices as proposed by Bonsai [2]. Bonsai groups devices inthe network into abstract nodes, creating a smaller topologyoverall for verification. Plankton computes Device Equiva-lence Classes (DECs) similarly, and defines a Link Equiv-alence Class (LEC) as the set of links between two DECs.For each link failure, Plankton then explores only one repre-sentative from each LEC. When exploring multiple failures,we refine the DECs and LECs after each selection. Note thatthis optimization limits the choice of failed links, but the ver-ification happens on the original input network. In order toavoid remapping interesting nodes (§ 3.5), they are each as-signed to a separate DEC. Since the computed DECs can bedifferent for each PEC, this optimization is done only whenthere are no cross-PEC dependencies.

4.4 State Hashing

During the exhaustive exploration of the state space, the ex-plicit state model checker needs to track a large number ofstates simultaneously. A single network state consists of acopy of all the protocol-specific state variables at all the de-vices. Maintaining millions of copies of these variables isnaturally expensive, and in fact, unnecessary. A routing de-cision at one device doesn’t immediately affect the variablesat any of the other devices. Plankton leverages this propertyto the reduce memory footprint, storing routing table entriesas 64-bit pointers to the actual entry, with each entry storedonce and indexed in a hash table. We believe this optimiza-tion can be applied to other variables in the network statetoo, as long as they are not updated frequently. Picking theright variables to optimize this way, and developing moreadvanced hash-based schemes, can be explored in the future.

5 EvaluationWe prototyped Plankton including the equivalence classcomputation, control plane model, policy API and optimiza-tions in 373 lines of Promela and 4870 lines of C++, ex-cluding the SPIN model checker. We experimented with our

prototype on Ubuntu 16.04 running on a 3.4 GHz Intel Xeonprocessor with 32 hardware threads and 188 GB RAM.

We begin our evaluation with simple hand-created topolo-gies incorporating protocol characteristics such as shortestpath routing, non-deterministic protocol convergence, redis-tribution, recursive routing, etc. Among these tests, we in-corporated examples of non-deterministic protocol executionfrom [8], as well as BGP wedgies, which are policy viola-tions which can occur only under some non-deterministicexecution paths. In each of these cases, Plankton correctlyfinds any violations that are present.

Having tested basic correctness, we next evaluate perfor-mance and compare to past verifiers. What sets Planktonapart from tools other than Minesweeper is its higher dataplane coverage and ability to handle multiple protocols (Fig-ure 1). We therefore compare primarily with Minesweeperbut also include ARC in some tests.

We also evaluated Bonsai [2], a preprocessing techniquethat helps improve the scalability of configuration verifica-tion, for specific policies. Bonsai could assist any configura-tion verifier. We integrated Bonsai with Plankton, and exper-imentally compare the performance of Bonsai+Minesweeperand Bonsai+Plankton. However, it is important to study theperformance of these tools without Bonsai too: Bonsai’s net-work compression cannot be applied if the correctness is tobe evaluated under link failures, or if the policy being evalu-ated is not preserved by Bonsai.

Experiments with synthetic configurationsOur first set of performance tests uses fat trees. We con-struct fat trees of increasing sizes, with each edge switchoriginating a prefix into OSPF. Link weights are identical.We check these networks for routing loops. In order to causeviolations, we install static routes at the core routers. In ourfirst set of experiments, the static routes match the routesthat OSPF would eventually compute, so there are no loops.Then, we change the static routes such that some of the traf-fic falls into a routing loop. Figure 7(a) illustrates the timeand memory consumed, using Plankton running on variousnumbers of cores, and using Minesweeper. We observed thatunder default settings, Minesweeper’s CPU utilization keepschanging, ranging from 100% to 1,600%. In this experimentand all others where we run both Minesweeper and Plankton,the two tools produced the same policy verification results.This serves as an additional correctness check for Plankton.Bonsai is not used, because its currently available implemen-tation does not appear to support loop policies.

As the results show, Plankton scales well with input net-work size. The speed and memory consumption varies asexpected with the degree of parallelism. Even on a singlecore, Plankton is quicker than Minesweeper for all topolo-gies. For larger networks, Plankton is several orders of mag-nitude quicker. On the memory front, even on 16 cores,Plankton’s footprint is smaller than Minesweeper’s.

Encouraged by the good performance numbers, we scale

up to very large fat trees (Figure 7(b)). Here, Minesweeperdoesn’t finish even in 4 hours, even with a 500-device net-work (in the case of passing loop check, even in a 245-devicenetwork). So, we did not run Minesweeper on the larger net-works. We run Plankton with a single CPU core only, toillustrate its time-memory tradeoff: since the analyses of in-dividual PECs are fully independent and of identical compu-tational effort, running with n cores would reduce the timeby n×, and increase memory by n×. For example, in the2,205-device network, Plankton uses about 170 GB per pro-cess. Policies that check a single equivalence class are muchcheaper: for example, single-IP reachability finishes in sec-onds or minutes even on the largest networks (Figure 7(b)).

Next, we test Plankton with a very high degree of non-determinism. We evaluated a data center setting with BGP,which is often employed to provide layer 3 routing downto the rack level in modern data centers [17]. We config-ure BGP as described in RFC 7938 [17] on fat trees of vari-ous sizes. Furthermore, we suppose that the network opera-tor intends traffic to pass through any of a set of acceptablewaypoints on the aggregation layer switches (e.g., imaginethe switches implement a certain monitoring function). Wepick a random subset of aggregation switches as the way-points in each experiment. However, we create a “miscon-figuration” that prevents multipath and fails to steer routesthrough the waypoints‡. Thus, in this scenario, whether theselected path passes through a waypoint depends on the or-der in which updates are received at various nodes, due toage-based tie breaking [16]. We check waypoint policieswhich state that the path between two edge switches shouldpass through one of the waypoints. Plankton evaluates vari-ous non-deterministic convergence paths in the network, anddetermines a violating sequence of events. Time and mem-ory both depend somewhat on the chosen set of aggregationswitches, but even the worst-case times are less than 2 sec-onds (Figure 7(c)). We consider this a success of our policy-based pruning optimization: the network has too many con-verged states to be verified in reasonable time, but many haveequivalent results in terms of the policy.

Experiments with semi-synthetic configurationsWe use real-world AS topologies and associated OSPF linkweights obtained from RocketFuel [24]. We pick a randomingress point that has more than one link incident on it. Weverify that with any single link failure, all destination pre-fixes are reachable from that ingress. Here, Minesweeper’sSMT-based search algorithm could be beneficial, due to thelarge search space created by failures. Nevertheless, Plank-ton performs consistently better in both time and memory(Figure 7(d)). Both tools find a violation in each case. Thetime taken by Plankton with 16 and 32 cores are often iden-tical, since a violation is found in the first set of PECs. Note

‡This setup is convenient for practical reasons, as our current Planktonprototype implementation does not support BGP multipath.

Timeout

0.001

0.01

0.1

1

10

0.01

0.1

1

10

100

1000

10000

100000

K=10 (Pass) K=12 (Pass) K=14 (Pass) K=10 (Fail) K=12 (Fail) K=14 (Fail)

Mem

ory

(GB)

Tim

e Ta

ken

(Sec

onds

)

Minesweeper (1+ cores) Plankton (1 core) Plankton (2 cores) Plankton (4 cores)Plankton (8 cores) Plankton (16 cores) Plankton (32 cores) Minesweeper (1+ cores)Plankton (1 core) Plankton (2 cores) Plankton (4 cores) Plankton (8 cores)Plankton (16 cores) Plankton (32 cores)

(a) Fat trees with OSPF, loop policy, multi-core

1

10

100

1000

1

10

100

1000

10000

100000

1000000

N = 50

0N =

720

N = 98

0N =

1280

N = 16

20N =

2205

N = 50

0N =

720

N = 98

0N =

1280

N = 16

20N =

2205

N =

500

N = 72

0N =

980

N = 12

80N =

1620

N = 22

05

Mem

ory

(GB

)

Tim

e (S

econ

ds)

Loop (Pass) Loop (Fail) Single IP Reachability

(b) Fat trees with OSPF, multiple policies, 1 core

0

200

400

600

800

1000

1200

20 45 80 125 180 245 320 0

50

100

150

200

250

300

350

400

450

Tim

e (

ms)

Mem

ory

(M

B)

Number of devices

Max. timeAvg. time

Max memoryAvg. memory

(c) Fat trees with BGP, waypoint policy, 1 core

Timeout

0.01

0.1

1

10

100

0.001

0.01

0.1

1

10

100

1000

10000

100000

AS 1221 AS 1239 AS 1755 AS 3257 AS 3967 AS 6461

Mem

ory

(GB)

Tim

e Ta

ken

(Sec

onds

)


(d) AS topologies with OSPF and failures, reachability pol-icy, multi-core

Timeout Timeout Timeout Timeout

0.001

0.01

0.1

1

10

100

1000

0.1

1

10

100

1000

10000

100000

AS 1221 AS 1239 AS 1755 AS 3257 AS 3967 AS 6461

Mem

ory

(GB)

Tim

e Ta

ken

(Sec

onds

)


(e) AS topologies with iBGP over OSPF, reachability policy,multi-core

0.1

1

10

100

1000

10000

N=20 N=45 N=80 N=125 N=125 N=245

Time(m

s)

MinesweeperReachability PlanktonReachabilityMinesweeperPathLength PlanktonPathLength

(f) Bonsai-compressed fat trees with OSPF, multiple poli-cies, 8 cores

Network LinksFailed

ARC Plankton

Fat tree (20 nodes)0 1.08 s 0.19 s≤ 1 1.05 s 0.23 s≤ 2 1.00 s 0.31 s




AS 1221 (108 nodes)0 41.62 s 1.12 s≤ 1 40.50 s 2.80 s≤ 2 38.83 s 106.57 s

AS 1775 (87 nodes)0 49.32 s 1.02 s≤ 1 48.65 s 2.73 s≤ 2 46.52 s 155.28 s

(g) Networks with link failures, all-to-allreachability policy, 8 cores

0.1

1

10

100

1000

10000

I(52)

II(63)

III(71)

IV(63)V(36)

VI(2)

VII(30)

VIII(30)

IX(3)

Stanford

Tim

e Ta

ken

(ms)

Reachability Reachability under failuresBounded Path Length Bounded Path Length under failuresWaypointing Waypointing under failures

(h) Real-world configs (number of devices inparentheses), multiple policies, 1 core

Network Policy LinksFailed

Memory Time

II

Loop0 2.37 GB 8.79 s

≤ 1 2.47 GB 13.62 s

Multipath Consistency0 2.37 GB 16.28 s

≤ 1 2.37 GB 22.01 s

Path Consistency0 2.37 GB 15.43 s

≤ 1 2.37 GB 23.55 s

III

Loop0 2.36 GB 11.49 s

≤ 1 2.88 GB 29.81 s


≤ 1 2.67 GB 24.86 s


≤ 1 2.88 GB 21.01 s

IV

Loop0 2.31 GB 12.37 s

≤ 1 2.40 GB 13.14 s


≤ 1 2.37 GB 17.04 s


≤ 1 2.37 GB 17.00 s

(i) Real-world configs, multiplepolicies, 32 cores

Figure 7: Plankton experiments. Bars and lines/points denote time and memory consumption, respectively.

that in this experiment and the next, we did not use Bon-sai, because (i) it cannot be used for checks involving linkfailures, and (ii) the topology has hardly any symmetry thatBonsai could exploit.

To evaluate our handling of PEC dependencies, we config-ure iBGP over OSPF on the AS topologies. The iBGP pre-

fixes rely on the underlying OSPF routing process to reachthe next hop. We check that packets destined to the iBGP-announced prefixes are correctly delivered. It is worth not-ing that this test evaluates a feature that, to the best of ourknowledge, is provided only by Plankton and Minesweeper.Thanks to the dependency-aware scheduler, Plankton per-

forms multiple orders of magnitude better (Figure 7(e)). Thisis unsurprising: Minesweeper’s approach of duplicating thenetwork forces it to solve a much harder problem here, some-times over 300× larger.

Integration with BonsaiWe integrated Plankton with Bonsai to take advantage ofcontrol plane compression when permitted by the specificverification task at hand. We test this integration experi-mentally by checking Bounded Path Length and Reachabil-ity policies on fat trees running OSPF. The symmetric natureof fat trees is key for Bonsai to have a significant impact. Wemeasure the time taken by Plankton and Minesweeper, afterBonsai preprocesses the network. Plankton still outperformsMinesweeper by multiple orders of magnitude (Figure 7(f)).

Comparison with ARCHaving evaluated Plankton’s performance in comparisonwith Minesweeper, we move on to comparing the perfor-mance of Plankton and ARC. ARC is specifically designedto check shortest-path routing under failures, so we expectedthe performance to be much better than the more general-purpose Plankton, when checking networks compatible withARC. We check all-to-all reachability in fat trees and AStopologies running OSPF, under a maximum of 0, 1 and 2link failures. Similar to Minesweeper, ARC’s CPU utiliza-tion ranges from 100% to 600% under default settings. Weallocate 8 cores to Plankton. Plankton is multiple orders ofmagnitude faster in most cases (Figure 7(g)).§ This is gen-uinely surprising; one reason that may explain the observa-tion is that ARC always computes separate models for eachsource-destination pair, whereas Plankton computes thembased only the destination, when verifying destination ad-dress routing. Nevertheless, we do not believe that there isa fundamental limitation in ARC’s design that would pre-vent it from outperforming Plankton on the networks thatcan be checked by either tool. Interestingly, while ARC’sresiliency-focused algorithm doesn’t scale as easily as Plank-ton for larger networks, its performance actually sometimesslightly improves when the number of failures to be checkedincreases. Plankton on the other hand scales poorly whenchecking increasing levels of resiliency. We do not find thisconcerning, since most interesting checks in the real worldinvolve only a small number of failures. When we performedthese experiments with Minesweeper, no check involving 2failures ran to completion except the smallest fat tree.

Testing with real configurationsWe used Plankton to verify 10 different real-world configu-rations from 3 different organizations, including the publiclyavailable Stanford dataset. We first check reachability, way-pointing and bounded path length policies on these networks,with and without failures. All except one of these networksuse some form of recursive routing, such as indirect static

§Our numbers for ARC are similar to those reported by its authors forsimilar sized networks, so we believe we have not misconfigured ARC.

Experiment Optimizations Time Memory

Ring, OSPF, 4 nodes, 1 failureAll 343 µs 137.43 MBNone 1.56 ms 137.39 MB

Ring, OSPF, 8 nodes, 1 FailureAll 623 µs 143.22 MBNone 0.13 s 137.04 MB

Ring, OSPF, 16 nodes, 1 FailureAll 2.44 ms 137.89 MBNone 266.48 s 7615.57 MB

Fat tree, OSPF, 20 nodesAll 464 µs 551.73 MBNone > 5 min > 8983.55 MB

Fat tree, OSPF, 245 nodesAll 4.297 s 1908 MBAll but link failure opt. 64.97 s 72862 MB

AS 1221 iBGPAll 27.54 s 254.22 MBAll but deterministic node opt. 25.43 s 254.34 MB

Fat tree, BGP, 20 nodesAll 46 ms 137 MBAll but deterministic node opt. > 5 min > 6144 MBAll but policy-based pruning > 5 min > 6144 MB

Figure 8: Experiments with optimizations disabled/limited

routes or iBGP. We feel that this highlights the significanceof Plankton’s and Minesweeper’s support for such config-urations. Moreover, the PEC dependency graph for thesenetworks did not have any strongly connected componentslarger than a single PEC, which matches yet another of ourexpectations. Interestingly, we did find that the PEC depen-dency graph had self loops, with a static route pointing to anext hop IP within the prefix being matched. It is also note-worthy that in these experiments, the only non-determinismwas in the choice of links that failed, which substantiatesour argument that network configurations in the real worldare largely deterministic. Figure 7(h) illustrates the results,which indicate that Plankton can handle the complexity ofreal-world configuration verification.

In our next experiment with real world configs, we identifythree networks where Loop, Multipath Consistency and PathConsistency policies are meaningful and non-trivial to check.We check these policies with and without link failures. Fig-ure 7(i) illustrates the results of this experiment. The resultsindicate that the breadth of Plankton’s policies scale well onreal world networks. The Batfish parser, which is used byMinesweeper, was incompatible with the configurations, sowe could not check these configs on Minesweeper (checkingthe Stanford dataset failed after parsing). However, the num-bers we observe are significantly better than those reportedfor Minesweeper on similar-sized networks, for similar poli-cies.

Optimization Cost/EffectivenessTo determine the effectiveness of Plankton’s optimizations,we perform experiments with some optimizations disabledor limited. Figure 8 illustrates the results from these exper-iments. When all optimizations are turned off, naive modelchecking fails to scale beyond the most trivial of networks.The optimizations reduce the state space by 4.95× in smallernetworks and by as much 24,968× in larger ones.

To evaluate device-equivalence based optimizations inpicking failed links, we perform loop check on fat trees run-ning OSPF under single link failure with the optimizationturned off. We observed a 15× reduction in speed, and a38× increase in memory overhead, indicating the effective-ness of the optimization in networks with high symmetry.

Experiment No Bitstate Hashing Bitstate Hashing180 Node BGP DC Waypoint (Worst Case) 202 MB 67 MB320 Node BGP DC Waypoint (Worst Case) 428 MB 215 MBAS 1239 Fault Tolerance (2 cores) 7.33 GB 4.52 GBAS 1221 Fault Tolerance (1 core) 163.53 MB 60 MB

Figure 9: The effect of bitstate hashing on memory usage

In the next set of experiments, we measure the impact ofour partial order reduction technique of prioritizing deter-ministic nodes (§ 4.1.2). We first try the iBGP reachabilityexperiment with the AS 1221 topology, with the detectionof deterministic nodes in BGP disabled. We notice that inthis case the decision independence partial order reductionproduces reductions identical to the disabled optimization,keeping the overall performance unaffected. In fact, the thetime improves by a small percentage, since there is no detec-tion algorithm that runs at every step. We see similar resultswhen we disable the optimization on the edge switches in ourBGP data center example. However, this does not mean thatthe deterministic node detection can be discarded — in theBGP data center example, when the optimization is disabledaltogether, the performance falls dramatically. The next opti-mization that we study is policy-based pruning. On the BGPdata center example, we attempt to check a waypoint policy,with policy-based optimizations turned off. The check timesout, since it is forced to generate every converged data plane,not just the ones relevant to the policy.

SPIN provides a built-in optimization called bitstate hash-ing that uses a Bloom filter to keep track of explored states,rather than storing them explicitly. This can cause some falsenegatives due to reduced coverage of execution paths. Wefind that bit state hashing provides significant reduction inmemory in a variety of our test cases (Figure 9). Accordingto SPIN’s statistics our coverage would be over 99.9%. Nev-ertheless, we have not turned on bitstate hashing in our otherexperiments in favor of full correctness.

6 Limitations

Some of the limitations of Plankton, such as the lack of sup-port for BGP multipath and limited support for route aggre-gation, have been mentioned in previous sections. As dis-cussed in § 3.2, Plankton may also produce false positiveswhen checking networks with cross-PEC dependencies, be-cause it expects that every converged state of a PEC may co-exist with every converged state of other PECs that dependon it. However, such false positives are unlikely to happenin practice, since real-world cases of cross-PEC dependen-cies (such as iBGP) usually involve only a single convergedstate for the recursive PECs. Our current implementationof Plankton assumes full visibility of the system to be ver-ified, and that any dynamics will originate from inside thesystem. So, influences such as external advertisements needto be modeled using stubs that denote entities which origi-nate them. Plankton’s technique is not suited for detectingissues in vendor-specific protocol implementations, a lim-itation that all existing formal configuration analysis tools

share. As with most formal verification tools, one needsto assume that Plankton itself is correct, both in terms ofthe theoretical foundations as well as the implementation.Correct-by-construction program synthesis could help in thisregard.

7 Related WorkData plane verification: The earlier offline network verifi-cation techniques [19, 13] have evolved into more efficientand real-time ones (e.g., [15, 12, 10, 26, 3, 11]), includingricher data plane models (e.g., [21, 14]). These techniqueshowever, cannot verify configurations prior to deployment.

Configuration verification: We discussed the state ofthe art of configuration verification in § 2, and how Plank-ton improves upon the various tools in existence. Crystal-Net [18] emulates actual device VMs, and its results couldbe fed to a data plane verifier. However, this would notverify non-deterministic control plane dynamics. Simulta-neously improving the fidelity of configuration verifiers inboth dimensions (capturing dynamics as in Plankton andimplementation-specific behavior as in CrystalNet) appearsto be a difficult open problem.

Optimizing network verification: Libra [27] is a divide-and-conquer data plane verifier, which is related to ourequivalence class-based partitioning of possible packets. Theuse of symmetry to scale verification has been studied in thedata plane [22] and control plane (Bonsai [2]). We have dis-cussed how Plankton uses ideas similar to Bonsai, as well asintegrates with Bonsai itself.

Model checking in the networking domain: Past ap-proaches that used model checking in the networking do-main have focused almost exclusively on the network soft-ware itself, either as SDN controllers, or protocol implemen-tations [4, 23, 20]. Plankton uses model checking not to ver-ify software, but to verify configurations.

8 Conclusion and Future WorkWe described Plankton, a formal network configuration ver-ification tool that combines equivalence partitioning of theheader space with explicit state model checking of protocolexecution. Thanks to pragmatic optimizations such as partialorder reduction and state hashing, Plankton produces signif-icant performance gains over the state of the art in config-uration verification. Improvements such as checking tran-sient states, incorporating real software, partial order reduc-tion heuristics that guarantee reduction, etc. are interestingavenues of future work.

Acknowledgements: We thank the anonymous reviewersand our shepherd Cole Schlesinger for their comments. Wealso thank Ratul Mahajan and Ryan Beckett for their help inintegrating Bonsai with Plankton. This work was supportedby NSF grants CNS-1513906 and CCF-1421575.

References[1] BECKETT, R., GUPTA, A., MAHAJAN, R., AND WALKER, D. A

general approach to network configuration verification. In Proceed-ings of the Conference of the ACM Special Interest Group on DataCommunication (New York, NY, USA, 2017), SIGCOMM ’17, ACM,pp. 155–168.

[2] BECKETT, R., GUPTA, A., MAHAJAN, R., AND WALKER, D. Con-trol plane compression. In Proceedings of the 2018 Conference of theACM Special Interest Group on Data Communication (New York, NY,USA, 2018), SIGCOMM ’18, ACM, pp. 476–489.

[3] BJØRNER, N., JUNIWAL, G., MAHAJAN, R., SESHIA, S. A., ANDVARGHESE, G. ddnf: An efficient data structure for header spaces. InHaifa Verification Conference (2016), Springer, pp. 49–64.

[4] CANINI, M., VENZANO, D., PEREŠÍNI, P., KOSTIC, D., AND REX-FORD, J. A NICE way to test openflow applications. In Presented aspart of the 9th USENIX Symposium on Networked Systems Design andImplementation (NSDI 12) (San Jose, CA, 2012), USENIX, pp. 127–140.

[5] FAYAZ, S. K., SHARMA, T., FOGEL, A., MAHAJAN, R., MILL-STEIN, T., SEKAR, V., AND VARGHESE, G. Efficient NetworkReachability Analysis Using a Succinct Control Plane Representation.In 12th USENIX Symposium on Operating Systems Design and Imple-mentation (OSDI 16) (GA, 2016), pp. 217–232.

[6] FOGEL, A., FUNG, S., PEDROSA, L., WALRAED-SULLIVAN, M.,GOVINDAN, R., MAHAJAN, R., AND MILLSTEIN, T. A general ap-proach to network configuration analysis. In Proceedings of the 12thUSENIX Conference on Networked Systems Design and Implemen-tation (Berkeley, CA, USA, 2015), NSDI’15, USENIX Association,pp. 469–483.

[7] GEMBER-JACOBSON, A., VISWANATHAN, R., AKELLA, A., ANDMAHAJAN, R. Fast control plane analysis using an abstract repre-sentation. In Proceedings of the 2016 ACM SIGCOMM Conference(2016), SIGCOMM ’16.

[8] GRIFFIN, T. G., SHEPHERD, F. B., AND WILFONG, G. The stablepaths problem and interdomain routing. IEEE/ACM Trans. Netw. 10,2 (Apr. 2002), 232–243.

[9] HOLZMANN, G. J. The model checker spin. IEEE Trans. Softw. Eng.23, 5 (May 1997), 279–295.

[10] HORN, A., KHERADMAND, A., AND PRASAD, M. Delta-net: Real-time network verification using atoms. In 14th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 17) (Boston,MA, 2017), USENIX Association, pp. 735–749.

[11] HORN, A., KHERADMAND, A., AND PRASAD, M. R. A preciseand expressive lattice-theoretical framework for efficient network ver-ification. In 2019 27st IEEE International Conference on NetworkProtocols (ICNP) (2019), IEEE.

[12] KAZEMIAN, P., CHANG, M., ZENG, H., VARGHESE, G., MCKE-OWN, N., AND WHYTE, S. Real time network policy checking usingheader space analysis. In Presented as part of the 10th USENIX Sym-posium on Networked Systems Design and Implementation (NSDI 13)(Lombard, IL, 2013).

[13] KAZEMIAN, P., VARGHESE, G., AND MCKEOWN, N. Header spaceanalysis: Static checking for networks. In Presented as part of the 9thUSENIX Symposium on Networked Systems Design and Implementa-tion (NSDI 12) (San Jose, CA, 2012), USENIX, pp. 113–126.

[14] KHERADMAND, A., AND ROSU, G. P4K: a formal semantics of P4and applications. CoRR abs/1804.01468 (2018).

[15] KHURSHID, A., ZOU, X., ZHOU, W., CAESAR, M., AND GOD-FREY, P. B. VeriFlow: Verifying Network-wide Invariants in RealTime. In Proceedings of the 10th USENIX Conference on NetworkedSystems Design and Implementation (Berkeley, CA, USA, 2013),NSDI’13, USENIX Association, pp. 15–28.

[16] LAPUKHOV, P. Equal-Cost Multipath Considerations for BGP.Internet-Draft draft-lapukhov-bgp-ecmp-considerations-00, InternetEngineering Task Force, Oct. 2016. Work in Progress.

[17] LAPUKHOV, P., PREMJI, A., AND MITCHELL, J. Use of BGP forRouting in Large-Scale Data Centers. RFC 7938 (Informational),Aug. 2016.

[18] LIU, H. H., ZHU, Y., PADHYE, J., CAO, J., TALLAPRAGADA, S.,LOPES, N. P., RYBALCHENKO, A., LU, G., AND YUAN, L. Crystal-net: Faithfully emulating large production networks. In Proceedingsof the 26th Symposium on Operating Systems Principles (New York,NY, USA, 2017), SOSP ’17, ACM, pp. 599–613.

[19] MAI, H., KHURSHID, A., AGARWAL, R., CAESAR, M., GODFREY,P. B., AND KING, S. T. Debugging the data plane with anteater.In Proceedings of the ACM SIGCOMM 2011 Conference (New York,NY, USA, 2011), SIGCOMM ’11, pp. 290–301.

[20] MUSUVATHI, M., AND ENGLER, D. R. Model checking large net-work protocol implementations. In Proceedings of the 1st Conferenceon Symposium on Networked Systems Design and Implementation -Volume 1 (Berkeley, CA, USA, 2004), NSDI’04, USENIX Associa-tion, pp. 12–12.

[21] PANDA, A., LAHAV, O., ARGYRAKI, K., SAGIV, M., ANDSHENKER, S. Verifying reachability in networks with mutable dat-apaths. In 14th USENIX Symposium on Networked Systems Designand Implementation (NSDI 17) (Boston, MA, 2017), USENIX Asso-ciation, pp. 699–718.

[22] PLOTKIN, G. D., BJØRNER, N., LOPES, N. P., RYBALCHENKO,A., AND VARGHESE, G. Scaling network verification using symme-try and surgery. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (NewYork, NY, USA, 2016), POPL ’16, ACM, pp. 69–83.

[23] SETHI, D., NARAYANA, S., AND MALIK, S. Abstractions for modelchecking SDN controllers. In 2013 Formal Methods in Computer-Aided Design (oct 2013), IEEE.

[24] SPRING, N., MAHAJAN, R., WETHERALL, D., AND ANDERSON,T. Measuring isp topologies with rocketfuel. IEEE/ACM Trans. Netw.12, 1 (Feb. 2004), 2–16.

[25] WEITZ, K., WOOS, D., TORLAK, E., ERNST, M. D., KRISHNA-MURTHY, A., AND TATLOCK, Z. Scalable verification of border gate-way protocol configurations with an smt solver. In Proceedings of the2016 ACM SIGPLAN International Conference on Object-OrientedProgramming, Systems, Languages, and Applications (2016), OOP-SLA 2016, ACM.

[26] YANG, H., AND LAM, S. S. Real-time verification of network prop-erties using atomic predicates. In 2013 21st IEEE International Con-ference on Network Protocols (ICNP) (2013), IEEE, pp. 1–11.

[27] ZENG, H., ZHANG, S., YE, F., JEYAKUMAR, V., JU, M., LIU, J.,MCKEOWN, N., AND VAHDAT, A. Libra: Divide and conquer toverify forwarding tables in huge networks. In 11th USENIX Sympo-sium on Networked Systems Design and Implementation (NSDI 14)(Seattle, WA, 2014), USENIX Association, pp. 87–99.

A Extended SPVP

In extended SPVP, for each node n, and for each peer n′ ∈peers(n), rib-inn(n′) keeps track of the most recent adver-tisement of n′ to n. In addition, best-path(n) keeps the bestpath that n has to one of the multiple origins. Peers are con-nected using reliable FIFO message buffers to exchange ad-vertisements. Each advertisement consists of a path from theadvertising node to an origin. In each step (which we assumeis performed atomically) a node n takes an advertisement

p from the buffer connected to peer n′, and applies an im-port filter on it (importn,n′(p)). n then updates rib-inn(n′)with the new imported advertisement. In our extension, weassume that each node has a ranking function λ that pro-vides a partial order over the paths acceptable by the node.n then proceeds by updating its best-path to the highestranking path in rib-inn. If the best path in rib-inn have thesame rank as the current best path and that path is still valid,best-path(n) will not change. If the best path is updated, nadvertises the path to its peers. For each peer n′, n appliesthe export filter on the path (exportn,n′(best-path(n))) be-fore sending the advertisement.

The import filter, the export filter, and the ranking func-tions are abstract notions that will be inferred from the con-figuration of the node. We make reasonable assumptionsabout these notions (Appendix B). Attributes such as localpref, IGP cost, etc. are accounted for in the the ranking func-tion and the import/export filters.

If a session between two peers fails, the messages in thebuffer are lost and the buffer cannot be used anymore. Weassume that when this happens, each peer gets ⊥ as the ad-vertised path. Additionally, to be able to model iBGP, inextended SPVP we allow the ranking function of any node nto change at any time during the execution of the protocol.This is to model cases in which for example a link failurecauses IGP costs to change. In such cases we assume thatn receives a special message to recompute its best-path ac-cording to the new ranking function.

The state of network at each point in time consists of thevalues of best-path, rib-in, and the contents of the messagebuffers. In the initial state S0, the best path of the originsis ε and the best path of the rest of the nodes are ⊥ whichindicates that the node has no path. Also for any n,n′ ∈ V ,rib-inn(n′) is ⊥. We assume that initially the origins puttheir advertisements in the message buffer to their peers, butthe rest of the buffers are empty.

An (partial) execution of SPVP is a sequence of states πwhich starts from S0 and each state is reachable by a singleatomic step of SPVP on the state before it. A convergedstate in SPVP, is a state in which all buffers are empty. Acomplete execution is an execution that ends in a convergedstate. It is well known that there are configurations which canmake SPVP diverge in some or all execution paths. However,our goal is to only to check the forwarding behavior in theconverged states, through explicit-state model checking. So,we define a much simpler model that can be used, withoutcompromising the soundness or completeness of the analysis(compared to SPVP).

B Assumptions

We make the following assumptions in in our theoreticalmodel.

• Both import/export filters return ⊥ if the filter rejects theadvertisement according to the configuration.

• All import filters reject paths that cause forwarding loops.They also do not alter the path (unless the path is rejected).

• All export filters for each node n not rejecting an adver-tisement, will append n at the end of the advertised path.No other modification is made to the path.

• Path ⊥ has the lowest ranking in all ranking functions.• The import/export filters never change during the execu-

tion of the protocol. Note that we do not make such as-sumption for the ranking functions.Note that these are reasonable assumptions with respect to

how real world protocols (especially BGP) work.

C Proof of Theorems

Proof of Theorem 1. For a complete executionπ = S0,S1, ...,Sc of SPVP and a state Si in that exe-cution, for any node n, we say that n is converged in Si forexecution π iff n has already picked the path it has in theconverged state (Sc) and does not change it:

convergedπ(n,Si) ,∀j.i≤ j ≤ c : best-pathSj

(n) = best-pathSc(n)

It it clear that when a node converges in an state, it remainsconverged (according to the definition above).

Lemma 1. In any complete execution π = S0,S1, ...,Scof SPVP, for any sate Si, for any two nodes n and n′,if ∃l : n′ = best-pathSi

(n)[l] ¶ , and best-pathSi(n′) 6=

best-pathSi(n)[l :], there is a j (i < j ≤ c) such that

best-pathSj(n) 6= best-pathSi

(n).

Proof. This can be shown by a simple induction on thelength of the prefix of the best path of n from n to upto n′ (l). If l = 1 (i.e the two nodes are directly con-nected) then either n′ will advertise its path to n and nand will change its path or the link between n and n′

fails in which case n will receive an advertisement with⊥ as the path (Section 3.4.1), which causes n to changeits path. Note that the argument holds even if the rankingfunction of n or n′ changes. If l > 1, assuming the claimholds for lengths less than l, for n′′ = best-pathSi

(n)[0],either best-pathSi

(n′′) 6= best-pathSi(n)[1 :] in which

case due to induction hypothesis the claim holds, orbest-pathSi

(n′′) = best-pathSi(n)[1 :] in which case we

note that n′ = best-pathSi(n′′)[l− 1], and since the length

of the path n′′ to n′ is less than l, by induction hypothesiswe know that eventually (i.e for a j > i) best-pathSj

(n′′) 6=best-pathSj

(n)[1 :] and by induction hypothesis this willlead to a change in the best path of n.

¶For a path P = p0,p1, ...,pn, we denote pi by P [i] andpi,pi+1, ...,pn by P [i :].

Corollary 1. For any complete execution π of SPVP , for anynode n, any node along the best path of n in the convergedstate converges before n.

Now consider a complete execution π = S0,S1, ...,Sc ofSPVP. We will construct a complete execution of RPVPwith |N | steps (where N is set of all nodes) resulting inthe same converged state as Sc. We start with a topol-ogy in which all the links that have failed during the ex-ecution of SPVP are already failed. For any node n, wedefine Cπ(n) = min{i|convergedπ(n,Si)}. Consider thesequence n1,n2, ...,n|N | of all nodes sorted in the increas-ing order of Cπ . Now consider the execution of RPVPπ′ = S′0,S

′1, ...,S

′|N | which starts from the initial state of

RPVP and in each state S′i, (a) either node ni+1 is the pickedenabled node and the node pi = best-pathSc

(ni+1)[0] is thepicked best peer, or (b) in case pi =⊥, nothing happens andS′i+1 = S′i.

First, note that (modulo the repeated states in case b), π′

is a valid execution of of RPVP: at each state S′i (in casea), ni+1 is indeed an enabled node since its best path at thatstate is⊥ and according to corollary 1, pi has already pickedits path: Also pi will be in the set of best peers of ni+1(line 13 in RPVP). Assume this is not the case, i.e there existsanother peer p′ that can advertise a better path. This meansthat in Sc of SPVP, p′ can send an advertisement that is better(according to the version of ranking functions in Sc) than theconverged path of ni+1. This contradicts the fact that Sc is aconverged path.

Second, note that S′|N | is a converged state for RPVP, be-cause otherwise, using similar reasoning as above, Sc cannot be converged. Also it is easy to see that best-pathS′

|N|=

best-pathSc

Finally, note that in π′, once a node changes its best pathfrom ⊥, it does not change its best path again.

Proof of Theorem 2. We begin by making two observationsabout RPVP that are key to the proof:

• RPVP for a prefix can never converge to a state having loop-ing paths.• If a node u adopts the best path of a neighbor v, v will benext hop of u.

Consider any converged state S. The theorem states thatany partial execution that is consistent with S can be ex-tended to a full execution that leads to S. We prove the the-orem by induction on the length of the longest best path inS.Base case: If in a network a converged state exists where thebest path at each node is of length 0, that means that eachnode is either an origin or doesn’t have a best path for theprefix. Since any execution apart from the empty execution

(where no protocol event happens) is not consistent with thisstate, the theorem holds.Induction hypothesis: If a converged state exists in a net-work such that all best paths are of length k or less, thenany partial execution that is consistent with the convergedstate can be extended to a full execution that reaches the con-verged state.Induction step: Consider a network with a converged stateS such that the longest best path is of length k+ 1. Wefirst divide the nodes in the network into two classes —N , which are the nodes with best paths of length k or less,and N ′, which are nodes with best paths of length k+ 1.Consider a partial execution π that is consistent with S. Weidentify two possibilities for π:

Case 1: Every node that has executed in π falls into N . Inthis case, we define a smaller network which is the subgraphof the original network, induced by the nodes in N . In thisnetwork, the path selections made in S will constitute aconverged state. This is because in the original network, inthe state S, the nodes in N are not enabled to make statechanges. So, we can extend π such that we get an executionπ′ where nodes in N match the path selections in S. Now,we further extend π′ with steps where each node in N ′ readsthe best path from the node that is its nexthop in S andupdates its best path. When every node in N has done this,the overall system state will reach S.

Case 2: At least one node in N ′ has executed in π. Inthis case, we observe that since π is consistent with S, by thedefinition of a consistent execution, no node in the networkhas read the state of any node in N ′. So, we can constructan execution π′ which has the same steps as π, except thatany step taken by a node inN ′ is skipped. As in the previouscase, π′ can be extended to reach a converged state in thesubgraph induced by N . We extend π, first by using thesteps that extend π′, and if necessary, taking additional stepsat nodes from N ′ to reach S.

Plankton: Scalable network conﬁguration veriﬁcation ...3 Plankton Design We now present...

Documents

Transcript of Plankton: Scalable network conﬁguration veriﬁcation ...3 Plankton Design We now present...