Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System...

14
Configuration Management at Massive Scale: System Design and Experience William Enck, Patrick McDaniel Pennsylvania State University {enck,mcdaniel}@cse.psu.edu Subhabrata Sen, Panagiotis Sebos, Sylke Spoerel AT&T Research {sen,psebos}@research.att.com [email protected] Albert Greenberg Microsoft Research [email protected] Sanjay Rao Purdue University [email protected] William Aiello University of British Columbia [email protected] Abstract The development and maintenance of network device configurations is one of the central challenges faced by large network providers. Current network management systems fail to meet this challenge primarily because of their inability to adapt to rapidly evolving customer and provider-network needs, and because of mismatches be- tween the conceptual models of the tools and the services they must support. In this paper, we present the PRESTO configuration management system that attempts to ad- dress these failings in a comprehensive and flexible way. Developed for and deployed over the last 4 years within a large ISP network, PRESTO constructs device-native configurations based on the composition of configlets representing different services or service options. Con- figlets are compiled by extracting and manipulating data from external systems as directed by the PRESTO con- figuration scripting and template language. We outline the configuration management needs of large-scale net- work providers, introduce the PRESTO system and con- figuration language, and demonstrate the use, workflows, and ultimately the platform’s flexibility via an example of VPN service. We conclude by considering future work and reflect on the operators’ experiences with PRESTO. 1 Introduction Configuration management is among the largest cost- drivers in provider networks. Such costs are driven by the immense complexity of the supported services and infrastructure. For a large provider, thousands of en- terprises with diverse services and configurations must be seamlessly and reliably connected across huge geo- graphic areas and rapidly evolving networks. Moreover, the initial turn-up installation and subsequent support of a single customer may span many organizations and sys- tems internal to the provider. The stakes for the sup- ported enterprise are extremely high: an outage may re- sult in loss of business, delays in “getting to revenue” since service turn up precedes revenue, failure to meet contractual obligations, or disruption of key organiza- tional workflows. Given the complexity and stakes, it may be surpris- ing that the common configuration management prac- tice involves pervasive manual work or ad hoc script- ing. The reasons for this are multi-faceted. From a provider perspective, every customer is in some ways unique. Though service orders have a lot in common, many new installations made to realize those orders rep- resent unique combinations of services and network con- figurations. Interactions with customer networks, stale, incomplete, or imperfect information, and service inter- actions make both turn-up as well as ongoing mainte- nance of configurations complex and error-prone pro- cesses. Moreover, the devices in the network and defi- nition of services they support change at a dizzying rate. New firmware versions, customer requirements, or sup- ported applications appear every day. Market demands further dictate that the time-to-market for new services is a critical driver of revenue: delays caused by tool config- uration, extension, or development can mean the differ- ence between profitability and loss. In short, there is an unserved need in provider networks for tools that address these complex and sometimes contradictory challenges while constructing service configurations. The PRESTO configuration management system de- velops network device configurations from composed collections of configlets that define the services to be supported by the target device. Written in our general- purpose hybrid scripting and configuration template lan- guage, the PRESTO system extracts specific informa- tion from external systems and databases and transforms this information into complete device configurations as directed by the PRESTO compiler. Extensive interac- tions with diverse engineering teams charged with man- aging operational IP networks led us to the conclusion that, to gain wide buy-in and adoption, the PRESTO language must adhere closely to the complex and often 2007 USENIX Annual Technical Conference USENIX Association 73

Transcript of Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System...

Page 1: Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System Design and Experience William Enck, Patrick McDaniel Pennsylvania State University

Configuration Management at Massive Scale:System Design and Experience

William Enck, Patrick McDanielPennsylvania State University{enck,mcdaniel}@cse.psu.edu

Subhabrata Sen, Panagiotis Sebos, Sylke SpoerelAT&T Research

{sen,psebos}@research.att.com

[email protected]

Albert GreenbergMicrosoft Research

[email protected]

Sanjay RaoPurdue University

[email protected]

William AielloUniversity of British Columbia

[email protected]

AbstractThe development and maintenance of network deviceconfigurations is one of the central challenges faced bylarge network providers. Current network managementsystems fail to meet this challenge primarily because oftheir inability to adapt to rapidly evolving customer andprovider-network needs, and because of mismatches be-tween the conceptual models of the tools and the servicesthey must support. In this paper, we present the PRESTOconfiguration management system that attempts to ad-dress these failings in a comprehensive and flexible way.Developed for and deployed over the last 4 years withina large ISP network, PRESTO constructs device-nativeconfigurations based on the composition of configletsrepresenting different services or service options. Con-figlets are compiled by extracting and manipulating datafrom external systems as directed by the PRESTO con-figuration scripting and template language. We outlinethe configuration management needs of large-scale net-work providers, introduce the PRESTO system and con-figuration language, and demonstrate the use, workflows,and ultimately the platform’s flexibility via an exampleof VPN service. We conclude by considering future workand reflect on the operators’ experiences with PRESTO.

1 Introduction

Configuration management is among the largest cost-drivers in provider networks. Such costs are driven bythe immense complexity of the supported services andinfrastructure. For a large provider, thousands of en-terprises with diverse services and configurations mustbe seamlessly and reliably connected across huge geo-graphic areas and rapidly evolving networks. Moreover,the initial turn-up installation and subsequent support ofa single customer may span many organizations and sys-tems internal to the provider. The stakes for the sup-ported enterprise are extremely high: an outage may re-sult in loss of business, delays in “getting to revenue”

since service turn up precedes revenue, failure to meetcontractual obligations, or disruption of key organiza-tional workflows.

Given the complexity and stakes, it may be surpris-ing that the common configuration management prac-tice involves pervasive manual work or ad hoc script-ing. The reasons for this are multi-faceted. From aprovider perspective, every customer is in some waysunique. Though service orders have a lot in common,many new installations made to realize those orders rep-resent unique combinations of services and network con-figurations. Interactions with customer networks, stale,incomplete, or imperfect information, and service inter-actions make both turn-up as well as ongoing mainte-nance of configurations complex and error-prone pro-cesses. Moreover, the devices in the network and defi-nition of services they support change at a dizzying rate.New firmware versions, customer requirements, or sup-ported applications appear every day. Market demandsfurther dictate that the time-to-market for new services isa critical driver of revenue: delays caused by tool config-uration, extension, or development can mean the differ-ence between profitability and loss. In short, there is anunserved need in provider networks for tools that addressthese complex and sometimes contradictory challengeswhile constructing service configurations.

The PRESTO configuration management system de-velops network device configurations from composedcollections of configlets that define the services to besupported by the target device. Written in our general-purpose hybrid scripting and configuration template lan-guage, the PRESTO system extracts specific informa-tion from external systems and databases and transformsthis information into complete device configurations asdirected by the PRESTO compiler. Extensive interac-tions with diverse engineering teams charged with man-aging operational IP networks led us to the conclusionthat, to gain wide buy-in and adoption, the PRESTOlanguage must adhere closely to the complex and often

2007 USENIX Annual Technical ConferenceUSENIX Association 73

Page 2: Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System Design and Experience William Enck, Patrick McDaniel Pennsylvania State University

low level configuration languages supported by networkdevice vendors (e.g., for Cisco devices, the IOS com-mand language). PRESTO empowers network designers,“power users” comfortable with native network deviceconfiguration languages, and automates the unambigu-ous translation of their design rules into precise networkconfigurations. Specifically, the PRESTO system gen-erates complete device-native configurations, which canthen be downloaded into a device and archived by net-work operators.

In this paper, we present the motivation, design, andworkflow of the the PRESTO configuration managementsystem. We outline the challenges faced by a large net-work provider in installing and maintaining millions ofdiverse devices for thousands of customers and orga-nizations, and reflect on the failures of past networkmanagement systems to address these needs. We out-line the PRESTO workflow and configuration languageand demonstrate its features through an extended exam-ple. Finally, we discuss future work and detail prelim-inary experiences in deploying services in operationalnetworks.

To date, PRESTO has concentrated on developing“greenfield” configs–configuration of new routers or ser-vices in a new installation. Such an approach avoidsthe inherent complexity of dealing with post-turn-up ma-nipulation of fielded configurations resulting from soft-ware updates, performance tuning, or other maintenance.PRESTO’s role in the long-term maintenance of routers,called “brownfield”configuration, is currently evolving.While the body of the following work focuses on green-field use, we revisit this latter objective and the chal-lenges therein in our concluding remarks.

The PRESTO system evolved out of decades of expe-rience in network management. Configuration manage-ment is about more than just getting routing and filter-ing correct. It must meld together many different ser-vices that exhibit subtle interactions and dependencies.Therein lies the challenge of configuration managementin a provider network—How do we glue together manyinformation sources of myriad organizations in real timeto build a functioning device configuration? Revisitedin the following section and throughout, it is the lessonsgleaned from our experiences in meeting that goal thatdrive the PRESTO design.

The remainder of the paper proceeds as follows: Sec-tion 2 discusses motivation and requirements; Section 3overviews the PRESTO work flow; Section 4 describesthe language extensions provided by PRESTO; Section 5incorporates these language extensions into a usable sys-tem, recognizing that input data is rarely pristine; Sec-tion 6 describes our experience using PRESTO for a realapplication within an enterprise network; Section 7 dis-cusses related work; and Section 8 concludes.

2 Configuration Automation

In this section, we discuss the need for automation by de-scribing current best practices and their limitations. Wethen describe the challenges an automated configurationgeneration system must face in large provider networks.

A router configuration file1 provides the detailed spec-ification of the router’s configuration, which in turn de-termines the router’s behavior. In essence, the config-uration file is a representation of the sequence of spe-cific commands that if typed through the command lineinterface determine the wide set of interdependent op-tions in router hardware and software configuration. Inpractice, this may represent thousands of lines of com-plex commands, per router. These configuration files aretext artifacts – described in a device specific commandlanguage, with a device specific syntax in a human andmachine readable format, in some cases in XML. It isworth noting that a plethora of network devices beyondrouters, e.g. Ethernet switches and firewalls, rely on con-figuration files of this type. To some degree, this state ofaffairs reflects natural technology evolution and the mar-ketplace – networks started (and often still start) smalland therefore often gravitated toward manual or (ad hoc)scripted configuration.

Today’s configuration languages offer myriad com-plex options, typically described in precise low leveldevice-specific languages, such as Cisco’s IOS commandlanguage [8]. While the learning curve for such lan-guages might be steep and the cost of inadequate learningsevere (small slipups may cause large network outages),these languages are in extremely wide use for the entirelifecycle of network management – starting with configu-ration, but encompassing all other aspects, including per-formance, fault and security management. The tack thatthe PRESTO system takes is to leverage these “native”languages, and empower the user of these languages toenforce the precise translation of design intent into de-tailed device configuration.

Our interactions with network designers revealed thatusing templates to describe design intent is essentiallyuniversal. That is, designers create parameterized chunksof design configurations to describe intent. Accordingly,PRESTO provides full and flexible support for templatecreation in the native device configuration language.

2.1 Need for AutomationDecades of experience in network management havetaught us that manual configuration practices are limitedin the following ways, i.e., the configuration process is:• costly, time-consuming, and unscalable: There is a sig-nificant initial investment in the interpretation and docu-mentation of network standards and device-specific in-

2007 USENIX Annual Technical Conference USENIX Association74

Page 3: Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System Design and Experience William Enck, Patrick McDaniel Pennsylvania State University

terfaces in developing any new service or support for adevice. The result of that investment is a “model” con-figuration document (sometimes termed an Engineeringand Troubleshooting Guidelines (ETG) document) usedby enablers2 to manually configure each target device.Typically performed by a large network engineering or-ganization and depending on the complexity of the ser-vice or device, this process can take many person monthsof effort to complete and is an expensive process. Thesubsequent manual application of the model configura-tions to customer networks is also costly—some largecustomers may have tens of thousands of network ele-ments, and applying a new configuration to even a frac-tion of them verges on the intractable.

• prone to misinterpretation and error: Even under thebest of circumstances, engineering guidelines will not beperfect. Because network designers cannot anticipate allpossible target environments, the guidelines are neces-sarily ambiguous, sometimes imprecise, and often sub-ject to multiple interpretations. Thus, different enablersmay interpret the same rules differently or adopt differ-ent local conventions. Differences between interpreta-tions can and often do result in configuration mismatcherrors. Making matters worse, while some errors mightbe easier to detect, others might have no immediate ef-fect. These latter configuration problems are the mostvexing, as they may become manifest at periods of highload (possibly the worst possible time) or introduce un-detected security vulnerabilities.

• fraught with ambiguous, incorrect, changing or un-available input data: Configuration information is notonly spread across multiple data sources, but may be in-complete and imperfect. For example, customer orderdatabases may not reflect the latest needs of the customer,e.g., order updates may only exist as emails to humancontacts and may not quickly (or ever) be reflected in adatabase. As another example, information such as IPaddress assignments may be missing at the time of initialconfiguration. Finally, rules might be ambiguous. Wehave, for example, encountered examples where a partic-ular service mandated that a site may have dual routerswith ISDN backup, but it was not obvious which routermust be the backup and which the primary.

2.2 Requirements

The pervasive practices and technical organizationalproblems detailed above makes automation in providernetworks difficult. These issues lead to the following re-quirements, i.e., the configuration process must:

• Support existing configuration languages: While therehave been many prior strong efforts at automating config-uration generation, most have focused on developing ab-

stract languages or associated formalisms to specify con-figurations in a vendor neutral fashions, e.g., IETF stan-dard SNMP MIBs (Simple Network Management Proto-col, Management Information Bases) [5], and the Com-mon Information Model (CIM) [9]. These informationmodels define and organize configuration semantics fornetworking/computing equipment and services in a man-ner not bound to a particular manufacturer or implemen-tation. However, such generalized abstractions invari-ably introduce layers of interpretation between the spec-ification and device. Gaps open between general abstractspecifications and the concrete language of specific de-vice configuration. It is very difficult to avoid extendingor creating specialized common models to describe therealities of today’s rapidly evolving devices and services.The artifacts of efforts to create standards or libraries of-ten lag the marketplace. In truth, network experts oftendo not have the time or inclination to understand suchabstractions, and today nearly universally find that work-ing within native configuration interfaces is much moreefficient for initial installation and later maintenance.

• Scale with options, combinations, and infrastructure:Customer configurations are dependent on, in particular,selected service offerings, devices, firmware revisions,and local infrastructure. For example, consider a siteconnecting to a provider backbone. The seemingly sim-ple customer order has many options—does the customerrequire multiple routers to connect to the backbone orjust one? Should each have multiple links or one? Fur-ther, each router may have several WAN (Wide Area Net-work) and LAN (Local Area Network) facing interfaces,and each interface may admit specific naming conven-tions that depend on the router model and the WAN card.The physical local infrastructure (e.g., routers and net-work topologies), will often have major impact on theworkflow and content of the configuration.

• Support heterogeneous and diverse data sources:Putting together a router configuration involves collect-ing all necessary router configuration information. Suchconfiguration information may not be all available inone central repository at the time the information isneeded, but rather maybe distributed amongst a varietyof databases, which are populated by upstream work-flows. Take, for example, the customer order database.The customer information may itself have arrived at dif-ferent times, and may be split across various forms. Inlarge operational networks, information regarding cus-tomer orders and the resulting router deployment andmaintenance is potentially spread across various systemsspanning many internal organizations. An automatedconfiguration system has to be cognizant of the diver-sity of information sources (and quality of data). Impor-tantly, these data sources typically have their own persis-

2007 USENIX Annual Technical ConferenceUSENIX Association 75

Page 4: Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System Design and Experience William Enck, Patrick McDaniel Pennsylvania State University

Batch request

of related

routers

(short lived)

Parse inputs and

merge with

supplementary data

Tabular

Supplementary

Data

(long lived)

Provisioning

Database

(short lived)

Template

Library

(long lived)

Provisioning

Generator

(per router)

Router

Configuration

Text Files

Application Specific

Data Model

Input

Output

Application Specific

Master Template

Figure 1: PRESTO Workflow

tent databases (each representing large investments), anda configuration management solution faces a huge real-world hurdle if it were to attempt to replace or replicatethese databases, or even add a new persistent databaserather than extend an already existing one. To the great-est extent possible, a configuration management systemshould strive to be stateless if it is to succeed in diverseoperational environments.

3 PRESTO Overview

Two kinds of users interact with PRESTO—domain ex-perts and provisioners. Domain experts initially cod-ify the configuration services and options in active tem-plates. These are the PRESTO equivalent of the ETG,where execution of the “active” template by the compilerdirects the interpretation of the service definition for aparticular environment. At installation time, provision-ers obtain necessary configuration information from cus-tomers and other sources, that, when combined with thetemplates written by the domain expert, produce the endrouter configuration.

A more complete description of the PRESTO work-flow is presented in Figure 1. In practice, deployment isa multi-stage process including requirements reconnais-sance, initial staging configuration creation, and deviceturn-up. PRESTO is concerned with the second part ofthe process, the creation of the configuration. The con-figuration creation process begins with a batch of newconfiguration requests resulting from a network upgraderequest or customer order. The requests are submittedto PRESTO as a collection of specification inputs, wherethe relevant environment data is provided as direct inputor extracted from supplementary inventory and configu-ration databases. The data is cleansed and projected intoa service or device (application-specific) data model cre-ated for the target configurations. Finally, an active tem-plate is executed for the set of specified target devicesresulting in complete device-native configurations.

Note that these active templates, called configlets, mayconditionally or deterministically import arbitrary othertemplates or extract data as needed at compile time tocomplete a configuration. It is this “composition by re-cursive import” and dynamic data extraction from exter-nal sources that allows PRESTO to deal with the poten-tially huge number of configuration options for a config-uration.

PRESTO adopts automation to the extent possible, butallows for human intervention where needed. The pro-cess of mining and preparing configuration informationis decoupled from the process of combining input datawith configlets to produce configurations. Hence, as dis-cussed further in Section 5.1, PRESTO can be viewed asa 2-step process, each of which is automated, but withmanual intervention in between the steps. The first stepinvolves tools that can, in an automated fashion, parsedisparate databases to extract necessary configuration in-formation. The information so produced may potentiallybe inspected by an enabler, overridden if necessary, andaugmented with manually supplied missing information.The cleansed information is then the input to the secondstage of the automated processing to produce the config-uration.

4 Active Template Language

The PRESTO template language lays out the foundationover which the rest of the system is built. PRESTO doesnot define a new language, but rather a language exten-sion. This allows domain experts to leverage knowledgeof flexible native configuration languages, e.g., CiscoIOS, while creating useful abstractions, defining ser-vices, which take the form of configlets in PRESTO.Interestingly, this is in direct opposition to traditionalnetwork management interfaces which provide a singleabstraction to which any policy would have to adhere.PRESTO provides the following key characteristics:

• Data Modeling: The data used to configure a networkdevice is derived from a complex calculus of data frommany diverse sources. Hence, dealing with this requiressome means of organizing and accessing the data. Weuse the most natural choice for this task: a relationaldatabase. The schema of that database, called a model,is specific to the services and devices to which it applies,i.e., each collection of configlets works in tandem with amodel defined by the domain experts.

• Rich Template language: A straightforward view oftemplates is that this merely involves direct substitutionof “variables” by user-supplied inputs. For example,an architect may insert a variable for IP address infor-mation, that is then supplied by inputs from the user.

2007 USENIX Annual Technical Conference USENIX Association76

Page 5: Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System Design and Experience William Enck, Patrick McDaniel Pennsylvania State University

However, more complex operations and data manipula-tions are needed. For example, based on whether or nota router has a particular feature, the configuration mayneed to omit or include executable script code. Again, adevice may have one or more interfaces, and configura-tion blocks may need to be created for each (of possiblymany) interfaces. Moreover, the number of interfacesto be configured may only be available at run-time. Inshort, more sophisticated constructs than purely variablesubstitution are needed, e.g., variable expressions, con-ditionals, and loops.

• Support for Template Decomposition and Assembly:PRESTO provides support for the architect to write mul-tiple smaller templates targeted for very specific ele-ments of a configuration, i.e., services. There are sev-eral advantages to such an approach. First, supportingmultiple smaller templates simplifies creation and main-tenance. Second, this allows for templates to be writtenby multiple designers based on their expertise. This isanalogous to programming modularity, where each pro-grammer or group can be develop and maintain a partof the larger system. In the case of PRESTO, for exam-ple, a designer may better understand how to deal withvarious WAN interfaces, while another may better un-derstand issues with BGP configuration. Third, breakingtemplates down promotes reusability, as there is the po-tential to create template libraries, and reuse them acrossmultiple applications.

Before discussing specific details, we provide an ex-ample scenario to aid in the understanding of languageconstructs and design motivations. Consider the config-uration of a gateway router. The gateway connectionmay have one or more external connections. If thereare multiple connections, they may be dispersed acrossmultiple routers. Hence, these routers require config-uration knowledge of the other routers participating inthe gateway connection, e.g., IP addresses, to coordinatefailover, e.g., HSRP [17]. The template language mustsupport these relationships between connections on onerouter and between routers.

We now discuss each part of the template language inturn. After discussing the core language concepts, weintroduce additional language features that enable bettersoftware engineering practices.

4.1 Data ModelThe PRESTO template language revolves around thedata model. The templates require access to smalldata chunks describing router properties. Furthermore,multi-router relationships dictate a need to perform quicklookups for peer specific information. Such a capabil-ity is required for instance when configuring one router

(eg., a spoke) involves extracting information for anotherrouter (eg., the hub). A relational database providesjust this capability: router properties are stored in tablefields and accessed as variables; peer router propertiesare queried by specifying the router hostname. Hence,the data model becomes a database schema. We referto this database as the provisioning database and provi-sioning relational database schema where necessary toremove ambiguity.

The schema definition is application dependent. Eachapplication has different requirements on data accessibil-ity. It is no surprise that defining the schema is the mostdelicate part of applying PRESTO to a new application,hence complete flexibility is required. Despite the sup-ported flexibility, past experience has resulted in a fewrecommended guidelines. The data model should con-tain a ROUTER table indexed by a globally unique identi-fying value, e.g., the router hostname. One row will existfor each router in a provisioning request. The ROUTERtable should contain the bulk properties; however, when-ever multiple instances of a property occur, a new tableshould be created. For example, multiple LAN or WANinterfaces are semantically equivalent. Sub-router tablesshould use a multi-column primary key consisting of theROUTER table index and a unique identifier, e.g., inter-face number. Upon querying the database for all LANinterface records matching a specific router hostname,the template language produces an interface loop. Forexample; suppose LAN is a table holding all LAN-facinginputs. Then

SELECT * FROM LAN WHERE ROUTER=THIS ROUTER

selects each LAN-facing interface on a given router. It-eration specifics and syntax are discussed below.

4.2 Variable Evaluation

Variable substitution is integral to any template language;PRESTO is no exception. Variables are defined by thedata model. Templates gain access to variables by query-ing the provisioning database. The returned record de-fines a variable namespace, or context3, used to accessthe variable, e.g.:

<CONTEXT.VARIABLE>

Variables of this form are directly substituted in the tem-plate text.

Templates are written to produce a configuration filefor one router. PRESTO begins template evaluation byquerying the ROUTER table in the data model for the rowcorresponding to the current router. The returned recordpopulates the ROUTER context, which consequently al-lows templates to use ROUTER variables at any point.The template creates new contexts by making a new

2007 USENIX Annual Technical ConferenceUSENIX Association 77

Page 6: Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System Design and Experience William Enck, Patrick McDaniel Pennsylvania State University

database query; however, those variables are only acces-sible within the defined context scope. When a query re-turns multiple records, the template code within the con-text is repeated, producing a loop. For example, SELECT* FROM LAN WHERE ROUTER=THIS ROUTER has the effectof configuring multiple LAN-facing interfaces.

4.3 IterationThe PRESTO template language simulates iteration byexecuting database queries that return multiple records.The template designer creates a data driven loop bydefining a new context name, an SQL-like query, and ascope. Each row returned by the query produces an iter-ation. For example:

[INT:SELECT (*) FROM (WAN_INTERFACE) WHERE(WAN_INTERFACE.HOSTNAME=<ROUTER.HOSTNAME>)]interface serial0/<INT.SLOT>/<INT.PORT>bandwidth <INT.BANDWIDTH>ip address <INT.IP> <INT.MASK>![/INT]

Here, INT is the name of the new context. The statementassociates the INT context with the record returned byquerying the WAN INTERFACE table of the provisioningdatabase for all fields ((*)) related to the current routerhostname (note the use of <ROUTER.HOSTNAME> inthe query). The text within the INT context scope, i.e.,all text between the query statement and the context clos-ing statement, [/INT], is repeated for each returnedrecord. Field names from each record are accessible asvariables within the context, as shown. Note that newcontext definitions can be arbitrarily nested, but theycannot define scopes spanning multiple parent scopes.That is, the nested context’s closing statement must oc-cur before its parent closing statement. This constraint isconsistent with loop structures in common programminglanguages.

4.4 Conditional LogicConfiguration statements are commonly dependent onrouter properties. For example, E1 (a standard widelyused in Europe) line cards required slightly different in-terface specification than T1 (a standard widely used inthe US) cards. The PRESTO template language supportsthe inclusion and omission of configuration options withconditional statements. All conditionals have a label,condition and scope, in general:

[COND LABEL CONDITION]... template text[/LABEL]

COND indicates a conditional statement; LABEL definesa label; and CONDITION contains relational operators

that dictate if the template text between the conditionstatement and the closing statement, [/LABEL], is in-cluded. The template text can contain static strings, newcontexts, or even more conditionals. The CONDITIONitself supports arbitrary complexity of Boolean logic.Statements can be simple:

("<ROUTER.HAS FEATURE X>" eq "YES")

or more complex logic:

(("<ROUTER.HAS FEATURE X>" eq "YES") &&(("<ROUTER.HAS FEATURE Y>" eq "YES") ||

("<ROUTER.FEATURE Z>" ne "BASIC")))

4.5 Data Transformation

Configuration statements commonly require a transfor-mation of an input variable. For example, an interface IPaddress may be specified as IP and mask, i.e., x.x.x.x/y,but the router configuration language requires the IP andmask coded separately, i.e., x.x.x.x z.z.z.z. In anothercase, the template designer may need to configure thenetwork address corresponding to the input value. Toaccommodate such requirements, the PRESTO templatelanguage provides a mechanism for arbitrary extension.

A function added to the language interpreter module isreferenced within a template as a context, variable, func-tion, and argument:

<CONTEXT.NEW VARIABLE:function(args)>

Upon execution, arguments are evaluated (if they arevariables) and passed to the function. The function per-forms a specific manipulation and returns the result to anew variable in the specified context. The new variable’svalue is inserted into the template text, and it’s value isretained for later use within the context.

To aid template design, the PRESTO template lan-guage contains a core set of application agnostic func-tions. Some functions provide generic computationabilities, e.g., calc() performs simple arithmetic,sbsstr() returns a substring specified by an offsetand length, and matchre() provides regular expres-sion substring matching. Other core utility functions per-form useful conversions on common network values suchas IP address. For example:

<INT.NETIP:computeOffsetMaskIP(<INT.IP>,0)>

computes the network address of an IP specified inx.x.x.x/y form. The function, however, can calculate anyoffset of the IP, a useful feature when network policy dic-tates devices on specific offsets, e.g., the gateway is com-monly .1. Realizing a new PRESTO function involvesincluding its code in the PRESTO language interpreter.

2007 USENIX Annual Technical Conference USENIX Association78

Page 7: Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System Design and Experience William Enck, Patrick McDaniel Pennsylvania State University

4.6 Hidden Evaluation

Configuration policy occasionally requires values result-ing from complex computations. While additional do-main specific functions provide ample mechanism, tem-plate designers are encouraged to keep domain knowl-edge within the templates themselves. The motivationis twofold. First, this reduces bloat of the core lan-guage. Second, as function definitions require program-ming, and most template designers do not possess thenecessary skills, or are simply unwilling, to create newfunctions. Therefore, we have added only a small num-ber of generic primitive operations to the core languagein an application.

As described to this point, the template language is notconducive to performing complex computation withinthe templates themselves. All functions return text thatis inserted into the end router configuration. Multi-stepcomputations therefore become difficult, if not impossi-ble. To overcome this issue, the language supports hid-den evaluation:

[EVAL LABEL noprint]... template statements[/LABEL]

Statements within the LABEL scope produce no output.Computation within EVAL blocks is not limited to

simple multi-step functional transformations. In prac-tice, we leveraged the hidden evaluation interface to pro-vided a multitude of features. For example, databaseSELECT queries were used to lookup values in supple-mental data tables. Values were assigned to higher levelcontexts, e.g., ROUTER, and used throughout the tem-plate. The EVAL blocks also proved useful to determinevalues that depended on multiple conditionals. The con-ditional logic was performed once, and the value wasused many times thereafter.

4.7 Template Assembly

Managing one large template becomes unwieldy. Soft-ware engineering experience recommends modular code.Templates are no exception. Using many small tem-plates, or configlets provides many beneficial sideeffects. It allows a template designer to concentrate onone feature at a time. For example, a configlet can bewritten for each network access type used for the WANinterface of a router. Later, depending on the router pro-visioning data, the correct configlet is chosen. By in-cluding configlets on demand, complicated conditionallogic is avoided. Additionally, as configlets are only in-serted where applicable, they can be written with certainassumptions in mind. This reduces complexity withinthe configlets themselves. Finally, as configlets can in-

clude other configlets, the template designer can exploitcommonality between configlets.

PRESTO stores all configlets in a template library.Configlets can be included at any point. The languageprovides a special syntax for including configlets:

[INCLUDE FROM (FEATURE) WHERE(FEATURE.TYPE=SOME_TYPE)]

In our above example, the correct WAN interface con-figlet is included using the <ROUTER.ACCESS TYPE>variable:

[INCLUDE FROM (WAN) WHERE(WAN.ACCESS_TYPE=<ROUTER.ACCESS_TYPE>)]

4.8 Example Configlet

Once the data model and configlet organization are de-termined, writing the configlets themselves is straight-forward. We now provide a quick example to show howeach of the language primitives come together. Considera network topology where the Internet edge has two ac-cess lines, each connected to one router, and the tworouters establish load sharing of in and outbound traffic.The most complex configlet will define the WAN routingprotocol. Figure 2 provides an example.

The example begins by including the interface con-figlet defining the back to back connection between thetwo routers. Multiple connection types may be sup-ported. Instead of using a large conditional statementto pick the right interface definition, the INCLUDE state-ment allows the relational database to perform the con-ditional logic and simplify the code the domain expertmust specify.

The BGP block defines the WAN routing protocol con-figuration. Here, a SELECT queries for network ad-dresses specific to the peer router. The configlet alsoperforms a domain specific sanity check that warns theenabler if the two routers’ back to back IP address arein different networks. This check is placed within andEVAL block to keep warning text from leaking into theend configuration. Finally, the SELECT query is alsoused to determine information specific to the WAN con-nection. Here, the data model specifies that WAN in-terface specifics be placed in a separate table. The con-figlet selects the correct table row using the HOSTNAMEforeign key. The Cisco IOS network and neighborcommands require a translation of the available informa-tion, therefore the computeOffsetMaskIP() func-tion is used. In this case, the remote peer is always thesecond IP in the network, therefore the domain expert isable to code the neighbor’s IP directly using the offsetfunction.

2007 USENIX Annual Technical ConferenceUSENIX Association 79

Page 8: Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System Design and Experience William Enck, Patrick McDaniel Pennsylvania State University

%% Configure Back-to-Back Interface to Peer Router[INCLUDE FROM (B2B) WHERE (B2B.TYPE=<ROUTER.B2B_TYPE>)]%% Define the BGP configurationrouter bgp <ROUTER.LOCAL_ASN>network <ROUTER.LOOPBACK0> mask 255.255.255.255[PEER:SELECT FROM (ROUTER) WHERE (ROUTER.HOSTNAME=<ROUTER.PEER>)]%% Ensure the peer is in the same network[EVAL B2B_CHECK noprint][COND WRONGNET ("<ROUTER.B2BNET:computeOffsetMaskIP(<ROUTER.B2B_IP>,0)>" \

ne "<PEER.B2BNET:computeOffsetMaskIP(<PEER.B2B_IP>,0)>" ) ]<ROUTER.WARNING:templateWarning(<ROUTER.HOSTNAME> and <PEER.HOSTNAME> different B2B Net)>[/WRONGNET][/B2B_CHECK]network <PEER.NETIP:computeOffsetMaskIP(<PEER.B2B_IP>,0)> mask 255.255.255.252neighbor <PEER.B2B_IP> remote-as <ROUTER.LOCAL_ASN>neighbor <PEER.B2B_IP> next-hop-self[/PEER][WAN:SELECT FROM (WAN) WHERE (WAN.HOSTNAME=<ROUTER.HOSTNAME>)]network <WAN.NETIP:computeOffsetMaskIP(<WAN.IP>,0)> mask <WAN.MASK:computeMask(<WAN.IP>)>%% The gateway is the second IP in the subnet (for this example)neighbor <WAN.GW:computeOffsetMaskIP(<WAN.IP>,1)> remote-as <WAN.REMOTE_ASN>[/WAN]no auto-summary!

Figure 2: Example configlet defining the WAN routing protocol of a two-line two-router configuration

5 The PRESTO SystemThe template language and data model provides a mech-anism to describe configuration policy; however, it mustbe incorporated into a usable system. In an ideal world,an engineer receives a request for a group of relatedrouters with all required input information available andcorrect. This would allow a straightforward applicationof the data model and templates to act upon inputs. Asshown in Figure 1, the per-request input data is parsedand merged with tabular supplementary rules to create aone time database. Then, for each router in the request,a master active template is executed by the provisioninggenerator. This master template includes and executesappropriate configlets according to the input data, result-ing in a completed configuration text file.

Unfortunately, the information required to configurea router is not always readily available. In large opera-tional networks, the input data for the configuration taskspans the outputs of multiple upstream workflows, whichmay arrive at different points in time. It is therefore im-portant to be able to work with such partial informationflows and to able to handle any inconsistencies across theflows. In such a scenario, ubiquitous flow-through or fullautomation becomes extremely difficult to realize. Ac-cordingly, PRESTO provides hooks to overcome thesedifficulties, when and where they arise.

5.1 PRESTO ArchitecturePRESTO achieves nearly full automation using a 2-stepprocess, see Figure 3. The goal of automation is to mini-mize user actions. PRESTO minimizes manual processesin two ways. First, it handles bulk requests. This stream-

lines the process of creating the initial router configura-tion code. Second, it requires only one point of user in-teraction at which point users provide the most minimaleffort to allow automation to complete. In PRESTO, dataprocessing proceeds in two steps, with user integrationcapabilities made available between step 1 and step 2.

Specifically, PRESTO uses a 2-step architecture to re-quest user input at the most ideal moment. The processbegins with the submission of a batch request to step 1.Step 1 pulls together and parses information from avail-able input data sources for the batch request. The outputof step 1 is the complete and unambiguous informationneeded to configure all routers in the batch. The role ofstep 1 then is to normalize and tabulate the input infor-mation and, if possible, apply defaults or inference rulesto fill in missing information. Where defaults or infer-ence are applied, the step 1 output will flag or annotatethe output, for (optional or mandatory) user inspection.In practice, we found that inference rules include manytypes of calculations, for example, ranging from assign-ing incremental BGP AS prepend values to selecting in-terface ports for network connections. The result is pre-sented to the user for validation in tabular form. Theuser is then given the option to change certain of the datato meet customer requirements (which we found some-times change faster than the ordering information sys-tems can be updated), e.g., changing interface cards, andthe batch request is submitted to step 2. Step 2 executesthe PRESTO engine described in Figure 1, combiningcomplete input information with templated policy infor-mation as described above, to produce the configurationfile for each router in the batch request. By dividing the

2007 USENIX Annual Technical Conference USENIX Association80

Page 9: Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System Design and Experience William Enck, Patrick McDaniel Pennsylvania State University

Batch request

of related

routers

(short lived)

Parse inputs and

merge with

supplementary data

Tabular

Supplementary

Data

(long lived)

Provisioning

Database

(short lived)

Template

Library

(long lived)

Provisioning

Generator

(per router)

Router

Configuration

Text Files

Application Specific

Data Model

Input

Output

Application Specific

Master Template

Parse inputs and

apply default rules

Application Specific

Default Rules

Step 1

Step 2

Application Specific

Data Model

User corrects

and adds

missing

information

Figure 3: 2-step PRESTO Data Flow

processing in these two steps, we isolate fallout due toinadequate information to step 1, and provide the systemand its users opportunities to repair fallout and resumeautomation before proceeding to step 2, which then pro-duces the desired output.

PRESTO was designed to be input agnostic, and hasbeen adapted to be invoked in a number of differentmodalities (via web services, via database invocations,or via stand-alone interfaces), and to operate on inputsof various forms and origins (database extracts, Excelspreadsheets, XML forms). PRESTO is essentially state-less. By design, the persistent data in PRESTO is limitedto a a repository of configuration policies; the persistentdatabase of associated router configurations is external toPRESTO.

Accordingly, PRESTO applications are built on a pro-visioning data model, describing short-lived data, and apolicy data model, describing long lived data (see Fig-ure 3). We do not have space here to describe the detailsor the precise representations of these data models, andso shall describe salient features. Short lived inputs arelimited to those specific to a given batch request and con-tain information used to populate the active templates.Long lived inputs contain configuration policy. These in-puts exist in a variety of forms, including the default rulesand data models used in both steps 1 and 2. The policydata model captures rarely changing information, suchas the number of ports on an line access card, as well asstable configuration parameters, e.g., domain wide net-work access lists. The policy data model also encom-passes the library of templates for configlets and wholeconfigurations. As noted above, the templates describethe comprehensive configuration policy logic in the na-tive device configuration language. Typically, new tem-plates are released after significant scrutiny and test on arelease schedule (albeit at a rapid pace), accompanied inparallel with customer or user notification – as template

change leads to change in network and service behavior.All data models eventually require maintenance, in-

cluding the policy data model. PRESTO simplifies main-taining tabular supplementary data on hardware configu-ration rules, and PRESTO language templates by stor-ing both in a database. We found that tabulating hard-ware configuration data only allows for easy additions ofnew hardware options, but it also allows non-technicaldomain experts to update tables by maintaining and sub-mitting corresponding spreadsheets. As noted in Sec-tion 4, the active templates are also broken down intoconfiglets (sub-templates). The configlet concept allowslogical components to be easily updated without affect-ing the entire library.

5.2 Validating Step-2 InputsStep 1 cleans up and normalizes short lived input data.PRESTO provides an interface between steps 1 and 2to allows users to update and validate values to ensureall required data is available and correct. Users, how-ever, can make mistakes. Therefore, step 2 must per-form sanity checks; both syntactic and semantic checksare required. Syntax checks occur in the parsing phaseand test for data formats, e.g., an IP addresses are valid“dotted quads,” of the form x.x.x.x/y. Factoring parsingchecks to the parsing phase of step 2 simplifies the sys-tem and improves template readability – as the templatesare written under the assumption that domain agnosticinputs such as IP addresses are syntactically correct. Se-mantic checks are more implemented naturally in theconfiglets within the PRESTO language, as these checksare domain or application specific, e.g., a check may as-sure that an IP address is neither a network or broadcastaddress. To support these various forms of checks, thePRESTO language provides capabilities to emit errorsand warnings, via functions templateError() andtemplateWarning(), each taking a string argument.

5.3 Handling Unknown InformationWe found warning messages to be of remarkable utility,as PRESTO users insisted that the process of generationrouter configuration files proceed in spite of missing in-formation. Users often preferred to obtain configurationswith warnings that some information might be missing orinferred, rather than obtaining just an error message. Asthis eventuality may appear surprising given PRESTO’stwo step architecture, some explanation may be in order.Routers for a given project or customer are not alwaysordered or provisioned at the same time. As routers arerelated to one another through their configuration, situa-tions sometimes arise where information needed to con-figure one set of routers may depend on the ordering and

2007 USENIX Annual Technical ConferenceUSENIX Association 81

Page 10: Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System Design and Experience William Enck, Patrick McDaniel Pennsylvania State University

configuration of a second set of routers, and the infor-mation flow for this second set may be missing at thetime that PRESTO is invoked to provision the first set.In other scenarios, essential parameters such as IP ad-dresses which must be written into the configuration maybe unknown at the time of initial configuration genera-tion.

PRESTO accommodates these scenarios by allowingthe active templates to perform a sort of due diligence.Active templates perform conditional checks to see if allnon-mandatory inputs are available. To allow data avail-ability tests of values that drive iteration or looping con-structs, the PRESTO template language was extended tosupport query checking by allowing the standard SQLsyntax count(*) as COUNT. Using such a query, thenew context has access to a COUNT variable, on whichconditional logic is performed, allowing for the detectionof missing information. Where these non-mandatory in-puts are missing, configuration lines are still generatedwith dummy or inferred values, but language specificcomments ensure the produced code is still of some util-ity (e.g., the router will boot and provide basic connec-tivity), leaving to automation downstream of PRESTO tocomplete the configuration task.

5.4 ImplementationThe core PRESTO system was implemented in approxi-mately 3,000 lines of Perl code. The code is divided intotwo modules, PROVGEN.pm and PROVDB.pm, whichimplement the language interpreter and database inter-face, respectively. To accommodate a new application,we found the application specific adapters to deal withstep 1 inputs can easily grow to thousands of lines. For-tunately, however, many identical components or parsingpatterns can be easily ported from one application to an-other.

6 Experience with Real Services

The development of the PRESTO system has benefitedfrom the insights of network designers and engineersresponsible for configuring network elements for largecommercial connectivity services. A key measure of thevalue of such a tool ultimately is how useful and us-able it is in practice for this target user community. ThePRESTO system is currently being used to automate con-figuration generation for a number of different commer-cial network services. In addition to the clear pressingbenefits of a successful configuration tool for networkmanagement, such an exercise is important for the fol-lowing reasons:

• It helps us understand the type and amount of effort

Figure 4: Provider VPN

needed in the process of taking the PRESTO tooland customizing it for a new service.

• It allows us to evaluate and if required evolve thedesign of the tool in the context of a real world ap-plication and its particular requirements.

• It provides valuable feedback for improving thetool.

In this section we present our experiences with,and lessons learned from automating configuration forprovider-based VPNs via PRESTO. We first describe theservice, then outline our experiences with customizingPRESTO for this service.

6.1 Customer edge router configurationfor provider VPNs

Enterprise networks are increasingly using provider-based Virtual Private Networks (VPNs) to connect ge-ographically disparate sites. In such a VPN, each cus-tomer site has one or more customer edge (CE) routerconnecting to a one or more provider edge (PE) routersin the provider network (see Figure 4). Incoming cus-tomer traffic from a CE is encapsulated at the PE and car-ried across the provider network, decapsulated by a re-mote provider edge router and handed off to the customerCE router at a remote site of the same customer. Traf-fic belonging to different customers is segregated in theprovider backbone, and the provider network is opaqueto the customer. The predominant method for supportingprovider-based VPNs uses MPLS as the encapsulatingtechnology across the provider backbone.

From a provider perspective, a critical part of sup-porting the VPN service involves configuring the the CErouters. The tasks include configuring ACLs, interfaces,WAN (Wide Area Network) and LAN (Local Area Net-work) links and routing (e.g., OSPF and BGP) . The keychallenges pertain to heterogeneity and scale. VPN ser-vices enjoy a large and growing number of customers. Asingle customer can have hundreds to thousands of dif-ferent sites. There are a wide range of features and op-tions a customer can select that impact the configuration.

2007 USENIX Annual Technical Conference USENIX Association82

Page 11: Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System Design and Experience William Enck, Patrick McDaniel Pennsylvania State University

There are different hardware elements (router models, in-terface cards), different access type options for the CE-PE connection (e.g., Frame Relay and ATM), differentinterface speed options, and so on. Customers can opt fora range of resiliency options for each of their sites, wherethe resiliency option determines the number of routers atthe site, the number of distinct links from the site towardsthe provider network, whether there is a last-resort dialbackup to be used if the data service fails, etc. An exam-ple resiliency option is 2 CEs with 2 different links to theprovider network running in load sharing mode. In addi-tion to being already large, the features and options alsochange as the service offering improves and evolves. Forinstance, newer router models and line cards are beingconstantly added to the list of supported options.

The CE configuration task embodies many of the chal-lenges outlined in Section 2. VPN services involve fastgrowing demand, a large and increasing volume of routerorders to be configured, a huge space of feature combina-tions, and the need to support a steadily increasing slewof new features. All these factors make the overheadswith a manual-intensive configuration workflow unac-ceptably high, and make these services prime candidatesfor PRESTO automation.

6.2 A PRESTO tool for CE configurationDeveloping a PRESTO-based CE configuration tool in-volved knowledge engineering (codifying expert knowl-edge, initially only partially documented) and data mod-eling (identifying service-specific information and busi-ness rules), as well as front end user interface and backend PRESTO template development. The main tasks in-volved were:

• identifying all the service-specific information re-quired for building the CE configs, and develop-ing the resulting VPN-specific configuration datamodel.

• understanding the workflow surrounding the provi-sioning process and available data sources and de-termining how the information in the above datamodel can be extracted.

• collating the service-specific provisioning rules andbuilding the service-specific templates based on en-gineering guidelines from the service designers.

• defining the workflow of the PRESTO-based tooland developing service-specific code around thecore service-agnostic PRESTO system.

Recall that PRESTO requires an application to definetwo types of data models—a provisioning data model for

short lived data, and a policy data model for supplemen-tal data and templates. The provisioning data model pro-vides a normalized repository of data specific to the cur-rent router request set. The VPN instantiation of the pro-visioning data model placed as many fields as possible ina central ROUTER table indexed by the router hostname.This contained router specific information such as modelnumber, software version, and available customer infor-mation. Whenever multiple instances of any type of datawas required to build template iterations, a new table wascreated – that is the data model was highly normalized,as replication has risk in provisioning tasks. For VPN,this led to the creation of tables for WAN interfaces, dialbackup information, and the logical interfaces that de-fine VPN connections. In total, the resulting provision-ing data model consisted of one main ROUTER table andnine secondary tables, each containing foreign keys toreference the ROUTER table.

The longer lived policy data model was split into sup-plemental data and template data sub-models. Supple-mental data consisted mainly of data already naturallyexpressed in tabular form, e.g., mappings from cardnames to interface type and number of ports, and map-pings from strings describing interface speeds to the ac-tual value to code in the configuration. The templatedata model proved much more interesting, as it containedthe configlets used to create the actual router configura-tion. Configlets were grouped by logical features, specif-ically BASE, LAN, WAN, RESILIENCY, DAC, and B2B.as follows. The BASE table consisted of the configletrequired for all routers, e.g., hostname, password, loop-back, and motd commands. The LAN and WAN tablescontained configlets for types of interfaces, e.g., framerelay and ATM interfaces. The RESILIENCY table con-tained configlets defining the different resiliency optionsrequired by the VPN service. The DAC table containedconfiglets specific to various parts of the dial backup con-figuration. The B2B table defined special interface def-initions used where CE routers are organized in back toback configurations. Defining these new feature tablesas primitives or building blocks allowed specialized con-figlets to be easily composed and promoted knowledgeand code reuse. A total of 44 configlets containing 5414lines of statements were created.

6.3 Benefits and Experiences

Various existing processes such as accounting, customerservice, provisioning, and network management interactwith configuration management. For the PRESTO toolto be practically usable, it was critical for it to operatewithin the confines of the surrounding existing config-uration management processes and systems. This re-quirement to operate in pre-existing existing system and

2007 USENIX Annual Technical ConferenceUSENIX Association 83

Page 12: Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System Design and Experience William Enck, Patrick McDaniel Pennsylvania State University

tool environments significantly impacted the PRESTOdesign, and the extent to which the configuration genera-tion could be automated. Indeed, the 2-step PRESTO ar-chitecture and the accommodations for potential humanintervention/oversight between the two Steps were keydesign elements that resulted from this requirement.

We next revisit the requirements criteria introduced inSection 2.2 and discuss to what extent the PRESTO real-ization achieves those.• Support existing configuration languages: The

PRESTO template language achieved this goal com-pletely. A PRESTO template consists of active templatecode and configuration statements in an existing config-uration language. The template language was used to ex-tract the particular configuration context, determine thecontrol of flow in a configlet, specify rules for combiningconfiglets, and achieve variable substitution and func-tional substitution. However, the actual configlet wasspecified in the configuration language that network en-gineers are familiar with, e.g., Cicso CLI (Figure 2).• Scale with options, combinations, and infrastruc-

ture: Several aspects of the PRESTO design made it pos-sible to write a configlet once and reuse it for many dif-ferent configuration scenarios. These included the capa-bility to dynamically extract data as needed, the approachof writing small configlets for specific features of a con-figuration, and the support provided for combining con-figlets together deterministically or conditionally. Reuseof configlets was critical in ensuring that the authoringof templates for the VPN service was tractable, despitethe large number of feature combinations in this service.Our experience demonstrated that the effort required toauthor templates the first time was acceptable – any sig-nificant lags were attributable to legitimate debate on thenuances of the precise design intent and associated routercapabilities. The incremental effort in updating the sys-tem to handle new features was also low. Any updatesto the supplementary data such as a new interface cardwere realized by simply updating the relevant table inthe database, without any additional coding effort. Forsupporting a new router model, we were able to reuseall the existing templates for common features, and onlyneeded to write templates for features that were not yetcovered (e.g., a new interface type) or that were routermodel specific (e.g., rules for numbering interfaces). Infact, the ability to reuse existing templates has proven tobe a key strength of the PRESTO approach.• Support heterogeneous and diverse data sources: A

key task involved modeling the information needed forservice configuration and determining how that informa-tion could be obtained from existing data sources. Forthe VPN service, there were multiple sources of input in-formation: (i) a customer order document that containeddetails about a customers request for the service, such

as a list of sites, and the selection of choices for thatsite (as listed above, the choices included the number ofrouters, router models, number of access links, accesstypes, and resiliency option); (ii) other documents thatlisted required auxiliary information, e.g., a list of thesupported router models and cards and the type and num-ber of interfaces on them, (iii) engineering policy docu-ments that specified the configuration rules for all com-binations of customer order options. While a large sub-set of the required information could be directly gleanedfrom the various data sources described before, therewere other important information pieces that for differ-ent reasons could not be pulled automatically from anexternal source. Some of this information required ap-plication of service-specific business rules and compu-tations to available input data, while other informationneeded to be manually assigned. We found that the 2-step PRESTO architecture (see Section 5.1) was wellsuited to handle this data-heterogeneous environment. Inaddition to getting available information from a varietyof input sources, Step 1 marshalled additional requiredvalues and choices in the data model by applying service-specific rules to the available input information. The re-sulting partially populated provisioning data model wasexposed to the user at the end of Step 1 for validationand for filling in missing values. Step 2 of the systemthen successfully drove the task of actually creating theCE configurations.

One measure of the benefit an automation tool is thereduction in the amount of human mediated effort. Whilean exact apple to apple comparison is not easy, we com-pared the traditional VPN CE configuration workflow tothe PRESTO workflow. In the traditional manual config-uration workflow, engineers proceeded through a man-ual time consuming process where they collated the dif-ferent data sources, ran complex computations to de-rive additional necessary information, and then navigatedthe complex options in the customer order to create therouter configuration. In a customer order with manysites and routers, the process had to be repeated manytimes, once for each router. If dependencies existed be-tween multiple routers, engineers had also to be carefulto reflect the dependencies and build consistent configu-rations. For instance, if 2 routers had a connection be-tween them, the configuration of the interfaces on bothsides should be consistent. In contrast, the PRESTO toolfor CE configuration has a significantly more automatedworkflow, where:

1. The configuration engineer uploaded the customerorder, and begins executing the first step ofPRESTO.

2. PRESTO created a document at the end of the firststep that contained for each router in the customer

2007 USENIX Annual Technical Conference USENIX Association84

Page 13: Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System Design and Experience William Enck, Patrick McDaniel Pennsylvania State University

order a list of all the information fields needed forconfiguring the router. Of the total of 64 rele-vant fields, about 42 were auto-populated with val-ues parsed directly from the various input sources,and another 14 were auto-populated by applyingdefault engineering rules coded into PRESTO. Asmall number of fields (around eight), had to bemanually filled in.

3. The engineer reviewed this document, filled in themissing field values, overrode auto-populated val-ues if required, and initiated the execution of thesecond step of PRESTO.

4. The tool then created and returned the configura-tions for all the routers in the customer order.

The manual effort with PRESTO was reduced to fillingin a small number of values per router, reviewing auto-populated fields, and sometimes overriding them. Feed-back from user trials indicated that engineers found theapproach of auto-computing, and auto-populating fieldvalues based on default rules to be very useful, eventhough manual intervention was sometimes needed tooverride the values.

The PRESTO CE configuration tool was put throughuser trials involving engineers from across the world.This helped uncover and normalize certain configurationrules that showed regional variations under the manualprocess, reinforcing the need for the PRESTO tool. Onelesson from the user trials was that the system must notonly create correct configurations, but also must supporta streamlined and sparse user interface. Though we donot describe the details here, designing a suitable user in-terface proved non-trivial, and required several iterationsbefore it passed the acceptance threshold of users.

7 Related WorkSeveral industrial products (for example, [6, 16, 20,21, 7]) have emerged that offer support for configu-ration management. Many of these efforts have fo-cused on developing abstract languages to specify con-figurations in a vendor neutral fashion, e.g., IETF stan-dard SNMP5D MIBs [5], and the Common Informa-tion Model (CIM) [9]. These information models de-fine and organize configuration semantics for network-ing/computing equipment and services in a manner notbound to a particular manufacturer or implementation.An example of the success of such an approach is theDSL Forum’s TR-069 effort for DSL router configura-tion [10]. Yet, general router configuration via this ap-proach is challenging given rapid technology evolution,forces driving network operators and vendors towards

competitive and differentiated advantage, feature prolif-eration, and the need to continuously expand networksand features while maintaining backwards compatibility.

Boehm et. al. [3] present a system that raises theabstraction level at which routing policies are specifiedfrom individual BGP statements to a network-wide rout-ing policy. The system includes support to generate theappropriate pieces of router configuration for all routersin the network. An approach to automated provision-ing of BGP-speaking customers is discussed in [15].These efforts focus on BGP, just one component of routerconfiguration. Narain [18] seeks to bridge the gap be-tween end-to-end network service requirements, and de-tailed component configurations, by formal specifica-tion of network service requirements. Such specificationcould aid synthesis of router configurations. In contrastto these efforts, our focus in PRESTO is on the synthesisof complete, precise, and diverse network configurationsthat are readily deployable.

Several initiatives have explored configuration man-agement systems for desktop, and server environ-ments [1, 4, 2, 12]. Networked and router environ-ments often involve more complex options and inter-dependencies than desktop environments, and these so-lutions do not directly apply. That said, there is muchpotential benefit from cross-fertilization between thesedomains. Further, many of these works have emphasizeddeployment of configurations, and have placed relativelylittle effort on deciding what the configuration of a nodeshould be [2].

While the focus of PRESTO is the synthesis of config-uration files, others have looked at important orthogonalissues related to configuration management. The Net-work Configuration Protocol (NETCONF) [19, 11] effortprovides mechanisms to install, manipulate, and deletethe configuration of network devices. The NESTORproject [22] seeks to simplify configuration managementtasks which requires changes in multiple interdependentelements at different network layers by avoiding incon-sistent configuration states among elements, and facili-tating undo of configuration changes to recover an op-erational state. Others [13, 14] have looked at detailedmodeling and detection of errors in deployed configura-tions.

8 ConclusionsThe PRESTO system presented throughout represents astep toward realistic automation of massive scale config-uration management. While its genesis was mandated bythe specific needs of a single provider, the approach andinsights are universal. Central to the success of PRESTOare the satisfied mandates for the treatment of complexand evolving service definitions and customer require-ments, dealing with the hugely diverse and sometimes

2007 USENIX Annual Technical ConferenceUSENIX Association 85

Page 14: Configuration Management at Massive Scale: …...Configuration Management at Massive Scale: System Design and Experience William Enck, Patrick McDaniel Pennsylvania State University

unreliable data sources, and communication within thelingua franca of its user community.

PRESTO attempts to balance these requirements byproviding malleable and composable configlets that en-code configuration business logic directly in the targetlanguage. Integration of external data sources is per-formed by simple code embedded in templates and spe-cialized database query interfaces. This approach pro-vides network engineers with rigorous tools to clearlydefine the workflow and content of service configurationwhile maintaining the flexibility of enablers to manipu-late the resulting configurations to suit the customer orenvironment in which they will be used. Additionally,configlets are not just applicable for routers; other de-vices with text-based configuration can also benefit.

Our experiences with PRESTO in the VPN and otherservices are promising. We learned much that led us toadjust the language and the way it is used in practice, butalso confirmed that PRESTO approach is viable. How-ever, we found one of the greatest challenges of automat-ing router configuration at a massive scale is the abilityto gather and reconcile necessary input data.

Our future work extends PRESTO in two key direc-tions. First, we are deploying the tool in a wider range ofservices. Our goal here is to demonstrate that much of thePRESTO design is general, and new services supportedwith relatively little effort. The second major initiativeis to evaluate PRESTO as a platform for “brownfield”configuration in full generality, where updates may bemade by a set of systems and tools, with PRESTO amongthis set. Such tools tackle the more complex problemof projecting a configuration into a live system withoutnegative consequences, e.g., causing performance, secu-rity, or connectivity problems. These efforts may requiresignificantly more intelligence than the smart templatescurrently supported, and may require the introduction oftechniques that reason about the consequences of con-figuration with respect to the services that are alreadypresent. It is through these efforts that providers will be-gin to ease the burden of costly and error-prone configu-ration management.

Notes1The specification may in fact be split across more than one file, or

modality of description of the command set2Enablers are the personnel who implement a given service, either

staff on-site or within a provider’s Network Operations Center.3Note, this use of context is different than in context-based evalua-

tion of programming language literature.

References[1] P. Anderson. Towards a high-level machine configuration

system. In Proc. of the 8th Large Installations SystemsAdministration (LISA) Conference, 1994.

[2] P. Anderson and E. Smith. Configuration tools: Work-ing together. In Proc. of the Large Installations SystemsAdministration (LISA) Conference, December 2005.

[3] H. Boehm, A. Feldmann, O. Maennel, C. Reiser, andR. Volk. Network-wide inter-domain routing policies:Design and realization. April 2005.

[4] M. Burgess. Cfengine: a site configuration engine. InUSENIX Computing systems, Vol 8, No. 3, 1995.

[5] J. Case, M. Fedor, M. Schoffstall, and J. Davin. A simplenetwork management protocol (snmp). http://www.ietf.org/rfc/rfc1157.txt, May 1990.

[6] Cisco IP solution center. http://www.cisco.com/en/US/products/sw/netmgtsw/ps4748/index.html.

[7] Cisco Systems Inc. Cisco works small net-work management solution version 1.5. http://www.cisco.com/warp/public/cc/pd/wr2k/prodlit/snms_ov.pdf, 2003.

[8] Cisco Systems, Inc. Cisco IOS Configuration Fundamen-tals Command Reference, 2006. Release 12.4.

[9] Distributed Management Task Force, Inc. http://www.dmtf.org.

[10] DSL forum TR-069. http://www.dslforum.org/aboutdsl/tr_table.html.

[11] R. Enns. NETCONF configuration protocol.http://www.ietf.org/internet-drafts/draft-ietf-netconf-prot-12.txt, February2006.

[12] N. Desai et al. A case study in configuration managementtool deployment. In Proc. of the Large Installations Sys-tems Administration (LISA) Conference, December 2005.

[13] N. Feamster and H. Balakrishnan. Detecting BGP con-figuration faults with static analysis. In Proc. of the 2ndSymposium on Networked Systems Design and Implemen-tation (NSDI), May 2005.

[14] A. Feldmann and J. Rexford. IP network configuration forintradomain traffic engineering. In IEEE Network Maga-zine, September 2001.

[15] J. Gottlieb, A. Greenberg, J. Rexford, and Jia Wang. Au-tomated provisioning of BGP customers. In IEEE Net-work Magazine, December 2003.

[16] Intelliden. http://www.intelliden.com/.[17] T. Li, B. Cole, P. Morton, and D. Li. RFC 2281, Cisco

hot standby router protocol (HSRP). Internet EngineeringTask Force, March 1998. http://www.ietf.org/rfc/rfc2281.txt.

[18] S. Narain. Network configuration management via modelfinding. In Proc. of the Large Installations Systems Ad-ministration (LISA) Conference, December 2005.

[19] Network configuration (netconf). http://www.ietf.org/html.charters/netconf-charter.html.

[20] Opsware. http://www.opsware.com/.[21] Voyence. http://www.voyence.com/.[22] Yechiam Yemini, Alexander Konstantinou, and Danilo

Florissi. NESTOR: An architecture for network self-management and organization. IEEE Journal on SelectedAreas in Communications, 18(5):758–766, May 2000.

2007 USENIX Annual Technical Conference USENIX Association86