Cutter - Universitetet i oslo · Automating the delivery process reduces the time development and...

The Journal of Information Technology Management

Cutter IT Journal

Vol. 24, No. 8August 2011

Devops: A Software Revolutionin the Making?

Opening Statementby Patrick Debois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Why Enterprises Must Adopt Devops to Enable Continuous Delivery by Jez Humble and Joanne Molesky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Devops at Advance Internet: How We Got in the Doorby Eric Shamow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

The Business Case for Devops: A Five-Year Retrospectiveby Lawrence Fitzpatrick and Michael Dillon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Next-Generation Process Integration: CMMI and ITIL Do Devopsby Bill Phifer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Devops: So You Say You Want a Revolution?by Dominica DeGrandis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

At Arm’s Length

Automating the delivery process reducesthe time development and operations haveto spend together. This means that whendev and ops must interact, they can focuson the issues that really matter.

Arm in Arm

Development and operations often play catsand dogs during deployment. To stop thefighting, we must bring them closer togetherso they will start collaborating again.

“Some people get stuck on theword ‘devops,’ thinking thatit’s just about developmentand operations workingtogether. Systems thinkingadvises us to optimize thewhole; therefore devopsmust apply to the wholeorganization, not only thepart between developmentand operations.”

— Patrick Debois,Guest Editor

http://www.cutter.com



Cutter IT Journal®

Cutter Business Technology Council:Rob Austin, Ron Blitstein, ChristineDavis, Tom DeMarco, Lynne Ellyn,Israel Gat, Tim Lister, Lou Mazzucchelli,Ken Orr, and Robert D. Scott

Editor Emeritus: Ed YourdonPublisher: Karen Fine CoburnGroup Publisher: Chris GeneraliManaging Editor: Karen PasleyProduction Editor: Linda M. DiasClient Services: [email protected]

Cutter IT Journal® is published 12 timesa year by Cutter Information LLC,37 Broadway, Suite 1, Arlington, MA02474-5552, USA (Tel: +1 781 6488700; Fax: +1 781 648 8707; Email: [email protected]; Website:www.cutter.com). Print ISSN: 1522-7383;online/electronic ISSN: 1554-5946.

©2011 by Cutter Information LLC. All rights reserved. Cutter IT Journal®is a trademark of Cutter Information LLC.No material in this publication may bereproduced, eaten, or distributed withoutwritten permission from the publisher.Unauthorized reproduction in any form,including photocopying, downloadingelectronic copies, posting on the Internet,image scanning, and faxing is against thelaw. Reprints make an excellent trainingtool. For information about reprints and/or back issues of Cutter Consortiumpublications, call +1 781 648 8700or email [email protected].

Subscription rates are US $485 a yearin North America, US $585 elsewhere,payable to Cutter Information LLC.Reprints, bulk purchases, past issues,and multiple subscription and site licenserates are available on request.

Part of Cutter Consortium’s mission is tofoster debate and dialogue on the businesstechnology issues challenging enterprisestoday, helping organizations leverage IT forcompetitive advantage and business success.Cutter’s philosophy is that most of the issuesthat managers face are complex enough tomerit examination that goes beyond simplepronouncements. Founded in 1987 asAmerican Programmer by Ed Yourdon,Cutter IT Journal is one of Cutter’s keyvenues for debate.

The monthly Cutter IT Journal and its com-panion Cutter IT Advisor offer a variety ofperspectives on the issues you’re dealing withtoday. Armed with opinion, data, and advice,you’ll be able to make the best decisions,employ the best practices, and choose theright strategies for your organization.

Unlike academic journals, Cutter IT Journaldoesn’t water down or delay its coverage oftimely issues with lengthy peer reviews. Eachmonth, our expert Guest Editor delivers arti-cles by internationally known IT practitionersthat include case studies, research findings,and experience-based opinion on the IT topicsenterprises face today — not issues you weredealing with six months ago, or those thatare so esoteric you might not ever need tolearn from others’ experiences. No otherjournal brings together so many cutting-edge thinkers or lets them speak so bluntly.

Cutter IT Journal subscribers consider theJournal a “consultancy in print” and likeneach month’s issue to the impassioneddebates they participate in at the end ofa day at a conference.

Every facet of IT — application integration,security, portfolio management, and testing,to name a few — plays a role in the successor failure of your organization’s IT efforts.Only Cutter IT Journal and Cutter IT Advisordeliver a comprehensive treatment of thesecritical issues and help you make informeddecisions about the strategies that canimprove IT’s performance.

Cutter IT Journal is unique in that it is writtenby IT professionals — people like you whoface the same challenges and are under thesame pressures to get the job done. TheJournal brings you frank, honest accountsof what works, what doesn’t, and why.

Put your IT concerns in a business context.Discover the best ways to pitch new ideasto executive management. Ensure the successof your IT organization in an economy thatencourages outsourcing and intense inter-national competition. Avoid the commonpitfalls and work smarter while under tighterconstraints. You’ll learn how to do all this andmore when you subscribe to Cutter IT Journal.

About Cutter IT Journal

Cutter IT Journal

Name Title

Company Address

City State/Province ZIP/Postal Code

Email (Be sure to include for weekly Cutter IT E-Mail Advisor)

Fax to +1 781 648 8707, call +1 781 648 8700, or send email to [email protected]. Mail to Cutter Consortium, 37 Broadway,Suite 1, Arlington, MA 02474-5552, USA.

SUBSCRIBE TODAY

Request Online LicenseSubscription Rates

For subscription rates for online licenses,contact us at [email protected] or+1 781 648 8700.

Start my print subscription to Cutter IT Journal ($485/year; US $585 outside North America)

mailto:[email protected]








Opening Statement

3Vol. 24, No. 8 CUTTER IT JOURNAL

Despite all the great methodologies we have in IT,delivering a project to production still feels like goingto war. Developers are nervous because they have thepressure of delivering new functionality to the cus-tomer as fast as possible. On the other side, operationsresists making that change a reality because it knowschange is a major cause of outages. So the usual finger-pointing begins when problems arise: “It’s a develop-ment problem”; “Oh no, it’s an operations problem.”This tragic scene gets repeated over and over again inmany companies, much to the frustration of manage-ment and the business, which is not able to predictreleases and deliver business value as expected andrequired.

The problem is further amplified by two industrytrends: agile development and large-scale/cloud infra-structure. Agile development by its nature embraceschange and fosters small but frequent deployments toproduction. This higher frequency puts additional pres-sure on operations, which becomes even more of a bot-tleneck in the delivery process. What is remarkable isthat even though agile actively seeks collaboration fromall its stakeholders, most agile projects did not extendthemselves toward the operations people. Continuousintegration and the testing department served as thetraditional buffer between development and operations,and in that buffer, pressure has also been building up.

The second driver, large-scale/cloud infrastructure,forced operations to change the way they managed theirinfrastructure. Traditional manual labor didn’t cut it atthe scale of thousands of machines. It’s good to want todo small increments and frequent changes, but you needto have the tools to support that. Virtualization enabledops to spin up new environments very quickly, andcloud made away with the resource problem. The realdifferentiator came from two concepts: configurationmanagement and infrastructure as code. The firstenables you to describe the desired state of an infra-structure through a domain-specific language, allowingyou to both create and manage your infrastructure usingthe same tools. Infrastructure as code is a result of theincrease of SaaS and APIs used to handle infrastructure.Embracing both concepts allows operations to automate

a lot of its work — not only the initial provisioning of anew machine, but the whole lifecycle. This gave birthto the concept of agile infrastructure.

As a response to these two drivers, devops was born. Anumber of people got together and started a grass-rootsmovement that set out to remove the traditional bound-aries between development and operations. Someconsider this picking up where “traditional agile” leftoff. After all, software doesn’t bring value unless it isdeployed in production; otherwise, it’s just inventory.To tackle the problem, devops encourages cross-silocollaboration constantly, not only when things fail.

Operations becomes a valued member of the traditionalagile process with an equal voice. Far too often, SLAsnegotiated with a customer work counter to the changesrequested by the same customer. Now both functionaland nonfunctional requirements are evaluated for theirbusiness priority. This moves the usual discussionon priorities back before development starts. As bothdevelopment and operations serve the same customer,the needs of both are discussed at the same time.

Once priorities have been determined and work canstart, developers pair together with operations peopleto get the job done. This pairing allows for betterknowledge diffusion across the two traditionally sepa-rate teams. Issues such as stability, monitoring, andbackup can be addressed immediately instead of beingafterthoughts, and operations gets a better understand-ing of how the application works before it actuallydeploys it in production.

When it’s hard, do it more often. Like exercise, the moreyou practice deployment to production, the better youwill get at it. Thanks to automation, operations canintroduce small changes much more quickly to the

by Patrick Debois, Guest Editor

Get The Cutter Edge free: www.cutter.com

Like exercise, the more you practice deploymentto production, the better you will get at it.


©2011 Cutter Information LLCCUTTER IT JOURNAL August 20114

infrastructure. Practicing deployment in development,testing, quality assurance, and production environ-ments, and managing it from the same set of definedmodels in configuration management, results in morerepeatable, predictable results.

As both development and operations are now responsi-ble for the whole, lines are beginning to blur. Developersare allowed to push changes to production (after goingthrough an extensive set of tests), and just like their oper-ations counterparts, they wear pagers in case of emergen-cies. They are now aware of the pain that issues cause,resulting in feedback on how they program things.

People active in continuous integration were among thefirst to reach out to operations. In the first article in thisissue, Jez Humble and Joanne Molesky elaborate on theconcept of an “automated deployment pipeline,” whichextends the boundaries of software development to thedeployment phase. All changes to a production environ-ment follow the same flow, regardless of whether thechanges are to code, infrastructure, or data. The deploy-ment pipeline acts as sonar for issues: the further youget in the pipeline, the more robust the results get. Andyou want to track the issues as fast as possible, sincethe cost of fixing them goes up the further you go inthe process.

Feedback is available to all people: those in operationslearn what issues they might expect in production,while developers learn about the production environ-ments. Nor is feedback only of a technical nature —management and the business can learn from produc-tion trial runs what customers really want and how theyreact. This moves product development even further

into customer development and allows you to test-driveyour ideas in almost real time.

In our second article, Eric Shamow shows that, likeany other change, devops is clearly a discovery path — people and processes don’t change overnight. Acommon pattern in devops adoption seems to be thatorganizational changes go hand in hand with the intro-duction of new tools. A tool by itself will not changeanything, but the way you use a tool can make adifference in behavior, resulting in a cultural change.Therefore, driving change requires management sup-port; otherwise, the impact of the change will be limitedto some local optimization of the global process.

Shamow’s story also shows that a lot of organizationswere already taking devops-like approaches, so forthem it is nothing new. The great thing with the term“devops” is that people can use it to label their stories.And thanks to this common label, we can search forthose stories and learn from others about what worksor not.

All the focus in devops on automation to reduce cycletime and enhance repeatability makes you wonder ifwe will all become obsolete and be replaced by tools.There is no denying that automation can be an effectivecost cutter, as our next article, a case study by LarryFitzpatrick and Michael Dillon, clearly demonstrates. Butwe need to recognize the real opportunity: all the timewon by automation gains us more time to design andimprove our work. This is where the competitive advan-tage comes in. People sometimes lightly say that theirmost valuable asset is people, but it’s oh so true. Theywill give your business the edge it needs if you let them.

Agility in coding, agility in systems — it takes timeand effort to nurture these ideas, but the results canbe astonishing. The feedback cycle makes the wholedifference in quality and results. With all this closecollaboration, lines begin to blur. Will we all be develop-ers? Does everybody deploy? In a multidisciplinaryapproach, you have either the option of generalizingspecialists (the approach taken in the organizationFitzpatrick and Dillon discuss) or specializing generalists.Both can work, but if you are generalizing specialists,you get the extra depth of their specialization as well.This is sometimes referred to as having “T-shapedpeople” in the collaboration: people with in-depthknowledge (vertical) who are also able to understandthe impact in a broad spectrum (horizontal). In thetraditional organization, this would map to a matrixorganization structure, similar to the one the authorsdescribe. It’s important to rotate people through different

UPCOMING TOPICS IN CUTTER IT JOURNAL

SEPTEMBER Robert D. Scott

21st-Century IT Personnel: Tooling Up or Tooling Down?

OCTOBER Dennis A. Adams

Creative Destruction: How to Keep from Being Technologically Disrupted

5Get The Cutter Edge free: www.cutter.com Vol. 24, No. 8 CUTTER IT JOURNAL

projects and roles, and it’s equally important for thespecialists to touch base again with their respectivespecialist groups to catch up with new global changes.

Many of the initial devops stories originated in Web 2.0companies, startups, or large-scale Internet applications.In contrast with more traditional enterprises, theseorganizations have a much lighter and more flexiblemanagement structure. It’s worth nothing that severalof the large Web 2.0 companies such as Amazon andFlickr didn’t use the term “agile” but were workingwith a similar mindset. Small groups of both developersand operations people delivered new releases andunderstood that they were working toward the samegoals for the same customers.

Given these origins, many people at first dismissdevops, thinking it will never work in their traditionallymanaged enterprise. They see it conflicting with exist-ing frameworks like CMMI, agile, and Scrum. Consider,however, that most of the stories in this journal comefrom traditional enterprises, which should serve todebunk that myth. Traditional enterprises need to careeven more about collaboration, as the number of peopleinvolved makes one-on-one communication moredifficult. The term “devops” is only a stub for moreglobal company collaboration.

Our next author, Bill Phifer, clearly shows there isno reason to throw existing good practices away.They might need to be adapted in execution, but noth-ing in them inherently conflicts with devops. Focusingparticularly on CMMI for Development and ITIL V3,Phifer discusses how these two best practice modelscan be integrated to enable improved collaborationbetween development and operations. But as usefulas this mapping of CMMI and ITIL functions to eachother is, Phifer argues that it is not enough. Rather, the“result of effective integration between CMMI and ITILas applied to the devops challenge is a framework thatensures IT is aligned with the needs of the business.”

Indeed, we sometimes forget that the reason we doIT is for the business, and if the business doesn’t earnmoney, we can be out of a job before we know it. AsPhifer emphasizes, this relation to the business is animportant aspect in devops: some people get stuck onthe word “devops,” thinking that it’s just about devel-opment and operations working together. While thisfeedback loop is important, it must be seen as part ofthe complete system. Systems thinking advises us tooptimize the whole; therefore, devops must apply tothe whole organization, not only the part betweendevelopment and operations.

In our final article, Dominica DeGrandis focuses onjust this kind of systems thinking. She talks aboutthe leadership changes that will be required to enablethe “devops revolution” and outlines how statisticalprocess control can be applied to devops to increasepredictability and thus customer satisfaction. DeGrandisconcludes by describing her experience at Corbis (thesame company where David Anderson developed hisideas on applying Kanban to software development),recounting how devops-friendly practices like contin-uous integration and smaller, more frequent releasesgreatly improved the IT organization’s ability to deliverbusiness value.

Only by providing positive results to the business andmanagement can IT reverse its bad reputation andbecome a reliable partner again. In order to do that,we need to break through blockers in our thoughtprocess, and devops invites us to challenge traditionalorganizational barriers. The days of top-down controlare over — devops is a grass-roots movement similar toother horizontal revolutions, such as Facebook. The roleof management is changing: no longer just directive, itis taking a more supportive role, unleashing the powerof the people on the floor to achieve awesome results.

Patrick Debois is a Senior Consultant with Cutter Consortium’sAgile Product & Project Management practice. To understandcurrent IT organizations, he has made a habit of changing both hisconsultancy role and the domain in which he works: sometimes asa developer, manager, sys admin, tester, or even as the customer.During 15 years of consultancy, there is one thing that annoys himbadly; it is the great divide between all these groups. But times arechanging now: being a player on the market requires you to getthe “battles” under control between these silos. Mr. Debois firstpresented concepts on agile infrastructure at Agile 2008 in Toronto,and in 2009 he organized the first devopsdays conference. Sincethen he has been promoting the notion of “devops” to exchange ideasbetween these different organizational groups and show how they canhelp each other achieve better results in business. He’s a very activemember of the agile and devops communities, sharing a lot infor-mation on devops through his blog http://jedi.be/blog and twitter(@patrickdebois). Mr. Debois can be reached at [email protected].

The term “devops” is only a stub for moreglobal company collaboration.


Roughly speaking those who work in connection with the[Automated Computing Engine] will be divided into itsmasters and its servants. Its masters will plan out instruc-tion tables for it, thinking up deeper and deeper ways ofusing it. Its servants will feed it with cards as it calls forthem. They will put right any parts that go wrong. Theywill assemble data that it requires. In fact the servantswill take the place of limbs. As time goes on the calcula-tor itself will take over the functions both of masters andof servants.

— Alan Turing, “Lecture on the Automatic Computing Engine”1

Turing was — as usual — remarkably prescient inthis quote, which predicts the dev/ops division. In thisarticle, we will show that this divide acts to retard thedelivery of high-quality, valuable software. We arguethat the most effective way to provide value with ITsystems — and to integrate IT with the business — isthrough the creation of cross-functional product teamsthat manage services throughout their lifecycle, alongwith automation of the process of building, testing, anddeploying IT systems. We will discuss the implicationsof this strategy in terms of how IT organizations shouldbe managed and show that this model provides benefitsfor IT governance not just in terms of improved servicedelivery, but also through better risk management.

THE PREDICAMENT OF IT OPERATIONS

IT organizations are facing a serious challenge. Onthe one hand, they must enable businesses to respondever faster to changing market conditions, serving cus-tomers who use an increasing variety of devices. Onthe other hand, they are saddled with evermore com-plex systems that need to be integrated and maintainedwith a high degree of reliability and availability. Thedivision between projects and operations has become aserious constraint both on the ability of businesses toget new functionality to market faster and, ironically,on the ability of IT to maintain stable, highly available,high-quality systems and services.

The way organizations traditionally deliver technologyhas been codified in the established discipline of project

management. However, while the projects that createnew systems are often successful, these projects usuallyend with the first major release of the system — thepoint at which it gets exposed to its users. At this stagethe project team disbands, the system is thrown over thewall to operations, and making further changes involveseither creating a new project or work by a “business asusual” team. (The flow of value in this model is shownin Figure 1.) This creates several problems:

Many developers have never had to run the systemsthey have built, and thus they don’t understand thetradeoffs involved in creating systems that are reli-able, scalable, high performance, and high quality.Operations teams sometimes overcompensate forpotential performance or availability problems bybuying expensive kits that are ultimately never used.

Operations teams are measured according to thestability of the systems they manage. Their rationalresponse is to restrict deployment to production asmuch as possible so they don’t have to suffer theinstability that releases of poor-quality softwareinevitably generate. Thus, a vicious cycle is created,and an unhealthy resentment between project teamsand operations teams is perpetuated.

Because there are several disincentives for teams torelease systems from early on in their lifecycle, solu-tions usually don’t get near a production environ-ment until close to release time. Operation teams’tight control over the build and release of physicalservers slows the ability to test the functionality anddeployment of solutions. This means that their fullproduction readiness is usually not assessed until itis too late to change architectural decisions that affectstability and performance.

The business receives little real feedback on whetherwhat is being built is valuable until the first release,which is usually many months after project approval.Several studies over the years have shown that thebiggest source of waste in software development isfeatures that are developed but are never or rarely


Why Enterprises Must Adopt Devops to EnableContinuous Deliveryby Jez Humble and Joanne Molesky

THE POWER OF THE PIPELINE


used — a problem that is exacerbated by long releasecycles.

The funding model for projects versus operatingexpenses creates challenges in measuring the costof any given system over its lifecycle. Thus, it isnearly impossible to measure the value providedto the business on a per-service basis.

Due to the complexity of current systems, it is difficultto determine what should be decommissioned whena new system is up and running. The tendency is tolet the old system run, creating additional costs andcomplexity that in turn drive up IT operating costs.

The upshot is that IT operations must maintain an ever-increasing variety of heterogeneous systems, while newprojects add more. In most organizations, IT operationsconsumes by far the majority of the IT budget. If youcan drive operating costs down by preventing andremoving bloat within systems created by projects,you’d have more resources to focus on problem solvingand continuous improvement of IT services.

DEVOPS: INTEGRATING PROJECT TEAMS AND IT OPERATIONS

Devops is about aligning the incentives of everybodyinvolved in delivering software, with a particular empha-sis on developers, testers, and operations personnel. Afundamental assumption of devops is that achieving bothfrequent, reliable deployments and a stable productionenvironment is not a zero-sum game. Devops is anapproach to fixing the first three problems listed abovethrough culture, automation, measurement, and sharing.2

We will address each of these aspects in turn.

Culture

In terms of culture, one important step is for operationsto be involved in the design and transition (develop-ment and deployment) of systems. This principle is infact stated in the ITIL V3 literature.3 Representativesfrom IT operations should attend applicable inceptions,retrospectives, planning meetings, and showcases ofproject teams. Meanwhile, developers should rotatethrough operations teams, and representatives fromproject teams should have regular meetings with theIT operations people. When an incident occurs inproduction, a developer should be on call to assistin discovering the root cause of the incident and tohelp resolve it if necessary.

Automation

Automation of build, deployment, and testing is keyto achieving low lead times and thus rapid feedback.Teams should implement a deployment pipeline4 toachieve this. A deployment pipeline, as shown in Figure2, is a single path to production for all changes to agiven system, whether to code, infrastructure andenvironments, database schemas and reference data,or configuration. The deployment pipeline modelsyour process for building, testing, and deploying yoursystems and is thus a manifestation of the part of yourvalue stream from check-in to release. Using the deploy-ment pipeline, each change to any system is validatedto see if it is fit for release, passing through a compre-hensive series of automated tests. If it is successful, itbecomes available for push-button deployment (withapprovals, if required) to testing, staging, and produc-tion environments.

Business ProjectsMeasured on delivery

OperationsMeasured on stability

Project teams

Tech, ops, and appmanagement

DBAs Service deskUsers

Systems

Value stream (”concept to cash”)

Figure 1 — The flow of value in project-based organizations.



Deployments should include the automated provision-ing of all environments, which is where tools such asvirtualization, IaaS/PaaS, and data center automationtools such as Puppet, Chef, and BladeLogic come inhandy. With automated provisioning and management,all configuration and steps required to recreate the cor-rect environment for the current service are stored andmaintained in a central location. This also makes dis-aster recovery much simpler, provided you regularlyback up the source information and your data.

Measurement

Measurement includes monitoring high-level businessmetrics such as revenue or end-to-end transactions perunit time. At a lower level, it requires careful choiceof key performance indicators, since people changetheir behavior according to how they are measured. Forexample, measuring developers according to test cover-age can easily lead to many automated tests with noassertions. One way to help developers focus on creat-ing more stable systems might be to measure the effectof releases on the stability of the affected systems. Makesure key metrics are presented on big, visible displaysto everybody involved in delivering software so theycan see how well they are doing.

In terms of process, a critical characteristic of your deliv-ery process is lead time. Mary and Tom Poppendieckask, “How long would it take your organization todeploy a change that involves just one single line ofcode? Do you do this on a repeatable, reliable basis?”5

Set a goal for this number, and work to identify andremove the bottlenecks in your delivery process. Oftenthe biggest obstacle to delivering faster is the lengthytime required to provision and deploy to production-likeenvironments for automated acceptance testing, show-cases, and exploratory and usability testing, so this is agood place to start.

Sharing

Sharing operates at several levels. A simple but effectiveform of sharing is for development and operationsteams to celebrate successful releases together. It alsomeans sharing knowledge, such as making sure the rel-evant operations team knows what new functionality iscoming their way as soon as possible, not on the day ofthe release. Sharing development tools and techniquesto manage environments and infrastructure is also akey part of devops.

DIY Deployments

If you implemented all of the practices described above,testers and operations personnel would be able to self-service deployments of the required version of the sys-tem to their environments on demand, and developerswould get rapid feedback on the production readinessof the systems they were creating. You would have theability to perform deployments more frequently andhave fewer incidents in production. By implementingcontinuous delivery, in which systems are production-ready and deployable throughout their lifecycle, youwould also get rid of the “crunches” that characterizemost projects as they move toward release day.

However, while the practices outlined above can helpfix the new systems you have entering production, theydon’t help you fix the string and duct tape that is hold-ing your existing production systems together. Let’sturn our attention to that problem now.

• The pipeline tells you the status of what’s currently in each environment• Everyone gets visibility into the risk of each change• People can self-service the deployments they want• The pipeline provides complete traceability and auditing ability into all changes

Single path to production for all changes to your system

Everything required torecreate your productionsystem (except data) inversion control:

VersionControl

Build binaries;Run automated

tests

Self-serviceautomated

deployment to testing envs.

Self-serviceautomated

deployment to staging and prod.

Source code, infrastructureconfiguration, automatedtests, database scripts,tool chain

Every changetriggers a new

release candidate

Changes thatpass automated

tests areavailable fordeployment

Approved changes can bedeployed into user-facingenvironments on demand

Figure 2 — The deployment pipeline.

One way to help developers focus on creatingmore stable systems might be to measurethe effect of releases on the stability of theaffected systems.


DID RUBE GOLDBERG DRAW THE ARCHITECTURE DIAGRAM FOR YOUR PRODUCTION SYSTEMS?

We propose an old-fashioned method for simplifyingyour production systems, and it rests on an approachto managing the lifecycle of your services. Treat eachstrategic service like a product, managed end to endby a small team that has firsthand access to all of theinformation required to run and change the service(see Figure 3). Use the discipline of product manage-ment, rather than project management, to evolve yourservices. Product teams are completely cross-functional,including all personnel required to build and run theservice. Each team should be able to calculate the costof building and running the service and the value itdelivers to the organization (preferably directly in termsof revenue).

There are many ways people can be organized to formproduct teams, but the key is to improve collaborationand share responsibilities for the overall quality of ser-vice delivered to the customers. As the teams cometogether and share, you will develop a knowledge basethat allows you to make better decisions on what canbe retired and when.

What is the role of a centrally run operations group inthis model? The ITIL V3 framework divides the workof the operations group into four functions:6

1. The service desk

2. Technical management

3. IT operations management

4. Application management

Although all four functions have some relationship withapplication development teams, the two most heavilyinvolved in interfacing with development teams areapplication management and IT infrastructure. Theapplication management function is responsible formanaging applications throughout their lifecycle.Technical management is also involved in the design,testing, release, and improvement of IT services, inaddition to supporting the ongoing operation of theIT infrastructure.

In a product development approach, the central appli-cation management function goes away, subsumed intoproduct teams. Nonroutine, application-specific requeststo the service desk also go to the product teams. The tech-nical management function remains but becomes focusedon providing IaaS to product teams. The teams responsi-ble for this work should also work as product teams.

To be clear, in this model there is more demand forthe skills, experience, and mindset of operations peoplewho are willing to work to improve systems, but lessfor those who create “works of art” — manually config-ured production systems that are impossible to repro-duce or change without their personal knowledge andpresence.

Once your organization has reached some level ofmaturity in terms of the basics of devops as described

Products/Services Operations

Ops management Service desk

Users

Infrastructure

Value stream (”concept to cash”)

Productowner

Figure 3 — The flow of value in a product development model.



in the previous section, you can start rearchitectingto reduce waste and unnecessary complexity. Select aservice that is already in production but is still underactive development and of strategic value to the busi-ness. Create a cross-functional product team to managethis service and create a new path to production, imple-mented using a deployment pipeline, for this service.When you are able to deploy to production using thedeployment pipeline exclusively, you can remove theunused, redundant, and legacy infrastructure fromyour system.

Finally, we are not proposing that the entire serviceportfolio be managed this way. This methodology issuitable for building strategic systems where the costallocation model is not artificial. For utility systemsthat are necessary for the organization but do not differ-entiate you in the market, COTS software is usually thecorrect solution. Some of the principles and practices wepresent here can be applied to these services to improvedelivery, but a dependence on the product owner tocomplete change will restrict how much you can do andhow fast you can go. Certainly, once changes are deliv-ered by a COTS supplier, you can test and deploy thechanges much faster if you have the ability to provisionsuitable test and production environments on demandusing automation.

DEVOPS AT AMAZON: IF IT’S A TRENDY BUZZWORD,THEY’VE BEEN DOING IT FOR YEARS

In 2001 Amazon made a decision to take its “big ballof mud”7 architecture and make it service-oriented.This involved not only changing the architecture of thecompany’s entire system, but also its team organization.In a 2006 interview, Werner Vogels, CTO of Amazon,gives a classic statement not only of the essence ofdevops, but also of how to create product teams and cre-ate a tight feedback loop between users and the business:

Another lesson we’ve learned is that it’s not only thetechnology side that was improved by using services.The development and operational process has greatlybenefited from it as well. The services model has been akey enabler in creating teams that can innovate quicklywith a strong customer focus. Each service has a team

associated with it, and that team is completely responsiblefor the service — from scoping out the functionality, toarchitecting it, to building it, and operating it.

There is another lesson here: Giving developers oper-ational responsibilities has greatly enhanced the qualityof the services, both from a customer and a technologypoint of view. The traditional model is that you take yoursoftware to the wall that separates development and oper-ations, and throw it over and then forget about it. Not atAmazon. You build it, you run it. This brings developersinto contact with the day-to-day operation of their soft-ware. It also brings them into day-to-day contact with thecustomer. This customer feedback loop is essential forimproving the quality of the service.8

Many organizations attempt to create small teams, butthey often make the mistake of splitting them function-ally based on technology and not on product or service.Amazon, in designing its organizational structure, wascareful to follow Conway’s Law: “Organizations whichdesign systems ... are constrained to produce designswhich are copies of the communication structures ofthese organizations.”9

IMPROVED RISK MANAGEMENT WITH CONTINUOUS DELIVERY

A common reason given for not trying devops and con-tinuous delivery in IT shops is that this approach doesnot comply with industry standards and regulations.Two controls that are often cited are segregation ofduties and change management.

Regulations and standards require organizations toprove they know what is happening and why, protectinformation and services, and perform accurate report-ing. Most IT organizations are subject to some kind ofregulation and implement controls in order to ensurethey are in compliance. Controls are also essential toreducing the risk of having bad things happen thatmay affect the confidentiality, integrity, and availabilityof information.

Segregation of duties is a concept derived from theworld of accounting to help prevent fraud and reducethe possibility of error. This control is required byregulations and standards such as SOX and PCI DSS.The relevant COBIT control states:

PO4.11 Segregation of Duties

Implement a division of roles and responsibilitiesthat reduces the possibility for a single individual tocompromise a critical process. Make sure that personnelare performing only authorized duties relevant to theirrespective jobs and positions.10

Many organizations attempt to createsmall teams, but they often make themistake of splitting them functionally basedon technology and not on product or service.


The spirit of the IT control is that one person should nothave the ability to make preventable errors or introducenefarious changes. At a basic level, you have checks andbalances to make sure this doesn’t happen. This controlcan be implemented many different ways; an extremeinterpretation of this control by some organizations isthat development, operations, and support functionsneed to be functionally and physically separated andcannot talk to each other or view each other’s systems.

Those of us who have experienced working in theseorganizations know that this level of control only servesto increase cycle time, delay delivery of valuable func-tionality and bug fixes, reduce collaboration, andincrease frustration levels for everyone. Furthermore,this type of separation actually increases the risk of errorand fraud due to the lack of collaboration and under-standing between teams. All IT teams should be ableto talk and collaborate with each other on how to bestreach the common goal of successful, stable deploy-ments to production. If your IT teams don’t talk andcollaborate with each other throughout the service/product delivery lifecycle, bad things will happen.

A Better Way: The Automated Deployment Pipeline

Reducing the risk of error or fraud in the deliveryprocess is better achieved through the use of an auto-mated deployment pipeline as opposed to isolated andmanual processes. It allows complete traceability fromdeployment back to source code and requirements. In afully automated deployment pipeline, every commandrequired to build, test, or deploy a piece of software isrecorded, along with its output, when and on whichmachine it was run, and who authorized it. Automationalso allows frequent, early, and comprehensive testingof changes to your systems — including validating con-formance to regulations — as they move through thedeployment pipeline.

This has three important effects:

1. Results are automatically documented and errorscan be detected earlier, when they are cheaper to fix.The actual deployment to production with all associ-ated changes has been tested before the real event, soeveryone goes in with a high level of confidence thatit will work. If you have to roll back for any reason,it is easier.

2. People downstream who need to approve or imple-ment changes (e.g., change advisory board members,database administrators) can be automatically notified,at an appropriate frequency and level of detail, of whatis coming their way. Thus, approvals can be performedelectronically in a just-in-time fashion.

3. Automating all aspects of the pipeline, includingprovisioning and management of infrastructure,allows all environments to be locked down suchthat they can only be changed using automatedprocesses approved by authorized personnel.

Thus, the stated goals of the change managementprocess are achieved:11

Responding to the customer’s changing businessrequirements while maximizing value and reducingincidents, disruptions, and rework

Responding to business and IT requests for changethat will align the services with the business needs

As well as meeting the spirit of the control, thisapproach makes it possible to conform to the letter ofthe control. Segregation of duties is achieved by havingyour release management system run all commandswithin the deployment pipeline as a special user createdfor this purpose. Modern release management systemsallow you to lock down who can perform any givenaction and will record who authorized what, and whenthey did so, for later auditing. Compensating controls(monitoring, alerts, and reviews) should also be appliedto detect unauthorized changes.

IMPLEMENTING CONTINUOUS DELIVERY

Continuous delivery enables businesses to reduce cycletime so as to get faster feedback from users, reduce therisk and cost of deployments, get better visibility intothe delivery process itself, and manage the risks of soft-ware delivery more effectively. At the highest levelof maturity, continuous delivery means knowing thatyou can release your system on demand with virtuallyno technical risk. Deployments become non-events(because they are done on a regular basis), and all teammembers experience a steadier pace of work with lessstress and overtime. IT waits for the business, instead ofthe other way around. Business risk is reduced becausedecisions are based on feedback from working software,not vaporware based on hypothesis. Thus, IT becomesintegrated into the business.

Achieving these benefits within enterprises requires thediscipline of devops: a culture of collaboration between

If your IT teams don’t talk and collaborate witheach other throughout the service/productdelivery lifecycle, bad things will happen.



all team members; measurement of process, value, cost,and technical metrics; sharing of knowledge and tools;and regular retrospectives as an input to a process ofcontinuous improvement.

From a risk and compliance perspective, continuousdelivery is a more mature, efficient, and effectivemethod for applying controls to meet regulatory require-ments than the traditional combination of automatedand manual activities, handoffs between teams, and last-minute heroics to get changes to work in production.

It is important not to underestimate the complexity ofimplementing continuous delivery. We have shown thatobjections to continuous delivery based on risk manage-ment concerns are the result of false reasoning and mis-interpretation of IT frameworks and controls. Rather,the main barrier to implementation will be organiza-tional. Success requires a culture that enables collabora-tion and understanding between the functional groupsthat deliver IT services.

The hardest part of implementing this change is to deter-mine what will work best in your circumstances andwhere to begin. Start by mapping out the current deploy-ment pipeline (path to production), engaging everyonewho contributes to delivery to identify all items andactivities required to make the service work. Measure theelapsed time and feedback cycles. Keep incrementalismand collaboration at the heart of everything you do —whether it’s deployments or organizational change.

ACKNOWLEDGMENTS

The authors would like to thank Evan Bottcher, TimCoote, Jim Fischer, Jim Highsmith, John Kordyback,and Ian Proctor for giving their feedback on an earlydraft of this article.

ENDNOTES1Turing, Alan. The Essential Turing, edited by B. Jack Copeland.Oxford University Press, 2004.

2Willis, John. “What Devops Means to Me.” Opscode, Inc.,16 July 2010 (www.opscode.com/blog/2010/07/16/what-devops-means-to-me).

3ITIL Service Transition. ITIL V3. Office of GovernmentCommerce (OGC), 2007. (ITIL published a “Summary ofUpdates” in August 2011; see www.itil-officialsite.com.)

4Humble, Jez, and David Farley. “Continuous Delivery:Anatomy of the Deployment Pipeline.” informIT, 7 September2010 (www.informit.com/articles/article.aspx?p=1621865).

5Poppendieck, Mary, and Tom Poppendieck. Implementing LeanSoftware Development: From Concept to Cash. Addison-WesleyProfessional, 2006.

6ITIL Service Transition. See 3.7Foote, Brian, and Joseph Yoder. “Big Ball of Mud.” In PatternLanguages of Program Design 4, edited by Neil Harrison, BrianFoote, and Hans Rohnert. Addison-Wesley, 2000.

8Vogels, Werner. “A Conversation with Werner Vogels.”Interview by Jim Gray. ACM Queue, 30 June 2006.

9Conway, Melvin E. “How Do Committees Invent?” Datamation,Vol. 14, No. 5, April 1968, pp. 28-31.

10COBIT 4.1. IT Governance Institute (ITGI), 2007.11ITIL Service Transition. See 3.

Jez Humble is a Principal at ThoughtWorks Studios and author ofContinuous Delivery, published in Martin Fowler’s Signature Series(Addison-Wesley, 2010). He has worked with a variety of platformsand technologies, consulting for nonprofits, telecoms, and financialservices and online retail companies. His focus is on helping organiza-tions deliver valuable, high-quality software frequently and reliablythrough implementing effective engineering practices. Mr. Humblecan be reached at [email protected].

Joanne Molesky is an IT governance professional and works as aPrincipal Consultant with ThoughtWorks Australia. Her work focuseson helping companies to reduce risk, optimize IT processes, and meetregulatory compliance through the application of agile methodologiesand principles in delivering IT services. She is certified in ITILand COBIT Foundations and holds CISA and CRISC designationsthrough ISACA. Ms. Molesky can be reached at [email protected].

Advance Internet is a medium-sized company withroughly 75 employees and around 1,000 servers devotedto a wide array of internally written and modified appli-cations. When I arrived here in 2009, the company wasin the midst of rolling out a new configuration manage-ment system (CMS) that had been in the works forroughly three years. Multiple deployment failures andpoor information transfer had resulted in a completebreakdown of communication between operations anddevelopment. Developers were finding it increasinglydifficult to get code onto test or production servers, or toget new servers provisioned. When they were deployed,it was often done incorrectly or without followingrelease notes, and it often had to be redone severaltimes. Operations had no information from develop-ment about when to expect releases, and when thereleases did come, they would often be incomplete orinconsistent: they had configuration files in varied loca-tions, contained poor or outdated release notes, andoften depended on applications not installed from anyrepository. The entire operations team eventually leftunder various circumstances, and I was brought in asthe new manager of systems operations to evaluate thesituation, build a new team, and try to instill a cultureof cooperation between development and operations.

I was not aware of “devops” as a movement at the timewe started this project, although I became aware of itthrough my work and by reaching out to other pro-fessionals for suggestions and support. However, myexperience discovering these problems and solving themfrom scratch led us straight down the devops path.

THE UNDERLYING ENVIRONMENT

There were several problems that became evidentimmediately. We began to tackle these issues individu-ally, beginning with monitoring. At that time, the alertswe received were not relevant and timely, and metricswere unavailable, which led to a lot of guesswork and“from-the-gut” decision making. The build situation alsoneeded to be fixed. New servers would be built by oper-ations, be handed over to development for vetting, andthen bounce back and forth in a three- or four-week

cycle as corrections and modifications were made onboth sides. By the end of this process, there was no clearrecord of who had done what, and the next build wasinevitably condemned to failure. Even once we had got-ten an idea of what went into a successful build, therewas no automated means of repeating the process.

Monitoring and alerting turned out to be the easy partof the equation, but the one that would have the greatestimpact on the tenor of the changes to come. Rather thanattempting to fix the existing monitoring framework,we built a brand-new one using Nagios and PNP forgraphing. We then developed an application severitymatrix — based around the organization’s list of knownapplications and their relative importance — in orderto form the basis for a per-application escalation chain.With these established, we then met group by groupwith each team and the relevant stakeholders for theapplication or server in question and painstakinglylisted which services needed to be monitored, how criti-cal those services were, and when alerts should escalate.

The key piece of this was involving not just peopleresponsible for monitoring the system, but the peoplewho “owned” the system on the business side. Becausethey were participating in the discussion, they could helpdetermine when a software engineer should be pagedor when an alert could be ignored until morning, whenthey themselves should be paged, and how long to waitbefore senior management was notified. Once thesemeetings were held, we were able to move over 80%of the servers in our original alerting system to thenew one. The remaining outliers enabled us to convert“unknown unknowns” into “known unknowns” — wenow had a map of those servers with no clear owner andlittle documentation, so we knew which servers requiredthe most focus. We also established a recurring quarterlyschedule with the same stakeholders to review theirapplication settings and determine whether we neededto make adjustments in thresholds and escalations.

Once this process was complete, access to our newNagios install was given to everyone inside the technicalorganization, from the lowest-level support representa-tive to the VP overseeing our group. We met with any-one who was interested and had several training classes


Devops at Advance Internet: How We Got in the Doorby Eric Shamow

DISCIPLES OF DISCIPLINE



that demonstrated how to navigate the interface and howto build graph correlation pages. The goal was to openthe metrics to the organization. Operations may not beable to keep its eye on everything at all times, so withmore people with different interests looking at our data,problems were more likely to become apparent and deci-sions could be made and fact-checked immediately.

In the meantime, the configuration piece was movingmuch more slowly. Advance had long grown past thepoint where server deployments were manual, butwhat was in place was not much better than that man-ual deploy. There were two build systems in place —CONF REPO and HOMR — and I will stop to discussthem here purely as cautionary tales. Prior groups hadclearly attempted to improve server deployment beforeand failed. The question I faced was, why? And howcould I avoid repeating those mistakes?

PREVIOUS ATTEMPTS AT A TOOLSET, AND MOVING FORWARD

In small-scale environments, making bulk configurationchanges is not challenging. Servers are configured indi-vidually, and when a change must be made to multiplesystems, administrators typically use a tool such asclusterSSH (which sends input from the user’s work-station to multiple remote systems) or a shell script,which repeats the same changes on all of these systemssequentially. As environments become larger, however,it becomes more difficult to maintain systems in thisway, as any changes made in an ad hoc fashion mustbe carefully documented and applied across the board.Therefore, in large-scale environments, it is common touse a configuration management system, which trackschanges and applies them automatically to designatedservers.

The original configuration environment evident atthe time of my arrival at Advance was CONF REPO.CONF REPO was in fact a heavily modified version ofCFEngine. At some point in time, an inadequately testedchange in CFEngine apparently caused a widespreadsystems outage, and the ops team responded by mod-ifying CFEngine’s default behavior. A script calledcache_configs now attempted to generate a simulation

of CFEngine’s cache from configuration files and auto-matically copied this to each host. The host would thencheck what was in the new cache against what was onthe filesystem, and it would then stop and send an alertrather than replace the files if there were any discrepan-cies. Although most of this functionality could havebeen achieved by CFEngine out of the box, the teamhad elected to rewrite large parts of it in the new CONFREPO framework, with the result that as CONF REPO’scodebase diverged from CFEngine’s, the frameworkbecame unmaintainable. CONF REPO was dropped asan active concern but never fully retired, so many hostsremained on this system.

This was followed by an attempt at writing a configmanagement system from scratch, called HOMR.HOMR was really a wrapper around Subversion andfunctioned as a pull-only tool. Changes would bechecked into HOMR, and then “homr forceconfigs”would be run on the client. This would pull downall configs and replace existing ones without warning.The result was that HOMR wouldn’t be run on ahost for weeks or months while individual adminstinkered with settings; then another admin on anentirely unrelated task (changing default SSH or NSCDsettings, for instance) would run “homr forceconfigs”across the board and wipe out a bunch of servers’configs. This often resulted in production outages.

My evaluation of this situation was that operations wastrying to solve a discipline problem with a technologicalsolution. CFEngine, the originally chosen configurationmanagement tool, is a well-respected and widely usedsystem. Had it been used properly and left in place,Advance would have had a CMS years ago. Instead,due to poor discipline surrounding testing of changes,the system was abandoned and replaced with increas-ingly convoluted and difficult-to-maintain frameworks.This was not the way forward.

Although there was appeal in simply returning toCFEngine, there was enough mistrust of the tool aroundthe organization that I instead elected to get a fresh startby bringing in Puppet as our new CMS. The idea wasto create a greenfield “safe zone” of servers that wereproperly configured and managed and then to slowlyexpand that zone to include more and more of ourenvironment.

Advance is a CentOS/RHEL shop, so I started bybuilding a Cobbler server and cleaning up our exist-ing Kickstart environment so that we had a reasonableinstall configuration. I then took the base configurationout of HOMR, translated it into Puppet, and set up a

Operations was trying to solve a disciplineproblem with a technological solution.


bootstrap script in our Kickstart so that we could buildPuppet machines with our base environment. We nowhad the ability to bring up new Puppet hosts. I wrappedthe Puppet configuration directories in a Mercurial repos-itory and shut off direct write access to this directory forthe entire admin team — all changes would have to bemade through Mercurial, ensuring accountability.

The next step was to start porting configs into config-uration management (hereafter CM), building morehosts, and training my newly hired team of admins touse this system. More critically, I needed a way to getbuy-in from suspicious developers and product owners(who had heard lots of stories of “new” configurationmanagement systems before) so that I could lift up theirexisting systems, get their time and involvement inbuilding CM classes reflecting their use, and ultimatelyrebuild their production environments to prove thatwe had in fact captured the correct configuration. Thisneeded to work, it needed to be surrounded by account-ability and visible metrics, and it needed to be donequickly.

BOOTSTRAPPING THE PROCESS

There were a few key pieces of learned wisdom I tookinto this next phase that helped ensure that it was a suc-cess. This was information I’d gathered during my yearsas a system administrator, and I knew that if I couldleverage it, I’d get buy-in both from my new staff andfrom the rest of the organization. To wit:

Unplanned changes are the root of most failures.Almost all organizations resist anything like changemanagement until you can prove this to them, inblack and white.

Engineers do a much better job of selling internalinitiatives to each other than management does, evenif the manager in question is an engineer. When possi-ble, train a trainer, then let him or her train the group.

Engineers like to play with fun stuff. If you tie funstuff to what they consider “boring stuff,” they willput up with a certain amount of the “boring” to getat the “fun.”

Organizations generally distrust operations ifoperations has a lousy track record. The only wayto fix this is to demonstrate repeatedly and over along period of time that operations can execute tothe business’s priorities.

The “unplanned changes” principle was the first itemto tackle and the easiest to prove within my team. Most

engineers are at least somewhat aware of this phenome-non, even if they think that they themselves are neverresponsible for such problems. Clear metrics, graphs,and logs made this stunningly clear. Problems werealmost always caused by changes that had been madeoutside the approved process, even if those problemsdidn’t show up until a change made as part of theprocess ran up against the unplanned change weekslater. It took a lot longer to prove this to the business,because engineers will see the results in a log file andimmediately draw the picture, whereas less technicalstakeholders will require explanations and summaryinformation. But over time, the concept is one that mostpeople can understand intuitively; logs and audit trailsmade the case.

I was careful to pick a single person within the group totrain on Puppet, a move that had a number of effects. Itcreated something of a mystique around the tool (“Whydoes he get to play with it? I want in!”). It also enabledme to carefully shape that person’s methodology toensure that it was in line with my thinking, while givinghim a lot of latitude to develop his own best practicesand standards. This was a true collaboration, and itgave me a second person who could now answer CMquestions as well as I could and who could train others.

Finally, we used the carrot approach and movedthe most exciting, interesting new projects into CM.Anything new and shiny went there, regardless of itsimportance to the business. Want to build a new backupserver for everybody’s local Time Machine backups?Fine, but it’s got to go in CM, and I want to see some-one else deploy the final build to ensure it works. Afew months of this, combined with scheduled training,meant that we quickly had a full group of people atleast nominally versed in how Puppet worked, how tointeract with Mercurial, and how to push changes outto servers.

At this point we were successful, but we had not yetachieved our goals. We could easily have failed herethrough the same kind of testing failure that broughtdown CFEngine within our group. The key step wasscaling the toolset to keep pace with the number ofpeople involved.

SCALING

Once we were prepared to move beyond two peoplemaking commits to our repository, we needed to worryabout integration testing. At this point I identified twopossible strategies for dealing with this:



1. Automated integration testing with a toolset such asCucumber

2. Manual testing with a development environment

Although the first was clearly the optimal solution, thetwo weren’t mutually exclusive. While the Cucumbersolution would require a long period of bootstrappingand development, the development environment couldbe set up quickly and be used and tested immediately.We thus forked our efforts: I tasked my CM expert onthe team with figuring out the development environ-ment and how to get code moved around, and I beganthe longer-term work on the Cucumber solution.

An issue that immediately raised itself was how to dealwith potential code conflicts and the code repository.The team we had built was comfortable with scriptingand low-level coding but had limited familiarity withcode repositories. My initial plan had been to use typi-cal developer tools such as branches and tags or per-haps a more sophisticated Mercurial-specific toolsetrevolving around Mercurial Queues. As we began tointroduce more junior admins into the process, how-ever, we found an enormous amount of confusionaround merging branches, the appropriate time tomerge, how to deal with conflicts, and other typicalsmall-group issues when people encounter distributedchange control for the first time. Rather than attempt toeducate a large number of people on this quickly, wedecided to build a simple semaphore-based system toprevent two people from working on the same CMmodule simultaneously.

Some will argue that this solution is suboptimal, but itallowed us to work around the branching issue duringthe early phases of deployment. There was never — andstill isn’t — any doubt that the “right” way to do this waswith native repository technologies, but the key point forus was that not adhering to those enabled us to roll outa functioning environment with minimal conflicts andwithout the need to debug and train on complex mergeissues. This kept our admins focused on the core ele-ments of moving to real change control rather than tiedup in the details of running it. The goal was to keepit simple initially to encourage more focus on gettingsystems into CM and writing good, clean CM entries.

DISCIPLINE AND CODING STANDARDS

Again drawing from lessons learned as a systemsadministrator and dealing with maintaining admin-written scripts, I began work on Puppet, operatingfrom a series of core principles that informed decisionmaking around the environment:

Coding standards are easy. Many style guides areavailable online, but if your tool of choice doesn’thave one, write your own. It does not have to bedeeply complex, but your code should adhere totwo principles: code must be clean and readable, andoutput must be similarly clean and machine parsable.

Development should be as generic as possible sothat nobody wastes time reinventing the wheel. Ifyou are building a ticket server that runs behindApache, take the time to build a generic Apachemodule that can be used by the entire team; if it usesMySQL, build that MySQL module. Treat the codepool as a commons and make everyone responsiblefor growing it and keeping it clean.

Rely on self-correction. Once a critical mass is builtand everyone is contributing to the same code base,peers won’t tolerate bad code, and you as a managerwon’t have to execute as much oversight day-to-day.Oversight and periodic code review are still impor-tant, but it’s much more important to have a stronggroup ethic that contributing “quick” but poorly writ-ten code to the repository is unacceptable. The morethat peers find errors in each other’s code and help tofix them, the more likely the system is to be stable asa whole and the better the code will likely integrate.

Enforce the use of a development environment. Ifchanges aren’t put into place in a real, running CMenvironment and run in that environment to verifythat they function properly, there is almost no pointin doing devops — you are just abstracting theprocess of “doing it live.” The key isn’t just to putcode through a code repository — it’s to then do thesame rigorous testing and deployment that develop-ers must do before operations accepts its code asready to go.

We did all of the above, to good effect. Within two tothree months, everybody on the team had a few per-sonal virtual machines hooked up to the developmentenvironment, which they could quickly redeploy tomatch real-world systems. Code was being contributed,shared, fixed, and extended by the entire group. Whenthe CM system got upgraded and offered us new fea-tures we wanted to use, it took only very little trainingand some examples set by the specialist to enable therest of the group to start working with the new tools.

Treat the code pool as a commons andmake everyone responsible for growing itand keeping it clean.


CLOSING THE LOOP

None of the things we did to this point would havebeen helpful if we hadn’t achieved the goal of improvedcommunication and better deployment. So how did thecompany respond to these changes?

Most of the concerns from management and othergroups were focused on time to deliver new builds andserver updates, the accuracy of those builds, and com-munication with development about the processes itneeded to follow to get material onto a server. Prior toour changeover to devops-style configuration manage-ment, the CMS project was a particular sore point.Rebuilding an individual server took an average oftwo weeks and two full days of admin time. After thechange, a single engineer could do that same build inhalf a day — with active attention not fully on the build— with absolute precision. As a result, we were ableto rebuild entire CMS environments in a day. Becausedevelopers worked with us to manage configurationfiles (only operations has commit rights, but we willoften grant developers limited read access), it becamevery clear what needed to be in a package. And tofacilitate faster development, for nonproduction orQA environments such as development and load, weused the Puppet ensure => latest directive for key pack-ages. Consequently, developers only needed to checkthe newest version of their application package into arepository, and within 30 minutes Puppet would installit to their environments. In the meantime, our publiclyavailable and transparent metrics enabled them tomeasure the results of these changes and ensure thatchanges were consistent across environments.

These developments had two immediate effects. First,the folks on the old CMS project started talking loudlyabout how well our project was going, and soon otherproduct managers whose applications suffered froma situation similar to the former CMS project began torequest that their servers move into our new CM. Wesoon had a queue longer than we could handle of proj-ects fighting to be moved into the new environment.For a while, my ability to move servers in laggedbehind my ability to train my staff in doing so.

Second, senior management immediately noticed theimprovement in turnaround time and the disappearanceof debugging time due to environment inconsistency.They immediately mandated that we roll our new CMout to as many environments as possible, with the goalof fully moving all of Advance into full CM by year’send. Management recognized the importance of thisapproach and was willing to make it a business ratherthan a technological priority.

I should not underemphasize the role of PR in makingthis project a success. A tremendous amount of timewent into meeting with product owners, project man-agers, development teams, and other stakeholders toassure them about what was changing in their newenvironment. I was also willing to bend a lot of rulesin less critical areas to get what we wanted in place;we made some short-term engineering compromises toassuage doubts in exchange for the ability to gain bettercontrol of the system in its entirety. For instance, wewould often waive our rules barring code releases onFridays, or we would handle less urgent requests withan out-of-band response normally reserved for emer-gencies to ensure that the developers would feel a senseof mutual cooperation. The gamble was that allowinga few expediencies would enable us to establish andmaintain trust with the developers so we could accom-plish our larger goals. That gamble has paid off.

Puppet is now fully entrenched in every major applica-tion stack, and it is at roughly 60% penetration withinour environment. The hosts not yet in the system aregenerally development or low-use production hosts thatfor a variety of reasons haven’t been ported; we con-tinue slow progress on these. We have also begun totrain engineers attached to the development teams inthe system’s use, so that they can build modules for usthat we can vet and deploy for new environments. Thisgives the development teams increased insight into andcontrol over deployments, while allowing us to main-tain oversight and standards compliance. We have alsoestablished some parts of our Mercurial repositorieswhere developers can directly commit files (e.g., often-changing configuration files) themselves, while stillleveraging our system and change history.

NEXT STEPS

We’re not finished yet — in fact we’re just gettingstarted. Exposure to the devops community has leftme full of fresh ideas for where to go next, but I wouldsay that our top three priorities are integration andunit testing, elastic deployments, and visibility andmonitoring improvements.

For integration and unit testing, we have alreadyselected Cucumber and have begun writing a testsuite, with the goal that all modifications to ourPuppet environment will have to pass these tests inorder to be committed. This suite would not be in lieuof, but rather in addition to, our development environ-ment. The ultimate goal here is to enable developmentteams or product owners to write agile-style “user



stories” in Cucumber, which developers can translateinto RSpec and we can then run to ensure that CMclasses do what they assert they can do. The integrationtesting piece is complete; unit tests will take consider-ably more time, but we will roll these out over the nextsix months.

Elastic deployment is ready to go but has not beenimplemented. Discussion of this is beyond the scopeof this article, but the summary is that we use a combi-nation of an automated deploy script, Cobbler, Puppet,and Nagios to determine thresholds for startup andshutdown and then automatically deploy hosts aroundsome concrete business rules. The tools are in place, andwe are now looking at viable candidate environmentsfor test deployment.

Finally, visibility and monitoring, like security, are nota destination but a process. We continue to “sharpen thesaw” by examining alternate monitoring tools, differentways to slice and improve metrics, automating simplestatistical analysis such as moving trend lines, anddeveloping dashboards. We are currently working ona number of dashboard components that would allowtechnically minded developers and even senior manage-ment to put together correlations and information fromthe servers on the fly.

Devops is now entrenched in the culture and fully sup-ported by management. Now that we’ve gotten it in thedoor, it’s up to us to use it to drive our environmentfurther and even more tightly couple operations withdevelopment. We’re on the same team … now we sharethe same information and tools.

FOR THOSE BEGINNING THE JOURNEY

If you are reading this and haven’t yet begun thetransition to a functioning devops environment, thelength of the road may seem daunting. There are manyorganization-specific choices we made that may not beappropriate for you, but the key is not to get hung upon the details. I would offer the following thoughts toanyone looking to implement a solution similar to ours:

Above all else, focus on the culture change andremember that you are working in a business envi-ronment. This change in culture will only occur ifthere is total commitment both from the driver of thechange and the majority of the people who will haveto participate in the new culture.

If you must give — and be prepared to give, becauseat the end of the day, there will come an emergency

more critical than your adherence to protocol — havea procedure in place for integrating those changesonce the crisis is over. Don’t allow management orother teams to see devops as something that preventscritical change; rather, emphasize the seriousness ofbypassing it and then facilitate the changes the stake-holders need while working within your own systemfor addressing exceptions.

Don’t worry about the specific toolset — let the toolsyou choose select themselves based on the cultureand skills of the group you have. Choosing the per-fect configuration management system for a teamwhose strengths are not in the same area as the toolwill not provide the best results, and the best-in-breed solution may not be the best for you. Keepeveryone comfortable using the system and they willbe more willing to use it; tools can always be changedonce procedures are set.

Your own transparency will open up other groups.The culture of information hiding in IT can be toxicand is done for various reasons, many of which areunintentional. Whether you are on the “dev” or the“ops” side of the divide, ensure that — regardless ofthe transparency of the other side — your informa-tion, your metrics, your documentation, and yourprocedures are kept open and available for view. Beopen and willing to discuss changes in procedurewith people that are affected by them. If you allowthe devops change to be just another proceduralchange, it will fail. The change must fit the contoursof your organization, which means give and take onboth sides. If there has been a rift, you will need togive first and often to establish trust. The trust youearn at this point will buy you the leeway to do thepreviously unthinkable once the project really startsrolling.

At the time of writing, Eric Shamow was Manager of SystemsOperations for Advance Internet. Mr. Shamow has led and workedin a wide variety of operations teams for the past 10 years. He hasspent time in university environments and private industry, workedin a Wall Street startup, and at the number three comScore-ratednews site. Mr. Shamow has managed a variety of Linux and Unixenvironments, as well as several large Windows-based deployments.He recently began work as a Professional Services Engineer forPuppet Labs.

Mr. Shamow has extensive experience in automating and document-ing large-scale operational environments and has long worked to bringmetrics-driven transparency to his teams. He has a paper due to bepresented at USENIX’s LISA ‘11 on his work building an elastic vir-tual environment. Mr. Shamow can be reached at [email protected].

THE JOURNEY BEGINS

In 2005, a large organization with a fully outsourcedIT operation and a sizable application developmentfootprint was relying primarily on manpower formany aspects of development support, engineering,and operations. Having failed to leverage automation,its development, deployment, and operations latencieswere measured in months, and manual change controlsled to repeated break/fix cycles. Five years into theunderperforming outsourcing contract, the organizationlacked confidence that the outsourcer could execute onseveral key initiatives, so it decided to take back controland insource its application development. Today, thespeed, capacity, and reliability of the operation is verydifferent, due in large part to a “devops” group thatgrew within the new development organization, func-tioning at the juncture of applications and operations.

The devops movement (see Table 1) grew out of anunderstanding of the interdependence and importanceof both the development and operations disciplines inmeeting an organization’s goal of rapidly producingsoftware products and services. In this organization,a devops group emerged during the aforementionedinsourcing of the company’s application development1

capabilities. Out of a desire to achieve consistency, effi-ciency, visibility, and predictability in development,and to support the transition to iterative/agile methods,the organization consolidated its development supportresources from multiple development teams into a cen-tralized Development Services Group (DSG). The DSG’smission was to standardize practice, automate develop-ment, and drive staffing efficiencies for software config-uration management (SCM)/build, quality assurance(testing), and systems analysis activities.

From its inception five years ago, the DSG has led atransformation that delivers value far beyond initialexpectations. Serving a development community ofover 500, the group operates a centralized build farmthat handles 25,000 software builds a year and an auto-mated deployment stack that executes 1,500 softwaredeployments a year. In addition to improving reliabil-ity, velocity, and transparency, this foundation has also

been used to automate test frameworks, provisioning,monitoring, and security assurance. Most significantly,the group has captured cost savings and avoidance forthe entire IT organization.

Bumps in the Road

This journey was particularly challenging — and per-haps different from other devops stories — due to theexistence of an outsourcer that was highly resistantto change and frequently cloaked itself in the termsof the contract. This single-vendor contract coveredsubstantially all of the IT function from operations tomaintenance and development. Fortunately, certaincontractual terms (activated by outsourcer failures)allowed the organization to begin to take control ofdevelopment. Minor grumbling proved little hindrancewhen the new development organization took backcontrol of development tool support (SCM, build, issuetracking, etc.), but the resistance escalated substantiallywhen other functions proved inadequate.

Within two years of insourcing application develop-ment, development cycle times had been reduced frommonths to weeks. But this development accelerationexposed a major impedance mismatch between oper-ations and development teams. Inefficiencies in the out-sourced operation’s ticket handling, system provisioningin both development and production environments,deployment, change management, and monitoringwere exposed as major bottlenecks as well as a sourceof tension between the applications and operations


The Business Case for Devops: A Five-Year Retrospectiveby Lawrence Fitzpatrick and Michael Dillon

IF YOU WANT IT DONE RIGHT, YOU'VE GOT TO DO IT YOURSELF

Increased collaboration between operations and development

Reduced cycle times for operations activities (e.g., provisioning, deployment, change controls)

Extreme focus on automation in tools and processes

Fostering continuous improvement as both a means to the above and as a strategy to adapt to increasingly rapid changes happening in IT

Table 1 — “Devops” in a Nutshell



organizations. Driven by increasing demands for veloc-ity and the inability of the existing operational modelto service these needs, the DSG took on some capabilityhistorically reserved for operations.

A number of obstacles dogged the organization as tradi-tionally operational roles were augmented and trans-formed. The infrastructure groups feared loss of controlwithout loss of accountability and felt constrained bythe terms of the outsourcing contract. Finance was con-cerned that the organization was “paying twice” for thesame services. Compliance watchdogs didn’t know howto interpret DSG’s role in the new landscape. Isolateddevelopment teams fought the loss of their “oilers”2

to automation.

Focusing on Results

By focusing on project delivery, sticking to solvingproblems that impacted development productivity,and paying close attention to cultural issues, the DSGwas able to sidestep objections and deliver results.Experienced leaders in the group knew the devopsapproach would yield agility, flexibility, and cost bene-fits, but they had no way to quantify those benefitsa priori. Now, with five years of experience and metricsbehind them, the DSG has been able to substantiate thevalue proposition. Their experience and lessons learnedare reflected in the key decision points and guidingprinciples that follow.

GUIDING PRINCIPLES FOR THE DSG

In a period of 15 months between 2006-2007, the newlyformed DSG grew within a development organizationthat went from 50 people and eight projects to 175people and 25 projects. Within another two years, thedevelopment organization’s size had tripled again. Inorder to maintain efficiency and effectiveness duringthis rapid growth, the DSG focused on three principles:

1. Centralize

2. Remain project-focused

3. Automate wherever possible

Not only did these principles keep the group focused,but they also reinforced the value of the devops approach

throughout the organization. Corporate stakeholdersintuitively accepted the value of centralization andautomation for cost containment, and project stakeholdersaccepted the sharing of project resources in exchange forthe emphasis on project delivery, skill, and automation.

Centralize Capabilities as Practices

An obvious way to achieve economies of scale and con-sistency is to centralize. The DSG initially consolidatedpractices around SCM, testing, and systems analysis byhiring a practice lead for each area. The practice leadwas responsible for delivering skilled staff to projectteams, managing staff, and developing best practicesand procedures to be used across projects. Initially,project leadership resisted allowing the practice leads tohire for their teams, but by letting the experts hire theirown kind, the practice leads took more ownership ofproject problems and recruited more qualified staff.

Remain Project-Focused — A Hybrid Matrix

In order to neutralize the tendency of centralized orga-nizations to lose touch with those they serve, the DSGstaff was organizationally managed by their respectivepractice leads but were funded by the project teams,embedded into the project teams, and operationallymanaged by the respective project manager or otherproject leadership (see Figure 1). This dual-reportingtension — a practice boss (from DSG) and an oper-ational boss (from the project) — was necessary toensure that both the project’s and the larger organiza-tion’s goals were met. This structure reinforced impor-tant values: a commitment to delivery, a bidirectionalconduit to harness innovation from the floor and intro-duce it back through the practice to other project teams,the ability to hold to standards in the face of deliverypressure, and visibility into project team risk and per-formance through multiple management channels. Italso helped resolve funding issues common to sharedservices groups: project teams were required to fundtheir own DSG resources, even though they wererecruited and managed by DSG, while simultaneouslyholding the practices to delivery expectations.

Automate and Manage to Metrics

While the automation of development procedure andthe generation of performance metrics tend to drive effi-ciency and consistency, development teams often fail toautomate. Their reluctance is understandable — amongthe things that preclude automation are a lack of moneyfor tools, project urgencies that trump the investment oftime needed to code procedures, or a bias toward hiringnonprogrammers into support roles in order to save

This dual-reporting tension was necessary toensure that both the project’s and the largerorganization’s goals were met.


money. The DSG eliminated these obstacles by invest-ing in tools, affirming the importance and expectationof automation (tracked by metrics and published indashboards), and ensuring that skilled professionalswere hired for crucial service roles. All this was pro-vided as a service to the delivery teams.

EARLY “WINS” AND PRACTICE LEADERSHIP

Because the DSG represented a significant change tothe organization’s structure (i.e., by introducing matrixpractices), the group needed to show stakeholders earlysuccesses. A focus on recruiting technical specialistsbecame the single most important step on the roadto building DSG credibility and acceptance. The DSGhired staff with a particular profile: a high affinity forautomation, a strong sense of service to the develop-ment community, a considerable track record of deliv-ery, and expertise in their practice area.

Practice staff were targeted to key projects that hadinfluence across dimensions such as platform (.NET,Java), size, maturity, and importance to the organiza-tion.3 Despite being placed and managed from the cen-ter, the embedded practice staff was held accountablefor delivery as part of the development team in orderto defuse sensitivity to remote control from the outside.This focus on accountability helped to mitigate the keyweakness of a pure matrix organization — the drop-inengagement model — which doesn’t breed trust orenable the tight communication loops common to high-functioning teams.

As practice staff merged into the development streamsof key projects, their unique positions allowed them toobserve patterns across the organization that were hin-dering development and to identify opportunities tointroduce improvements (see Table 2). The opportuni-ties and solutions initially identified by each practicewere captured in a “state of the practice” roadmap —a set of actionable initiatives determined by attemptingto solve delivery issues with the project teams. Theseroadmaps were used to get concurrence and set expec-tations with management and peers and to hold practiceleads accountable for improvements to their practice.

By building a track record of problem solving, bothproject-centered and enterprise-wide, the practice leadswere ultimately able to move services out of develop-ment teams and into the center. Key examples of thiswere the centralized automated build infrastructure,automated test frameworks, and, later, monitoring

Domain Domain Domain

ProjectProjectProjectProjectProjectProjectProjectProjectProject

• Hire practice resources for the projects

• Help projects establish practice approach

• Provide ongoing coaching, mentoring, training, and performance management

• Oversee practice activities, including quality review of artifacts

• Conduct troubleshooting and surge support

• Establish practice standards and best practices

ConfigurationManagement

DevelopmentInfrastructure

ReleaseImplementation

ApplicationSecurity

QualityAssurance

SystemsAnalysis

Development Services Development Projects

Figure 1 — The DSG engagement strategy.

Identify a common capability whose lack of consistency or maturity is an obstacle to goals.

Recruit a practice lead to take charge of this capability in a hybrid matrix structure.

Execute by staffing the capability for real project needs.

Build a roadmap of improvements, informed by a “state of the practice” assessment.

Harvest capability and back-fill to other portfolio projects.

Need

Lead

Breed

Read

Seed

Table 2 — The Lifecycle of a Practice



and automated deployment capability. This harvestingof solutions that can be matured and sustained, thenreintroducing them back into the rest of the enterpriseportfolio, had additional benefits. It enabled flexibility,allowing the DSG to adjust staff appropriate to the ebband flow of the organization’s needs, and it formedthe basis for standardization and intellectual propertyretention in a company that had a high ratio of transientcontractors.

CAPABILITIES, SERVICE, AND VALUE

Three practices made up the initial DSG: SCM, qualityassurance (testing), and systems analysis. While qualityassurance and systems analysis practices had a sig-nificant positive impact on development teams, theirimpact on operations was limited. The SCM practice,however, played a key role in the transformation ofthe development/operations juncture, both directly(by driving automation) and indirectly (by supportingadditional areas that led to further operational trans-formations — release implementation, applicationsecurity, and development infrastructure). Twofactors contributed to the SCM practice’s pivotalrole in devops innovation:

1. A tool platform useful both to development andoperations

2. The consolidation of interaction between develop-ment and operations

The SCM toolset provides integration opportunities forcapabilities beyond code management and build. Someof these are development-focused (e.g., test automation,issue tracking, and code metrics), while many of themare also operations-focused (e.g., deployment andchange management). Figure 2 highlights the pivotalrole of the automated build platform.

In addition, SCM activity provides an opportunityto facilitate and focus the interaction between devel-opment and operations. Prior to centralization, theinteraction between development teams and operationswas strained and frequently erupted into conflict. Theexplosion of development activity overwhelmed thefew operations staff assigned to coordinate environmentchanges. Left to fend for themselves, and without clearprocedures or an understanding of operational con-cerns, development teams made matters worse. Becausethe DSG spoke both dialects — development and oper-ations — it was able to step into the middle to relievethe tension. Operations appreciated having a single,disciplined point of contact to development teams, andthe development teams appreciated having responsive,high-level services. This confluence of support createdan opportunity for real operational change.

StarTeamData Mart

CONTROLStar Team

Hudson

UNIT TEST

JUnit CloverVS Presform

3.

LABEL

StarTeam SDK

5.

CHECK IN/OUT

StarTeam/IDEs

1.

SECURITY

Fortify/Appscan 6.

SMOKE TEST

Selenium/QTP

4.

EXTRACT/BUILD

Ant, XML,PackageMake

Maven 2.

DEPLOY

Blade Logic

7.

PROD

OC

INT

DEV

AUTOMATION

Figure 2 — Automated build as a hub for other development and operations support activities.


INNOVATION AT THE DEVELOPMENT/OPERATIONSBOUNDARY

With an automation mindset and the weight (priorityand resources) of a number of business-critical develop-ment projects, the development organization, with itsDSG, gained significant leverage to drive development-style change into operations. The following sectionshighlight some of the more significant impacts of thepractices at the operations boundary.

Software Configuration Management

SCM provides the safe house for software created as theresult of portfolio investment. Control over softwareconfiguration provides transparency through build andcode-change metrics. A strong SCM capability enablesrapid deployment of new projects and serves as a cor-nerstone for interaction with the operations boundary.Failure to perform sound SCM leads to fragmentationand loss of control of the corporate asset, as each teamoperates in a vacuum. In order to ensure developmentsoundness, the DSG had to improve configuration man-agement, but gaining control of SCM drove anotherimportant innovation: the creation of an automatedbuild infrastructure.

Before the DSG took control of SCM, the configurationmanagement function was sclerosed inside the out-sourced operations group, and the configuration man-agement engineer (CME) had become an administratorto facilitate the numerous tickets necessary to generatebuilds and deployments within the tightly controlledproduction environments. In the new embedded tech-nical resource model of the DSG, the SCM practiceworked inside the development team to build anddeploy software, relieving the rest of the project teamto focus on delivery. This “build as a service” offeredby the practice lead encouraged teams to migrate theirstandalone efforts to the center in a short period of timeand facilitated the development of an automated soft-ware build infrastructure (see Table 3). Within a year,the service had broad penetration within the portfolio.The standardized build structure and metadata enabledmetrics generation from a data warehouse withoutimposing a burden on teams. Initially, the metrics plat-form provided tracking of the build from inception tostaging, and it was later extended to catch metadatafrom other automation efforts, such as unit test and testcase code coverage. This gave the organization unprece-dented transparency into the development activities ofmultiple projects.

Centralizing SCM offered both qualitative and quanti-tative value (see Table 4). An initial investment of US$500,000 launched the capability, and other than the

practice lead, all costs were borne by the projects beingserved. Taking into account only project setup, thispractice yielded annual cost avoidance of over $1 mil-lion per year, not counting indirect savings to projectteams related to service delays. In addition, consolidat-ing this practice drove efficiencies in hardware acquisi-tion and maintenance by avoiding project-specific buildplants and captured staff efficiencies. By restructuringSCM, the CM full-time employee (FTE)–to-project ratioincreased by a factor of five. Estimated using staffingratios, the savings to the organization from 2008onward are greater than $5 million per year.

Application Security

The flexibility and agility afforded by having a central-ized, automation-focused SCM group that servicedall of development enabled the rapid addition of newcapabilities, such as application security, at a cost thatwas substantially less than alternative options.

In response to increasing awareness of data securityconcerns, changing legislation related to protection ofpersonally identifiable information (PII), and lessonslearned in the financial services industry, the operationsorganization proactively sought to mitigate the riskof security flaws in the substantial quantity of customcode that ran the business. External consultants

Labor

Activities

Artifacts

Tools

Metrics

2%-3% of development labor budget

Approximately 50 concurrent projects, 500-plus developers, 12 million LOC, 9,000 builds, 350 releases, several organization-wide initiatives

CM plan, build manifest, metrics, standard automated build, and CM structure

StarTeam, Hudson, JUnit, Ant, Clover, VSPerfMon, Maven

Build time, KLOC, test results/coverage, deploy time

Table 3 — Characteristics of the SCM Practice

Effort per instance setup (30/yr)

Annual cost per instance setup

Projects per CM FTE

SCM as percentage of application development staff

3 person-months

0.5 person-months

$1.5 million $250,000

2:1 9:1

10% 2%

Before After

Table 4 — SCM/Build Metrics



proposed identifying and remediating security flawswith standalone solutions that were expensive andfailed to provide a means of institutionalizing securitydefect prevention because they relied on experts to bothassess and perform the remediation.

Instead, the DSG partnered with the organization’sCorporate Information Security Office (CISO) toestablish a “bureau model” tied to the SCM platform.Automatic code scanning4 was integrated into the buildsystem, and two FTE application security experts tai-lored the scanning to match the conditions in the codebase and the company’s security policy. Issues identi-fied by scanning were screened by the experts and fedinto the standard issue-tracking system. The CISO couldthen prioritize issues for project teams, and the securityexperts were available to help teams handle complexremediation.

The bureau model enabled the DSG to provide applica-tion security remediation for the 31 critical corporateapplications at almost one-quarter of the cost of thesolutions security consultants had proposed (see Table5). In addition, because of the high degree of automa-tion, DSG staff were able to partner with the CISO, giv-ing them improved visibility and participation in theprocess and allowing them to fulfill their mission withlittle administrative effort. Finally, the centralized appli-cation security practice had a considerable impact onapplication development teams. Because these teamswere accountable for fixing security defects detectedas a result of the security scanning, their security knowl-edge increased naturally out of the desire to avoidproblems. Thus, application security practice wasinstitutionalized as a routine part of the developmentprocess at a sustainable cost.

Change Management Interface

Rapid organizational growth and the proliferation ofdevelopment teams inexperienced with change manage-ment processes in the production environments also ledto problems. In one example caused by broken changenotification procedures, a scheduled software changewould have inadvertently taken a 250-person unitoffline. So much information flooded the system thatit was nearly impossible to monitor everything. Eachof the 1,800 change tickets per year for the upper envi-ronments generated 12-15 broadcast emails to coordi-nate dozens of development teams and six differentinfrastructure teams. Larger application teams hadstaff dedicated to watching the email stream for changesaffecting their applications and servers, but less robustteams often were surprised.

As a service to the application teams, the SCM practicedeveloped a portal that took a daily data feed from theproduction ticketing system and aggregated key infor-mation around upcoming releases, application contacts,and server inventory into an approval dashboard. Bydisplaying only relevant tickets for each project contact,the dashboard saved time and avoided mistakes (seeTable 6). This portal application helped solidify theDSG’s position as intrinsic to the organization’s success.

Release Implementation

Concurrent to the implementation of the build infra-structure and ticketing portal, a number of projectteams were investing in automating their softwaredeployments. As build automation increased the rateof software delivery, the services offered by the out-sourced operation were unable to meet installationdemand for volume, speed, and reliability. The DSGformed a nascent release implementation practice bypulling key release staff and practices from the applica-tion team that was most advanced in this regard. Afterconsolidating release services for a number of develop-ment teams, this new practice began an initiative tointroduce an automated deployment framework as aservice offering. The outsourced operations release teamresponsible for deployments was hesitant to make anychanges but was eventually recruited to the effort.

Within six months, the first proof of concept was inplace using BMC’s BladeLogic product, which wassubsequently selected as the organization’s platformfor automated deployments. In addition to enablingconcurrent code drops to multiple environments, theprocess consolidated all activity into one ticket and sixrelease teams into one. A release manager is assignedto, and gains familiarity with, a significant portion of

Total annual cost

FTE

$1.9 million

9

$540,000

2

Standalone Integrated

Table 5 — Application Security

Emails

Effort hours

34,000

1,750

3,800

400

Before

Portal

After

Portal

Table 6 — Requests for Change (RFCs) Metrics


the application portfolio, thus eliminating developmentteam labor effort. Because the operations team was ableto reduce its staff from 12 to eight positions, an initialinvestment of approximately $350,000 and an annualsupport cost of $300,000 are more than covered by costsavings (see Table 7). This infrastructure and team nowsupport over 1,500 deployments per year without staffincrease — a feat that could not have been achievedwithout automation.

RESOURCE FLEXIBILITY: ABSORBING A MERGER

The consistency and level of automation implementedby the DSG were tested when one of the company’sbusiness units acquired and merged with a similar busi-ness unit from another company, which had similar butdifferent business processes and was supported by adifferent application portfolio. During the two-year ini-tiative to merge more than 80 applications into the exist-ing portfolio, the development organization recruitedan additional 125 FTEs and executed an additional 30projects concurrently.

The practice structure distributed the hiring load andshortened the on-boarding time of new staff as a resultof the standardization and automation of all back-endactivity. Most critically, the operations boundary waspreviously a massive time sink for new teams, but withthe DSG practices established and automated, suchteams could focus on business reconciliation andwasted no time on operational tasks. Just as impor-tantly, the organization easily scaled back staff inresponse to the reduced post-merger workload withoutloss of productivity or knowledge.

TAKEAWAYS

Given the nature of the IT organization in questionprior to 2005, a devops approach driven from withinthe application development organization was theonly viable path to achieving the stability, predictabil-ity, cost savings, and productivity gains realized in thiscase study. The conventional organization structure ofseparate “apps” and “ops” hindered operations fromrecognizing and acting on their responsibility to adoptemergent capabilities to better support development.The concept of such hard and fast boundaries is inher-ently broken and counterproductive to IT’s mission tosupport the business.

At the time, operations was limited by its contract witha large outsourcer that was marching to a cadence froma different era and had repeatedly demonstrated out-right hostility to change. The inhouse infrastructure and

operations organization wanted to help but had itshands tied by its limited staff, small budget, and over-sight role toward the outsourcer. In addition, this grouphad a compliance burden that discouraged innovationat small scales — any solution they implemented had toscale instantly to the entire enterprise or risk audit find-ings. Furthermore, because they were not accountablefor the success of (and were at arm’s length from) appli-cation delivery, they didn’t experience the consequencesof high-latency and low-consistency legacy operationalprocedures. Therefore, they had no compelling reasonto adjust their or the outsourcer’s practices.

By contrast, the development organization was lessencumbered by the status quo, was directly accountablefor software delivery, and was so severely impacted byoperations and infrastructure latencies that it was in itsinterest to fix the bottlenecks. The development groupalso had the labor budget and flexibility to hire hands-on development engineers to solve problems. As atrump card, the development group could lever thepolitical weight of the business units it supported toapply additional pressure to force change. Given thecorporate aversion to risk, the DSG could justify a cen-tralized approach on the basis of providing consistencyand predictability to development. Fortuitously, theconsolidated development practices provided a focalpoint to connect the application delivery organizationwith operations. The DSG’s emphasis on automationand scalability meshed well with operations for capabil-ities that were natural extensions of the developmentsupport activities.

When the organization’s journey began, developmentleadership was convinced that a centralized approach

Effort per application

Annual cost

Release team size

1 day

$1.7 million

12

20 min

$85,000

8

Before After

Table 7 — Deployments: 2,000 per Year

The conventional organization structureof separate “apps” and “ops” hinderedoperations from recognizing and actingon their responsibility to adopt emergentcapabilities to better support development.



was the right path toward large-scale, reliable deliveryand was allowed the time necessary to quantify thebenefits. The DSG promised consistency and trans-parency, consolidated points of control, risk reduction,and economies of scale, and it was able to attain thosegoals in a relatively short time. We cannot overstate theimportance of the hybrid matrix organization model tomaintaining a primary focus on delivering for projects,while at the same time improving practice across thewhole organization and putting downward pressure onthe tendency of both project teams and centralized ser-vices groups to bloat their ranks.

Quantitatively, the DSG realized significant results. Thegroup has:

Enabled increased throughput without increasedcosts

Maintained low startup costs for new and evolvingcapabilities

Decreased latency and administrative burden

Reduced error rates

Realized both license and infrastructure savings

Gained labor efficiencies by sharing resources acrossprojects and replacing humans with automation

The DSG also realized qualitative results by:

Sharpening hiring skills

Standardizing procedure

Maintaining flexibility in scaling up and downas needed with minimal loss of discipline

Increasing operational stability

Providing transparency, which frequently mitigatedrisks and averted costly failures

While the economies of scale offered by the DSG’srelentless focus on centralization, project results, andautomation are considerable, perhaps the greatest valueto the organization of automation at the engineeringand operational boundary is increased developmentproductivity. A leading outside consultant bench-marked5 a number of the organization’s projects in 2008and again in 2011. The results show that these methodscontribute to a substantial difference in performance(see Table 8). In 2008, the benchmarks showed projectshovering around industry norms for size, cost, andquality. The 2011 numbers contrast markedly, showingfactor-of-two improvements in cost and effort and afive-fold improvement in quality.

Software LifecycleManagement

(Build, ContinuousIntegration, Test)

Software Tools — Ticket Requests

DSG Improved Operations Service Value Realized

Change Management — Personalized Portal

Change Management — Broadcast Ticketing

Application Security — Automated Service

Bureau for CISO

ApplicationSecurity —

Policy

Release Implementation — Automated

Release Engineering — Manual

Cycle

of

Inn

ovati

on

& Im

pro

vem

en

t

App Monitoring —Full MonitoringStack & Service

App Monitoring— System Logs, Ad Hoc Requests

• From 0 to 20+ apps monitored• Teams don’t staff “watchers”• Automatic vs. human notification

• Deploys/year from <100 to 1,500• Annual cost $1.7 million to $85,000• Release FTEs 12 to 8

• Apps complying 0 to 31• Annual cost $540,00 vs. $1.9 million• Dedicated FTEs from 9 to 2

• Emails from 34,000 to 1,800• Effort hours/year 1,750 to 400• Consistency ad hoc to managed

• Project start 3 person-months to 0.5 person-months• $6 million/year cost avoidance• CM FTEs/app staff 10% to 2%• Builds from hundreds to 25,000

Figure 3 — Some of the inefficient and inadequate operations services improved by DSG and the value realized.

Project cost

Schedule

Effort

QA defects

55%

50%

65%

20%

Table 8 — Fraction of Industry Norms


The case for investing in devops in 2005, when thisdevelopment organization formed its DSG, was basedon development dogma and speculation. Since then,this devops group has collected a number of concretemeasurements of its incremental value (see Figure 3).The greatest value by far is its contribution to a dou-bling of productivity for this company, which spentmany tens of millions of dollars per year in new devel-opment during this time. We hope that this case studywill provide support to others seeking to justify adopt-ing devops practices and that, in short order, devopswill become the norm within the industry.

ENDNOTES1Fitzpatrick, Lawrence. “How Craftsmanship Survives ExplosiveGrowth.” Cutter IT Journal, Vol. 23, No. 4, 2010.

2During the industrial era, an army of workers wielding oilcansconstantly tended to heavy machinery in order to forestall break-down. One can draw a parallel to operations and maintenancepersonnel performing manual break/fix production work.

3This model is similar to the military approach of assemblingunique capabilities from various units to form teams forspecific missions.

4A number of vendors provide tools for scanning source codefor security flaws.

5Mah, Michael. “IT Organization, Benchmark Thyself.”IT Metrics Strategies, Vol. 6, No. 3, 2002, pp. 1-6.

Lawrence Fitzpatrick has a record of software delivery and transfor-mational software management that spans academia and government,commercial software product development, and corporate IT over 30years. Currently, Mr. Fitzpatrick is President of Computech, Inc., aleading application development and consulting firm headquartered inBethesda, Maryland, USA. Until mid-2009, Mr. Fitzpatrick served asSenior VP of the Financial Industry Regulatory Authority (FINRA).During a time of significant change in the financial sector, he grewand ran an application development team of over 500 individuals andwas accountable for the delivery of numerous key systems. Prior tojoining FINRA in 2002, Mr. Fitzpatrick led delivery of commercialsoftware products as VP at Open Text Software (OTEX) and as CTOat Personal Library Software. Mr. Fitzpatrick has degrees from theUniversity of Virginia and Georgetown University. Mr. Fitzpatrickcan be reached at [email protected].

Michael Dillon has led government and corporate IT and applicationdelivery over the past 20 years. For the last five-plus years, Mr. Dillonhas served as Associate VP in charge of a creating, growing, and man-aging the development services group for a leading financial servicesregulatory firm. Prior to his present position, Mr. Dillon built andran multidisciplinary matrix teams that supported application devel-opment needs in the government services arena and led a numberof quality functions within government and the airline industry.Mr. Dillon can be reached at [email protected].


Computer operations used to be a lot simpler. I recallmy first several IT programming jobs, where the com-puter room was only a set of stairs away. You couldactually watch your test job running and even mountthe data tapes for it yourself. A production problemcould be handled in person (even if you had to get outof bed and drive to the office first) and was usuallydiagnosed quickly from a system return code orthrough a printed core dump to find the offendinginstructions.

THE GROWTH OF IT COMPLEXITY

Today, that’s all changed. You may be modifying acomputer application that was developed on one plat-form in India, was implemented by a California releaseteam, now runs in a complex distributed global net-work of servers, and is monitored from Argentina. Youprobably have little idea about the physical runtimeenvironments for the executable, but you still need toensure that your business customer’s service levels foravailability are maintained.

There are many more challenges now for developers,starting with the cold reality that you probably aren’tallowed access to your data center or have never evenseen it. The silos and relationships between developersand operations have never been more separate and dis-tant. Cloud computing further abstracts the connectionpoints between infrastructure and applications byadding more virtualization.

These may not represent the biggest problems, however.Solutions often are not designed for the multi-tieredarchitectures, complex technologies, and platforms in theruntime environment. They are developed without inputfrom infrastructure and operations teams and as a resultoften run inefficiently and are difficult to maintain. Thereare few cross-domain experts who are familiar with bothdevelopment and operations, and fewer still who canalso fairly represent the business. Development and

operations teams also have conflicting priorities —developers are evaluated on their ability to deliver moreenhancements, more releases, and more application com-ponents, while operations staff are driven to reduce andeliminate outages.1 These are among the many reasonswhy the devops movement is timely and relevant, sinceit aims to break down the barriers that stop organizationsfrom effectively delivering working software to the busi-ness at the pace of the business.2

Historically, many major outages and implementationfailures result from a lack of communication betweenapplications and infrastructure groups that don’t partic-ipate effectively within a complete service managementframework. Other examples of integration failuresinclude:3

Communication gaps between delivery groups

Critical errors that are not resolved in a timelymanner or are even ignored

End-to-end change management that is not effectiveor is nonexistent

Poor transition from development to production

Apps and infrastructure engineering that don’t sync,don’t collaborate, and don’t learn from one another

Infrastructure that does not always have a strongproject management culture, being more reactivethan proactive

No central governance for strategy, architecture,and standards

Best practice reference models provide discipline todevelopment teams (e.g., CMMI for Development[CMMI-DEV]) and to service management/operations(e.g., ITIL V3), but no one model crosses the chasm.CMMI-DEV is not descriptive enough of the operationalenvironment, and ITIL V3 does not really attempt toaddress application development.


Next-Generation Process Integration: CMMI and ITIL Do Devopsby Bill Phifer

IT TAKES TWO TO TANGO


CMMI AND ITIL PLAY TOGETHER FOR DEVOPS

Devops is concerned with addressing this challengethrough better collaboration between development andoperations, but how? To begin, we can take a longerlook at best practice models such as CMMI-DEV andITIL V3.

The CMMI for Development4 provides support for thesoftware development lifecycle (SDLC) through disci-plines that include requirements development, estimat-ing and project planning, product quality, configurationmanagement, and other processes such as measurementand training. However, the SDLC and best practices inthis case end at product integration and validation test-ing within CMMI. This model is strong in systems andsoftware engineering, but weak in transition.

This is essentially where ITIL V3 picks up in sufficientlydescribing service design and transition into runtimeand subsequent operations for the completed applica-tion. ITIL was written as a means of enabling and man-aging services that add value to the customer’s businessby providing a common, proven approach. The comple-mentary aspects of these two models are evident, but dothey dance together? I would answer “yes.” Each hasstrengths, but also weaknesses, as indicated in Figure 1.But taken together, they make each other more effective.

Looking at CMMI and ITIL V3 processes and disciplinestogether allows us to exploit the strengths of each model,but the synergies are most effective when their best prac-tices are fully integrated. What this means is that appli-cations must be developed not only to support intendedbusiness processes, but also with full consideration ofwhat’s needed to effectively and efficiently sustain theapplication over its life. To do that, clearly IT operationsmust be engaged much earlier in development and there

must be a system-wide culture change, giving develop-ment and operations more visibility into each other’sactivities. This is not just a tooling or communicationquestion, but an attitude adjustment. This is one viewof the devops movement.

Understanding the possibilities for process integrationbetween development and operations is most pro-nounced within the engineering process areas ofthe CMMI for Development.5 These five areas —Requirements Development, Technical Solution,Verification, Validation, and Product Integration — pro-vide a framework for applying and integrating servicemanagement processes from ITIL V3 for the greatestbenefit. Figure 2 describes a straightforward SDLC basedon CMMI engineering process areas and indicates rele-vant integration points for applicable ITIL V3 processes.

IT organizations must effectively perform this processintegration in order to ensure that applications aredesigned and built to run in the complex IT environ-ment demanded by global and Internet business.

DEVOPS BEGINS WITH REQUIREMENTS DETERMINATION

The devops movement has clearly stressed a need forcollaboration and integration6 — specifically that devel-opment and operations should design, develop, build,and deploy software together.7 This may be easy to say,but it is more difficult to accomplish. What must changefirst is that there must be new and detailed bilateralcommunication between development and operationsduring the requirements determination stage — notas a final activity within deployment. And let’s addthe business unit itself, since in addition to businessrequirements, IT must elicit and negotiate the followingamong the business, application development, and

Using the strengths of CMMI and ITIL wisely ...

... results in devops balance.

Requirements DevelopmentTransition and

Operations

In addition to business andtechnical design requirements(using CMMI), also elicitoperational requirementsusing ITIL V3 processes and principles.

Use engineering best practicesfrom CMMI for DevelopmentV1.3 (which now includesagile support) for technicaldesign, testing, and productintegration.

Follow ITIL V3 best practicesto ensure successful transitionto operations and ongoingoperations management ofthe service.

Figure 1 — CMMI and ITIL scope targets and strengths.



infrastructure operations, using several ITIL V3processes8 to drive the discussion:

Availability Management. Understand and plan forthe resources needed to ensure the application per-forms at levels specified or needed by the business.What hours and days must the service supported bythe application be available? What resources andcomponents (experts, knowledge, networks, environ-ments, service desks, etc.) are required to maintainand sustain it? An availability assessment with oper-ations may be required to understand what is needed,but clearly the requirements have to include thesetypes of needs.

Capacity Management. As is the case with availabil-ity, there are three levels of requirements — for busi-ness, service, and component capacity. The businesscapacity is largely determined by the current and tar-geted market and drives the required service capacityneeded by IT to support the business. At the applica-tion level as part of requirements gathering, IT mustunderstand the capacity needed to deploy and main-tain the service, including component capacity. Howmany users? How many database records? Whatnetwork bandwidth? How many virtual or physicalservers? In short, the requirements must specifythe existing infrastructure capacity necessary tosatisfactorily support the application.

Service-Level Management. Service-level require-ments in support of business objectives should alsobe included in the requirements gathering and identi-fied as constraints on the design. New applicationsand major changes can easily impact system responsetimes, system availability windows, and even servicedesk response and resolution if training and docu-mentation requirements are not established up front.This is the time to work with operations to ensurethat IT component capacity is sufficient to achieveagreed-upon service levels, or these may have to berenegotiated.

IT Security Management. Security has to be built in;responding to attacks, unauthorized accesses, andtheft of data and intellectual property after they occuris expensive and embarrassing. IT must define secu-rity requirements and preventive measures to resistvulnerabilities and threats, as well as meet securityand privacy compliance regulations such as HIPAAand PCI DSS — all prior to design and build.

IT Service Continuity Management. We all under-stand that today’s business applications run in acomplex and widely distributed but interconnectedand largely virtual environment. As seen recentlywith some highly publicized public cloud outages,applications are perpetually exposed to risks ofservice interruption; in the global enterprise, these

RequirementsDevelopment

TechnicalSolution

ProductIntegration

IT Operations

Determine requirements for:• Availability• Capacity (Service & Components)• Service Levels• Service Continuity

Apply:• Security Management• IT Asset & Configuration Management• Change Management

Apply:• Release & Deployment Management• IT Asset & Configuration Management

Ensure serviceoperations using:• Incident Management• Problem Management• Availability Management• Capacity Management• Service Continuity Management• Security Management

Devops requires service managementattention early and often duringthe entire product lifecycle.

Figure 2 — CMMI engineering lifecycle with key ITIL interactions.


types of problems are unacceptable. In conjunctionwith understanding availability needs, applicationdevelopers and operations personnel also need tounderstand critical points of possible failure and setrequirements to design appropriate mitigations suchas warning trigger points, automated backups of criti-cal data, and built-in redundancies into the architec-ture and application when appropriate.

DEVOPS CONTINUES WITH DESIGN, BUILD, AND TEST

Still following the CMMI for Development lifecycle, theremaining engineering process areas (RequirementsManagement, Technical Solution, Product Integration,Verification, and Validation)9 also require devops-stylecollaboration among the business, application develop-ers, and operations, as follows:

Change Management. Few complete sets of require-ments or design solutions survive the SDLC intact.Changes happen — repeatedly — and this is onlyone factor spawning the agile movement. Applicationdesign solutions must undergo change review byinfrastructure engineering and operations groups toensure that they do not adversely impact currentoperations and that the appropriate runtime infra-structure is available with the necessary capacity tosupport the change and achieve required service lev-els. The implementation should also be included inwhat ITIL calls a “Forward Schedule of Change” toensure coordination with other planned infrastruc-ture change and maintenance activities.

Capacity Management and AvailabilityManagement. As the design evolves, additionalassessments may be required for capacity and avail-ability through modeling, simulations, prototypes,and design walkthroughs to ensure that the targetinfrastructure can sustain the application in oper-ational steady state. The closer the integrated systemtest environment can mimic the live environment, thebetter. It is unrealistic to expect that system perfor-mance with large volumes of data will be the samefor a global distributed application as it is in a smallclustered data center or environment.

Configuration Management. ITIL V3 takes thisprocess well beyond the basic change and versioncontrol discipline expected by CMMI, which focusesprimarily on applications. Here ConfigurationManagement defines the relationship between all thecomponents involved in providing each service thebusiness consumes. For this, applications staff mustprovide missing data and ensure that it is maintained.

Release and Deployment Management. Operationsstaff hates failed implementations and having to backout and reschedule changes. They also are unhappybeing unable to effectively support newly deployedapplications due to lack of runtime instructions andshared knowledge. The solution to this problem isearly and frequent involvement of operations staff inthe planning stages of major new releases.10 Releaseand Deployment Management takes a system viewwell beyond what CMMI typically expects, suggest-ing that release documents and objects be providedand traceable back to corresponding change records.

DEVOPS DEPENDS ON COLLABORATION IN SERVICE OPERATIONS

The purpose of IT operations is to coordinate and carryout the activities and processes required to deliver andmanage IT-based services at agreed levels for the busi-ness.11 The applications staff clearly has a role to playin this to ensure that IT helps the business achieve itsobjectives. In this phase of the lifecycle, devops princi-ples are not so much concerned with how the inherentITIL V3 service operations processes are implementedas the extent to which developers are involved.

In several processes in this phase, developers can pro-vide operations with additional knowledge and under-standing that is not currently offered or required. Forexample, the following information should be includedas part of changes and releases:12

Application components with their interdependencies

Directories to locate and store binary files andapplication-related files

Scripts to run as part of installation, or to start orstop the application

Scripts to check hardware and software configurationof target systems before deployment or installation

Operational and support requirements

SLA targets and requirements related to theapplication

Operations staff hate failed implementationsand having to back out and reschedulechanges. The solution to this problem is earlyand frequent involvement of operations staffin the planning stages of major new releases.



Specification of metrics and events that can beretrieved from the application and that indicatethe performance status of the application

Specification of the details required to track anapplication’s major transactions

Event Management could benefit from having applica-tions personnel actively monitoring normal operationsand detecting and escalating exception conditions.Further, these individuals can help with the develop-ment and installation of monitoring agents and sensorsto trigger response. Similarly, Incident Managementand Problem Management require application devel-oper involvement to quickly restore service and also toassist in analyzing the root cause of problems, so as toresolve them permanently and prevent future prob-lems. Other ITIL processes already mentioned, such asConfiguration Management and Change Management,are fully enhanced when developers are completelyengaged and the devops principles are supported.

KNOWLEDGE MANAGEMENT DRIVES DEVOPS

Knowledge management is the process responsible forgathering, analyzing, storing, and especially sharingknowledge and information within the organization.13

It is a tenet of devops and requires an open culture forsharing knowledge and the infrastructure to support it.The implication is that processes supporting devopsmust include activities directing interaction betweenapplication developers and operations, so they canwork together to effectively support the business. Thisalso requires identifying knowledge gaps and improv-ing processes to include necessary knowledge sharing,critical cross-silo reviews, and training.

CMMI FOR SERVICES VS. ITIL V3

Many process experts may be looking to the SEI’s CMMIfor Services (CMMI-SVC), V1.314 as a single guide tocombining the strengths of CMMI with those of ITIL in away that also supports devops principles. While severalof the ITIL V3 processes are represented within thismodel, it is my opinion that CMMI-SVC currently doesnot provide sufficient guidance to address the devopsissues discussed here. However, since ITIL V3 has onlylimited support for the software development lifecycleand CMMI-DEV does not address service management,CMMI-SVC may offer at least a limited model that couldbe useful for appraisal and evaluation. The approach Irecommended is to use guidance from CMMI-DEV forapplication development and ITIL V3 for service man-agement and service operations, and consider CMMI-SVC as a reference model for evaluation.

EFFECTIVE CMMI-ITIL INTEGRATION YIELDS BUSINESS-IT ALIGNMENT

While developing applications following the CMMImodel may often be helpful in managing developmentenvironments and projects from a software- or systemsengineering–centric view, it is entirely more effectivewhen complemented by an IT management model suchas ITIL. However, the simple act of mapping the CMMIand ITIL functions to each other provides only a con-ceptual view and doesn’t get to what is needed at theexecution level. After all, an IT organization could berated at CMMI Level 5 and have practices fully alignedwith ITIL, yet still have operational problems that causeit to fall far short of achieving the required value for thebusiness.

DEVOPS-RELATED QUESTIONS FOR DEVELOPERS

Has operations been included in your change advisoryboard meetings, and is your application change includedin the Forward Schedule of Change in order to coordinatewith operations maintenance and other change activities?

Have you determined the required availability of yourapplication to the business and ensured the availabilityof resources (such as databases and licenses, as well asoperations staff with appropriate knowledge) to support it?

What capacity requirements (storage, processor, network,database size, etc.) do you require from infrastructureengineering, and have you used a simulation model orother approach to confirm them?

What possible impact will your developed application(or major modification) have on negotiated service levels?Will these need to be modified?

Have you considered security and privacy vulnerabilities?Are there credit card numbers (PCI DSS) or patient records(HIPAA) involved? Will offsite backups of this data needto be encrypted?

Has operations reviewed your release and deploymentplans?


The result of effective integration between CMMI andITIL as applied to the devops challenge is a frameworkthat ensures IT is aligned with the needs of the business.This framework is supported in application develop-ment through applications that are built on CMMIengineering principles but are integrated with ITIL V3service management processes. The latter are applied toensure that the applications can be effectively deployedand managed in the targeted operational environmentand be sustained with ITIL Service Operations disci-plines. The outcome is a service lifecycle that leveragesthe individual strengths of each model — CMMI forsystems and software engineering and ITIL for clearbusiness alignment, effective transition, and sustainedoperation of the IT environment. An integrated lifecycleand associated collaboration provide for the effectivesharing (both push and pull) of information betweenapplication development and operations, allowing bothgroups to approach IT complexity and rapid changetogether in a consistent manner.15

FINAL THOUGHTS AND ACTIONS

As I have described, the key elements of devopsto drive collaboration between development andoperations functions can be satisfied through imple-mentation of additional process elements as part ofthe requirements gathering and design/build phasesof the SDLC. Success also requires effective two-waycommunication between these groups and the business.IT organizations would be well advised to train bothdevelopers and operations in ITIL V3 processes andredesign the SDLC to ensure that runtime environmentfactors and operational considerations are addressedearly (and often) in the lifecycle, beginning withrequirements. In addition, successful organizationswill build collaboration networks and consider the useof matrix teams of developers and operations staff,especially in release and deployment planning.

ENDNOTES1Logue, Clyde. “Bridging the DevOps Gap” CM CrossroadsWebcast Series, 29 July 2010 (www.youtube.com/watch?v=ZtewOtFKF4w).

2West, Dave. “It’s Time to Take Agile to the Next Level.”Forrester Research, 25 March 2011.

3ITIL Service Transition. ITIL V3. Office of GovernmentCommerce (OGC), 2007. (ITIL published a “Summary ofUpdates” in August 2011; see www.itil-officialsite.com.)

4Chrissis, Mary Beth, Mike Konrad, and Sandra Shrum. CMMIfor Development: Guidelines for Process Integration and ProductImprovement. 3rd edition. Addison-Wesley Professional, 2011.

5Chrissis et al. See 4.6“DevOps.” Wikipedia (http://en.wikipedia.org/wiki/DevOps).

7Portelli, Bill. “DevOps 101.” CM Journal, Vol. 9, No. 2,February 2011.

8ITIL Service Design. ITIL V3. Office of Government Commerce(OGC), 2007. (ITIL published a “Summary of Updates”in August 2011; see www.itil-officialsite.com.)

9Chrissis et al. See 4.10ITIL Service Transition. See 3.11ITIL Service Operation. ITIL V3. Office of Government

Commerce (OGC), 2007. (ITIL published a “Summary ofUpdates” in August 2011; see www.itil-officialsite.com.)

12ITIL Service Operation. See 11.13ITIL Service Transition. See 3.14Forrester, Eileen, Brandon Buteau, and Sandra Shrum.

CMMI for Services. Addison-Wesley Professional, 2010.15West. See 2.

Bill Phifer is a Fellow at HP Enterprise Services, with responsibilityfor service assurance and service management strategies forApplications Management Services. He is an SEI-certified CMMILead Appraiser with over 35 years in IT, including 18 years in soft-ware process implementation and improvement, measurement, servicemanagement, and project management. Mr. Phifer is also a LeadEvaluator for Carnegie Mellon University’s eSourcing CapabilityModel for Service Providers (eSCM-SP), with an interest in sourcingbest practices. He is a regular presenter at IT industry conferencesand seminars such as SEI’s SEPG, itSMF’s USA Fusion, and thoseby IEEE.

Mr. Phifer is currently involved in research and architecture of end-to-end IT lifecycle process integration between applicationsand infrastructure using multiple industry reference models andstandards, such as CMMI, ITIL/ISO 20000, eSCM, ISO 9000, ISO27001, and COBIT. He is a member of the Philadelphia SoftwareProcess Improvement Network, itSMF Lehigh Valley Local InterestGroup, and a former president and current board member ofthe Delaware Valley Science Council. Mr. Phifer can be reachedat [email protected].


Coined in Belgium during a 2009 data center migration,the term “devops” sprang from an attempt to applyagile techniques to operations (“ops”) activities. Withops on the receiving end of faster and more frequentdeliveries from development teams, the pressure tokeep up necessitated faster and more efficient practices.And now, with a cloud of thousands of servers tomanage, the inefficiencies of the past would clearlyno longer suffice.

The realization that the world of building, deploying,and maintaining environments could benefit signifi-cantly from using a new approach caught on rapidly.The devops momentum accelerated when professionalsworking in the bowels of software delivery foundcommunity support.

The “revolution in the making” is a shift from a focuson separate departments working independently to anorganization-wide collaboration — a “systems thinking”approach. It’s about addressing all the work as a whole,versus looking only at the bits and pieces. It’s aboutwork flowing across functions versus lurking in silos.

As part of a systems thinking approach, devops is aboutrespect, cooperation, and trust among the individualswho do the work versus management-driven control.A systems thinking approach requires a change inbehavior from management. In this article, I willexplain why this change is valuable and describe thekind of leadership required to make the shift to followthe revolution. I will also discuss how using statisticalprocess control as a mechanism for better decision mak-ing can help devops teams drive out variability in theirprocesses and improve customer satisfaction.

THE PROBLEM: LITTLE-PICTURE THINKING

When an individual team within a company focusesonly on optimizing its own area, team members areoften oblivious to what’s needed for the company asa whole to be successful. Each team member doinga terrific job in his or her own cubicle doesn’t cut itwhen their work has to be handled, or mishandled,by someone else before it moves on to its ultimate use.

Consider a not-so-hypothetical example. A developer(let’s call him Steve) proclaims his work is done whenhis code is checked into source control. In this instance,however, it turns out that his code, which worked justfine in the development and test environment, doesn’twork in the staging environment.

Invariably, the problem wasn’t discovered until theday before a release, and Steve (who has already begunwork on another project) looks up to find the projectmanager, plus one or two others, gathered around hisdesk. Steve’s neighbors become all ears as the sceneplays out something like this: “The build isn’t workingin staging, Steve, and the error log indicates a missingfile.” To which Steve replies, “It works in dev. Doesn’tstaging have the latest version of …?”

The huddle outside Steve’s cubicle grows until it’sdetermined that changes must be made to allow forbackward compatibility and that another build mustbe deployed to staging. QA and ops will have to worklate (again) to test the new build and stage it for pro-duction before the release tomorrow. This is a problem,and an even bigger problem, long term, is that thiskind of waste spawns disgust from professionals look-ing to take pride in their work. Ops is disgusted withthe development team for the lack of backward compat-ibility in the code. The development team is disgustedwith ops for the lack of consistency between the dev,test, and staging environments. QA is disgusted withboth developers and ops for not considering these prob-lems to begin with. And the project manager is dis-gusted, claiming developers and ops don’t talk to eachother and that “herding cats” or “babysitting” might bebetter terms for describing his or her role. This all eatsaway at team morale.

THE SOLUTION: SYSTEMS THINKING

Imagine a company very much like Steve’s — we’ll call it“WeNeedHelp.com” — and then further imagine that itsheads of software engineering and IT operations agree totry a different approach, pursuing incremental evolution-ary change (driven by customer demand) across their


Devops: So You Say You Want a Revolution?by Dominica DeGrandis

OPTIMIZE THE WHOLE


departments. Analyzing customer demand across func-tional boundaries would enable teams to step back andobjectively look at their productivity from the “outsidein.” They would fully describe where their work origi-nates, the types of work, the arrival rate, and the expec-tations from upstream and downstream customers.

Problems that are hurting the business, but werepreviously ignored, would surface. If WeNeedHelp.comcustomers have been complaining that sustainmentchanges (noncritical changes to production) aren’t beingdelivered, a demand analysis would make visible thefact that sustainment work is not getting done and thatWeNeedHelp.com has no adequate “service delivery”capability for sustainment work. Perhaps it’s due to aperceived higher value on new work from another busi-ness unit. Or perhaps there is no real avenue for sus-tainment or maintenance fixes to be worked (other thanemployees burning the midnight oil). Shining a lighton the services that teams provide to each other andthe blockages preventing them from doing so wouldcreate an incentive to use a service delivery approachto improving productivity and customer satisfaction.

Looking at sources of internal team dissatisfaction,WeNeedHelp.com might discover that ops is beingsupplied with confusing and inadequate informationthat makes it hard for them to do their job. And it mayfind that adopting explicit policies that define sufficienthandoff information between teams solves that problem.

Considering demand would reveal variability thatrandomizes the process and prevents work from beingdelivered on time. Perhaps the codeline branching strat-egy used at WeNeedHelp.com has crippled the abilityto merge certain changes until other changes are alsoready for delivery. Using a systems thinking approach,it would become acceptable to experiment with newcodeline strategies (as long as they were survivable).

That would be just the start. The team would thengather data to find ways to understand the capability ofthe system and how it operates. This data would exposerisks, bottlenecks (or not), and economic overheads. Itwould set expectations for objective, data-driven man-agement. As WeNeedHelp.com teams watch metricstrend, they could make well-informed, high-qualitydecisions.

Maybe the rate at which developers are fixing bugs istrending faster than the rate that ops can deliver the fixesto production. It may at first appear that ops is the bottle-neck, when in reality the problem lies with a tightly cou-pled, complex system architecture that requires nothingless than a miracle to maintain. A systems thinking

approach will show that problems are typically inherentin the system and not the people. To deliver businessvalue early and often, WeNeedHelp.com will likelyneed to invest in an automated regression tool ratherthan continue to overload its testers.

As the WeNeedHelp.com dev and ops teams organiza-tionally focus on common goals, they would seek outand receive feedback to keep aligned. They would workclosely together and perhaps merge into one devopsteam. The devops principle of “optimize the whole”would spread through organization-level continuousimprovement.

THE BENEFITS OF SHIFTING TO A DEVOPS APPROACH

Optimizing the whole allows companies to achieve theirgoals of increasing revenue, driving down costs, andkeeping good people. Let’s look more closely at howdevops can get us there.

Increasing Revenue

Increasing revenue requires releasing our product orservice to market faster, which requires ever faster,evermore reliable software build and deploymentmethods. This begins with a thorough understandingof the interdependencies among build artifacts, data-base schemas, environment configurations, and theinfrastructure they sit on.

Acquiring the knowledge needed to interact withand design a speedy and accurate deployment processtakes time — time from a team that understands bothsoftware architecture and infrastructure. Dividing thiswork among traditional functional teams often leads toinadequate system design decisions that are discoveredtoo late.

By merging smaller teams into larger teams focusedon common goals, devops facilitates a broader under-standing of the idiosyncrasies throughout the systemand minimizes handoffs from team to team. By work-ing together, early on, devops teams can foresee prob-lems and design a well-thought-through deploymentprocess. The ability to deploy quickly and correctlybegins with involving from the very beginning thoseresponsible for deployment.

The ability to deploy quickly and correctlybegins with involving from the very beginningthose responsible for deployment.



Driving Down Costs

Driving down costs by reducing rework requires build-ing quality into the process in the first place. Becausethey perform builds and deployments early and often,far away from the intensity of release night, devopsteams using a systems thinking approach have timeto fine-tune scripts, discover and solve problems,and cross-train team members. Imagine the increasein quality when staff has the capacity to work onimprovements instead of chiefly interruptions andrework. Imagine releases evolving from chaotic middle-of-the-night ordeals into daily non-events.

Wealthfront, an investment advisor whose founderwas named one of the “Best Young Tech Entrepreneurs2011” by Bloomberg Businessweek,1 has just one teamof engineers that collectively owns and is responsiblefor all processes involving development, testing,deployment, and monitoring. The entire team ownsquality. They can deploy to production in less than 10minutes, and they do so, on average, 30 times a day! Inan SEC-regulated industry, their full regression suiteruns in less than five minutes.2 Fast Company includedWealthfront among its “10 Most Innovative Companiesin Finance,” noting that the Wealthfront website “hasattracted more than $100 million in assets, and its man-agers, which undergo an intensive selection process,have collectively outperformed the S&P 500 by 6%.”3

Keeping Good People

More than once I’ve seen a team of highly trainedprofessionals crumble because of ongoing systemicfrustrations that paralyze the team’s effectiveness andpulverize its gung ho spirit. Management will knowthat this is happening when it discovers team membersare moving on, perhaps to the competition — andinvariably it’s the best who are the first to go.

Keeping good people, by enabling them to take pride intheir work, depends on the opportunity to master one’swork. More than enormous sums of money, the abilityto conquer the job is a huge motivator. When 32-year-old point guard Mike Bibby gave up US $6 million to

join the Miami Heat late in his career, it was becausehe wanted a chance to play for the championship.4 Hewanted to belong to a really good team.

We all want to belong to a really good team, but thisrequires us to be really good at what we do. A devopsteam using a systems thinking approach allows for con-tinuous improvement of critical skills that results inincreased expertise. As at WeNeedHelp.com, this doesnot occur because of any grand project plan. It just hap-pens. By implementing these changes in an incremental,evolutionary fashion, solutions will be found that noone would have earlier been able to plan for.

LEADERSHIP: THE ESSENTIAL INGREDIENT OF DEVOPS

This all sounds good, yet the devops revolution can’thappen without good leadership — and good leader-ship often seems to be in woefully short supply. Somemanagers appear incompetent in part due to the struc-ture of the corporation, with CEO fiduciary responsi-bility focused solely on short-term shareholder profit.Some management is slanted toward new projects(instead of much-needed maintenance work) becausenew projects can be written off as capital expenditures.Some management breeds contention between teamswith annual merit ratings favoring certain teams overothers. With developers scoring points for makingchanges happen and ops guys scoring points for keep-ing things stable, is it any wonder they find workingtogether for the good of the company difficult? Goodteamwork may help the company, but it provides fewtangible points on an individual’s annual merit review.

If you think I’m overstating the case, consider thatAmazon once celebrated heartily after a software releasecrashed its website. Responsible for website availabilityat the time, Jesse Robbins (now CEO of Opscode) voiceda concern that the release in question would bring thesite down. But an Amazon VP pushed forward anyway,believing the company’s stock price would go up regard-less of website availability. The software was deployedand — as Robbins predicted — brought the site down.It took two days to stabilize the site and get it back up,but as the VP had foreseen, the stock price rose alongwith order volume. At the end of that year, Amazonrewarded developers for the new and profitable func-tionality and penalized ops for the outage!5

Can cross-teamwork find a spot on merit reviews? If not,maybe individual merit reviews should be abolished.After all, the benefits of dumping merit ratings have beendiscussed for over 50 years, beginning with W. EdwardsDeming in his 14 principles for transformation.6

With developers scoring points for makingchanges happen and ops guys scoring pointsfor keeping things stable, is it any wonderthey find working together for the good ofthe company difficult?


Another ineffective management technique that doesmore harm than good to the devops movement is pres-suring staff into working during the wee hours of thenight on top of their regular day job. Working longhours may come with trendy bragging rights thesedays, implying strength and power. But as Wall StreetJournal health writer Melinda Beck says, “Genuine‘short sleepers’ only need four hours of sleep per night,but they comprise only 1% to 3% of the total popula-tion.”7 So for the 97%-99% of us who aren’t short sleep-ers, working the wee hours brings sleep deprivationand mistakes.

When production breaks in the middle of the night, it’stypically someone from operations who gets paged totroubleshoot the issue. In problematic systems, ops canget paged a lot, sometimes working night after night(supporting an unstable system) on top of their day job.In teams that practice devops — where ops personnel,architects, and developers collaborate on system archi-tecture and where developers carry duty pagers as oftenas ops staff do — production failures will likely dimin-ish, reducing middle-of-the-night interruptions.

Management by Fear Breeds Distrust; Good Leadership Drives Out Fear

It’s hard to put forth your best effort when you areworried about public humiliation or losing your job.Remarks like “Well, if we don’t get this project deliv-ered by October, we’ll all be looking for new jobs” or“There are a lot of people out there who would love tohave your job” result in people keeping ideas to them-selves. Collaboration, not withholding information, iswhat’s needed in the devops revolution.

I once worked with a team where the director announcedto her staff, “Do not send any communications outto other teams without my knowledge.” To me thisscreamed, “I don’t trust you to do the right thing,” andit resulted in a pivotal moment when more than onerésumé got updated in preparation for departure.

How do you know if people are fearful? Look at reports— inflated figures are a sure signal that people aretelling you what you want to hear instead of the truthyou need to know. Sifting through a wordy progressreport crafted by someone who has carefully chosenwords so as not to offend is time-consuming and rarelytells you what is really going on.

Making the shift to a devops systems thinking approachrequires trust. Trust is essential for influencing change,and gaining trust takes time. Sociologists have learnedthat trust is event-driven and that small, frequent

gestures or events enhance trust more than larger,grand gestures made only occasionally. Trust is alsoasymmetrical in that a single act of bad faith candestroy it, and repairing the damage, if possible at all,will take many small acts completed with competenceand delivered as promised.

For some, moving to a systems thinking approachwill require a revolution in management behavior.This revolution consists of:

An emphasis on quality, which encourages crafts-manship and pride of workmanship

Establishing avenues to enable collaborative working

A properly designed process, which enables peopleto work effectively without becoming overloaded

A feeling among workers that their actions aredirectly connected to business success (or failure)

Management that addresses systemic problems bychanging processes and policies, not by blaming orreplacing people

Management that demonstrates trust in the workers,and those same workers learning to trust theirmanagers (via the quality of management actions)

Management that drives out fear to increaseproductivity and innovation

These are management imperatives that are essentialto making the shift to devops.

IMPROVING THE QUALITY OF DECISION MAKING

While effective leadership is absolutely essential tothe devops revolution, it is not sufficient. High-qualitydecision making is the second half of the equation.

One means of improving decision making is the SPD(Study-Plan-Do) model, which is used for carrying outcontinuous improvement. It was inspired by W. EdwardsDeming’s popular PDSA (Plan-Do-Study-Act) cycle8 andJohn Seddon’s three-step approach for managing change,CPD (Check-Plan-Do).9 The SPD model teaches that firstyou must Study. Studying the “what and why” of thecurrent performance of a system leads to understandingpossible improvements and what prevents them frombeing achieved. Next is “Plan,” or identifying what needsto change in order to improve performance and whatsteps can be taken to get predictable results. This is fol-lowed by “Do,” or executing the plan by taking smallsteps in controlled circumstances. Both Deming andSeddon stress that we must study capability and seek



to match capability against demand in order to providesatisfactory service.10, 11 Statistical process control (SPC)charts are a mechanism for the Study part of the Demingcycle and Seddon CPD (see sidebar).

Removing Variability Results in Greater Predictability

SPC charts can be applied to devops where predictabil-ity of “routine” work, such as system maintenance anddeployments, has a direct connection to the organiza-tion’s immediate profitability. SPC charts provide a

means for understanding the capability of a team ororganization and the variation in their tasks anddeliverable work items.

To be meaningful and useful, an SPC chart should becreated for work items of variable size processed by ateam or department. For example, measuring cycle timeacross a set of typical ops tasks doesn’t tell us muchwhen they are of widely varying sizes. Changing userpermissions might take only two minutes, but upgrad-ing databases to the next version might take weeks.Combining cycle times for these would skew results,making it difficult to put any predictability into theprocess. Breaking out the SPC chart by work item andsize — so that, for example, database upgrades allappear on the same chart — helps avoid convolutedmetrics. Exceptionally long cycle times for databaseupgrades would then stand out as special causes andwould act as talking points for retrospectives and opsreviews.

By studying and then removing variability in the sys-tem, we enable a devops team to become more pre-dictable. SPC charts help us to see variability that wewant to act upon as managers. They help us demon-strate that capability is matched with demand andexpectations. Using SPC charts as part of the SPD cycleinevitably leads us to improved customer satisfaction.

PRECISION AND QUALITY DRIVE DOWN COST

It sounds counterintuitive, but a focus on cutting costsactually results in higher costs. We’ve seen evidence ofthis with companies that relied heavily on manual testing.

When I worked at Corbis (a privately held, Seattle-based company that licenses digital images), we reliedentirely on manual build tests during deployment foryears — the argument being that automated test toolswere too expensive. The reality was that each buildrequired, on average, 25 minutes of manual testingbefore being released to QA for integration testing. Fullregression testing rarely occurred because it took toolong. This added rework when previous productionfixes were overwritten with newer builds. A contribut-ing factor to deployment issues was the large batch sizeof the changes incorporated into new project builds,which created too much work-in-progress (WIP).

With a complex system prone to design and code issues,a new build sometimes ate up days of troubleshootingby database developers, architects, sys admins, buildengineers, and testers. Imagine the cost of that!

SPC CHARTS

Statistical process control charts (see Figure A) involveplotting events to reveal which ones fall outside ofacceptable levels. The levels are demarcated with an uppercontrol limit (UCL) and a lower control limit (LCL). The pointsthat fall outside of the control limits are called “special” or“assignable cause” variation. These tend to be external tothe system or process. The term “assignable” means thatthey can be easily identified and are often pointed to bymembers of the team. A list of special cause variationsshould be addressed with risk management — mitigation,reduction, and contingency planning. The variation amongthe points within the control limits (as in Figure A) is knownas “common” or “chance cause” variation. This tends to beinternal to the system and is therefore, in theory, affected bythe process in use and under the control of local managers.Common cause variation can be affected by policy changes,individual skills, process design, training, recruitmentpolicies, and many more aspects of the workplace thatare under management control.

The lesson here is that the control chart can be used tosee capability and to react when it is poorly matched withexpectations. Control charts show where the process isunstable and hence totally unpredictable.

UCL = 10.860

Center line = 10.058

LCL = 9.256

11.0

10.0

9.0

3 6 9 12 15

Sample

Qu

ality

ch

ara

cte

rist

ics

Figure A — Control chart showing common cause variation.


The risks involved with the lack of automated testingeventually diminished over time, due in part to thefollowing improvements:

Continuous integration (although a major cost) wasimplemented, allowing developers to immediatelysee the impact of their code changes and fix problemson the spot in the development environment.

WIP was limited to available capacity, resulting inless context switching.

The release cadence increased from quarterly tobiweekly. This resulted in fewer changes (again, lessWIP) per build, allowing each change to be testedmore thoroughly.

Smaller, more frequent changes reduced merge andintegration issues.

The improvements listed above were not made as part ofa project plan or an IT governance initiative, but as partof an ongoing management focus on driving incrementalimprovements continuously as a regular part of operat-ing our business. Organizationally, we were focused ondelivering business value. Teams were aligned aroundbusiness goals and individuals were empowered withinformation on business risk and value, supported by amanagement team that encouraged them to optimize thewhole system through collaboration with peers ratherthan act locally and independently. This was a signifi-cant leap from the previous management approach,and it demonstrated the powerful concepts behind thedevops movement.

SUMMING UP

A systems thinking approach to devops results in theability to increase revenue, drive down cost, and keepgood people. These valuable results are achieved bybuilding quality into devops activities from the beginningand establishing proper motivations to drive out resent-ment and fear, enabling continuous improvement andincreased expertise. These attributes, indeed, make fora devops revolution.

In addition, managers can use the Study-Plan-Do modelto balance the capability of their organization againstthe demand. They can improve the quality of their deci-sions by understanding variability through SPC charts.

They can take actions to change the system (the policiesand procedures and practices used) or to deal appropri-ately with special variations through risk management.

This is how IT organizations should be run. Better man-agers are systems thinkers who make better decisionsand produce better results.

ENDNOTES1“Best Young Tech Entrepreneurs 2011.” Bloomberg Businessweek,17 May 2011 (http://images.businessweek.com/slideshows/20110516/best-young-tech-entrepreneurs-2011/slides/4).

2Perez, Pascal-Louis. “Continuous Deployment in an SEC-Regulated Environment – SLL Conf.” Wealthfront Engineering,25 May 2011.

3Macsai, Dan. “The 10 Most Innovative Companies in Finance.”Fast Company, 14 March 2011 (www.fastcompany.com/1738549/the-10-most-innovative-companies-in-finance).

4Salter, Chuck. “The World’s Greatest Chemistry Experiment.”Fast Company, No. 155, May 2011, pp. 78-84.

5Logan, Martin J. “DevOps Culture Hacks.” DevOps.com,8 March 2011 (http://devops.com/2011/03/08/devops-culture-hacks).

6Deming, W. Edwards. The New Economics for Industry,Government, Education. 2nd edition. The MIT Press, 2000.

7Beck, Melinda. “The Sleepless Elite.” Wall Street Journal, 5 April 2011.

8Deming. See 6.9Seddon, John. Freedom From Command and Control: RethinkingManagement for Lean Service. Productivity Press, 2005.

10Deming. See 6.11Seddon. See 9.

Dominica DeGrandis is an Associate with David J. Anderson &Associates, specializing in Kanban for IT operations and devops.Ms. DeGrandis spent her first 15 years in software engineeringdeeply embedded in development teams performing builds, deploy-ments, and environment maintenance. Adept at leading teams per-forming configuration management and release management, shefound a passion for improving the way development and operationsteams work together. Committed to doing the right thing, Ms.DeGrandis studies sustainability in business, community, and globalsocial matters. She is based in Seattle, Washington, USA, and holdsa BS in information computer sciences from the University of Hawaii.Ms. DeGrandis can be reached at [email protected].


Agile Masterclass

Workshop with Israel Gat, Fellow and Director, Agile Product & Project Management Practice,Cutter Consortium; and Hubert Smits, Senior Consultant, Cutter Consortium

CUTTER CONSORTIUM

Cutter Events

In this half-day Masterclass, Israel Gat and

Hubert Smits will teach you — in a hands-on,

practical way — how you can use Agile and

Lean concepts, artifacts and practices in a

broader and deeper context to make your

organization more innovative and competitive.

They will divulge the tools you need to move

well beyond the team-level basics. You will:

nDiscover how to scale frameworks like

Scrum for larger projects.

n Learn how to weave together the outcomes

that result when multiple frameworks

(agile, lean or traditional) are used by

different teams, projects, or programs.

nDiscover how to get advanced software

engineering disciplines like technical debt

assessment/remediation and devops’

infrastructure as code to take root in your

organization; and,

n Find out, via a case study, how Agile is

being extended outside of software into

other departments, such as Marketing, and

how success in IT/product development can

be leveraged to drive change both up- and

down-stream.

When you combine the experience-based

strategies Israel and Hubert share with the

techniques and skills they impart in this half-

day Masterclass, you’ll be another step

closer to creating the strong foundation that

continuous value delivery requires.

Agile and Outsourcing: A Disciplined Approach

Recorded Webinar with Scott Ambler, Senior Consultant, Cutter Consortium

In this webinar, Senior Consultant Scott

Ambler explores strategies for effectively

initiating and governing an outsourced IT

delivery project in an agile manner.

Outsourcing introduces a collection of risks

that can be uniquely addressed with a disci-

plined agile strategy. Luckily, the Disciplined

Agile Delivery (DAD) process decision frame-

work provides a foundation from which you

can tailor a viable strategy for disciplined

agile outsourcing. DAD is a goal-

driven, hybrid agile, full delivery methodology

that is enterprise aware and scalable.

This webinar addresses:

n Why agile and outsourcing?

n The risks associated with outsourcing.

nWhat the Disciplined Agile Delivery (DAD)

framework is.

nDisciplined agile outsourcing from the point

of views of the customer and the provider.

nWhat you need to do to succeed at

disciplined agile outsourcing.

n Industry statistics regarding agile

outsourcing in practice.

n Criteria to determine if you’re ready for

outsourcing IT delivery projects.

Spend an hour learning about DAD from its

creator, Scott Ambler, and determine if

outsourcing IT delivery projects is a smart

move for your organization.

Recorded: 8 May 2013

Posted In: Cutter Agile Product & ProjectManagement Resource Center

To Access: bit.ly/DADwebinar

Requires: Adobe Flash Player 8 or higher.

Duration: 60 minutes

Fee: Complimentary

Date: Wednesday November 6, 2013

Time: 9:00 am - 2:00 pm EST (luncheon included)

Location: Le Méridien Hotel, Cambridge, MA.

Fee: $695 for Agile Masterclass only; Better Value: Attend CutterSummit 2013, 4-6 November,and get three days of executiveeducation — including theAgile Masterclass — for $2495!Learn more at www.cutter.com/summit/2013.html.

Cutter Consortium events offer multiple avenues to engage with Cutter research and consultants —

from workshops to webinars, Q&A sessions and peer-to-peer events to conferences worldwide. Join the

conversation today!


http://www.cutter.com/summit/2013.html

http://bit.ly/DADwebinar

CUTTER CONSORTIUM

Cutter Events

Peopleware

Member Exclusive Peer-to-Peer session with Tim Lister, Cutter Fellow

In this exclusive Peer to Peer for Cutter

Members, Cutter Fellow Tim Lister will lead a

discussion on how major issues of software

development are human, not technical. Lister,

co-author with Tom DeMarco of the ground-

breaking book, Peopleware: Productive

Projects and Teams, will answer your

questions on why some projects succeed and

others fail. Both success and failure have a

dramatic financial impact on the organization

— yet we know far too little about the

underlying reasons for project effectiveness.

Lister will also address how throwing

technology and money at projects is wasted if

the project team does not understand how

they should respond to unforeseen circum-

stances, or if the team has some pathological

behavior that prevents any worthwhile

accomplishment, or lacks the appropriate skills

and strategies. Other possible discussion top-

ics include:

nDevelopment of productive persons

nOrganizational culture

nOrganizational learning

nDevelopment of productive teams

nModeling of human competencies

Peopleware, with its third edition just

published, continues to inspire intense devo-

tion from software project managers around

the globe. Please join Tim for what promises

to be a lively discussion about Peopleware!

Date: Wednesday, September 18, 2013

Time: 12:00 pm EDT

Duration: 60 minutes

Location: Wherever you are! Just dial inand/or log in.

Fee: Complimentary for CutterMembers

Cutter Consortium events offer multiple avenues to engage with Cutter research and consultants —

from workshops to webinars, Q&A sessions and peer-to-peer events to conferences worldwide. Join the

conversation today!

Agile Conference Mexico 2013Keynotes by Hubert Smits, Scott Ambler, Claude Baudoin, and Edgar Barroso; SeniorConsultants, Cutter Consortium

For the third year, Cutter Consortium

summons IT to establish a dialogue on how to

make the benefits of adopting Agile sustain-

able. How can Agile practices work in rigid

and bureaucratic corporate environments, and

on large projects and distributed teams?

Intelligent solutions to these challenges

require changes in all kinds of development

practices, in management, organizational

character, and business processes.

This conference program integrates a balance

of keynotes and workshops, including:

nDisciplined Agile Delivery with Scott Ambler

nAgile: Theory vs. Practice with Claude

Baudoin

n Scaling the Scrum Framework: The Effect on

Process and People with Hubert Smits

n Trans-Disciplinary Collaboration for

Innovation with Edgar Barroso

n 5 Levels of Agile Planning with Hubert Smits

nDisciplined Agile Delivery Certification with

Scott Ambler

Take part in this three day, sponsor-free event

and benefit from small group interactions and

the opportunity to interact with Cutter

consultants and experts, so that you can

increase the value and quality of your

products while reducing the time to market.

Dates: 4-6 September 2013

Location: JW Marriott Hotel Mexico CitySante Fe, Mexico City

Duration: 3 days

Register: agileconference.cutter.mx/2013/

http://agileconference.cutter.mx/2013

http://www.cutter.com/events/multimedia/peopleware.html

Cutter IT Journal

About Cutter ConsortiumCutter Consortium is a truly unique IT advisory firm, comprising a group of more than100 internationally recognized experts who have come together to offer content,consulting, and training to our clients. These experts are committed to delivering top-level, critical, and objective advice. They have done, and are doing, groundbreakingwork in organizations worldwide, helping companies deal with issues in the core areasof software development and agile project management, enterprise architecture, businesstechnology trends and strategies, enterprise risk management, metrics, and sourcing.

Cutter offers a different value proposition than other IT research firms: We give youAccess to the Experts. You get practitioners’ points of view, derived from hands-onexperience with the same critical issues you are facing, not the perspective of a desk-bound analyst who can only make predictions and observations on what’s happening inthe marketplace. With Cutter Consortium, you get the best practices and lessons learnedfrom the world’s leading experts, experts who are implementing these techniques atcompanies like yours right now.

Cutter’s clients are able to tap into its expertise in a variety of formats, including contentvia online advisory services and journals, mentoring, workshops, training, and consulting.And by customizing our information products and training/consulting services, you getthe solutions you need, while staying within your budget.

Cutter Consortium’s philosophy is that there is no single right solution for all enterprises,or all departments within one enterprise, or even all projects within a department. Cutterbelieves that the complexity of the business technology issues confronting corporationstoday demands multiple detailed perspectives from which a company can view itsopportunities and risks in order to make the right strategic and tactical decisions. Thesimplistic pronouncements other analyst firms make do not take into account the uniquesituation of each organization. This is another reason to present the several sides to eachissue: to enable clients to determine the course of action that best fits their uniquesituation.

For more information, contact Cutter Consortium at +1 781 648 8700 [email protected].

The Cutter BusinessTechnology CouncilThe Cutter Business Technology Councilwas established by Cutter Consortium tohelp spot emerging trends in IT, digitaltechnology, and the marketplace. Itsmembers are IT specialists whose ideashave become important building blocks oftoday’s wide-band, digitally connected,global economy. This brain trust includes:

• Rob Austin• Ron Blitstein• Christine Davis• Tom DeMarco• Lynne Ellyn• Israel Gat• Tim Lister• Lou Mazzucchelli• Ken Orr• Robert D. Scott



Cutter - Universitetet i oslo · Automating the delivery process reduces the time development and...

Documents

Transcript of Cutter - Universitetet i oslo · Automating the delivery process reduces the time development and...