Experiences Bringing CD to a DoD Project

40
Gene Gotimer Experiences Bringing Continuous Delivery to a DoD Project

Transcript of Experiences Bringing CD to a DoD Project

Bringing Continuous Delivery to the DoD

Gene GotimerExperiences Bringing Continuous Delivery to a DoD Project

It isn't always feasible to get top-level buy-in to say "let's do DevOps."I'll discuss techniques and tools we've used to bring a DevOps mindset and Continuous Delivery practices into an environment that wasn't already Agile.

This is a war story. I'll talk about how we were able to start in development, where we had the most control, with a "let's starting being Agile" initiative and working on "why is continuous integration important?" From there, we incrementally brought our practices through "higher environments" until the project was confidently delivering working, QA-tested, security-tested releases, ready for production every two weeks.

About CoverosCoveros builds security-critical applications using agile methods.

Coveros ServicesAgile transformationsAgile development and testingDevOps and continuous integrationApplication security analysis

Agile & Security training

Government qualificationsDCAA approved rates and accountingTS facility clearance

Areas of Expertise

Copyright 2016 Coveros, Inc. All rights reserved.#

Coveros is a consulting company that helps organizations build better software. We provide software development, application security, QA/testing, and software process improvement services. Coveros focuses on organizations that must build and deploy software within the constraints of significant regulatory or compliance requirements. The primary markets we serve include: DoD, Homeland Security & associated critical infrastructure companies, Healthcare providers, and Financial services institutions

2

Select Clients

Copyright 2016 Coveros, Inc. All rights reserved.#The ProjectCOTS product integration for DoDcustom Python glueand Java, PHP, PerlReleases every 6 months or soFreeze 2-4 weeks in advanceDeploy Friday evening to Sunday afternoonRepair broken functionality Monday and Tuesday (and on)Barely starting AgileDaily Stand-ups (really daily status calls)2-week SprintsGood, pruned backlogNo automated testingNo unit testsNo continuous integration

Copyright 2016 Coveros, Inc. All rights reserved.#The Delivery TeamDevelopment Local2 Developers1 Business Analyst1 Project Manager

DISA PMO1 Program Manager1 Chief Engineer1 Program Director1 Systems Engineer

Test and Integration Remote4-6 Testers4-6 Integrators including security experts1 Information Assurance

Off-teamSystems Administratorshardware and software

Copyright 2016 Coveros, Inc. All rights reserved.#There were other people on the team, but these were the technical people directly involved in getting features and releases out the door.

Test and Integration team was responsible for all testing, all installation processes, all security testing/scanning, and coordination with Operations. We could not talk to the sysadmins. No hands on keyboards.

Typical government contract, the Test/Integration/Security team was a different contract, although we worked well together and didnt have near as many issues as some cross-contract teams do. Or so we thought.

Definitely not trying to slam the other team. They worked better than many govt contractors, and their world was the traditional DoD, long-term, stable, slow-and-steady wins the race type of projects. This journey ended up being a huge culture shock to them.

5

The Problem\_()_/

It works on my machine!Every developer, at some point= HIGH RISK DEPLOYS

Copyright 2016 Coveros, Inc. All rights reserved.#-- Every developer, everywhere, at some point

In this case test and integration team.Ran on servers that were configured differently, with different security restrictions because it was in a different data center.

High drama, high risk, lots of deliberation

We didnt know it at the time, but in retrospect we had every anti-DevOps stereotype: painful releases, so do fewer, with more changes

Also, deliberate throwing over the wall between integrator testing/documenting the deploy and integrator doing the deploy. That is how the team was making sure the deploy notes were complete, if someone could install the software sight unseen with the documentation they had prepared. They saw that as part of separation of duties.

6

DevOps isHow long would it take your organization to deploy a change that involves just one single line of code?Do you do this on a repeatable, reliable basis?

- Mary and Tom Poppendieck

Implementing Lean Software Development: From Concept to Cash

Copyright 2016 Coveros, Inc. All rights reserved.#Everyone has their own definition, so we were thinking of it as

7

DevOps isThe goal of DevOps is not just to increase the rate of change, but to successfully deploy features into production without causing chaos and disrupting other services, while quickly detecting and correcting incidents when they occur.

- Gene Kim

Top 11 Things You Need to Know About DevOps

Copyright 2016 Coveros, Inc. All rights reserved.#

8

Continuous DeliveryMake releasing a business decision, not a technical decision

High-confidence releasesSmall releasesFully testedNo expectation of problems

Hotfix releasesPossible, no more than moderate risk and moderate coordination

Copyright 2016 Coveros, Inc. All rights reserved.#So when I say we got to a Continuous Delivery process, this was what we achieved

9

The ApproachStarted with things that were in our controlDev and Test environmentsDevelopment process

Make changes behind the scenesFree/open source toolsEasy to integrate into our CI systemSmall changes

Disclose the changes when there was a winHighlight ease of useUse as justification for higher environments

Copyright 2016 Coveros, Inc. All rights reserved.#This is the approach that worked for us.

THIS IS NOT PRESCRIPTIVE! Your situation is different, so your approach may be different.

Also, while it is nothing we did, we had air cover from a really, really strong advocate and a directive to be an exemplar project, and show other DoD projects that they could be Agile and show how to do it. We never would have succeeded without a champion to buy us the time and flexibility to undertake the changes to the process we made.

In all cases, we started with the pieces that were within our control (dev/test) and showed the value there as justification to push out further to higher environments and outside our immediate team.

10

The JourneyContinuous IntegrationFunctional TestingAutomated DeploysSecurity TestingPerformanceCulture Clash

4 Years

September 2009 March 2014

Copyright 2016 Coveros, Inc. All rights reserved.#A lot more overlap than Im going to describe, but it did happen roughly in these waves over 4 years.

When we started, we were not driving to a CD process. We didnt even know what CD was.11

1. Continuous IntegrationTrouble explaining integrationbetween two or more developersnot between systemsSet up SecureCI one afternoonExplained the advantages laterWired to the ALM tool we hadJenkins (Hudson at the time)NexusSonarQube (Sonar at the time)Automated buildsAnt, MavenPMD, FindBugs, CheckstyleCoberturaLater added Python tools

Copyright 2016 Coveros, Inc. All rights reserved.#Continuous integration to them was a two week test cycle that could be kicked off with roughly 6 weeks of lead time run once or twice a year. We explained that was neither continuous nor developer integration.

They didnt see the value in CI, but didnt see any harm since it was just an afternoon to get Jenkins set up anyway. We stuck with primarily open source software, because this wasnt an explicitly funded effort. Plus it made it instant (no lead time for procurement).

12

1. Continuous IntegrationThis gave us a strong basis for CD later, although we didnt know it at the time.

Lessons Learned: Continuous integration is valuable, but outside the dev team it isnt obvious.The biggest advantage to open-source tools is often acquisition time, not acquisition cost.

Copyright 2016 Coveros, Inc. All rights reserved.#This gave us a strong basis for CD later, although we didnt know it at the time.

Lesson learned: CI is valuable, but outside the dev team it isnt obvious. Also, the biggest advantage to open-source tools is often acquisition time, not acquisition cost.

13

2. Functional TestingFunctional testing was done manuallyfrom a script written in Microsoft WordWe waited a year before staging a coupwe didnt want to encroach on their domainDemo of Seleniumdemonstrated record-and-playback through the Selenium IDE we recorded the first set of teststhen turned it back over to the test team

Sound from soundbible.com, CC BY 3.0 US

Copyright 2016 Coveros, Inc. All rights reserved.#After a few months of CI, functional testing became the biggest bottleneck and showed the least value.

14

2. Functional TestingThey argued later that automated testing was ineffectivethe automated script (singular) only worked one timeneeded to be re-recorded when any changes got made to the app

Lesson Learned: Automated testing isnt just about replacing manual tests with an automated test framework.It requires a different way of thinking.

Copyright 2016 Coveros, Inc. All rights reserved.#Lesson learned: Automated testing isnt just replacing manual with a test framework. There is a different way of thinking.

15

2. Functional TestingWe took it backRewrote existing tests in JavaShowed our business analyst how to clone-and-mutate the Java testsStarted with JUnit, but went to TestNGbetter tagging and parameterizationpre-test run initialization

Copyright 2016 Coveros, Inc. All rights reserved.#The test team was happy not to be burdened with testing.

Since it was COTS, focused on testing system interfaces, not application functionality16

2. Functional TestingDevelopment team had more confidence in releasesAlso began testing user rolesSecurity testing = what can this type of user NOT do

Lesson Learned: Should have focused on demonstrating that there were fewer escaped defects.It was hard to point to a clear benefit.

Copyright 2016 Coveros, Inc. All rights reserved.#Positive and negative user role testing was a great idea in retrospect. Strong basis for security and highest risk point when adding new functionality.

Lesson learned: Should have focused on demonstrating that there were fewer escaped defects. It was hard to point to a clear win.

17

The BookProject Manager came across the book in a book storeEverything made so much senseLogical extension of what we were trying to doAddressed a lot of the issues we were running intoNo money or time for an effort, so we adopted it as our long-term goal

Copyright 2016 Coveros, Inc. All rights reserved.#Project manager dropped to time to free up budget, hired release engineer as Puppet expert even though he didnt know Puppet at the time.

Chose Puppet almost at random. Chef or Ansible would have worked just as well. None of them is a wrong choice, and anything is better than nothing.

Lucky timing: Integration team was focusing or had just finished a 6-month, full team effort to update OS from 32-bit to 64-bit.

18

3. Automated DeploysStarted with automating a Drupal web server installnew system, not yet in productiondatabase server was easy, so we skipped it for nowThen automated the manual COTS installThen started reverse engineering the broken COTS installer

Copyright 2016 Coveros, Inc. All rights reserved.#COTS install was a crap shoot, didnt always work, took several days, never seemed to end up installed the same way. When we called the vendor, they confirmed their professional services had a 50% success rate. Their solution was just to try again. It almost always works on the second try.

Used Drupal success as a clear win for automating COTS install.19

3. Automated DeploysDown the road, realized we could automate everythingDoesnt just reduce risk, also speeds up the process

Lessons Learned: Automate everything- even the easy stuff.When it is easier to install, youll stumble across more reasons to install it.Go from Why? to Why not?

Copyright 2016 Coveros, Inc. All rights reserved.#Integration team was happy not to be burdened with integration or documenting the install process.

COTS vendor eventually called us and asked for our Puppet code. We said no. Largely out of spite.

Lesson learned: automate everything, even the easy stuff. It isnt just to reduce risk, but also to speed things up. If you can install it easily, you will stumble across reasons to install it more often. We went from Why? to Why not?

20

3. Automated DeploysNo Puppet Enterprise Serverjust manually ran puppet apply from the command lineevery system (DB, Web server, SVN server, ALM tool) used the same puppet apply command

Vagrant would have been helpful for local deploysJust hadnt heard of it

Copyright 2016 Coveros, Inc. All rights reserved.#Couldnt easily get funds for license, extra server, nor accreditation for Puppet Enterprise.

Also, Vagrant would have been useful here had we known about it. With so few developers, coordination was easy and we almost never had conflicts. But Vagrant would have been easier if even one more dev added, especially if not collocated.21

4. Security TestingDecided we needed at least some security in devSystem hardeningWeb application scanningWe knew it couldnt replace the official testingplus, we didnt want to encroach on their domain

Noticed extra processes runningDev system in cloud with default passwordTested Security Blanketjust purchased by Raytheoncouldnt get it purchased

Copyright 2016 Coveros, Inc. All rights reserved.#Had a hacker get into a dev system with a default password. Just an email bot, not a directed attack.

But we realized how lax we were with security, even in the safety of a dev env.

Again, open source acquisition was faster. Raytheon had just purchased Security Blanket and no one knew how to sell it to us.

22

4. Security TestingKnew we had some good base for securityCI, static analysis, user role testingWanted a security scannerat the time, none worked with client certificates out of the boxFound w3afPythoncustomizableclient certificate support was there, but not exposedhanded it over to the security experts on the integration team

Copyright 2016 Coveros, Inc. All rights reserved.#4. Security TestingFound 0 vulnerabilities!

Copyright 2016 Coveros, Inc. All rights reserved.#3 months of effort, finally got results

Announced on daily call

24

4. Security TestingNever got past the login screenNever read the output or logSo we took it backEventually had problems getting customized w3af to work properlySwitched to OWASP ZAP, run manuallySecurity team focused on STIG and SELinuxthat was their expertise anyway

Copyright 2016 Coveros, Inc. All rights reserved.#Our non-technical BA was able to see the issue right away.

Never got past the login screenBut didnt start at the beginning, so they even missed a XSS bug on the home page

The security guys were happy not to be burdened with security testing. They were IA, really checklist guys anyway.

Security Technical Implementation Guide

25

4. Security TestingLost a lot of faith in us when we were hackedInformation Assurance isnt the same as Security

Lesson Learned: Protect every system, everywhere.Many hacks are just for the system, not he data.

Copyright 2016 Coveros, Inc. All rights reserved.#Lesson learned: IA not the same as security. And protect every system everywhere, it doesnt matter if it has production data, many hackers just want the system.26

4+. Security TestingOver a few days, implemented OpenSCAP in Jenkins for STIGimmediately found issuesstarted adding Puppet manifests for remediationStarted using Nikto2 for web server scanningimmediately found issues

Started running weekly scans of dev and test using OpenVASno immediate issues, but started seeing package security updates before they became IAVMsDiscovered SELinux was in permissive modehad never been in enforcing

Copyright 2016 Coveros, Inc. All rights reserved.#Pattern recognition began to set in.

In their defense, this was automated vs. manual checks, not really incompetence

Found SELinux when Puppet code ran and assumed SELinux was enforcing, COTS product could not work in enforcing

27

4+. Security TestingEasier auditsProactive security upgradesMuch better relationship with the data center

Lesson Learned: Benefits of security testing go beyond increased security.

Copyright 2016 Coveros, Inc. All rights reserved.#Lesson learned: The benefits aside from increased security are significant: easier audits, proactive security upgrades on our schedule, and in the long term a much better relationship with the data center Ops guys.

28

5. PerformanceApplying STIG to database serverseemed like it was getting slowerUsed JMeter to get baselineTook rough breakdown of most common queriesRepeated as a 15-minute testMonitored trend

Added similar testing to functional tests, another 15 minsAlso, number of functional tests was growing slowlyWatched functional test elapsed time as rough guide

Copyright 2016 Coveros, Inc. All rights reserved.#We were applying STIG to database settings and noticed the database got slower

We never got around to adding true L&P or stress, or even response times.

29

5. PerformanceWatching trends can be very worthwhileSome testing can be almost as valuable as full testing

Lesson Learned: A baseline can be a great safety net.

Copyright 2016 Coveros, Inc. All rights reserved.#Lesson learned: Even without formal guidance, a baseline is a great safety net. A relatively short test to show the trend over time is very worthwhile.

30

6. Culture ClashContinuous Delivery was being openly discussedPMO had just started thinking of it as a clear planKept asking when continuous delivery would be delivered, and how it would be packaged

Test and Integration started complaining3 of us were pushing the 12+ of them too hardmoving too fastnot a risk or control complaint, merely effortPeople on test and integration team started leavingincluding Burt

Copyright 2016 Coveros, Inc. All rights reserved.#Remember, we were doing the development, testing, writing functional tests, security scans, and automating the deploys

Burt- no last name because I never met him in 2-2 years. Nor heard of him, before or after the status call when they announced he was leaving. He wasnt on the call that day. Never spoke up on the daily status call. Never showed up to a release planning meeting that happened every 6 months, sometimes in their offices. No tasks assigned nor delivered. Never supported a deploy. They were sad to see him go.

No back filling positions, just let the test and integration team atrophy.31

6. Culture ClashBenefits were growing clearEffort was minimal No active resistance

Lesson Learned: Do not underestimate cultural inertia.Some will not or cannot ever make the mental shift.

Copyright 2016 Coveros, Inc. All rights reserved.#Lesson learned: No matter how many advantages and benefits you show, even if there is little effort to be expended, some people/teams/orgs will never make the mental shift. It wasnt active resistance, they just couldnt/didnt make the mental shift.

32

The AftermathTest and Integration decided not to renew their contractall remaining personnel ended project with a monthSecurity issue found the following weekdeployed 3 days laterWent back to 2-week deploy cycles, sometimes fasterLeft 3 people on development teamOne went back to take over for the test and integration team as hands-on-keyboardBA left project and another came in time for testingDropped into maintenance mode

Copyright 2016 Coveros, Inc. All rights reserved.#These results have been reinforced by our experiences at other govt agencies and commercial clients

5 years in, 4 years since CI introduced. May have been a little more notice, but I dont believe so.

Used to have 2 deploys a month split between two sets of properties, dwindled to no more than once a month due to moving too fast33

The Delivery TeamDevelopment Local1 Developer1 Release Manager Tester

DISA PMO1 Program Manager1 Chief Engineer1 Program Director1 Systems EngineerTest and Integration Remote1 Information Assurance

Off-teamSystems Administratorshardware and software

Copyright 2016 Coveros, Inc. All rights reserved.#Smaller local team, only 1 on remote34

The ProjectBarely AgileMaintenance onlyKanban-ishtracking work in progressDaily Stand-ups (really daily status calls)2-week Sprints

Releases prepared every 2 weeksSoft freeze Thursday for Friday releaseDeploy Friday evening100% working functionality Friday eveningNon-event

Puppet took the configuration parametersfrom 200+ untracked values to ~30 Hiera-controlled values

Biggest coordination issue: 72 hours for user messaging

Biggest time consumer: 3-6 hours for VM clones

Copyright 2016 Coveros, Inc. All rights reserved.#to ~30 Hiera-controlled through standardization and composition

35

My Advice

Lessons Learned:

DevOps and Continuous Delivery are not a goal.Do not set out to do DevOps or CD.Remove road blocks and bottlenecks.Fix quality issues.Be more responsive to change.Adopt change incrementally.As you build a repeatable, reliable process for delivering software, CD will magically appear.

Copyright 2016 Coveros, Inc. All rights reserved.#Jeff Payne- Dont do Agile, you have to be Agile

Not focusing on DevOps/CD as a goal will help you prioritize what needs to be improved and will show you benefits sooner

Incremental adoptionless culture shockmore visible and concrete benefits sooner

Monty Hall Let's Make a Deal(19631977)said Actually, I'm an overnight success, but it took twenty years.http://www.brainyquote.com/quotes/quotes/m/montyhall192769.html#QTimGb64Le97MELL.99

36

My AdviceRead Continuous Delivery and The Phoenix Project

Copyright 2016 Coveros, Inc. All rights reserved.#Missed OpportunitiesAutomated deploysmore valuable than just reducing riskVagrantSome security scanning earlierdo not just assume someone else is doing itSome performance testing earliersome is a lot better than nonemaybe almost as good as a lotWe relied on client-side certificates for authenticationEJBCA should have been set up immediatelyUpgrades are a huge time sinkcomponents, libraries, applications, system softwareadd tools to track it as early as possible

Copyright 2016 Coveros, Inc. All rights reserved.#to ~30 Hiera-controlled through standardization and composition

38

The Tool ChainJenkinsPuppet (no Puppet Enterprise)TestingTestNG for Java unit testsNose for Python unit testsMockito/Mockito for PythonStatic Analysis - JavaPMDFindBugsCheckstyleCoberturaSonarQubeStatic Analysis - PythonPylintcoverage.py

JMeterfor some representative performance testsSecurityOpenSCAP (every deploy, minutes)OpenVAS (every weekend, hours)included Nikto2used Kali LinuxOWASP Dependency Check (on-demand, many minutes)OWASP Zed Attack Proxy (on-demand, few days)Full role-based Selenium test coverage (every deploy, overnight)10k+ Selenium tests via TestNG

Copyright 2016 Coveros, Inc. All rights reserved.#5 envs: dev, test, uat, staging/RACE, productionOWASP Dep Check every new library and periodicallyOWASP ZAP any change in security posture

Enough functional testing, regression testing, performance testing, and security testing to give us confidence

39

Questions?

Gene [email protected]@CoverosGene

Copyright 2016 Coveros, Inc. All rights reserved.#Dun Dun Dun WavnullBlues2419.2644Dun Dun Dun WavnullBlues2419.2644