Notes on Continuous Integration.doc

download Notes on Continuous Integration.doc

of 17

Transcript of Notes on Continuous Integration.doc

  • 8/14/2019 Notes on Continuous Integration.doc

    1/17

    Continuous IntegrationContinuous Integration is a software development practice wheremembers of a team integrate their work frequently, usually each personintegrates at least daily - leading to multiple integrations per day. Eachintegration is verified by an automated build (including test to detectintegration errors as quickly as possible. !any teams find that thisapproach leads to significantly reduced integration problems and allowsa team to develop cohesive software more rapidly. "his article is a quickoverview of Continuous Integration summari#ing the technique and itscurrent usage.

    I vividly remember one of my first sightings of a large software project. I was taking a summer internship at a large English electronics company. Mymanager, part of the QA group, gave me a tour of a site and we entered ahuge depressing warehouse stacked full with cubes. I was told that thisproject had been in development for a couple of years and was currentlyintegrating, and had been integrating for several months. My guide told methat nobody really knew how long it would take to finish integrating. romthis I learned a common story of software projects! integration is a long andunpredictable process.

    "ut this needn#t be the way. Most projects done by my colleagues at$hought%orks, and by many others around the world, treat integration as anon&event. Any individual developer#s work is only a few hours away from ashared project state and can be integrated back into that state in minutes.

    Any integration errors are found rapidly and can be fi'ed rapidly.

    $his contrast isn#t the result of an e'pensive and comple' tool. $he essenceof it lies in the simple practice of everyone on the team integratingfre(uently, usually daily, against a controlled source code repository.

    %hen I#ve described this practice to people, I commonly find two reactions!)it can#t work *here+) and )doing it won#t make much difference). %hatpeople find out as they try it is that it#s much easier than it sounds, and thatit makes a huge difference to development. $hus the third common reactionis )yes we do that & how could you live without it )

    $he term #-ontinuous Integration# originated with the E'tremerogramming development process, as one of its original twelve practices.

    %hen I started at $hought%orks, as a consultant, I encouraged the projectI was working with to use the techni(ue. Matthew oemmel turned my

    vague e'hortations into solid action and we saw the project go from rareand comple' integrations to the non&event I described. Matthew and I

  • 8/14/2019 Notes on Continuous Integration.doc

    2/17

  • 8/14/2019 Notes on Continuous Integration.doc

    3/17

    the automated tests. 2nly if it all builds and tests without errors is theoverall build considered to be good.

    %ith a good build, I can then think about committing my changes into therepository. $he twist, of course, is that other people may, and usually have,made changes to the mainline before I get chance to commit. /o first Iupdate my working copy with their changes and rebuild. If their changesclash with my changes, it will manifest as a failure either in the compilationor in the tests. In this case it#s my responsibility to fi' this and repeat until Ican build a working copy that is properly synchroni5ed with the mainline.

    2nce I have made my own build of a properly synchroni5ed working copy Ican then finally commit my changes into the mainline, which then updatesthe repository.

    6owever my commit doesn#t finish my work. At this point we build again, but this time on an integration machine based on the mainline code. 2nly when this build succeeds can we say that my changes are done. $here isalways a chance that I missed something on my machine and the repository

    wasn#t properly updated. 2nly when my committed changes buildsuccessfully on the integration is my job done. $his integration build can bee'ecuted manually by me, or done automatically by -ruise.

    If a clash occurs between two developers, it is usually caught when thesecond developer to commit builds their updated working copy. If not theintegration build should fail. Either way the error is detected rapidly. Atthis point the most important task is to fi' it, and get the build workingproperly again. In a -ontinuous Integration environment you should neverhave a failed integration build stay failed for long. A good team should havemany correct builds a day. "ad builds do occur from time to time, butshould be (uickly fi'ed.

    $he result of doing this is that there is a stable piece of software that worksproperly and contains few bugs. Everybody develops off that shared stable

    base and never gets so far away from that base that it takes very long tointegrate back with it. 0ess time is spent trying to find bugs because theyshow up (uickly.

    Practices of Continuous Integration$he story above is the overview of -I and how it works in daily life. 7ettingall this to work smoothly is obviously rather more than that. I#ll focus nowon the key practices that make up effective -I.

  • 8/14/2019 Notes on Continuous Integration.doc

    4/17

    Maintain a Single Source Repository.

    /oftware projects involve lots of files that need to be orchestrated togetherto build a product. 8eeping track of all of these is a major effort,particularly when there#s multiple people involved. /o it#s not surprisingthat over the years software development teams have built tools to manageall this. $hese tools & called /ource -ode Management tools, configurationmanagement, version control systems, repositories, or various other names& are an integral part of most development projects. $he sad and surprisingthing is that they aren#t part of all projects. It is rare, but I do run intoprojects that don#t use such a system and use some messy combination oflocal and shared drives.

    /o as a simple basis make sure you get a decent source code management

    system. -ost isn#t an issue as good (uality open&source tools are available.$he current open source repository of choice is /ubversion. *$he olderopen&source tool -9/ is still widely used, and is much better than nothing,

    but /ubversion is the modern choice.+ Interestingly as I talk to developers Iknow most commercial source code management tools are liked less than/ubversion. $he only tool I#ve consistently heard people say is worth payingfor is erforce.

    2nce you get a source code management system, make sure it is the wellknown place for everyone to go get source code. 1obody should ever ask

    )where is the foo&whiffle file ) Everything should be in the repository. Although many teams use repositories a common mistake I see is that theydon#t put everything in the repository. If people use one they#ll put code inthere, but everything you need to do a build should be in there including!test scripts, properties files, database schema, install scripts, and thirdparty libraries. I#ve known projects that check their compilers into therepository *important in the early days of flaky -:: compilers+. $he basicrule of thumb is that you should be able to walk up to the project with a

    virgin machine, do a checkout, and be able to fully build the system. 2nly a

    minimal amount of things should be on the virgin machine & usually thingsthat are large, complicated to install, and stable. An operating system, ;avadevelopment environment, or base database system are typical e'amples.

  • 8/14/2019 Notes on Continuous Integration.doc

    5/17

    people into trouble. 8eep your use of branches to a minimum. In particularhave a mainline ! a single branch of the project currently underdevelopment. retty much everyone should work off this mainline most ofthe time. *>easonable branches are bug fi'es of prior production releases

    and temporary e'periments.+In general you should store in source control everything you need to buildanything, but nothing that you actually build. /ome people do keep the

    build products in source control, but I consider that to be a smell & anindication of a deeper problem, usually an inability to reliably recreate

    builds.

    Automate the Build

    7etting the sources turned into a running system can often be acomplicated process involving compilation, moving files around, loadingschemas into the databases, and so on. 6owever like most tasks in this partof software development it can be automated & and as a result should beautomated. Asking people to type in strange commands or clicking throughdialog bo'es is a waste of time and a breeding ground for mistakes.

    Automated environments for builds are a common feature of systems. $he4ni' world has had make for decades, the ;ava community developed Ant,the .1E$ community has had 1ant and now has M/"uild. Make sure youcan build and launch your system using these scripts using a singlecommand.

    A common mistake is not to include everything in the automated build. $he build should include getting the database schema out of the repository andfiring it up in the e'ecution environment. I#ll elaborate my earlier rule ofthumb! anyone should be able to bring in a virgin machine, check thesources out of the repository, issue a single command, and have a runningsystem on their machine.

    "uild scripts come in various flavors and are often particular to a platformor community, but they don#t have to be. Although most of our ;avaprojects use Ant, some have used >uby *the >uby >ake system is a very nice

    build script tool+. %e got a lot of value from automating an early Microsoft-2M project with Ant.

    A big build often takes time, you don#t want to do all of these steps if you#veonly made a small change. /o a good build tool analy5es what needs to bechanged as part of the process. $he common way to do this is to check thedates of the source and object files and only compile if the source date islater. =ependencies then get tricky! if one object file changes those that

  • 8/14/2019 Notes on Continuous Integration.doc

    6/17

    depend on it may also need to be rebuilt. -ompilers may handle this kind ofthing, or they may not.

    =epending on what you need, you may need different kinds of things to be built.

  • 8/14/2019 Notes on Continuous Integration.doc

    7/17

    2ver the last few years the rise of $== has populari5ed the 34nit family ofopen&source tools which are ideal for this kind of testing. 34nit tools haveproved very valuable to us at $hought%orks and I always suggest to peoplethat they use them. $hese tools, pioneered by 8ent "eck, make it very easy

    for you to set up a fully self&testing environment.34nit tools are certainly the starting point for making your code self&testing.

  • 8/14/2019 Notes on Continuous Integration.doc

    8/17

    My general rule of thumb is that every developer should commit to therepository every day. In practice it#s often useful if developers commit morefre(uently than that. $he more fre(uently you commit, the less places youhave to look for conflict errors, and the more rapidly you fi' conflicts.

    re(uent commits encourage developers to break down their work intosmall chunks of a few hours each. $his helps track progress and provides asense of progress. 2ften people initially feel they can#t do somethingmeaningful in just a few hours, but we#ve found that mentoring and practicehelps them learn.

    #$ery Commit Should Build the Mainline on an IntegrationMachine

    4sing daily commits, a team gets fre(uent tested builds. $his ought tomean that the mainline stays in a healthy state. In practice, however, thingsstill do go wrong. 2ne reason is discipline, people not doing an update and

    build before they commit. Another is environmental differences betweendevelopers# machines.

    As a result you should ensure that regular builds happen on an integrationmachine and only if this integration build succeeds should the commit beconsidered to be done. /ince the developer who commits is responsible forthis, that developer needs to monitor the mainline build so they can fi' it if

    it breaks. A corollary of this is that you shouldn#t go home until themainline build has passed with any commits you#ve added late in the day.

    $here are two main ways I#ve seen to ensure this! using a manual build or acontinuous integration server.

    $he manual build approach is the simplest one to describe. Essentially it#s asimilar thing to the local build that a developer does before the commit intothe repository. $he developer goes to the integration machine, checks outthe head of the mainline *which now houses his last commit+ and kicks off

    the integration build. 6e keeps an eye on its progress, and if the buildsucceeds he#s done with his commit.

    A continuous integration server acts as a monitor to the repository. Everytime a commit against the repository finishes the server automaticallychecks out the sources onto the integration machine, initiates a build, andnotifies the committer of the result of the build. $he committer isn#t doneuntil she gets the notification & usually an email.

    At $hought%orks, we#re big fans of continuous integration servers & indeed we led the original development of -ruise-ontrol and -ruise-ontrol.1E$,the widely used open&source -I servers. /ince then we#ve also built the

  • 8/14/2019 Notes on Continuous Integration.doc

    9/17

    commercial -ruise -I server. %e use a -I server on nearly every project wedo and have been very happy with the results.

    1ot everyone prefers to use a -I server. ;im /hore gave a well argueddescription of why he prefers the manual approach. I agree with him that-I is much more than just installing some software. All the practices hereneed to be in play to do -ontinuous Integration effectively. "ut e(uallymany teams who do -I well find a -I server to be a helpful tool.

    Many organi5ations do regular builds on a timed schedule, such as everynight. $his is not the same thing as a continuous build and isn#t enough forcontinuous integration. $he whole point of continuous integration is to findproblems as soon as you can. 1ightly builds mean that bugs lie undetectedfor a whole day before anyone discovers them. 2nce they are in the systemthat long, it takes a long time to find and remove them.

    A key part of doing a continuous build is that if the mainline build fails, itneeds to be fi'ed right away. $he whole point of working with -I is that

    you#re always developing on a known stable base. It#s not a bad thing for themainline build to break, although if it#s happening all the time it suggestspeople aren#t being careful enough about updating and building locally

    before a commit. %hen the mainline build does break, however, it#simportant that it gets fi'ed fast. $o help avoid breaking the mainline youmight consider using a pending head.

    %hen teams are introducing -I, often this is one of the hardest things tosort out. Early on a team can struggle to get into the regular habit of

    working mainline builds, particularly if they are working on an e'istingcode base. atience and steady application does seem to regularly do thetrick, so don#t get discouraged.

    &eep the Build Fast

    $he whole point of -ontinuous Integration is to provide rapid feedback.1othing sucks the blood of a -I activity more than a build that takes a longtime. 6ere I must admit a certain crotchety old guy amusement at what#sconsidered to be a long build. Most of my colleagues consider a build thattakes an hour to be totally unreasonable. I remember teams dreaming thatthey could get it so fast & and occasionally we still run into cases where it#s

    very hard to get builds to that speed.

    or most projects, however, the 3 guideline of a ten minute build isperfectly within reason. Most of our modern projects achieve this. It#s worthputting in concentrated effort to make it happen, because every minute youreduce off the build time is a minute saved for each developer every time

  • 8/14/2019 Notes on Continuous Integration.doc

    10/17

    they commit. /ince -I demands fre(uent commits, this adds up to a lot oftime.

    If you#re staring at a one hour build time, then getting to a faster build mayseem like a daunting prospect. It can even be daunting to work on a newproject and think about how to keep things fast. or enterpriseapplications, at least, we#ve found the usual bottleneck is testing &particularly tests that involve e'ternal services such as a database.

    robably the most crucial step is to start working on setting up a staged build. $he idea behind a staged build *also known as buildpipeline or deployment pipeline + is that there are in fact multiple

    builds done in se(uence. $he commit to the mainline triggers the first build& what I call the commit build. $he commit build is the build that#sneeded when someone commits to the mainline. $he commit build is theone that has to be done (uickly, as a result it will take a number of shortcutsthat will reduce the ability to detect bugs. $he trick is to balance the needsof bug finding and speed so that a good commit build is stable enough forother people to work on.

    2nce the commit build is good then other people can work on the code withconfidence. 6owever there are further, slower, tests that you can start todo. Additional machines can run further testing routines on the build thattake longer to do.

    A simple e'ample of this is a two stage build. $he first stage would do thecompilation and run tests that are more locali5ed unit tests with thedatabase completely stubbed out. /uch tests can run very fast, keeping

    within the ten minute guideline. 6owever any bugs that involve larger scaleinteractions, particularly those involving the real database, won#t be found.$he second stage build runs a different suite of tests that do hit the realdatabase and involve more end&to&end behavior. $his suite might take acouple of hours to run.

    In this scenario people use the first stage as the commit build and use thisas their main -I cycle. $he second&stage build is a secondary build whichruns when it can, picking up the e'ecutable from the latest good commit

    build for further testing. If the secondary build fails, then this doesn#t havethe same #stop everything# (uality, but the team does aim to fi' such bugs asrapidly as possible, while keeping the commit build running. Indeed thesecondary build doesn#t have to stay good, as long as each known bug isidentified and dealt with in a ne't few days. As in this e'ample, secondary

    builds are often pure tests since these days it#s usually tests that cause theslowness.

  • 8/14/2019 Notes on Continuous Integration.doc

    11/17

    If the secondary build detects a bug, that#s a sign that the commit buildcould do with another test. As much as possible you want to ensure that anysecondary build failure leads to new tests in the commit build that wouldhave caught the bug, so the bug stays fi'ed in the commit build. $his way

    the commit tests are strengthened whenever something gets past them.$here are cases where there#s no way to build a fast&running test thate'poses the bug, so you may decide to only test for that condition in thesecondary build. Most of time, fortunately, you can add suitable tests to thecommit build.

    $his e'ample is of a two&stage build, but the basic principle can be e'tendedto any number of later builds. $he builds after the commit build can also bedone in parallel, so if you have two hours of secondary tests you canimprove responsiveness by having two machines that run half the tests

    each. "y using parallel secondary builds like this you can introduce all sortsof further automated testing, including performance testing, into theregular build process. *I#ve run into a lot of interesting techni(ues aroundthis as I#ve visited various $hought%orks projects over the last couple of

    years & I#m hoping to persuade some of the developers to write these up.+

    "est in a Clone of the Production #n$ironment

    $he point of testing is to flush out, under controlled conditions, anyproblem that the system will have in production. A significant part of this is

    the environment within which the production system will run. If you test ina different environment, every difference results in a risk that whathappens under test won#t happen in production.

    As a result you want to set up your test environment to be as e'act a mimicof your production environment as possible. 4se the same databasesoftware, with the same versions, use the same version of operating system.ut all the appropriate libraries that are in the production environment intothe test environment, even if the system doesn#t actually use them. 4se thesame I addresses and ports, run it on the same hardware.

    %ell, in reality there are limits. If you#re writing desktop software it#s notpracticable to test in a clone of every possible desktop with all the thirdparty software that different people are running. /imilarly some productionenvironments may be prohibitively e'pensive to duplicate *although I#veoften come across false economies by not duplicating moderately e'pensiveenvironments+. =espite these limits your goal should still be to duplicatethe production environment as much as you can, and to understand therisks you are accepting for every difference between test and production.

    If you have a pretty simple setup without many awkward communications, you may be able to run your commit build in a mimicked environment.

  • 8/14/2019 Notes on Continuous Integration.doc

    12/17

    2ften, however, you need to use test doubles because systems respondslowly or intermittently. As a result it#s common to have a very artificialenvironment for the commit tests for speed, and use a production clone forsecondary testing.

    I#ve noticed a growing interest in using virtuali5ation to make it easy to puttogether test environments. 9irtuali5ed machines can be saved with all thenecessary elements baked into the virtuali5ation. It#s then relativelystraightforward to install the latest build and run tests. urthermore thiscan allow you to run multiple tests on one machine, or simulate multiplemachines in a network on a single machine. As the performance penalty of

    virtuali5ation decreases, this option makes more and more sense.

    Make it #asy for Anyone to 'et the (atest #)ecuta*le

    2ne of the most difficult parts of software development is making sure that you build the right software. %e#ve found that it#s very hard to specify what you want in advance and be correct? people find it much easier to seesomething that#s not (uite right and say how it needs to be changed. Agiledevelopment processes e'plicitly e'pect and take advantage of this part ofhuman behavior.

    $o help make this work, anyone involved with a software project should beable to get the latest e'ecutable and be able to run it! for demonstrations,e'ploratory testing, or just to see what changed this week.

    =oing this is pretty straightforward! make sure there#s a well known place where people can find the latest e'ecutable. It may be useful to put severale'ecutables in such a store. or the very latest you should put the lateste'ecutable to pass the commit tests & such an e'ecutable should be prettystable providing the commit suite is reasonably strong.

    If you are following a process with well defined iterations, it#s usually wiseto also put the end of iteration builds there too. =emonstrations, inparticular, need software whose features are familiar, so then it#s usually

    worth sacrificing the very latest for something that the demonstrator knowshow to operate.

    #$eryone can see what+s happening

    -ontinuous Integration is all about communication, so you want to ensurethat everyone can easily see the state of the system and the changes thathave been made to it.

    2ne of the most important things to communicate is the state of themainline build. If you#re using -ruise there#s a built in web site that willshow you if there#s a build in progress and what was the state of the last

  • 8/14/2019 Notes on Continuous Integration.doc

    13/17

    mainline build. Many teams like to make this even more apparent byhooking up a continuous display to the build system & lights that glow green

    when the build works, or red if it fails are popular. A particularly commontouch is red and green lava lamps & not just do these indicate the state of the

    build, but also how long it#s been in that state. "ubbles on a red lampindicate the build#s been broken for too long. Each team makes its ownchoices on these build sensors & it#s good to be playful with your choice*recently I saw someone e'perimenting with a dancing rabbit.+

    If you#re using a manual -I process, this visibility is still essential. $hemonitor of the physical build machine can show the status of the mainline

    build. 2ften you have a build token to put on the desk of whoever#scurrently doing the build *again something silly like a rubber chicken is agood choice+. 2ften people like to make a simple noise on good builds, like

    ringing a bell.-I servers# web pages can carry more information than this, of course.-ruise provides an indication not just of who is building, but what changesthey made. -ruise also provides a history of changes, allowing teammembers to get a good sense of recent activity on the project. I know teamleads who like to use this to get a sense of what people have been doing andkeep a sense of the changes to the system.

    Another advantage of using a web site is that those that are not co&locatedcan get a sense of the project#s status. In general I prefer to have everyoneactively working on a project sitting together, but often there are peripheralpeople who like to keep an eye on things. It#s also useful for groups toaggregate together build information from multiple projects & providing asimple and automated status of different projects.

    7ood information displays are not only those on a computer screens. 2neof my favorite displays was for a project that was getting into -I. It had along history of being unable to make stable builds. %e put a calendar on the

    wall that showed a full year with a small s(uare for each day. Every day the

    QA group would put a green sticker on the day if they had received onestable build that passed the commit tests, otherwise a red s(uare. 2ver timethe calendar revealed the state of the build process showing a steadyimprovement until green s(uares were so common that the calendardisappeared & its purpose fulfilled.

    Automate %eployment

    $o do -ontinuous Integration you need multiple environments, one to runcommit tests, one or more to run secondary tests. /ince you are moving

    e'ecutables between these environments multiple times a day, you#ll want

  • 8/14/2019 Notes on Continuous Integration.doc

    14/17

    to do this automatically. /o it#s important to have scripts that will allow youto deploy the application into any environment easily.

    A natural conse(uence of this is that you should also have scripts that allow you to deploy into production with similar ease.

  • 8/14/2019 Notes on Continuous Integration.doc

    15/17

    -ontinuous Integration completely finesses this problem. $here#s no longintegration, you completely eliminate the blind spot. At all times you know

    where you are, what works, what doesn#t, the outstanding bugs you have in your system.

    "ugs & these are the nasty things that destroy confidence and mess upschedules and reputations. "ugs in deployed software make users angry

    with you. "ugs in work in progress get in your way, making it harder to getthe rest of the software working correctly.

    -ontinuous Integrations doesn#t get rid of bugs, but it does make themdramatically easier to find and remove. In this respect it#s rather like self&testing code. If you introduce a bug and detect it (uickly it#s far easier to getrid of. /ince you#ve only changed a small bit of the system, you don#t havefar to look. /ince that bit of the system is the bit you just worked with, it#sfresh in your memory & again making it easier to find the bug.

  • 8/14/2019 Notes on Continuous Integration.doc

    16/17

    /o you fancy trying out -ontinuous Integration & where do you start $hefull set of practices I outlined above give you the full benefits & but you don#tneed to start with all of them.

    $here#s no fi'ed recipe here & much depends on the nature of your setupand team. "ut here are a few things that we#ve learned to get things going.

    2ne of the first steps is to get the build automated. 7et everything you needinto source control get it so that you can build the whole system with asingle command. or many projects this is not a minor undertaking & yet it#sessential for any of the other things to work. Initially you may only do buildoccasionally on demand, or just do an automated nightly build. %hile thesearen#t continuous integration an automated nightly build is a fine step onthe way.

    Introduce some automated testing into your build. $ry to identify the majorareas where things go wrong and get automated tests to e'pose thosefailures. articularly on an e'isting project it#s hard to get a really goodsuite of tests going rapidly & it takes time to build tests up. ome#s build schedule apply.

    $ry to speed up the commit build. -ontinuous Integration on a build of afew hours is better than nothing, but getting down to that magic ten minutenumber is much better. $his usually re(uires some pretty serious surgeryon your code base to do as you break dependencies on slow parts of thesystem.

    If you are starting a new project, begin with -ontinuous Integration fromthe beginning. 8eep an eye on build times and take action as soon as youstart going slower than the ten minute rule. "y acting (uickly you#ll makethe necessary restructurings before the code base gets so big that it

    becomes a major pain.

    Above all get some help. ind someone who has done -ontinuousIntegration before to help you. 0ike any new techni(ue it#s hard tointroduce it when you don#t know what the final result looks like. It maycost money to get a mentor, but you#ll also pay in lost time and productivityif you don#t do it. *=isclaimer @ Advert & yes we at $hought%orks do someconsultancy in this area. After all we#ve made most of the mistakes thatthere are to make.+

    Final "houghtsIn the years since Matt and I wrote the original paper on this site,-ontinuous Integration has become a mainstream techni(ue for software

  • 8/14/2019 Notes on Continuous Integration.doc

    17/17

    development. 6ardly any $hought%orks projects goes without it & and wesee others using -I all over the world. I#ve hardly ever heard negative thingsabout the approach & unlike some of the more controversial E'tremerogramming practices.

    If you#re not using -ontinuous Integration I strongly urge you give it a try.If you are, maybe there are some ideas in this article that can help you do itmore effectively. %e#ve learned a lot about -ontinuous Integration in thelast few years, I hope there#s still more to learn and improve.