Configuration management - A "love" story

This talk was presented in Infracoders Melbourne on 11th March 2014 by Javier Turegano (@setoide)

I tried two experiments:- Tell a parallel story (love, it could easily be horror)- Sketching all the slides in a whiteboard

It is based on a real story (not 100% accurate), I took a few licences

First love

- Perfect- Will last forever- All the things that you need now and ever- First experiences

A long, long time ago: small teams of devs and ops

Only a few serversManual deploymentsConfig via SSH

As time passes teams grow and complexity and number of servers as well

It was love at first sight

Puppet running in daemon mode in all the serversConfig is stored in git and pushed to the puppet masters

Automation works! Hurray

But as the number of modules grew and dependencies started to be more complex, single errors updating a service could take completely unrelated farms down.No testing.

Big lost of confidenceManual test was required before deployment

The decision was made to stop running daemon mode

In order to deploy a change, an ops will write a change, push it to git and then test in the node through ssh using the noop option, if it worked as expected then they will use puppet agent to apply the change in that node.

But, what about the rest of the nodes...Slowly they started to be out of sync and whenever you were going to make a change you could find a bunch of unapplied changes and errors waiting for you.Confidence on doing changes continued to decrease.

People in a rush would do the changes manually in the servers and...

...afetwards commit the changes to the repository.

The first love didn't look so perfect at that stage.

And then someone else gets into your live... fresh, new, exciting...

But before that, let's expore the problem:

- A single staging environment for multiple teams of development willing to test changes. Waiting and lots of work to maintain the system.

Dev story

Who could help us? Gandalf, of course. (Codename for the project to solve the problem)

Gandalf came with a lot of new ideas:

- Use AWS cloud to avoid the limits of physical infrastructure and enable on demand resources- Chef, the blond girl of this story, as dealing with fixing the relationship with puppet was a lot of work... let's start fresh.- The gandalf tooling would allow development teams to create their own stack with the full or limited functionality to test.

Now a team can deploy a full end to end environment as needed

And then a different developer or team can create another environment with the components needed for his test.

The idea worked really well... but going to production was still a problem.

The difference between dev/test and prod could end in problems

After trying different experiences and when things get too complicated is time to simplify and commit.

As time has evolved the company has adopted lots of principle of agile and devops culture.Cross functional teams are formed and work in parallel streams in line of businesses.A central platform team takes care of the common elements of the stack: datacentres, cloud, os, common services, etc... while teams are fully reponsible for the application life cycle.

The platform team provides an unified platform image that can run in dev/test and staging.

The image is created using puppet and includes all the common bits required (ldap, nfs, splunk forwarder, newrelic, etc)

New versions of the platform image are tested first in dev and staging before been promoted to production.

LoB teams package their application using rpms and their CIs upload them to a common repository.

As the platform image provides lots of common services the application dependencies decrease enormously and simplifies the full process of deploying.

The combination of the rpm and the platform image can be tested.

When confident you can deploy to production.

In order to be able to easily apply different configuration to dev/test to prod environments a configuration service can provide configuration variables different to each environment.

This way the combination rpm+image is the same in all environments.

And then you have kids... you will assume that they will follow your steps but they have their own preferences...

At this point, more environments end up in the cloud.

One of our principles is to value autonomy/accountability over economies of scale. Teams can make their own choices as they will be the ones maintaining the systems.

The idea is that sharing still happens but still of imposing a technology/methodology/practice we use other mechanism to share knowledge:

- Guilds: where people share their wins/loses and experiences and allow other people to pick up ideas. For example: the cloud guild, the delivery engineering guild, security guild or Ops Dojo.

- Open Source model: A team creates a tool or service that other teams are interested in. A custodian is designated and guides the roadmap but other teams/individuals can contribute or even fork if they have different needs.

For example team 1 decides that they want to use a combination of technologies.

While maybe team 3 has a difference preference according with their needs/skills.

Through the sharing process some ideas will be reused by other teams when they are proven sucessful.

Next... are you up for the cahllenge?We are hiring:

http://careers.realestate.com.au

Configuration management - A "love" story

Technology

Transcript of Configuration management - A "love" story