The devops laboratory - 1 year later

download The devops laboratory - 1 year later

If you can't read please download the document

Transcript of The devops laboratory - 1 year later

The Devops Lab+ 1 year

Last Conference Melbourne 2016

About Me: Javier Turegano

#devops

#open-source

#IT-leadership

@setoide

#web-operations

These is me and my passions.

My current gig

REA Group is a market-leading digital media business specialising in property.

In the last 4 and half year I've been working for REA.

We operate heavy traffic sites around the world.Some of the things that make REA special are:

- Innovation- Though leadership in areas like agile, lean and devops

The only constant is change, always looking to improve.

DEVOPS DAYS 2015

This talk

The devopsLaboratory+ 1 year

This talk is about the different experiments we've run to try to create a devops culture in REA.

As probably Nigel could explain better:

Complex systems are complex and organizations like REA are complex in many dimensions: business, engineering, IT systems, etc...

The approachChange something and observe. Be brave. Repeat.

At the beginning...

Delivery Team1

SiteOperations

OpsOpsOpsOpsDevDevDevDevDevDevDevDevDev

Delivery Team2

DevDevDevDevDevDevDevDevDev

Delivery Team N

DevDevDevDevDevDevDevDevDev

OpsDelviery vs Site Operations

If only there was someone around...

Ops: - To modify the code - To help understand how the application works

Devs: - To help us deploy to prod - To help us with some non functional requirements

SiteOPsManager

The night is dark and full of incidents.

Days since a full night sleep counter3-4 alerts per night

Happy engineer getting off pager.

Hire all
the heroes

Ops had to understand and troubleshoot a massively large complex set of systems

Storage/Networks/Systems/Apps/Monitoring/Data/Security etc...

That made hiring difficult because: Heroes don't scale

EXPERIMENT 1: Placements

Placements

Delivery Team1

SiteOperationsOpsOpsOpsOpsDevDevDevDevDevDevDevDevDev

Delivery Team2

DevDevDevDevDevDevDevDevDevOps

Delivery Team N

DevDevDevDevDevDevDevDevDev

Short temporal placements of engineers in a different functional area. Normally went for a few weeks.

Allocated capacityWorking closer to where the action is

Rotations

SiteOperationsOpsOpsDevDevDevDevDevDevDevDevDevDevDevDev

DevDevDevDevDevDevDevDevDevDevOpsOpsOpsOpsOps

Dev

OpsOpsKnowledge of full stackYou would never stop learning

Handovers and rump up for a new area difficultStill there were conflicting priorities

Alerts and incidents still been managed by the central team

Meet ADO, one of our first Devs to be fully knighted by the SiteOps team

Ops in DeliveryDevs in Site Ops

I am going to fit there?

EXPERIMENT 2

The tooling team

As many companies have done

Create a centralize team to drive automation, continuous delivery, cloud adoption, etc...

PROBLEMS:Painful manual deployments

QA blessing to go to prod

Coordination wall

1 staging fits all

Gandalf

Delivery Team1

SiteOperationsOpsOpsOpsDevDevDevDevDevDevDevDev

Delivery Team2

DevDevDevDevDevDevDevDevGandalf

OpsOpsDevDevQAOpsThe approachCentralized team

Build tools ( #cloud + #chef + #git )

Solution that fits all needs

Influence teams to adoption

Web 1Web 2API 1MobileSearch EngineUsersDatabaseBackend 1Backend 2E2E for every developer

This is a simplified version of an E2E environment. One of the achievements of the Gandalf team that allowed us for a long time to have better opportunities for developing and testing changes that affected multiple components.

Challenges

Pie charts, sorry

Just an example of some of the tech challenges the team was going through as they tried to provide stable infrastructure for EVERYONE.

EXPERIMENT 3

Secondments

Send your champions to contaminate other areas with their passion

Secondments

SiteOperationsOpsOpsOpsOpsDevDevDevDevDevDevDevDevDevDevDevDev

OpsDevDevDevDevDevDevDevDevDevDevDevDevLonger term allocations to a teamOps still reported/belonged to the SiteOps team

Different approach

- Champions in each team to build the needed capabilities: automation, monitoring, performance

Some plusesPriorities dictated by your function area

Engagement with the team

Better understanding of pain points

Early input in the project

And what about pager?

SiteOperationsOpsOpsOpsOpsDevDevDevDevDevDevDevDevDevDevDevDev

OpsDevDevDevDevDevDevDevDevDevDevDevDev

Day

Day

Day/Nights

Longer term allocations to a teamOps still reported/belonged to the SiteOps team

EXPERIMENT 4

Automation as part of Delivery

Example of optimization from within a team instead of tackling the full-company problem.

Autobots

Delivery Team1

SiteOperationsOpsOpsOpsDevDevDevDevDevDevDevDev

Delivery Team2

DevDevDevDevGandalf

OpsOpsDevDevQAOpsAutobots

DevDevOpsThe Autobots team was part of one of the Delivery areas and was focused on automating some parts of their delivery process.

They mianaged to automate some really compex processes:

- Schemabot: Database schema changes in an automated maner.- Deploybot: Managed the deployment. One of its components, the netscaler gem, was afterward used by multiple teams.

The idea of copying from the open source model and having teams looking at what other teams have come up with has repeated over time becoming one of the most successful patterns at REA.

EXPERIMENT 5

Ops as an attribute of Business areas

Business Areas + Lean GI

LoB1

Global Infrastructure

OpsOpsOpsOps

LoB2

InternationalLoBN

Different business areas highly independent Develop + OperationA very lean layer of Global Infrastructure to support

Global Infrastructure

Thing layers of shared services and vendor mgmt

The principle was to impulse TMI: Team Managed Infrastructure.

Cloud Many accounts

Cons: Does everybody needs to know about infrastructure/netoworks/etc...?

DevQAOpsBAIMTechLDevDevTeam 1 Midsize initiative X

DevDevBATeam 2 Small Initiative Y

IMDevQAOpsBATechLDevDevIMDevDevDevQAUXUXTeam 3 Big Initiative G

LoB A

Team 4 Midsize initiative Z

IMDevDevDevQAOpsLeadTech LeadIMBAUXTechLDevQAOpsIteration Manager

Business Analyst

User Experience

Tech lead

Developer

Tech lead

Quality Assurance

Operations

OpsNegative

Priorities dictated by your business area

New Silos

Lost sense of community
Postivie

Focus - Get Shit Done

Engagement with the team +++

Input into the roadmap

The AA virtuous circle

Autonomy

Accountability

We give autonomy to the business areas to chose the best tools/practices for their areas.They will have to support and maintain what they create which drives the Accountability.

Can you spot the Ops engineer?

Devs step up (Pager, deployments, metrics, performance, etc...)

Day pager going to devsEscalate if needed after troubleshootingProxy knowledgePick up BAUDeploy something that hasn't been deployed

Tom our ops engineer can focus in general improvements of operations like:

Exploring a new CDN

Regresion testing in Operations

Automating Security patches

Etc...

If the problem was beyond the knowledge of the engineers they can escalate the problem to the Ops representative and the good thing is that they will cache the knowledge.

The role of the ops in LoBs has evolved:

Their role (boost operations capacity in their area)

Enable previously disabled people

Early input into the projects

And what about pager?

LoB1

Global Infrastructure

OpsOpsOpsOps

LoB2

InternationalLoBN

v

Different business areas highly independent Develop + OperationA very lean layer of Global Infrastructure to support

War room becomes the exception. For example this all hands on deck collaboration to tackle Hearbleed as soon as possible.

EXPERIMENT 5

The era of Guilds

The rise of NEW Silos :'(

DeliveryOperationsLoBALoBBLoBCTeam

TeamTeamTeamTeamTeam

TeamTeamTeamTeamTeamTeam2 challenges so far:

- We need to increase our Ops capability across the organisation- We need to minimise the walls of the new Silos.

Guilds to the rescue!

Feedback

Happiness

Public speaking

Guild of guilds/metaguild

Cloud

Delivery Engineering

Ruby

Security

Lean/Agile

Ops Dojo

What are guilds?- Communities of interest around different topics- Opt in model- They are horizontal

EXPERIMENT 6

The raise of the Delivery Engineering teams

DevQAOpsBAIMTechLDevDevTeam 1 Midsize initiative X

DevDevBATeam 2 Small Initiative Y

IMDevQABATechLDevDevIMDevDevDevQAUXUXTeam 3 Big Initiative G

LoB TOO MANY STREAMS

Team 4 Midsize initiative Z

IMDevDevDevQAIMBAUXTechLDevQAOpsIteration Manager

Business Analyst

User Experience

Tech lead

Developer

Tech lead

Quality Assurance

Operations

DevDevBATeam 6 Small Initiative Y

DevDevTeam N Small Initiative Y

DevBAIMTechLDevDevIMOpsOpsThe previous model was quite successful but as we can see as we became faster the business areas tried to run more streams in parallel but the Ops capability sometimes wasn't correctly readjusted...

How many ops are too many ops?

With areas running so many concurrent projects

Push to regroup again

But how is this different?Previous investments paying off. Devs++

Focus in areas that can boost the full group

DevQABAIMTechLDevDevTeam 1 Midsize initiative X

DevDevBATeam 2 Small Initiative Y

IMDevQAOpsBATechLDevDevIMDevDevDevQAUXUXTeam 3 Big Initiative G

LoB A

Team 4 Midsize initiative Z

IMDevDevDevQA

Team 5 Delivery Engineering

OpsOpsDevQAOpsLeadTech LeadIMBAUXTechLDevQAOpsIteration Manager

Business Analyst

User Experience

Tech lead

Developer

Tech lead

Quality Assurance

Operations

OpsSometimes called Devops (arrrgggggg) or BAU teams.

Focus: go fast from idea to prod

Examples: MaD walking scheleton, Group Delivery Engineering

Danger: BAU and operations brought back to this group undoing the previous beneficts

KUDOS TO ANGUSFIRST GRAD ON PAGER

Night pager improved over time.And finally we had our first grad on Pager.

Kudos to Angus.

EXPERIMENT 7

Sec + DevOps

Adding Security to the equation

LoB1

Global Infrastructure

OpsOpsOpsOps

LoB2

InternationalLoBN

Security

OpsOpsSecSecDifferent business areas highly independent Develop + OperationA very lean layer of Global Infrastructure to support

Security improvements roadmapSecurity as consultants/coaches/experts

Teams are accountable for security

Lean technique: A3s (find out american sizes)

Story telling

(PIC) A3s

Sec Consulting

LoB1

Global Infrastructure

OpsOpsOpsOps

LoB2

InternationalLoBN

Security

OpsOpsSecSecDifferent business areas highly independent Develop + OperationA very lean layer of Global Infrastructure to support

EXPERIMENT 8

Leverage vs Autonomy

EXPERIMENT 9

Finance + DevOps

EXPERIMENT 10

????

http://rea.to/careers

TL;DR: Which one worked?

There are only a few problems that can't be solved by cake

QUESTIONS?FEEDBACK?

THANKS!

@setoide

This experiments presented are just examples of what we have tried at some point of time. They had different level of success and the results are based on the state of our own business and our own journey.

Run your own experiments. Try new things. Monitor the results.