The devops laboratory - 1 year later
-
Upload
javier-turegano-molina -
Category
Internet
-
view
487 -
download
0
Transcript of The devops laboratory - 1 year later
The Devops Lab+ 1 year
Last Conference Melbourne 2016
About Me: Javier Turegano
#devops
#open-source
#IT-leadership
@setoide
#web-operations
These is me and my passions.
My current gig
REA Group is a market-leading digital media business specialising in property.
In the last 4 and half year I've been working for REA.
We operate heavy traffic sites around the world.Some of the things that make REA special are:
- Innovation- Though leadership in areas like agile, lean and devops
The only constant is change, always looking to improve.
DEVOPS DAYS 2015
This talk
The devopsLaboratory+ 1 year
This talk is about the different experiments we've run to try to create a devops culture in REA.
As probably Nigel could explain better:
Complex systems are complex and organizations like REA are complex in many dimensions: business, engineering, IT systems, etc...
The approachChange something and observe. Be brave. Repeat.
At the beginning...
Delivery Team1
SiteOperations
OpsOpsOpsOpsDevDevDevDevDevDevDevDevDev
Delivery Team2
DevDevDevDevDevDevDevDevDev
Delivery Team N
DevDevDevDevDevDevDevDevDev
OpsDelviery vs Site Operations
If only there was someone around...
Ops: - To modify the code - To help understand how the application works
Devs: - To help us deploy to prod - To help us with some non functional requirements
SiteOPsManager
The night is dark and full of incidents.
Days since a full night sleep counter3-4 alerts per night
Happy engineer getting off pager.
Hire all
the heroes
Ops had to understand and troubleshoot a massively large complex set of systems
Storage/Networks/Systems/Apps/Monitoring/Data/Security etc...
That made hiring difficult because: Heroes don't scale
EXPERIMENT 1: Placements
Placements
Delivery Team1
SiteOperationsOpsOpsOpsOpsDevDevDevDevDevDevDevDevDev
Delivery Team2
DevDevDevDevDevDevDevDevDevOps
Delivery Team N
DevDevDevDevDevDevDevDevDev
Short temporal placements of engineers in a different functional area. Normally went for a few weeks.
Allocated capacityWorking closer to where the action is
Rotations
SiteOperationsOpsOpsDevDevDevDevDevDevDevDevDevDevDevDev
DevDevDevDevDevDevDevDevDevDevOpsOpsOpsOpsOps
Dev
OpsOpsKnowledge of full stackYou would never stop learning
Handovers and rump up for a new area difficultStill there were conflicting priorities
Alerts and incidents still been managed by the central team
Meet ADO, one of our first Devs to be fully knighted by the SiteOps team
Ops in DeliveryDevs in Site Ops
I am going to fit there?
EXPERIMENT 2
The tooling team
As many companies have done
Create a centralize team to drive automation, continuous delivery, cloud adoption, etc...
PROBLEMS:Painful manual deployments
QA blessing to go to prod
Coordination wall
1 staging fits all
Gandalf
Delivery Team1
SiteOperationsOpsOpsOpsDevDevDevDevDevDevDevDev
Delivery Team2
DevDevDevDevDevDevDevDevGandalf
OpsOpsDevDevQAOpsThe approachCentralized team
Build tools ( #cloud + #chef + #git )
Solution that fits all needs
Influence teams to adoption
Web 1Web 2API 1MobileSearch EngineUsersDatabaseBackend 1Backend 2E2E for every developer
This is a simplified version of an E2E environment. One of the achievements of the Gandalf team that allowed us for a long time to have better opportunities for developing and testing changes that affected multiple components.
Challenges
Pie charts, sorry
Just an example of some of the tech challenges the team was going through as they tried to provide stable infrastructure for EVERYONE.
EXPERIMENT 3
Secondments
Send your champions to contaminate other areas with their passion
Secondments
SiteOperationsOpsOpsOpsOpsDevDevDevDevDevDevDevDevDevDevDevDev
OpsDevDevDevDevDevDevDevDevDevDevDevDevLonger term allocations to a teamOps still reported/belonged to the SiteOps team
Different approach
- Champions in each team to build the needed capabilities: automation, monitoring, performance
Some plusesPriorities dictated by your function area
Engagement with the team
Better understanding of pain points
Early input in the project
And what about pager?
SiteOperationsOpsOpsOpsOpsDevDevDevDevDevDevDevDevDevDevDevDev
OpsDevDevDevDevDevDevDevDevDevDevDevDev
Day
Day
Day/Nights
Longer term allocations to a teamOps still reported/belonged to the SiteOps team
EXPERIMENT 4
Automation as part of Delivery
Example of optimization from within a team instead of tackling the full-company problem.
Autobots
Delivery Team1
SiteOperationsOpsOpsOpsDevDevDevDevDevDevDevDev
Delivery Team2
DevDevDevDevGandalf
OpsOpsDevDevQAOpsAutobots
DevDevOpsThe Autobots team was part of one of the Delivery areas and was focused on automating some parts of their delivery process.
They mianaged to automate some really compex processes:
- Schemabot: Database schema changes in an automated maner.- Deploybot: Managed the deployment. One of its components, the netscaler gem, was afterward used by multiple teams.
The idea of copying from the open source model and having teams looking at what other teams have come up with has repeated over time becoming one of the most successful patterns at REA.
EXPERIMENT 5
Ops as an attribute of Business areas
Business Areas + Lean GI
LoB1
Global Infrastructure
OpsOpsOpsOps
LoB2
InternationalLoBN
Different business areas highly independent Develop + OperationA very lean layer of Global Infrastructure to support
Global Infrastructure
Thing layers of shared services and vendor mgmt
The principle was to impulse TMI: Team Managed Infrastructure.
Cloud Many accounts
Cons: Does everybody needs to know about infrastructure/netoworks/etc...?
DevQAOpsBAIMTechLDevDevTeam 1 Midsize initiative X
DevDevBATeam 2 Small Initiative Y
IMDevQAOpsBATechLDevDevIMDevDevDevQAUXUXTeam 3 Big Initiative G
LoB A
Team 4 Midsize initiative Z
IMDevDevDevQAOpsLeadTech LeadIMBAUXTechLDevQAOpsIteration Manager
Business Analyst
User Experience
Tech lead
Developer
Tech lead
Quality Assurance
Operations
OpsNegative
Priorities dictated by your business area
New Silos
Lost sense of community
Postivie
Focus - Get Shit Done
Engagement with the team +++
Input into the roadmap
The AA virtuous circle
Autonomy
Accountability
We give autonomy to the business areas to chose the best tools/practices for their areas.They will have to support and maintain what they create which drives the Accountability.
Can you spot the Ops engineer?
Devs step up (Pager, deployments, metrics, performance, etc...)
Day pager going to devsEscalate if needed after troubleshootingProxy knowledgePick up BAUDeploy something that hasn't been deployed
Tom our ops engineer can focus in general improvements of operations like:
Exploring a new CDN
Regresion testing in Operations
Automating Security patches
Etc...
If the problem was beyond the knowledge of the engineers they can escalate the problem to the Ops representative and the good thing is that they will cache the knowledge.
The role of the ops in LoBs has evolved:
Their role (boost operations capacity in their area)
Enable previously disabled people
Early input into the projects
And what about pager?
LoB1
Global Infrastructure
OpsOpsOpsOps
LoB2
InternationalLoBN
v
Different business areas highly independent Develop + OperationA very lean layer of Global Infrastructure to support
War room becomes the exception. For example this all hands on deck collaboration to tackle Hearbleed as soon as possible.
EXPERIMENT 5
The era of Guilds
The rise of NEW Silos :'(
DeliveryOperationsLoBALoBBLoBCTeam
TeamTeamTeamTeamTeam
TeamTeamTeamTeamTeamTeam2 challenges so far:
- We need to increase our Ops capability across the organisation- We need to minimise the walls of the new Silos.
Guilds to the rescue!
Feedback
Happiness
Public speaking
Guild of guilds/metaguild
Cloud
Delivery Engineering
Ruby
Security
Lean/Agile
Ops Dojo
What are guilds?- Communities of interest around different topics- Opt in model- They are horizontal
EXPERIMENT 6
The raise of the Delivery Engineering teams
DevQAOpsBAIMTechLDevDevTeam 1 Midsize initiative X
DevDevBATeam 2 Small Initiative Y
IMDevQABATechLDevDevIMDevDevDevQAUXUXTeam 3 Big Initiative G
LoB TOO MANY STREAMS
Team 4 Midsize initiative Z
IMDevDevDevQAIMBAUXTechLDevQAOpsIteration Manager
Business Analyst
User Experience
Tech lead
Developer
Tech lead
Quality Assurance
Operations
DevDevBATeam 6 Small Initiative Y
DevDevTeam N Small Initiative Y
DevBAIMTechLDevDevIMOpsOpsThe previous model was quite successful but as we can see as we became faster the business areas tried to run more streams in parallel but the Ops capability sometimes wasn't correctly readjusted...
How many ops are too many ops?
With areas running so many concurrent projects
Push to regroup again
But how is this different?Previous investments paying off. Devs++
Focus in areas that can boost the full group
DevQABAIMTechLDevDevTeam 1 Midsize initiative X
DevDevBATeam 2 Small Initiative Y
IMDevQAOpsBATechLDevDevIMDevDevDevQAUXUXTeam 3 Big Initiative G
LoB A
Team 4 Midsize initiative Z
IMDevDevDevQA
Team 5 Delivery Engineering
OpsOpsDevQAOpsLeadTech LeadIMBAUXTechLDevQAOpsIteration Manager
Business Analyst
User Experience
Tech lead
Developer
Tech lead
Quality Assurance
Operations
OpsSometimes called Devops (arrrgggggg) or BAU teams.
Focus: go fast from idea to prod
Examples: MaD walking scheleton, Group Delivery Engineering
Danger: BAU and operations brought back to this group undoing the previous beneficts
KUDOS TO ANGUSFIRST GRAD ON PAGER
Night pager improved over time.And finally we had our first grad on Pager.
Kudos to Angus.
EXPERIMENT 7
Sec + DevOps
Adding Security to the equation
LoB1
Global Infrastructure
OpsOpsOpsOps
LoB2
InternationalLoBN
Security
OpsOpsSecSecDifferent business areas highly independent Develop + OperationA very lean layer of Global Infrastructure to support
Security improvements roadmapSecurity as consultants/coaches/experts
Teams are accountable for security
Lean technique: A3s (find out american sizes)
Story telling
(PIC) A3s
Sec Consulting
LoB1
Global Infrastructure
OpsOpsOpsOps
LoB2
InternationalLoBN
Security
OpsOpsSecSecDifferent business areas highly independent Develop + OperationA very lean layer of Global Infrastructure to support
EXPERIMENT 8
Leverage vs Autonomy
EXPERIMENT 9
Finance + DevOps
EXPERIMENT 10
????
http://rea.to/careers
TL;DR: Which one worked?
There are only a few problems that can't be solved by cake
QUESTIONS?FEEDBACK?
THANKS!
@setoide
This experiments presented are just examples of what we have tried at some point of time. They had different level of success and the results are based on the state of our own business and our own journey.
Run your own experiments. Try new things. Monitor the results.