2011 09 18 United "Platitudes, reality and promise"
Transcript of 2011 09 18 United "Platitudes, reality and promise"
Organized by
THE PLATITUDES, THE REALITY AND THE PROMISE
Where Did The High Performers Come From?
High Performing IT Organizations
High performers maintain a posture of compliance
Fewest number of repeat audit findings
One-third amount of audit preparation effort
High performers find and fix security breaches faster
5 times more likely to detect breaches by automated control
5 times less likely to have breaches result in a loss event
When high performers implement changes…
14 times more changes
One-half the change failure rate
One-quarter the first fix failure rate
10x faster MTTR for Sev 1 outages
When high performers manage IT resources…
One-third the amount of unplanned work
8 times more projects and IT services
6 times more applications
Source: IT Process Institute, 2008
Common Traits of High Performers
Source: IT Process Institute
Change management
Causality
Compliance and continual reduction of operational variance
Culture of…
Integration of IT operations/security via problem/change management Processes that serve both organizational needs and business objectives Highest rate of effective change
Highest service levels (MTTR, MTBF) Highest first fix rate (unneeded rework)
Production configurations Highest level of pre-production staffing Effective pre-production controls Effective pairing of preventive and detective controls
Visible Ops: Playbook of High Performers
The IT Process Institute has been studying
high-performing organizations since 1999
• What is common to all the high performers?
• What is different between them and average
and low performers?
• How did they become great?
Answers have been codified in the Visible
Ops Methodology
The “Visible Ops Handbook” is available
from the ITPI www.ITPI.org
2007: Three Controls Predict 60% Of Performance
To what extent does an organization define, monitor and
enforce the following?
• Standardized configuration strategy
• Process discipline
• Controlled access to production systems
Source: IT Process Institute, 2008
Organized by
THE DARKEST MOMENT IN MY JOURNEY
plat·i·tude: noun \ˈpla-tə-ˌtüd, -ˌtyüd\
1: the quality or state of being dull or
insipid
2: a banal, trite, or stale remark
Platitudes
“Buy low, sell high”
More Platitudes
”Speak in the language of the business”
”Help foster the right tone at the top”
"Build a genuine relationship with your fellow stakeholders”
”Be savvy and take advantage of compelling events”
"Create real security programs, so that compliance will be free”
”Because security is everyone's responsibility”
"Don't let the auditors create your compliance program for you”
“Assess, plan, design, execute, monitor”
“Build the right control environment, and security and compliance will come”
Tough Love From Ari Balogh
Why Was I So Unsatisfied With The State Of IT Practice?
IT operations work continued to be viewed as tactical
Information security and compliance programs were sucking all the air out of the
room (due to scoping problems)
The activation energy for successful improvement programs was still too high
The IT operations and Information Security issues overshadowed by development
• Issues are amplified 10x in production: outages, findings, lawsuits
• Technical debt builds up over time
• IT operations is often the constraint in the organization
Linkage of IT performance to business performance not obvious enough
“Why doesn’t the business care? I found the pump handle!”
Seeing A Bigger Problem
Operations Sees… Fragile applications are prone to failure Long time required to figure out “which
bit got flipped” Detective control is a salesperson Too much time required to restore
service Too much firefighting and unplanned
work Urgent security rework and remedation Planned project work cannot complete Frustrated customers leave Market share goes down Business misses Wall Street
commitments Business makes even larger promises to
Wall Street
Dev Sees… More urgent, date-driven projects put
into the queue Even more fragile code (less secure) put
into production More releases have increasingly
“turbulent installs” Release cycles lengthen to amortize
“cost of deployments” Failing bigger deployments more
difficult to diagnose Most senior and constrained IT ops
resources have less time to fix underlying process problems
Ever increasing backlog of infrastructure and security projects that could fix root cause and reduce costs
Ever increasing amount of tension between IT Ops and DevelopmentThese aren’t IT Operations or Infosec problems…
These are business problems!
14
Infosec Can Break A Core Chronic Conflict In IT *
Every IT organization is pressured to simultaneously:
• Respond more quickly to urgent business needs
• Provide stable, secure and predictable IT service
Source: The authors acknowledge Dr. Eliyahu Goldratt, creator of the Theory of Constraints and author of The Goal, has written extensively on the theory and practice of identifying and resolving core, chronic conflicts.
Words often used to describe ITIL process owners:“hysterical, irrelevant, bureaucratic, bottleneck, difficult to understand, not aligned
with the business, immature, shrill, perpetually focused on irrelevant technical minutiae…”
Framed This Way, Help Can Come From A Surprising Place
The VP Application Development will often have the following complaints:
• IT Operations is the bottleneck
• We complete the code, but it takes too long for IT Operations to get the code
into production
• Environments are never available when we need them
• Security changes break the production environment on rollout
• Releases often cause chaos and disruption to all the other production services
• Turbulent installs have become the norm: 30 min installs take 3 days
• Due to slow OS upgrades, applications delayed by 2 quarters
• We are always late getting features to market
Organized by
WHAT THESE BREAKTHROUGHS LOOK LIKE
A Reframed IT Operations Problem Statement
Increase flow from Dev to Production
• Increase throughput
• Decrease WIP
Our goal is to create a system of operations that allows
• Planned work to quickly move to production
• Ensure service is quickly restored when things go wrong
• Information security built in every stage of Development, Project Management, and IT Operations
How does this relate to Visible Ops?
• We focused much on “unplanned work”
• What’s happening to all the planned work?
• At any given time, what should IT Ops be working on?
• Now we are focusing on the flow of planned work
Goal #1: Decrease Cycle Time Of Releases
Create determinism in the release process
Move packaging responsibility to development
Release early and often
Decrease cycle time
• Reduce deployment times from 6 hours to 45 minutes
• Refactor deployment process that had 1300+ steps spanning 4 weeks
Never again “fix forward,” instead “roll back,” escalating any deviation from plan to Dev
Verify for all handoffs (e.g., correctness, accuracy, timeliness, etc…)
Ensure environments are properly built before deployment begins
Control code and environments down the preproduction runways
Hold Dev, QA, Int, and Staging owners accountable for integrity
Goal #2: Increase Production Rigor
Define what work is and where work can come from
Protect the integrity of the work queue (e.g., are checks being written than won’t clear?)
To preserve and increase throughput, elevate preventive projects and maintenance tasks
Document all work, changes and outcomes so that it is repeatable
Ops builds Agile standardized deployment stories, to be completed after Dev sprints are
complete
Maintains adequate situational awareness so that incidents could be quickly detected and
corrected
Standardize unplanned work and escalations
Always seeking to eradicate unplanned work and increase throughput
Lean Principle: “Better -> Faster -> Cheaper”
Organized by
REDUCING TRANSFORMATION ACTIVATION ENERGY
The Prescriptive DevOps Cookbook
Capture and codify how to start and finish
successful DevOps transformations
• Create isomorphic mapping between plant floors and
IT shops
• Co-authoring with Patrick DeBois, Mike Orzen, John
Willis
• Describe in detail how to replicate the transformations
describe in “When IT Fails: The Novel”
Goals
• How does Development, IT Operations and Infosec
become dependable partners
• How do they work together to solve business
problems (and Infosec, too)
By The Visible Ops Team:Gene Kim, Kevin Behr, George Spafford
The Theory of Constraints Approach To Visible Ops
Dr. Goldratt wrote The Goal in 1984,
describing Alex’s challenge to fix his plant’s
cost and due date issues within 90 days
Some tenets that went against common
wisdom:
• Every flow of work has a constraint/bottleneck
• Any improvement not made at the bottleneck
is merely an illusion
• Fallacy of cost accounting as operational
management tool
When IT Fails: The NovelDay 1
Steve Masters, CEO
Bill Palmer,
VP IT Operations
Parts Unlimited
$4B revenue/year
“We’re not Google.
IT isn’t a core competency”
Bill’s First Month On The Job
Day 1: CEO loses chairmanship of the company, due to
inability to deliver critical project that will “close the gap”
Day 2: The payroll outage, due to a tokenization rollout
Day 3: VP IT Operations thrown under the bus by Marketing
and Development: deployment in 9 days
Day 4: 900 IT general control deficiencies in SOX-404 audit
Day 12: The launch…
John Pesche, CISO
CISO for 12 years
39 years old
Aggressive career climber
Ex-Big Four auditor
John Pesche, CISO
John Pesche, CISO
John Pesche, CISO
Assumption #1
Infosec wins based on how much work it can put into the IT
system
How much budget can we get?
How many of the vulnerabilities can we get closed?
How much can we push line managers and workers to close
security holes?
Breakthrough #1
He realizes that excess control complexity continually adds
entropy to the rest of the system
He becomes the “forebrain of the organization:” what data
really needs to be protected, where controls reliance really
resides, and where you don't have sole reliance on a technical
control
Shrinks the scope of the SOX-404 and PCI audit, doubling the
capacity of the IT Operations organization
Assumption #2
Infosec wins when it meddles with IT daily work
Breakthrough #2
He shifts his focus from the work center level, to the plant line level
He realizes that he can help design a control environment and build
a system of work where Dev and Ops can be relied upon so that
they work together to simultaneously achieve:
• fast flow of features into production
• deliver services in production that are:
Attributes of Rugged DevOps
• Scalability, availability, survivability, sustainability, security, supportability,
defensibility
Assumption #3
Cycles for Infosec come at the expense of Development and IT
Operations
Breakthrough #3
IT Operations constraint capacity quadruples
Development release rate goes from quarterly to three times
daily
10% of all Dev and Ops cycles go to security requirements
Security mean time to Find/Fix goes from quarters to days to
hours
High Performing IT Organizations
High performers maintain a posture of compliance
Fewest number of repeat audit findings
One-third amount of audit preparation effort
High performers find and fix security breaches faster
5 times more likely to detect breaches by automated control
5 times less likely to have breaches result in a loss event
When high performers implement changes…
14 times more changes
One-half the change failure rate
One-quarter the first fix failure rate
10x faster MTTR for Sev 1 outages
When high performers manage IT resources…
One-third the amount of unplanned work
8 times more projects and IT services
6 times more applications
Source: IT Process Institute, 2008
Triumph Of The CISO
Interested?
If you’re interested in When IT Fails: The Novel, sign up for the
newsletter at http://whenitfails.org
Or:
# mail [email protected]
Subject: novel
# mail [email protected]
Subject: cookbook
Resources
From the IT Process Institute www.itpi.org• Both Visible Ops Handbooks• ITPI IT Controls Performance Study
“Lean IT” by Orzen and Bell• Winner of the Shingo Prize 2011
Rugged Software by Corman, et al: http://ruggedsoftware.org
“Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation” by Humble, Farley
Follow Gene Kim• @RealGeneKim• mailto:[email protected] • http://realgenekim.me/blog