Post on 08-May-2015
description
Scaling the Cloud
Bill BurnsSr. Manager, Networks & Security
CISO Executive ForumFebruary 26, 2012
Thursday, March 8, 12
Agenda
•Netflix Background and Culture
•Why We Moved to the Cloud
• InfoSec Challenges in an IaaS Cloud
• InfoSec Perspective: Running In The Cloud
Thursday, March 8, 12
NetflixBusiness
(c) 2011 SandvineThursday, March 8, 12
NetflixBusiness
• 24+ million members globally
(c) 2011 SandvineThursday, March 8, 12
NetflixBusiness
• 24+ million members globally
• Streaming in 47 countries
(c) 2011 SandvineThursday, March 8, 12
NetflixBusiness
• 24+ million members globally
• Streaming in 47 countries
•Watch on more than 700 devices
(c) 2011 SandvineThursday, March 8, 12
NetflixBusiness
• 24+ million members globally
• Streaming in 47 countries
•Watch on more than 700 devices
• 33% of US peak evening Internet traffic
(c) 2011 SandvineThursday, March 8, 12
Background and Context
•High Performance Culture
•Fail Fast, Learn Fast ... Get Results
•Core Value: “Freedom & Responsibility”
Thursday, March 8, 12
Engineering-Centric Culture
Thursday, March 8, 12
Engineering-Centric Culture
•Sought the Cloud for Availability, Capacity
• ...and also found Agility
Thursday, March 8, 12
Engineering-Centric Culture
•Sought the Cloud for Availability, Capacity
• ...and also found Agility
•DevOps / NoOps means engineering teams own:
•New deployments and upgrades
•Capacity planning & procurement
Thursday, March 8, 12
Freedom&
Responsibility
Thursday, March 8, 12
Freedom&
Responsibility
Thursday, March 8, 12
Why Cloud?
•Transforming Netflix’s Core Business
•Availability, Capacity, Consistency
•Lower operational effort
•Mission Focus
•Agility
Thursday, March 8, 12
Demand vs Capacity
Thursday, March 8, 12
Demand vs Capacity
Thursday, March 8, 12
Demand vs Capacity
37x growth in13 months
Thursday, March 8, 12
Demand vs Capacity
37x growth in13 months
DataCenter Capacity
Thursday, March 8, 12
Cloud:On-Demand Capacity
Thursday, March 8, 12
Cloud:On-Demand Capacity
1. Demand: Typical pattern of customer requests rise & fall over time
1
Demand
Thursday, March 8, 12
Cloud:On-Demand Capacity
1. Demand: Typical pattern of customer requests rise & fall over time
2. Reaction: System automatically adds, removes servers to the application pool
1
Demand
2
# Servers
Thursday, March 8, 12
Cloud:On-Demand Capacity
1. Demand: Typical pattern of customer requests rise & fall over time
2. Reaction: System automatically adds, removes servers to the application pool
3. Result: Overall utilization stays constant
1
Demand
2
# Servers
3
Utilization
Thursday, March 8, 12
InfoSec Challenges In An IaaS CloudU"lity'
Authen"city'
Possession'
Confiden"ality'
Integrity'
Availability'
Thursday, March 8, 12
InfoSec Challenge in an IaaS Cloud :: Confidentiality
Thursday, March 8, 12
InfoSec Challenge in an IaaS Cloud :: Integrity
Thursday, March 8, 12
InfoSec Challenge in an IaaS Cloud :: Availability
Thursday, March 8, 12
InfoSec Challenge in an IaaS Cloud :: Possession/Control
Thursday, March 8, 12
InfoSec Challenge in an IaaS Cloud :: Authenticity
Thursday, March 8, 12
InfoSec Challenge in an IaaS Cloud :: Authenticity
Thursday, March 8, 12
InfoSec Challenge in an IaaS Cloud :: Authenticity
Thursday, March 8, 12
InfoSec Challenge in an IaaS Cloud :: Authenticity
Thursday, March 8, 12
Running In The Cloud :: InfoSec Perspective
Thursday, March 8, 12
Running In The Cloud :: InfoSec Perspective
Thursday, March 8, 12
Running In The Cloud :: InfoSec Perspective
Thursday, March 8, 12
Running In The Cloud :: InfoSec Perspective
Thursday, March 8, 12
InfoSec In The Cloud :: Harder
Thursday, March 8, 12
InfoSec In The Cloud :: Harder
1.“You’re host attacked me yesterday. Please stop!”
Thursday, March 8, 12
InfoSec In The Cloud :: Harder
1.“You’re host attacked me yesterday. Please stop!”2.Dealing with other people’s traffic at your front door
Thursday, March 8, 12
InfoSec In The Cloud :: Harder
1.“You’re host attacked me yesterday. Please stop!”2.Dealing with other people’s traffic at your front door 3.Herding ephemeral instances with vendor applications
Thursday, March 8, 12
InfoSec In The Cloud :: Harder
1.“You’re host attacked me yesterday. Please stop!”2.Dealing with other people’s traffic at your front door 3.Herding ephemeral instances with vendor applications4.Trusting endpoints, infrastructure
Thursday, March 8, 12
InfoSec In The Cloud :: Harder
1.“You’re host attacked me yesterday. Please stop!”2.Dealing with other people’s traffic at your front door 3.Herding ephemeral instances with vendor applications4.Trusting endpoints, infrastructure5.Key management
Thursday, March 8, 12
InfoSec In The Cloud :: Easier
Thursday, March 8, 12
InfoSec In The Cloud :: Easier
1.Reacting to business velocity
2.Detecting instance changes
3.Application ownership, management
4.Patching, updating
5.Availability, in a failure-prone environment
6.Embedding security controls
7.Least privilege enforcement
8.Testing/auditing for conformance
9.Consistency, conformity in build and launch
Thursday, March 8, 12
Old IT way:Hand-Crafted configuration
(C) courtesy: Flikr (piper, viamoi)Thursday, March 8, 12
Old IT way:Hand-Crafted configuration
(C) courtesy: Flikr (piper, viamoi)Thursday, March 8, 12
New: Automation
Thursday, March 8, 12
Change Controls ::Patching
• Goal: Running instances do not get patched• Alternative:
• Bake a new AMI for any change• Launch new instances in parallel• Kill the old instances
Thursday, March 8, 12
Change Controls ::Upgrades• Bake a new AMI for any
change
• Launch new instances in parallel
• Kill the old instances
Lesson Learned: Make the secure, consistent behavior the easier alternative.
Thursday, March 8, 12
Availability :: Never Launch One of Anything
(c) Courtesy Flikr - WintonThursday, March 8, 12
Availability :: Never Launch One of Anything
•Chaos Monkey induces failures, helps us practice recovery
(c) Courtesy Flikr - WintonThursday, March 8, 12
Availability :: Never Launch One of Anything
•Chaos Monkey induces failures, helps us practice recovery
•Balance across Availability Zones
(c) Courtesy Flikr - WintonThursday, March 8, 12
Availability :: Never Launch One of Anything
•Chaos Monkey induces failures, helps us practice recovery
•Balance across Availability Zones
•Applications automatically scale-out, regenerate
(c) Courtesy Flikr - WintonThursday, March 8, 12
Availability :: Never Launch One of Anything
•Chaos Monkey induces failures, helps us practice recovery
•Balance across Availability Zones
•Applications automatically scale-out, regenerate
•Conformity Monkey detects differences, improper settings
(c) Courtesy Flikr - WintonThursday, March 8, 12
Identity Challenges :: Vendors Lagging
Thursday, March 8, 12
Identity Challenges :: Vendors Lagging
• Cloud instances are ephemeral
• Customers cannot necessarily pick their IP addresses, ranges
• Instances need to base context on apps, services, tagging (not IPs)
• Vendors need better support for ephemeral licensing, stateless instances, self-config
Thursday, March 8, 12
Identity Challenges :: Vendors Lagging
• Cloud instances are ephemeral
• Customers cannot necessarily pick their IP addresses, ranges
• Instances need to base context on apps, services, tagging (not IPs)
• Vendors need better support for ephemeral licensing, stateless instances, self-config
• Machine capacity is no longer a CapEx friction item.
Thursday, March 8, 12
Conformity&Consistency
Thursday, March 8, 12
Conformity&Consistency
Thursday, March 8, 12
Automation =Conformity &Consistency
Thursday, March 8, 12
Automation =Conformity &Consistency
• All apps, tiers are Highly Available
• Secure defaults applied automatically
• Replacement instances look just like the originals
Thursday, March 8, 12
Automation =Conformity &Consistency
• All apps, tiers are Highly Available
• Secure defaults applied automatically
• Replacement instances look just like the originals
Thursday, March 8, 12
Baked-In Security Controls :: Netflix Simian Army
• Cloud Ready Dashboard
• Identify and test common failure modes
• Continuous, aggressive monitoring, testing
• Mostly opt-In
Thursday, March 8, 12
Baked-In Security Controls :: Netflix Simian Army
• Cloud Ready Dashboard
• Identify and test common failure modes
• Continuous, aggressive monitoring, testing
• Mostly opt-In
Thursday, March 8, 12
Baked-In Security Controls :: Netflix Simian Army
• Cloud Ready Dashboard
• Identify and test common failure modes
• Continuous, aggressive monitoring, testing
• Mostly opt-In
• Chaos Monkey - Randomly kills instances
Thursday, March 8, 12
Baked-In Security Controls :: Netflix Simian Army
• Cloud Ready Dashboard
• Identify and test common failure modes
• Continuous, aggressive monitoring, testing
• Mostly opt-In
• Chaos Monkey - Randomly kills instances
• Conformity Monkey - Various policy checks
Thursday, March 8, 12
Baked-In Security Controls :: Netflix Simian Army
• Cloud Ready Dashboard
• Identify and test common failure modes
• Continuous, aggressive monitoring, testing
• Mostly opt-In
• Chaos Monkey - Randomly kills instances
• Conformity Monkey - Various policy checks
• Latency Monkey – Induces random latency
Thursday, March 8, 12
Baked-In Security Controls :: Netflix Simian Army
• Cloud Ready Dashboard
• Identify and test common failure modes
• Continuous, aggressive monitoring, testing
• Mostly opt-In
• Chaos Monkey - Randomly kills instances
• Conformity Monkey - Various policy checks
• Latency Monkey – Induces random latency
• Janitor Monkey – Kills orphaned instances
Thursday, March 8, 12
Baked-In Security Controls :: Netflix Simian Army
• Cloud Ready Dashboard
• Identify and test common failure modes
• Continuous, aggressive monitoring, testing
• Mostly opt-In
• Chaos Monkey - Randomly kills instances
• Conformity Monkey - Various policy checks
• Latency Monkey – Induces random latency
• Janitor Monkey – Kills orphaned instances
• Security Monkey – Various security checks
Thursday, March 8, 12
Baked-In Security Controls :: Netflix Simian Army
• Cloud Ready Dashboard
• Identify and test common failure modes
• Continuous, aggressive monitoring, testing
• Mostly opt-In
• Chaos Monkey - Randomly kills instances
• Conformity Monkey - Various policy checks
• Latency Monkey – Induces random latency
• Janitor Monkey – Kills orphaned instances
• Security Monkey – Various security checks
• Exploit Monkey – Vuln Scans / Pen Tests
Thursday, March 8, 12
Baked-In Security Controls :: Netflix Simian Army
• Cloud Ready Dashboard
• Identify and test common failure modes
• Continuous, aggressive monitoring, testing
• Mostly opt-In
• Chaos Monkey - Randomly kills instances
• Conformity Monkey - Various policy checks
• Latency Monkey – Induces random latency
• Janitor Monkey – Kills orphaned instances
• Security Monkey – Various security checks
• Exploit Monkey – Vuln Scans / Pen Tests
• Unnamed – File integrity monitoring, HIDS
Thursday, March 8, 12
Embedded Security Controls
Thursday, March 8, 12
Embedded Security Controls
• Controls baked into the “base AMI”
• Controls placed near the data
• Applied as machines die/reborn
Thursday, March 8, 12
Embedded Security Controls
• Controls baked into the “base AMI”
• Controls placed near the data
• Applied as machines die/reborn
• Security controls are “Data Center agnostic”
• Provide a “single pane of glass” awareness
• Span all regions, data centers
Thursday, March 8, 12
CISO ForumTake-Aways
Thursday, March 8, 12
CISO ForumTake-Aways
1. The public cloud / IaaS is not just a technology.
2. Cloud IaaS is disruptive to Operations, Engineering, Vendors, Auditors.
3. Your Data is your new perimeter.
4. Design for failures in everything.
5. IaaS providers care about their infrastructure.
6. Public cloud Information Security is still about the basics, but in a new context.
7. There’s still plenty left to resolve, like trusted infrastructure, strong key management, COTS support.
Thursday, March 8, 12
Questions
Thursday, March 8, 12
Questions
Thursday, March 8, 12