Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013
-
Upload
adrian-cockcroft -
Category
Technology
-
view
7.976 -
download
1
description
Transcript of Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013
![Page 1: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/1.jpg)
Cloud Native at NetflixWhat Changed?
July 2013Adrian Cockcroft
@adrianco #netflixcloud @NetflixOSShttp://www.linkedin.com/in/adriancockcroft
![Page 2: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/2.jpg)
Cloud Native
Netflix Architecture
NetflixOSS
![Page 3: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/3.jpg)
Cloud Native
What is it?Why?
![Page 4: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/4.jpg)
Engineers
Solve hard problemsBuild amazing and complex things
Fix things when they break
![Page 5: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/5.jpg)
Strive for perfection
Perfect codePerfect hardware
Perfectly operated
![Page 6: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/6.jpg)
But perfection takes too long…
Compromises…Time to market vs. Quality
Utopia remains out of reach
![Page 7: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/7.jpg)
Where time to market wins big
Making a land-grabDisrupting competitors (OODA)
Anything delivered as web services
![Page 8: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/8.jpg)
Observe
Orient
Decide
Act
Land grab opportunity Competitive
move
Customer Pain Point
Analysis
Get buy-in
Plan response
Commit resources
Implement
Deliver
Engage customers
Research alternatives
BIG DATA
INNOVATION
CULTURE
CLOUD
Measure customers
Colonel Boyd, USAF
“Get inside your adversaries'
OODA loop to disorient them”
![Page 9: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/9.jpg)
How Soon?
Code features in days instead of monthsGet hardware in minutes instead of weeks
Incident response in seconds instead of hours
![Page 10: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/10.jpg)
A new engineering challenge
Construct a highly agile and highly available service from ephemeral and
assumed broken components
![Page 11: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/11.jpg)
Inspiration
![Page 12: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/12.jpg)
How to get to Cloud Native
Freedom and Responsibility for DevelopersDecentralize and Automate Ops Activities
Integrate DevOps into the Business Organization
Re-Org!
![Page 13: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/13.jpg)
Four Transitions
• Management: Integrated Roles in a Single Organization– Business, Development, Operations -> BusDevOps
• Developers: Denormalized Data – NoSQL– Decentralized, scalable, available, polyglot
• Responsibility from Ops to Dev: Continuous Delivery– Decentralized small daily production updates
• Responsibility from Ops to Dev: Agile Infrastructure - Cloud– Hardware in minutes, provisioned directly by developers
![Page 14: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/14.jpg)
Netflix BusDevOps OrganizationChief Product
Officer
VP Product Management
Directors Product
VP UI Engineering
Directors Development
Developers + DevOps
UI Data Sources
AWS
VP Discovery Engineering
Directors Development
Developers + DevOps
Discovery Data Sources
AWS
VP Platform
Directors Platform
Developers + DevOps
Platform Data Sources
AWS
Denormalized, independently updated and scaled data
Cloud, self service updated & scaled infrastructure
Code, independently updated continuous delivery
![Page 15: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/15.jpg)
Decentralized Deployment
![Page 16: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/16.jpg)
Asgard Developer Portalhttp://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
![Page 17: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/17.jpg)
Ephemeral Instances
• Largest services are autoscaled• Average lifetime of an instance is 36 hours
Push
Autoscale UpAutoscale Down
![Page 18: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/18.jpg)
Netflix Streaming
A Cloud Native Application based on an open source platform
![Page 19: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/19.jpg)
Netflix Member Web Site Home PagePersonalization Driven – How Does It Work?
![Page 20: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/20.jpg)
How Netflix Streaming Works
Customer Device (PC, PS3, TV…)
Web Site or Discovery API
User Data
Personalization
Streaming API
DRM
QoS Logging
OpenConnect CDN Boxes
CDN Management and
Steering
Content Encoding
Consumer Electronics
AWS Cloud Services
CDN Edge Locations
![Page 21: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/21.jpg)
Nov2012StreamingBandwidth
March2013
MeanBandwidth+39% 6mo
![Page 22: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/22.jpg)
Real Web Server Dependencies Flow(Netflix Home page business transaction as seen by AppDynamics)
Start Here
memcached
Cassandra
Web service
S3 bucket
Personalization movie group choosers (for US, Canada and Latam)
Each icon is three to a few hundred instances across three AWS zones
![Page 23: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/23.jpg)
Three Balanced Availability ZonesTest with Chaos Gorilla
Cassandra and Evcache Replicas
Zone A
Cassandra and Evcache Replicas
Zone B
Cassandra and Evcache Replicas
Zone C
Load Balancers
Chaos Gorilla
![Page 24: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/24.jpg)
Isolated Regions
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
US-East Load Balancers
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
EU-West Load Balancers
![Page 25: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/25.jpg)
Cross Region Use Cases
• Geographic Isolation– US to Europe replication of subscriber data– Read intensive, low update rate– Production use since late 2011
• Redundancy for regional failover– US East to US West replication of everything– Includes write intensive data, high update rate– Testing now
![Page 26: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/26.jpg)
Benchmarking Global CassandraWrite intensive test of cross region replication capacity
16 x hi1.4xlarge SSD nodes per zone = 96 total192 TB of SSD in six locations up and running Cassandra in 20 min
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
US-West-2 Region - Oregon
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
US-East-1 Region - Virginia
Test Load
Test Load
Validation Load
Inter-Zone Traffic
1 Million writesCL.ONE (wait for one replica to ack)
1 Million readsAfter 500msCL.ONE with noData loss
Inter-Region TrafficUp to 9Gbits/s, 83ms 18TB
backups from S3
![Page 27: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/27.jpg)
Managing Multi-Region Availability
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
Regional Load Balancers
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
Regional Load Balancers
UltraDNSDynECT
DNS
AWS Route53
Denominator – manage traffic via multiple DNS providers with Java code2013 Timeline - Concept Jan, Code Feb, OSS March, Production use May
Denominator
![Page 28: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/28.jpg)
Incidents – Impact and Mitigation
PRX Incidents
CSXX Incidents
Metrics impact – Feature disableXXX Incidents
No Impact – fast retry or automated failoverXXXX Incidents
Public Relations Media Impact
High Customer Service Calls
Affects AB Test Results
Y incidents mitigated by Active Active, game day practicing
YY incidents mitigated by
better tools and practices
YYY incidents mitigated by better
data tagging
![Page 29: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/29.jpg)
Cloud Security
Automated attack surface monitoringCrypto key store management (CloudHSM)
Scale to resist DDOS attackshttp://www.slideshare.net/jason_chan/resilience-and-security-scale-lessons-learned
![Page 30: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/30.jpg)
What Changed?
“Impossible” deployments are easyJointly building code with vendors in public
Highly available and secure despite scale and speed
![Page 31: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/31.jpg)
The DIY Question
Why doesn’t Netflix build and run its own cloud?
![Page 32: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/32.jpg)
Fitting Into Public Scale
Public Grey Area Private
1,000 Instances 100,000 Instances
Netflix FacebookStartups
![Page 33: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/33.jpg)
How big is Public?
AWS upper bound estimate based on the number of public IP AddressesEvery provisioned instance gets a public IP by default (some VPC don’t)
AWS Maximum Possible Instance Count 4.2 Million – May 2013Growth >10x in Three Years, >2x Per Annum - http://bit.ly/awsiprange
![Page 34: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/34.jpg)
A Cloud Native Open Source PlatformSee netflix.github.com
![Page 35: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/35.jpg)
Establish our solutions as Best
Practices / Standards
Hire, Retain and Engage Top Engineers
Build up Netflix Technology Brand
Benefit from a shared ecosystem
Goals
![Page 36: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/36.jpg)
Example Application – RSS Reader
ZUUL
Zuul TrafficProcessing and Routing
![Page 37: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/37.jpg)
Ice – Detailed AWS “Chargeback”http://techblog.netflix.com/2013/06/announcing-ice-cloud-spend-and-usage.html
![Page 38: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/38.jpg)
Boosting the @NetflixOSS EcosystemSee netflix.github.com
![Page 39: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/39.jpg)
More Use Cases
More Features
Better portability
Higher availability
Easier to deploy
Contributions from end users
Contributions from vendors
What’s Coming Next?
![Page 40: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/40.jpg)
Vendor Driven PortabilityInterest in using NetflixOSS for Enterprise Private Clouds
“It’s done when it runs Asgard”Functionally completeDemonstrated MarchReleased June in V3.3
Offering $10K prize for integration workVendor and end user interestOpenstack “Heat” getting therePaypal C3 Console based on Asgard
![Page 41: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/41.jpg)
Functionality and scale now, portability coming
Moving from parts to a platform in 2013
Netflix is fostering a cloud native ecosystem
Rapid Evolution - Low MTBIAMSH(Mean Time Between Idea And Making Stuff Happen)
![Page 42: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/42.jpg)
Slideshare.net/Netflix Details• Meetup S1E3 July – Featuring Contributors Eucalyptus, IBM, Paypal, Riot Games
– http://techblog.netflix.com/2013/07/netflixoss-meetup-series-1-episode-3.html
• Lightning Talks March S1E2– http://www.slideshare.net/RuslanMeshenberg/netflixoss-meetup-lightning-talks-and-roadmap
• Lightning Talks Feb S1E1– http://www.slideshare.net/RuslanMeshenberg/netflixoss-open-house-lightning-talks
• Asgard In Depth Feb S1E1– http://www.slideshare.net/joesondow/asgard-overview-from-netflix-oss-open-house
• Security Architecture– http://www.slideshare.net/jason_chan/resilience-and-security-scale-lessons-learned/
• Cost Aware Cloud Architectures – with Jinesh Varia of AWS– http://www.slideshare.net/AmazonWebServices/building-costaware-architectures-jinesh-varia-aw
s-and-adrian-cockroft-netflix
![Page 43: Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013](https://reader036.fdocuments.us/reader036/viewer/2022062702/554a08c6b4c9055b7a8b5807/html5/thumbnails/43.jpg)
What Changed?
Speed wins, Cloud Native helps you get there
NetflixOSS makes it easier for everyone to become Cloud Native
http://netflix.github.comhttp://techblog.netflix.comhttp://slideshare.net/Netflix
http://www.linkedin.com/in/adriancockcroft
@adrianco #netflixcloud @NetflixOSS