Cloud Native at Netflix: What Changed? - Gartner Catalyst 2013
Netflix Global Cloud Architecture
-
Upload
adrian-cockcroft -
Category
Technology
-
view
22.134 -
download
9
description
Transcript of Netflix Global Cloud Architecture
Globally Distributed Cloud Applica4ons at Ne7lix
October 2012 Adrian Cockcro3 @adrianco #ne6lixcloud
h;p://www.linkedin.com/in/adriancockcro3
Adrian Cockcro3 • Director, Architecture for Cloud Systems, Ne6lix Inc.
– Previously Director for PersonalizaMon Pla6orm
• DisMnguished Availability Engineer, eBay Inc. 2004-‐7 – Founding member of eBay Research Labs
• DisMnguished Engineer, Sun Microsystems Inc. 1988-‐2004 – 2003-‐4 Chief Architect High Performance Technical CompuMng – 2001 Author: Capacity Planning for Web Services – 1999 Author: Resource Management – 1995 & 1998 Author: Sun Performance and Tuning – 1996 Japanese EdiMon of Sun Performance and Tuning
• SPARC & Solarisパフォーマンスチューニング (サンソフトプレスシリーズ)
• More – Twi;er @adrianco – Blog h;p://perfcap.blogspot.com – PresentaMons at h;p://www.slideshare.net/adrianco
The Ne6lix Streaming Service
Now in USA, Canada, LaMn America, UK, Ireland, Sweden, Denmark,
Norway and Finland
US Non-‐Member Web Site AdverMsing and MarkeMng Driven
Member Web Site PersonalizaMon Driven
Streaming Device API
Netflix Ready DevicesFrom: May 2008
To: May 2010
Content Delivery Service Distributed storage nodes controlled by Ne6lix cloud services
Abstract
• Ne6lix on Cloud – What, Why and When
• Globally Distributed Architecture
• Open Source Components
Why Use Cloud?
Things we don’t do
What Ne6lix Did
• Moved to SaaS – Corporate IT – OneLogin, Workday, Box, Evernote… – Tools – Pagerduty, AppDynamics, EMR (Hadoop)
• Built our own PaaS – Customized to make our developers producMve – Large scale, global, highly available, leveraging AWS
• Moved incremental capacity to IaaS – No new datacenter space since 2008 as we grew – Moved our streaming apps to the cloud
Keeping up with Developer Trends
• Big Data/Hadoop • AWS Cloud • ApplicaMon Performance Management • Integrated DevOps PracMces • ConMnuous IntegraMon/Delivery • NoSQL • Pla6orm as a Service; Fine grain SOA • Social coding, open development/github
In producMon at Ne6lix
2009 2009 2010 2010 2010 2010 2010 2011
AWS specific feature dependence….
Portability vs. FuncMonality
• Portability – the OperaMons focus – Avoid vendor lock-‐in – Support datacenter based use cases – Possible operaMons cost savings
• FuncMonality – the Developer focus – Less complex test and debug, one mature supplier – Faster Mme to market for your products – Possible developer Mme/cost savings
FuncMonal PaaS
• IaaS base -‐ all the features of AWS – Very large scale, mature, global, evolving rapidly – ELB, Autoscale, VPC, SQS, EIP, EMR, etc, etc. – E.g. Large files (TB) and mulMpart writes in S3
• FuncMonal PaaS – Ne6lix added features – ConMnuous build/deploy, SOA, HA pa;erns – Asgard console, Monkeys, Big data tools – Cassandra/Zookeeper data store automaMon
How Ne6lix Works
Customer Device (PC, PS3, TV…)
Web Site or Discovery API
User Data
PersonalizaMon
Streaming API
DRM
QoS Logging
OpenConnect CDN Boxes
CDN Management and
Steering
Content Encoding
Consumer Electronics
AWS Cloud Services
CDN Edge LocaMons
Component Services (Simplified view using AppDynamics)
Web Server Dependencies Flow (Home page business transacMon as seen by AppDynamics)
Start Here
memcached
Cassandra
Web service
S3 bucket
One Request Snapshot (captured because it was unusually slow)
Current Architectural Pa;erns for Availability
• Isolated Services – Resilient Business logic
• Three Balanced Availability Zones – Resilient to Infrastructure outage
• Triple Replicated Persistence – Durable distributed Storage
• Isolated Regions – US and EU don’t take each other down
Isolated Services Test With Chaos Monkey, Latency Monkey
Three Balanced Availability Zones Test with Chaos Gorilla
Cassandra and Evcache Replicas
Zone A
Cassandra and Evcache Replicas
Zone B
Cassandra and Evcache Replicas
Zone C
Load Balancers
Triple Replicated Persistence Cassandra maintenance affects individual replicas
Cassandra and Evcache Replicas
Zone A
Cassandra and Evcache Replicas
Zone B
Cassandra and Evcache Replicas
Zone C
Load Balancers
Isolated Regions
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
US-‐East Load Balancers
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
EU-‐West Load Balancers
Failure Mode Probability Mi4ga4on Plan
ApplicaMon Failure High AutomaMc degraded response
AWS Region Failure Low Wait for region to recover
AWS Zone Failure Medium ConMnue to run on 2 out of 3 zones
Datacenter Failure Medium Migrate more funcMons to cloud
Data store failure Low Restore from S3 backups
S3 failure Low Restore from remote archive
Failure Modes and Effects
Ne6lix Deployed on AWS
Content
Content Management
EC2 Encoding
S3 Petabytes
Logs
S3 Terabytes
EMR
Hive & Pig
Business Intelligence
Play
DRM
CDN rouMng
Bookmarks
Logging
WWW
Sign-‐Up
Search Solr
Movie Choosing
RaMngs
API
Metadata
Device Config
TV Movie Choosing
Social Facebook
CS
InternaMonal CS lookup
DiagnosMcs & AcMons
Customer Call Log
CS AnalyMcs
2009 2009 2010 2010 2010 2011
CDNs ISPs
Terabits Customers
Cloud Architecture Pa;erns
Where do we start?
Datacenter to Cloud TransiMon Goals
• Faster – Lower latency than the equivalent datacenter web pages and API calls – Measured as mean and 99th percenMle – For both first hit (e.g. home page) and in-‐session hits for the same user
• Scalable – Avoid needing any more datacenter capacity as subscriber count increases – No central verMcally scaled databases – Leverage AWS elasMc capacity effecMvely
• Available – SubstanMally higher robustness and availability than datacenter services – Leverage mulMple AWS availability zones – No scheduled down Mme, no central database schema to change
• ProducMve – OpMmize agility of a large development team with automaMon and tools – Leave behind complex tangled datacenter code base (~8 year old architecture) – Enforce clean layered interfaces and re-‐usable components
Ne6lix Datacenter vs. Cloud Arch
Central SQL Database Distributed Key/Value NoSQL
SMcky In-‐Memory Session Shared Memcached Session
Cha;y Protocols Latency Tolerant Protocols
Tangled Service Interfaces Layered Service Interfaces
Instrumented Code Instrumented Service Pa;erns
Fat Complex Objects Lightweight Serializable Objects
Components as Jar Files Components as Services
Cassandra on AWS
A highly available and durable deployment pa;ern
Cassandra Service Pa;ern Cassandra Cluster Managed by Priam Between 6 and 72 nodes
Data Access REST Service Astyanax Cassandra Client
Datacenter Update Flow
Service REST Clients
Appdynamics Service Flow VisualizaMon
ProducMon Deployment Totally Denormalized Data Model
Over 50 Cassandra Clusters Over 500 nodes Over 30TB of daily backups Biggest cluster 72 nodes 1 cluster over 250Kwrites/s
Astyanax -‐ Cassandra Write Data Flows Single Region, MulMple Availability Zone, Token Aware
Token Aware Clients
Cassandra • Disks • Zone A
Cassandra • Disks • Zone B
Cassandra • Disks • Zone C
Cassandra • Disks • Zone A
Cassandra • Disks • Zone B
Cassandra • Disks • Zone C
1. Client Writes to local coordinator
2. Coodinator writes to other zones
3. Nodes return ack 4. Data wri;en to
internal commit log disks (no more than 10 seconds later)
If a node goes offline, hinted handoff completes the write when the node comes back up. Requests can choose to wait for one node, a quorum, or all nodes to ack the write SSTable disk writes and compacMons occur asynchronously
14
4
42
3
3 3
2
Data Flows for MulM-‐Region Writes Token Aware, Consistency Level = Local Quorum
1. Client writes to local replicas 2. Local write acks returned to
Client which conMnues when 2 of 3 local nodes are commi;ed
3. Local coordinator writes to remote coordinator.
4. When data arrives, remote coordinator node acks and copies to other remote zones
5. Remote nodes ack to local coordinator
6. Data flushed to internal commit log disks (no more than 10 seconds later)
If a node or region goes offline, hinted handoff completes the write when the node comes back up. Nightly global compare and repair jobs ensure everything stays consistent.
US Clients
Cassandra • Disks • Zone A
Cassandra • Disks • Zone B
Cassandra • Disks • Zone C
Cassandra • Disks • Zone A
Cassandra • Disks • Zone B
Cassandra • Disks • Zone C
EU Clients
Cassandra • Disks • Zone A
Cassandra • Disks • Zone B
Cassandra • Disks • Zone C
Cassandra • Disks • Zone A
Cassandra • Disks • Zone B
Cassandra • Disks • Zone C
6
5
5
6 6 4
4 4
1 6
6
6 2
2
2 3
100+ms latency
ETL for Cassandra
• Data is de-‐normalized over many clusters! • Too many to restore from backups for ETL • SoluMon – read backup files using Hadoop • Aegisthus
– h;p://techblog.ne6lix.com/2012/02/aegisthus-‐bulk-‐data-‐pipeline-‐out-‐of.html
– High throughput raw SSTable processing – Re-‐normalizes many clusters to a consistent view – Extract, Transform, then Load into Teradata
Benchmarks and Scalability
Cloud Deployment Scalability New Autoscaled AMI – zero to 500 instances from 21:38:52 -‐ 21:46:32, 7m40s
Scaled up and down over a few days, total 2176 instance launches, m2.2xlarge (4 core 34GB)
Min. 1st Qu. Median Mean 3rd Qu. Max. !41.0 104.2 149.0 171.8 215.8 562.0!
Scalability from 48 to 288 nodes on AWS h;p://techblog.ne6lix.com/2011/11/benchmarking-‐cassandra-‐scalability-‐on.html
174373
366828
537172
1099837
0
200000
400000
600000
800000
1000000
1200000
0 50 100 150 200 250 300 350
Client Writes/s by node count – Replica4on Factor = 3
Used 288 of m1.xlarge 4 CPU, 15 GB RAM, 8 ECU Cassandra 0.86 Benchmark config only existed for about 1hr
Cassandra on AWS
The Past • Instance: m2.4xlarge • Storage: 2 drives, 1.7TB • CPU: 8 Cores, 26 ECU • RAM: 68GB • Network: 1Gbit • IOPS: ~500 • Throughput: ~100Mbyte/s • Cost: $1.80/hr
The Future • Instance: hi1.4xlarge • Storage: 2 SSD volumes, 2TB • CPU: 8 HT cores, 35 ECU • RAM: 64GB • Network: 10Gbit • IOPS: ~100,000 • Throughput: ~1Gbyte/s • Cost: $3.10/hr
Cassandra Disk vs. SSD Benchmark Same Throughput, Lower Latency, Half Cost
Availability and Resilience
Chaos Monkey h;p://techblog.ne6lix.com/2012/07/chaos-‐monkey-‐released-‐into-‐wild.html • Computers (Datacenter or AWS) randomly die
– Fact of life, but too infrequent to test resiliency • Test to make sure systems are resilient
– Allow any instance to fail without customer impact
• Chaos Monkey hours – Monday-‐Friday 9am-‐3pm random instance kill
• ApplicaMon configuraMon opMon – Apps now have to opt-‐out from Chaos Monkey
Responsibility and Experience
• Make developers responsible for failures – Then they learn and write code that doesn’t fail
• Use Incident Reviews to find gaps to fix – Make sure its not about finding “who to blame”
• Keep Mmeouts short, fail fast – Don’t let cascading Mmeouts stack up
• Make configuraMon opMons dynamic – You don’t want to push code to tweak an opMon
Resilient Design – Circuit Breakers h;p://techblog.ne6lix.com/2012/02/fault-‐tolerance-‐in-‐high-‐volume.html
Distributed OperaMonal Model
• Developers – Provision and run their own code in producMon – Take turns to be on call if it breaks (pagerduty) – Configure autoscalers to handle capacity needs
• DevOps and PaaS (aka NoOps) – DevOps is used to build and run the PaaS – PaaS constrains Dev to use automaMon instead – PaaS puts more responsibility on Dev, with tools
Culture
UnconvenMonal Culture See culture deck at h;p://jobs.ne6lix.com
• Brave/Aggressive from the top down • Focus on talent density above everything • Reduce process, remove complexity • Freedom and Responsibility • One product focus for the whole company • (almost) full informaMon sharing across co. • Simplified managers role
Managers Role
• Hiring, Architecture, Project Management • No vacaMon policy to track • (Almost) no remote employees or contractors • No bonuses to allocate • No expenses to approve • Pay mark to market handled at VP level
Ne6lix OrganizaMon DevOps Org ReporMng into Product Group, not ITops
CEO – Reed HasMngs
CPO – Chief Product Officer – Neil Hunt
VP -‐ Cloud and Pla6orm Engineering -‐ Yury
Architecture
Future planning Security Arch Efficiency
AWS VPC Hyperguard
Powerpoint J
Pla6orm and Persistence Engineering
Base Pla6orm Zookeeper
Cassandra Ops
AWS Instances
Cloud SoluMons
Monitoring Monkeys Build Tools
AWS Instances AWS API
Cloud Ops Reliability Engineering
Alert RouMng Incident Lifecycle
PagerDuty
PersonalizaMon Pla6orm and
Performance Eng
Metadata Benchmarking Memcached
AWS Instances
Membership and Billing
Data sources Vault processing
Cassandra
Data Science Pla6orm
Business Intelligence
Hadoop on EMR
Build Your Own PaaS
Components
• ConMnuous build framework turns code into AMIs • AWS accounts for test, producMon, etc. • Cloud access gateway • Service registry • ConfiguraMon properMes service • Persistence services • Monitoring, alert forwarding • Backups, archives
Ne6lix Open Source Strategy
• Release PaaS Components git-‐by-‐git – Source at github.com/ne6lix – we build from it… – Intros and techniques at techblog.ne6lix.com – Blog post or new code every few weeks
• MoMvaMons – Give back to Apache licensed OSS community – MoMvate, retain, hire top engineers – “Peer pressure” code cleanup, external contribuMons
Instance creaMon
ASG / Instance started Instance Running
Asgard
Autoscaling scripts Odin
Bakery & Build tools
Base AMI
ApplicaMon Code
Instance
Image baked
ApplicaMon Launch
Registering, configuraMon
Eureka
Entrypoints Archaius
Governator (Guice)
Async logging
Servo
ApplicaMon iniMalizing
RunMme
Managing service
Resiliency aids
Priam
Exhibitor
Explorers
NIWS LB
Astyanax
Curator
Dependency Command
REST client
Chaos Monkey Latency Monkey Janitor Monkey Cass JMeter
Calling other services
Open Source Projects Github / Techblog
Apache ContribuMons
Techblog Post
Coming Soon
Priam Cassandra as a Service
Astyanax Cassandra client for Java
CassJMeter Cassandra test suite
Cassandra MulM-‐region EC2 datastore support
Aegisthus Hadoop ETL for Cassandra
Explorers
Governator -‐ Library lifecycle and dependency injecMon
Odin Workflow orchestraMon
Async logging
Exhibitor Zookeeper as a Service
Curator Zookeeper Pa;erns
EVCache Memcached as a Service
Eureka / Discovery Service Directory
Archaius Dynamics ProperMes Service
EntryPoints
Server-‐side latency/error injecMon
REST Client + mid-‐Mer LB
ConfiguraMon REST endpoints
Servo and Autoscaling Scripts
Honu Log4j streaming to Hadoop
Circuit Breaker Robust service pa;ern
Asgard -‐ AutoScaleGroup based AWS console
Chaos Monkey Robustness verificaMon
Latency Monkey
Janitor Monkey
Bakeries and AMI
Build dynaslaves
Legend
Roadmap for 2012
• More resiliency and improved availability • More automaMon, orchestraMon • “Hardening” the pla6orm, code clean-‐up • Lower latency for web services and devices • IPv6 – now running in prod, rollout in process • More open sourced components • See you at AWS Re:Invent in November…
Takeaway
Ne?lix has built and deployed a scalable global Pla?orm as a Service.
Key components of the Ne?lix PaaS are being released as Open Source projects so you can build your own custom PaaS.
h;p://github.com/Ne6lix h;p://techblog.ne6lix.com h;p://slideshare.net/Ne6lix
h;p://www.linkedin.com/in/adriancockcro3
@adrianco #ne6lixcloud
Amazon Cloud Terminology Reference See http://aws.amazon.com/ This is not a full list of Amazon Web Service features
• AWS – Amazon Web Services (common name for Amazon cloud) • AMI – Amazon Machine Image (archived boot disk, Linux, Windows etc. plus applicaMon code) • EC2 – ElasMc Compute Cloud
– Range of virtual machine types m1, m2, c1, cc, cg. Varying memory, CPU and disk configuraMons. – Instance – a running computer system. Ephemeral, when it is de-‐allocated nothing is kept. – Reserved Instances – pre-‐paid to reduce cost for long term usage – Availability Zone – datacenter with own power and cooling hosMng cloud instances – Region – group of Avail Zones – US-‐East, US-‐West, EU-‐Eire, Asia-‐Singapore, Asia-‐Japan, SA-‐Brazil, US-‐Gov
• ASG – Auto Scaling Group (instances booMng from the same AMI) • S3 – Simple Storage Service (h;p access) • EBS – ElasMc Block Storage (network disk filesystem can be mounted on an instance) • RDS – RelaMonal Database Service (managed MySQL master and slaves) • DynamoDB/SDB – Simple Data Base (hosted h;p based NoSQL datastore, DynamoDB replaces SDB) • SQS – Simple Queue Service (h;p based message queue) • SNS – Simple NoMficaMon Service (h;p and email based topics and messages) • EMR – ElasMc Map Reduce (automaMcally managed Hadoop cluster) • ELB – ElasMc Load Balancer • EIP – ElasMc IP (stable IP address mapping assigned to instance or ELB) • VPC – Virtual Private Cloud (single tenant, more flexible network and security constructs) • DirectConnect – secure pipe from AWS VPC to external datacenter • IAM – IdenMty and Access Management (fine grain role based security keys)