Experiences from DevOps production: Deployment, performance, failure.
-
Upload
server-density -
Category
Technology
-
view
717 -
download
0
description
Transcript of Experiences from DevOps production: Deployment, performance, failure.
Experiences from productionDeployment, performance, failure
David MyttonAll Your Base - Oct 2014
blog.serverdensity.com
David Mytton
serverdensity.com/allyourbase
Slides: twitter.com/davidmytton
Agenda
● Performance
● Architecture
● Downtime
● Preparation
● Where to host?
Server Density Architecture
Server Density Architecture
● ~100 servers - Ubuntu 12.04
Server Density Architecture
● ~100 servers - Ubuntu 12.04
● 50:50 virtual/dedicated
Server Density Architecture
● ~100 servers - Ubuntu 12.04
● 50:50 virtual/dedicated
● 200TB/m processed data
Server Density Architecture
● ~100 servers - Ubuntu 12.04
● 50:50 virtual/dedicated
● 200TB/m processed data
● Nginx, Python, MongoDB
Server Density Architecture
● ~100 servers - Ubuntu 12.04
● 50:50 virtual/dedicated
● 200TB/m processed data
● Nginx, Python, MongoDB
● Softlayer > 1TB RAM, 5TB SSDs
Two choices for deployment
Two choices for deployment
● Virtualized
● Bare metal
Advantages of virtualization
● Easy to manage
Advantages of virtualization
● Easy to manage
● Fast boot
Advantages of virtualization
● Easy to manage
● Fast boot
● Easier to resize/migrate
Advantages of virtualization
● Easy to manage
● Fast boot
● Easier to resize/migrate
● Templating/snapshots
Advantages of virtualization
● Easy to manage
● Fast boot
● Easier to resize/migrate
● Templating/snapshots
● Containment
Disadvantages of virtualization
● Another layer
Disadvantages of virtualization
● Another layer
● Hypervisor overhead
Disadvantages of virtualization
● Another layer
● Hypervisor overhead
● Host contention
Disadvantages of virtualization
● Another layer
● Hypervisor overhead
● Host contention
● i/o performance
Advantages of bare metal
● Dedicated resources
Advantages of bare metal
● Dedicated resources
● Direct access to hardware
Advantages of bare metal
● Dedicated resources
● Direct access to hardware
● Customisable specs
Advantages of bare metal
● Dedicated resources
● Direct access to hardware
● Customisable specs
● Performance
Disadvantages of bare metal
● Build/deploy time
Disadvantages of bare metal
● Build/deploy time
● More difficult to resize
Disadvantages of bare metal
● Build/deploy time
● More difficult to resize
● Difficult to migrate/snapshot
Disadvantages of bare metal
● Build/deploy time
● More difficult to resize
● Capex/lifetime
● Difficult to migrate/snapshot
Performance problems?
Performance problems?
Easy answer: move to bare metal!
Key performance factors
● Network
Key performance factors
● Network
● EC2: Cluster compute, high memory, high i/o, high storage
● GCE: Higher CPU instances
Key performance factors
● Network
Key performance factors
● Network
Location Ping RTT LatencyWithin USA 40-80msTrans-Atlantic 100msTrans-Pacific 150msEurope-Japan 300ms
Networking performance
AWS
GCE
bit.ly/googlevsamazon
Key performance factors
● Memory
http://blog.pythonisito.com/2011/12/mongodbs-write-lock.html
http://blog.pythonisito.com/2011/12/mongodbs-write-lock.html
Key performance factors
● Memory is expensive
Key performance factors
● Disk
● SSDs!
Key performance factors
● Disk
● SSDs!
GCE: 256GB = $83.20/m
EC2: 256GB = $35.32/m
SL: 200GB = $81/m
Why cloud?
● Flexible
Why cloud?
● Flexible
● Unlimited resources
Why cloud?
● Flexible
● Unlimited resources
● Cheap to get started
Why cloud?
● Flexible
● Unlimited resources
● Cheap to get started
● Other products
Why colo?
Why colo?
● Vastly cheaper
Why colo?
● Vastly cheaper
● Complete control
Let’s talk about downtime
2013 Spend: ~$5bn
2013 Spend: ~$6bn
2013 Spend: ~$4bn
You will have downtime
How much do you spend?
Preparation
Preparation - On Call
● Rotations
Preparation - On Call
● Off call
● Rotations
Preparation - On Call
● Off call
● Rotations
● Work the next day?
● Reachability - Train, 3G/4G (edge?!), Do Not Disturb mode, system updates
Preparation - On Call
● Off call
● Rotations
● Work the next day?
● Reachability - Train, 3G/4G (edge?!), Do Not Disturb mode, system updates
● Work the next day?
Preparation - Documentation
Preparation - Documentation
● Searchable
Preparation - Documentation
● Searchable
● Easy to edit
Preparation - Documentation
● Searchable
● Easy to edit
● Independent of your infrastructure
Preparation - Documentation
● Searchable
● Easy to edit
● Independent of your infrastructure
● Up to date
Unexpected failures
Unexpected failures
● Communication systems
Unexpected failures
● Communication systems
● Network connectivity
Unexpected failures
● Communication systems
● Network connectivity
● Access to support
ALERT!
ALERT!
1. Load up incident response checklist
ALERT!
1. Load up incident response checklist
2. Log incident in JIRA
ALERT!
1. Load up incident response checklist
2. Log incident in JIRA
3. Log into Ops War Room
ALERT!
1. Load up incident response checklist
2. Log incident in JIRA
4. Public status post
3. Log into Ops War Room
ALERT!
1. Load up incident response checklist
2. Log incident in JIRA
4. Public status post
5. Initial investigation
3. Log into Ops War Room
Key response principles
Key response principles
● Log everything
Key response principles
● Log everything
● Frequent public status updates
Key response principles
● Log everything
● Frequent public status updates
● Gather the team
Key response principles
● Log everything
● Frequent public status updates
● Gather the team
● Escalate!
Summary
● Performance
● Architecture
● Downtime
● Preparation
● Where to host?
どもありがとうございます
@davidmytton
blog.serverdensity.com
serverdensity.com/allyourbase