Post on 06-May-2015
description
Automa'ng Life in the Cloud
Joshua Buss & Ma+hew Kemp
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved. 2
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
(but if you find one let us know)
There are no Silver Bullets
3
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved. 4
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Have to go with what cloud provider offers. Not always ideal for every workload.
Designing for the Cloud
Virtual Machines
5
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Focus on scaling applica5ons horizontally.
Designing for the Cloud
Scalability
6
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Wikipedia Defini5on: SOA as an architecture relies on service-‐orienta3on as its fundamental design principle. If a service presents a simple interface that abstracts away its underlying complexity, users can access independent services without knowledge of the service's plaBorm implementa3on.
Layman’s terms: A complex system is broken into simple components that are able to interact with each other (and possibly outside sources).
Applica5on Design
Service Oriented Architecture
7
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Applica5on Design
Case Study: Services
8
tagserve datahub
database
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
When should you split services up?
Applica5on Design
Service Division of Labor
9
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Changes need to be allowed, but compa5bility needs to be maintained.
Applica5on Design
Backwards Compa5bility
10
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Dependent on needs of business, SLAs and technology. Can introduce more complexity and moving pieces.
Applica5on Design
Asynchronous vs Synchronous
11
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Some5mes the failures are outside your control.
Cloud Design
Failures Happen
12
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Keep failures self contained.
Cloud Design
Design for Failure
13
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Run a full stack in each region.
Applica5on Design
Case Study: Redundancy
14
tagserve datahub
database
tagserve datahub
database
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Applica5on Design
Inter-‐Region Communica5on
15
Need some data available in all regions, but keep inter-‐region communica5on to a minimum.
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Applica5on Design
Case Study: Cassandra
16
cassandra01 [0-‐63]
cassandra02 [64-‐127]
cassandra03 [128-‐191]
cassandra04 [192-‐255]
East
cassandra01 [1-‐64]
cassandra02 [65-‐128]
cassandra03 [129-‐192]
cassandra04 [193-‐0]
West
Key hashes to 157 Writes go here
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Provide mul5ple modes of opera5on.
Applica5on Design
Run5me Controls
17
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved. 18
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Deployment
Smooth Code Pushes
19
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Easy migra5ons and upgrade path.
Can be more expensive.
Deployment
Mirror Environment Cutover
20
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
More complicated migra5ons and upgrades. Longer deploy window. Usually cheaper.
Deployment
Rolling Deploy
21
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
for region in regions: for app in apps: for server in region: if app on server: maintenance app scp new code to <d_tag> dir symlink app/current to app/<d_tag> restart app wait for healthy
Deployment
Case Study: Fabric Rolling Deploy
22
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Fabric is push, puppet is pull. Businesses don't move as fast as infrastructure changes, but configs have to stay up to date all the 5me.
(/etc/hosts) (systempoller.py) (mashed_potatoes.env) (dataserver.war)
puppet ===================================== fabric (real-‐time up-‐to-‐date) (moderately up-‐to-‐date) (weekly)
Deployment
Fabric vs Puppet
23
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Need a rock-‐solid founda5on to deploy onto.
Puppet
Consistency > *
24
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Set environment per-‐instance: /etc/puppet/puppet.conf Symlink /etc/puppet/environments/ on master to various folders with read/write access by our main user. $ cd /etc/puppet/environments $ sudo ln –s ~/src/puppet/prod_stable $ sudo ln –s ~/src/puppet/stage_stable $ sudo ln –s ~/src/puppet/dev_test
Puppet
Single Puppet Master
25
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Each environment has its own branch. Make a new branch for every new feature. Merge into a test branch to test. Merge into stable.
Puppet
Source Controlled Puppet Configs
26
Confidential, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior written authorization. © 2011 BrightTag, Inc. All Rights Reserved.
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Management
What is Zerg?
28
+ =
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
DNS vs /etc/hosts
Management
How to Reach Servers?
29
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
$ curl –s 'http://zerg/etchosts/us-‐west-‐1' # The following lines are desirable for IPv6 capable hosts" ::1 ip6-‐localhost ip6-‐loopback fe00::0 ip6-‐localnet ff00::0 ip6-‐mcastprefix ff02::1 ip6-‐allnodes ff02::2 ip6-‐allrouters ff02::3 ip6-‐allhosts
10.0.0.10 server01 # External: 123.123.123.123 10.0.0.11 server02 # External: 123.123.123.124 10.0.0.12 server03 # External: 123.123.123.125 10.0.0.13 server04 # External: 123.123.123.126 10.0.0.14 server05 # External: 123.123.123.127 10.0.0.15 server06 # External: 123.123.123.128
Management
Case Study: /etc/hosts
30
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Amazon API is easy ... but just crea5ng the instance is 10% of the work
Gesng the right sotware. Surviving internal API failures. Staying hos5ng provider agnos5c.
Management
Instance Crea5on
31
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
ROLE_MAPPING = { "stage" : { "supercloud" : { "ops" : ["zerg"], "awesome" : ["awesome", "haproxy_awesome"], "shabang" : ["shabang", "mashed_potatoes", "haproxy_shabang"], "whistles" : ["gowhooo", "shabang", "thehardproblem", "redis"], "data" : ["dataserver", "dataleaf"], "nosql" : ["itshards", "devnull"], "lb" : ["haproxy"], "redis" : ["redis"], "graph" : ["graphite", "tattle"] }, "prod" : { "evenmoresupercloud" : { ... } } }
Management
Case Study: Instance Crea5on
32
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
/etc/instanceinfo filled out with values based on Zerg's defini5ons: § hostname § environment § region § roles § where's my local zerg server? § where's my local graphite server? § cloud meta info
Management
Case Study: Instance Configura5on
33
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Update 5ming tricky to get right. Too important to leave completely autonomous.
Management
Loadbalancer Configura5on
34
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Management
HAProxy Configura5on Workflow
35
Large changes to templates (human)
Git (ops)
Zerg (genera5on)
Script (human)
Git (puppet)
Server Server Server
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
APP_DEFS : { "zerg" : {"type": "http", "healthcheck": {"port": 19999, "resource": "/zerghealth"}}, "awesome" : {"type": "http", "healthcheck": {"port": 20000, "resource": "/ahc"} }, "haproxy_awesome" : {"type": "http", "healthcheck": {"port": 20001, "resource": "/"}}, "shabang" : {"type": "http", "healthcheck": {"port": 20002, "resource": "/"}}, "mashed_potatoes" : {"type": "http", "healthcheck": {"port": 20003, "resource": "/"}}, "haproxy_shabang" : {"type": "http", "healthcheck": {"port": 20004, "resource": "/hc"}}, "gowhooo" : {"type": "http", "healthcheck": {"port": 20005, "resource": "/"}}, "thehardproblem" : {"type": "http", "healthcheck": {"port": 20006, "resource": "/"}}, "redis" : { "type": "tcp", "healthcheck": {"port": 20007, "resource": "/rhc"}}, "dataserver" : { "type": "http", "healthcheck": {"port": 20008, "resource": "/"}}, "itshards" : { "type": "http", "healthcheck": {"port": 20009, "resource": "/"}}, "devnull" : { "type": "http", "healthcheck": {"port": 200010, "resource": "/hc"}}, "graphite" : { "type": "http", "healthcheck": {"port": 80, "resource": "/composer"}} }
Management
Applica5on Descrip5ons
36
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
$ curl -‐s 'http://zerg/haproxy/<env>/<region>/<service>' global
log 127.0.0.1 local0 log 127.0.0.1 local1 notice
stats socket /tmp/haproxy blah blah
defaults
log global mode http blah blah
frontend mashedpotatoes_vip bind *:30000 default_backend data backend mashedpotatoes blah blah options server shabang01 10.0.0.30:30001 check server shabang02 10.0.0.31:30001 check
Management
Case Study: HAProxy Configura5on
37
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
$ ./update_haproxy.sh <env> <region> <service> ** Git is clean and in sync with origin.. now waiting for zerg http response.. [prod_stable f4qijo] [puppetry] Haproxy Auto-‐Commit for <env> <region> <service> 1 files changed, 2 insertions(+), 2 deletions(-‐) ** Template pulled and committed ** Here is the diff from origin to the new version: diff -‐-‐git a/modules/haproxy/templates/haproxy_<env>_<region>_<service>_cfg.erb b/modules/haproxy/templates/haproxy_<env>_<region>_<service>_cfg.erb -‐-‐-‐ a/modules/haproxy/templates/haproxy_prod_us-‐east-‐1_tagserve_cfg.erb +++ b/modules/haproxy/templates/haproxy_prod_us-‐east-‐1_tagserve_cfg.erb -‐ oldyuckyserver01 -‐ oldyuckyserver02 + fastwonderfulnewserver01 + fastwonderfulnewserver02 ** Do you want to push this change? (y/n) y blah blah successful git push message ** Commit successfully pushed to origin ** All done!
Management
The Manual Step
38
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Best Prac5ces
Trust is a Luxury
39
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Best Prac5ces
Uniform Environments
40
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved. 41
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
How do you know what's going on?
Monitoring
Why monitor?
42
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Iden5fy metrics that act as signals. Add alerts ater every incident.
Monitoring
What to monitor?
43
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Need system level metrics and applica5on metrics to get full picture. Everything is in a different format.
Monitoring
Data Collec5on
44
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Monitoring
Case Study: Metric Polling at BrightTag
45
graphite carbon mpoller
tagserve haproxy
datahub redis
cassandra
graphite carbon
tagserve haproxy mpoller
datahub redis
mpoller cassandra mpoller
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Storage of historical metrics allows for trending and comparisons. Aggrega5on is performed on data retrieval via the webapp.
Monitoring
Graphite
46
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Expose a "metrics" service per region. Enables a flexible topology.
Monitoring
Branches and Leaves
47
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Monitoring
Real5me Numbers Across Regions
48
Requests are farmed out to each metrics service.
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Different visualiza5ons tell you different things.
Monitoring
Visualiza5on
49
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Dashboards provide at-‐a-‐glance high level overviews.
Monitoring
Red-‐Yellow-‐Green
50
Confiden5al, Property of BrightTag, Inc. Not to be disclosed, reproduced, or distributed without BrightTag, Inc.'s prior wri+en authoriza5on. © 2011 BrightTag, Inc. All Rights Reserved.
Ques5ons?
51