ARCHITECTING IN AWSfor resilience & cost at scale
Jos Boumans - @jiboumanshttp://rafaykhan619.wix.com/downhouse
Thursday 22 August 13
RIPE NCCEngineering manager for RIPE Database
http://www.ripe.net/db
Thursday 22 August 13
CANONICAL
http://lukeroberts.deviantart.com/art/Destroy-Ubuntu-93235775
Engineering manager for Ubuntu Server 10.04 & 10.10
http://www.ubuntu.com/business/server/overview
Thursday 22 August 13
KRUXVP of Operations & Infrastructure
http://www.krux.com/
Thursday 22 August 13
SOME OF OUR CUSTOMERS
Thursday 22 August 13
LOTS OF TRAFFIC
http://www.americapictures.net/buenos-aires-traffic-city-night-argentina.html
Thursday 22 August 13
AVERAGE REQUESTS* / SEC
http://mashable.com/2013/03/21/happy-7th-birthday-twitter/http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
*Twitter : New tweets Wikipedia: Articles readKrux: New data points
0 3,750 7,500 11,250 15,000
Thursday 22 August 13
MONTHLY UNIQUE USERS
0 200,000,000 400,000,000 600,000,000 800,000,000
http://en.wikipedia.org/wiki/Wikipedia http://mashable.com/2013/03/21/happy-7th-birthday-twitter/
Thursday 22 August 13
WE CHOSE 'THE CLOUD'
http://previewnetworks.com/blog/
Thursday 22 August 13
THERE ARE DOWNSIDES
http://modernsavage.hubpages.com/hub/10-springfield-shopper-headlinesThursday 22 August 13
RESILIENCE & COST AT SCALE
Thursday 22 August 13
FOCUS ON AWS
http://aws.amazon.com/
Thursday 22 August 13
APRIL 21, 2011
http://aws.amazon.com/message/680587/http://aws.amazon.com/message/680342/
http://aws.amazon.com/message/67457/http://aws.amazon.com/message/65648/
Also: June 29, 2012 - October 22, 2012 - December 24, 2012
http://businessnerds.wordpress.com/2011/05/28/so-far-so-good…-the-review/
Thursday 22 August 13
So#ware,)8)
Automa/on,)4)
Process,)14)
#"of"Issues"
Amazon"Cloud"Major"Outage"7"Issues"Categories"
ROOT CAUSE CATEGORIES
http://www.slideshare.net/rahultyagi50999/amazon-cloud-major-outages-analysis
Software bugs & human error
Thursday 22 August 13
JUNE 29, 2012
http://www.fanpop.com/spots/thunderstorm/images/25416163/title/thunderstorms-wallpaper http://aws.amazon.com/message/67457/
Thursday 22 August 13
AWS OUTAGE = YOUR OUTAGE
http://it.mario.wikia.com/wiki/Lakitu
Thursday 22 August 13
RESILIENCE @ SCALEEmbrace Failure: Hardware will fail. Humans will make errors.
Nature will produce thunderstorms.http://blabitcanada.com/category/twitter-2/
Thursday 22 August 13
DEFINE 'AVAILABLE'Things will break, so choose your degraded state.
http://libcom.org/library/occupied-wall-street-some-tactical-thoughts-malcolm-harris
Thursday 22 August 13
BASIC API CALL3 potential points of failure
Thursday 22 August 13
FALLBACK PATTERNSThe cost of resilience should be accuracy or latency
http://redis.io/http://memcached.org/
http://varnish-cache.org/Thursday 22 August 13
FALLBACK PATTERNSThe cost of resilience should be accuracy or latency
http://redis.io/http://memcached.org/
http://varnish-cache.org/Thursday 22 August 13
FALLBACK PATTERNSThe cost of resilience should be accuracy or latency
http://redis.io/http://memcached.org/
http://varnish-cache.org/Thursday 22 August 13
FALLBACK PATTERNSThe cost of resilience should be accuracy or latency
http://redis.io/http://memcached.org/
http://varnish-cache.org/Thursday 22 August 13
FALLBACK PATTERNSThe cost of resilience should be accuracy or latency
http://redis.io/http://memcached.org/
http://varnish-cache.org/Thursday 22 August 13
USER EXPERIENCEMy tweet got posted
Thursday 22 August 13
RESILIENCE TOOLSStorage, Network & ACL
http://wordyou.ru/kolonki/my-teper-ne-na-avrore-a-na-titanike.html
Thursday 22 August 13
MANY SMALL NODES VERSUS A FEW LARGER NODES
The benefits of the many outweigh the benefits of the fewhttp://www.stealingfaith.com/2012/07/08/throw-off-the-tiny-ropes/
Thursday 22 August 13
DATABASESCAP Theorem applies.
Your choice: sacrifice availability or consistency. Orange is a lie.
RDBMSBigTable Based
Master / Slave based
CouchDBDynamo Based
http://ferd.ca/beating-the-cap-theorem-checklist.html
Thursday 22 August 13
SIMPLE STORAGE SERVICES3: Arguably AWS' best feature
http://www.iwallpaper.us/gold-star-fo-christmas-wallpaper-140/http://aws.amazon.com/s3/
https://forums.aws.amazon.com/message.jspa?messageID=182919#182919Thursday 22 August 13
CACHE WHAT YOU CANHTTP Responses, DB Queries, User content
Browsers have caches too!http://cruncht.com/95/drupal-caching/
http://redis.io/http://memcached.org/
http://varnish-cache.org/Thursday 22 August 13
CLIENT SIDE STORAGEKeep a copy of your users data locally
http://www.w3.org/2001/tag/2010/09/ClientSideStorage.htmlhttp://www.wired.com/gadgetlab/2012/03/badass-gadget-ammo-lunch-box/
Thursday 22 August 13
USE ELASTIC LOAD BALANCERSThey will save you more than once
http://wallpapers5.com/wallpaper/Balance-Green-Tree-Frog/
Thursday 22 August 13
USE GLOBAL LOAD BALANCINGFail over to the closest data center on region failure
Thursday 22 August 13
SHOUT OUT: DYNDNS for Bit.ly, Quora, Twitter, Wikia, Fastly, etc
http://dyn.com
Thursday 22 August 13
USE IAM ROLES FOR ACCESSHumans make mistakes, including your humans
Thursday 22 August 13
COST @ SCALEScaling without breaking the bank
http://mgx.com/blogs/wp-content/uploads/2013/07/piggybank.jpg
Thursday 22 August 13
EMR + SPOT INSTANCESOn demand rate: $0.165 / hour
http://aws.amazon.com/ec2/spot-instances/
Thursday 22 August 13
AMAZON REDSHIFTEconomical Business Intelligence
Scales with data sizehttp://www.flitemedia.com/music.php
http://aws.amazon.com/redshifthttp://www.tableausoftware.com/
Thursday 22 August 13
AMAZON GLACIER"Tapes for the Cloud Era"
Writes vastly cheaper than readshttp://aws.amazon.com/glacier/http://www.gorp.com/parks-guide/glacier-national-park-outdoor-pp2-guide-cid350021.html
Thursday 22 August 13
AWS SIMPLE EMAIL SERVICEDealing with email is boring and time consuming
http://aws.amazon.com/ses/http://bfsdaniels.copycop.com/blog/all-about-printing/hypertargeting-with-direct-mail/
Thursday 22 August 13
AWS SIMPLE QUEUE SERVICEExcellent for latency insensitive, small volume queues
http://www.toledoblade.com/Retail/2013/01/13/Disney-s-magic-bracelet-new-key-to-its-kingdom.htmlhttp://aws.amazon.com/sqs/
http://colby.id.au/benchmarking-sqsThursday 22 August 13
INSTANCE MARKETPLACEBuy & sell reserved instances
http://commons.wikimedia.org/wiki/File:Javanese_market_place.jpg http://aws.amazon.com/ec2/reserved-instances/marketplace/
Thursday 22 August 13
AWS DYNAMO DBExcellent for small keys & high read rates
at known & consistent IOPShttp://hlbike.en.ecplaza.net/2.jpg http://aws.amazon.com/dynamodb/
Thursday 22 August 13
MAXIMIZE IOPSRAID 0 Ephemeral drives
use m1.xlarge or c1.xlarge, or use ssds if you need >20k IOPShttp://calculator.s3.amazonaws.com/calc5.html
http://blog.scalyr.com/2012/10/16/a-systematic-look-at-ec2-io/http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#disk-performance
Thursday 22 August 13
RED FLAGSAnti-patterns to watch out for
http://grandprix247.com/2012/09/03/spa-pile-up-renews-focus-on-formula-1-safety-matters/Thursday 22 August 13
PROVISIONED IOPS EBSEphemeral storage on c1/m1.xlarge or SSD is betterIf you must: m*large or c1.xlarge for dedicated NIC
http://www.slideshare.net/AmazonWebServices/ebs-mongo-dbwebinarfinal-nnhttp://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.htmlhttp://navidoo.ru/interest/Nasha_jizn/17676.html
Thursday 22 August 13
AWS DYNAMO DBFor high write rates or
large/variable keyshttp://aws.amazon.com/dynamodb/http://www.walltowall.co.uk/program/standing-tall-worlds-tallest-people_93.aspx
Thursday 22 August 13
HIGH IO/DISK/RAM NODESUse them deliberately
http://elledecoration.co.za/2010/07/gigantic/
Thursday 22 August 13
AWS CLOUDWATCHMetric collection, Amazon style
Cost prohibitive & resolution too lowhttp://www.flickr.com/photos/65683080@N08/6893582132/ http://aws.amazon.com/cloudwatch/
Thursday 22 August 13
LOWER COST PER METRICUse graphite & statsd
http://graphite.wikidot.com/https://github.com/etsy/statsd
Thursday 22 August 13
HOSTED ALTERNATIVESCirconus: All the insights you ever wanted
StackDriver : Optimized for AWShttp://circonus.com
http://stackdriver.com
Thursday 22 August 13
AWS CLOUDFORMATIONTemplatize your entire stack
Harder to use as complexity increaseshttp://aws.amazon.com/cloudwatch/http://fullnfenil7.blogspot.com/2012/05/amazing-cloud-shapes-photos.html#.UhKrZmRgZHg
Thursday 22 August 13
RDS FOR ANALYTICS/REPORTSPaying OLTP prices for BI usageSharding will be a matter of time
http://nerds.airbnb.com/redshift-performance-costhttp://business901.com/blog1/understanding-your-customer-problem/
Thursday 22 August 13
Q & A
http://vickicaruana.blogspot.com/2011/01/are-you-afraid-to-raise-your-hand.html
@jiboumanshttp://slideshare.net/jiboumans
Thursday 22 August 13
Top Related