AWS Summit Benelux 2013 - Architecting for High Availability
-
Upload
amazon-web-services -
Category
Technology
-
view
976 -
download
2
Transcript of AWS Summit Benelux 2013 - Architecting for High Availability
ARCHITECTING FOR HIGH
AVAILABILITY
Carlos Conde Sr. Mgr. Solutions Architecture
“LET’S BUILD
A ________ WEB
APPLICATION”
“LET’S BUILD
A HIGHLY AVAILABLE
________ WEB
APPLICATION”
“LET’S BUILD
A HIGHLY AVAILABLE
AND SCALABLE
________ WEB
APPLICATION”
“LET’S BUILD A HIGHLY AVAILABLE,
DURABLE AND SCALABLE
________ WEB APPLICATION”
“LET’S BUILD A HIGHLY AVAILABLE, DURABLE, RESILIENT
AND SCALABLE ________ WEB APPLICATION”
AWS BUILDING BLOCKS Inherently Fault-Tolerant Services Fault-Tolerant with the
right architecture Amazon S3
Amazon DynamoDB
Amazon CloudFront
Amazon SWF
Amazon SQS
Amazon SNS
Amazon SES
Amazon Route53
Elastic Load Balancing
AWS IAM
AWS Elastic Beanstalk
Amazon ElastiCache
Amazon EMR
Amazon Redshift
Amazon CloudSearch
Amazon EC2
Amazon EBS
Amazon RDS
Amazon VPC
1. DESIGN FOR FAILURE
2. USE MULTIPLE AZs
3. BUILD FOR SCALE
4. DECOUPLE COMPONENTS
« Everything fails all the time »
Werner Vogels
CTO of Amazon
YOUR GOAL
APPLICATIONS SHOULD CONTINUE TO FUNCTION
EVEN IF THE UNDERLYING PHYSICAL HARDWARE
FAILS OR IS REMOVED OR REPLACED
#1 DESIGN FOR FAILURE
AVOID SINGLE POINTS OF
FAILURE
ASSUME EVERYTHING FAILS,
AND WORK BACKWARDS
AVOID SINGLE POINTS OF
FAILURE
ASSUME EVERYTHING FAILS,
AND WORK BACKWARDS
HEALTH CHECKS
#2 USE MULTIPLE
AVAILABILITY ZONES
US-WEST (N. California) EU-WEST (Ireland)
ASIA PAC (Tokyo)
ASIA PAC
(Singapore)
US-WEST (Oregon)
SOUTH AMERICA (Sao Paulo)
US-EAST (Virginia)
GOV CLOUD
ASIA PAC (Sidney)
AMAZON RDS
MULTI-AZ
#3 BUILD FOR SCALE
AMAZON
CLOUDWATCH MONITORING FOR AWS RESOURCES
AUTO SCALING SCALE UP/DOWN EC2 CAPACITY
HEALTH CHECKS
+ AUTO SCALING
HEALTH CHECKS
+ AUTO SCALING
=
SELF-HEALING
#4 DECOUPLE COMPONENTS
BUILD LOOSELY
COUPLED SYSTEMS
The looser they are coupled,
the bigger they scale,
the more fault tolerant they get…
PUBLISH
& NOTIFY RECEIVE TRANSCODE
AMAZON SQS SIMPLE QUEUE SERVICE
PUBLISH
& NOTIFY RECEIVE TRANSCODE
PUBLISH
& NOTIFY RECEIVE TRANSCODE
PUBLISH
& NOTIFY RECEIVE
PUBLISH
& NOTIFY RECEIVE TRANSCODE
ARCHITECTURE
DESIGN PATTERN
SQS VISIBILITY TIMEOUT
BUFFERING
CLOUDWATCH METRICS FOR AMAZON SQS
+ AUTO SCALING
PUBLISH
& NOTIFY RECEIVE TRANSCODE
PUBLISH
& NOTIFY RECEIVE TRANSCODE
CAT?
CHECK
IMAGE
TOO
BIG?
RESIZE
IMAGE
NO
YES NO
OMG, IT’S
A CAT!
TRANSCODE
CAT
CHECK
START
PUBLISH
& NOTIFY
STOP REJECT
CAT?
CHECK
IMAGE
TOO
BIG?
RESIZE
IMAGE
NO
YES NO
YES
TRANSCODE
CAT
CHECK
START
PUBLISH
& NOTIFY
STOP REJECT
CAT?
CHECK
IMAGE
TOO
BIG?
RESIZE
IMAGE
NO
YES NO
YES
TRANSCODE
CAT
CHECK
START
PUBLISH
& NOTIFY
STOP REJECT
TAKS
DECISIONS
HISTORY
TAKS
DECISIONS
HISTORY
STATELESS !
STATELESS SCALES
HORIZONTALLY
AMAZON SWF ENABLES RESILIENT, SCALABLE,
DISTRIBUTED WORKFLOWS
WORKFLOW ACTORS
DECIDERS COORDINATION LOGIC
1. Poll for work on a decision list Long polling: 60 seconds
2. Evaluate workflow execution history SWF sends full history in JSON format
3. Return decision to Amazon SWF Usually scheduling another task
WORKERS EXECUTION LOGIC
1. Poll for work on a specific task list Long polling: 60 seconds
2. Execute works, send heartbeats SWF sends input data from deciders
3. Return success / failure Detailed data can be provided to deciders
SWF IS WATCHING TRACKING:
Execution tracking Time to start, time to finish, …
Time to finish for overall workflow
Timeouts controlled for each of these (and more)
Heartbeats for long-running activities (optional)
Decider is informed of timeouts Schedule retries, “mitigation” strategies or cleanup tasks
NO NEW LANGUAGE
TO LEARN
YOUR CODE IS YOUR WORKFLOW LANGUAGE
AMAZON SWF MAINTAINS STATE
ALL HORIZONTAL SCALING
PATTERNS APPLY
CHAINED TASKS
WITHOUT DECISIONS?
USE AMAZON SQS
PUBLISH
& NOTIFY RECEIVE TRANSCODE
TASK GRAPH WITH DECISIONS?
USE AMAZON SWF
SANITY
CHECK
RECEIVE
DATA
CHECK
FORMAT
REJECT ADJUST
FORMAT
PUBLISH
& NOTIFY
GOOD
LONG
OK
SPAM
TRANSCODE
1. DESIGN FOR FAILURE
2. USE MULTIPLE AZs
3. BUILD FOR SCALE
4. DECOUPLE COMPONENTS
YOUR GOAL
APPLICATIONS SHOULD CONTINUE TO FUNCTION
EVEN IF THE UNDERLYING PHYSICAL HARDWARE
FAILS OR IS REMOVED OR REPLACED
AWS ARCHITECTURE CENTER http://aws.amazon.com/architecture
AWS TECHNICAL ARTICLES http://aws.amazon.com/articles
AWS BLOG http://aws.typepad.com
AWS PODCAST http://aws.amazon.com/podcast
THANK YOU!