T1 – Architecting highly available applications on aws

Post on 15-Jan-2015

1.637 views 2 download

Tags:

description

 

Transcript of T1 – Architecting highly available applications on aws

AWS Summit 2014

Architecting Highly Available Applications on AWS

Alex Sinner Solutions Architect @alexsinner

Architecting Highly Available Applications on AWS

•  ME: Alex Sinner – AWS Solutions Architect •  YOU: Here to learn more about running highly

available, scalable Applications on AWS •  TODAY: about best practices and things to think

about when building for large scale

Going from 1 User to >10 Millions

So how do we scale?

Hi, I have NO IDEA what I am doing!!

a lot of things to read

not where we want to start

a lot of things to read

Auto Scaling is a tool. It’s not the single thing that

fixes everything.

What do we need first?

Some basics…

Regions US-WEST (Oregon)

EU-WEST (Ireland) ASIA PAC (Tokyo)

US-WEST (N. California)

SOUTH AMERICA (Sao Paulo)

US-EAST (Virginia)

AWS GovCloud (US)

ASIA PAC (Sydney)

ASIA PAC (Singapore)

CHINA (Beijing)

Availability Zones US-WEST (Oregon)

EU-WEST (Ireland) ASIA PAC (Tokyo)

US-WEST (N. California)

SOUTH AMERICA (Sao Paulo)

US-EAST (Virginia)

AWS GovCloud (US)

ASIA PAC (Sydney)

ASIA PAC (Singapore)

CHINA (Beijing)

Compute  Storage  &  Content  Delivery  

AWS  Global  Infrastructure  

Database  

App  Services  

Deployment  &  Administra=on  

Networking  

Service Reference Model

Compute  Storage  &  Content  Delivery  

AWS  Global  Infrastructure  

Database  

App  Services  

Deployment  &  Administra=on  

Networking  

Amazon CloudSearch

Amazon SQS

Amazon SNS

Amazon Elastic

Transcoder

Amazon SWF Amazon SES

Amazon DynamoDB

Amazon RDS

Amazon ElastiCache

Amazon RedShift

AWS Storage Gateway

Amazon S3

Amazon Glacier

Amazon CloudFront

Amazon CloudWatch AWS IAM AWS

CloudFormation Amazon Elastic

Beanstalk AWS Data

Pipeline

AWS OpsWorks

AWS CloudTrail

Amazon EC2

Amazon EMR

Amazon VPC

Amazon Route 53

AWS Direct

Connect

Amazon Kinesis

So let’s start from day one, user one ( you )

Day One, User One

•  A single EC2 Instance –  With full stack on this host

•  Web app •  Database •  Management •  Etc.

•  A single Elastic IP •  Route53 for DNS

EC2 Instance

Elastic IP

Amazon Route 53

User

“We’re gonna need a bigger box”

•  Simplest approach •  Can now leverage PIOPs •  High I/O instances •  High memory instances •  High CPU instances •  High storage instances •  Easy to change instance sizes •  Will hit an endpoint eventually

i2.4xlarge

m3.xlarge

m1.small

“We’re gonna need a bigger box”

•  Simplest approach •  Can now leverage PIOPs •  High I/O instances •  High memory instances •  High CPU instances •  High storage instances •  Easy to change instance sizes •  Will hit an endpoint eventually

i2.4xlarge

m3.xlarge

m1.small

Day One, User One: •  We could potentially get

to a few hundred to a few thousand depending on application complexity and traffic

•  No failover •  No redundancy •  Too many eggs in one

basket

EC2 instance

Elastic IP address

Amazon Route 53

User

Day One, User One: •  We could potentially get

to a few hundred to a few thousand depending on application complexity and traffic

•  No failover •  No redundancy •  Too many eggs in one

basket

EC2 instance

Elastic IP address

Amazon Route 53

User

Day Two, User >1: First, let’s separate out our single host into more than one: •  Web •  Database

–  Make use of a database service?

Web instance

Database instance

Elastic IP address

Amazon Route 53

User

Self-Managed Fully-Managed

Database server on Amazon EC2

Your choice of

database running on Amazon EC2

Bring Your Own License (BYOL)

Amazon DynamoDB

Managed NoSQL database service

using SSD storage

Seamless scalability Zero administration

Amazon RDS

Microsoft SQL, Oracle, MySQL or PostgreSQL as a managed service

Flexible licensing BYOL or License

Included

Amazon Redshift

Massively parallel,

petabyte-scale, data warehouse service

Fast, powerful and

easy to scale

Database Options

But how do I choose what DB technology I need? SQL? NoSQL?

Some people won’t like this. But…

Start with SQL databases

Why start with SQL? •  Established and well-worn technology •  Lots of existing code, communities, books, background,

tools, etc. •  You aren’t going to break SQL DBs in your first 10 million

users. No really, you won’t*. •  Clear patterns to scalability * Unless you are manipulating data at MASSIVE scale; even then, SQL will have a place in your stack

AH HA! You said “massive amounts”, I will have massive amounts!

If your usage is such that you will be generating several TB of data in the

first year OR have an incredibly data-intensive workload… you might

need NoSQL

Regardless, why NoSQL? •  Super low latency applications •  Metadata driven datasets •  Highly non-relational data •  Need schema-less data constructs* •  Massive amounts of data (again, in the TB range) •  Rapid ingest of data ( thousands of records/sec ) •  Already have skilled staff *Need != “it is easier to do dev without schemas”

But back to the main path… Let’s see how far SQL at the core

can grow

User >100 First let’s separate out our single host into more than one •  Web •  Database

–  Use RDS to make your life easier

Web Instance

Elastic IP

RDS DB Instance

Amazon Route 53

User

User > 1000 Next let’s address our lack of failover and redundancy issues •  Elastic Load Balancing •  Another web instance

–  In another Availability Zone

•  Enable Amazon RDS multi-AZ

Web Instance

RDS DB Instance Active (Multi-AZ) Availability Zone Availability Zone

Web Instance

RDS DB Instance Standby (Multi-AZ)

Elastic Load Balancing

Amazon Route 53

User

Scaling this horizontally and vertically

will get us pretty far ( 10s-100s of thousands )

User >10 ks–100 ks

RDS DB Instance Active (Multi-AZ)

Availability Zone Availability Zone

RDS DB Instance Standby (Multi-AZ)

Elastic Load Balancing

RDS DB Instance Read Replica

RDS DB Instance Read Replica

RDS DB Instance Read Replica

RDS DB Instance Read Replica

Web Instance

Web Instance

Web Instance

Web Instance

Web Instance

Web Instance

Web Instance

Web Instance

Amazon Route 53

User

Shift some load around: Let’s lighten the load on our web and database instances: •  Move static content from the

web instance to Amazon S3 and CloudFront

•  Move dynamic content from the load balancer to CloudFront

•  Move session/state and DB caching to ElastiCache or Amazon DynamoDB

Web instance

RDS DB Instance Active (Multi-AZ) Availability Zone

Elastic Load Balancer

Amazon S3

Amazon CloudFront

Amazon Route 53

User

ElastiCache

Amazon DynamoDB

Now let’s revisit the beginning of our talk…

Auto Scaling!

Automatic resizing of compute clusters based on demand

Trigger auto-scaling policy

Feature   Details  

Control   Define  minimum  and  maximum  instance  pool  sizes  and  when  scaling  and  cool  down  occurs  

Integrated  to  Amazon  CloudWatch  

Use  metrics  gathered  by  CloudWatch  to  drive  scaling  

Instance  types   Run  Auto  Scaling  for  On-­‐Demand  and  Spot  Instances;  compa=ble  with  VPC  

aws  autoscaling  create-­‐auto-­‐scaling-­‐group  -­‐-­‐auto-­‐scaling-­‐group-­‐name  MyGroup  -­‐-­‐launch-­‐configuration-­‐name  MyConfig  -­‐-­‐min-­‐size  4  -­‐-­‐max-­‐size  200  -­‐-­‐availability-­‐zones  us-­‐west-­‐2c  

Auto Scaling Amazon

CloudWatch

Auto Scaling can scale from one instance to thousands

and back down

User >500k+:

Availability Zone

Amazon Route 53

User

Amazon S3

Amazon CloudFront

Availability Zone

Elastic Load Balancing

Amazon DynamoDB RDS DB Instance

Read Replica

Web instance

Web instance

Web instance

ElastiCache RDS DB Instance Read Replica

Web instance

Web instance

Web instance

ElastiCache RDS DB Instance Standby (Multi-AZ)

RDS DB Instance Active (Multi-AZ)

ARCHITECTING DATA-DRIVEN MASS PRODUCED VIDEO

AWS SUMMIT 2014 | JUNE 10, 2014

JASPER JAGER SENIOR DEVELOPER AND AWS ARCHITECT REDNUN, AMSTERDAM

THIS IS WHAT WE DO

‣ Automatically mass produce data-driven, personalised or profiled video

‣ ING, KLM, Essent België, T-Mobile

‣ Run everything in AWS

‣ Small campaign, 25.000 personalised videos

‣ Self hosted 3x 8 core Xserves with 96GB RAM

‣ 300 videos an hour

HOW WE STARTED

PROBLEMS WITH THE OLD IN-HOUSE SETUP‣ 250.000 videos would take us 35 days

‣ Or we would have to buy more hardware

‣ Systems which would idle most of the time

‣ Storing and serving all videos - HELP

REBUILD REDNUN IN THE CLOUD‣ Ability to start 100’s of machines, based on preconfigured AMI

‣ High availability for our campaign sites, behind load balancers

‣ Big campaign, big Dutch lottery, 1.200.000

‣ Batched, pre-rendered videos, stored on S3

‣ Took us just a couple of days

SECOND INFRASTRUCTURE

AUTOSCALE EVERYTHING

‣ Automated daily flows, welcome video, birthday video etc.

‣ API, videos can be produced on the fly

‣ Autoscaling based on Cloudwatch metrics for web and app servers

‣ Custom autoscaling scripts for video rendering

‣ Use spot instances when available

LOOSE COUPLING

‣ Decoupled components

‣ Use SQS as a buffer

‣ Continuous monitoring and adjusting

AUTOMATE EVERYTHING

‣ Cloudformation and Opsworks

‣ Flexibility to start environment in different region

‣ Dev and QA environments

CURRENT INFRASTRUCTURE

THE THINGS WE’VE LEARNED‣ AWS service limits

‣ Autoscale on Cloudwatch or custom metrics

‣ Automate your infrastructure

‣ AWS can help you scale with ease

JASPER@REDNUN.NL WWW.REDNUN.NL

On Tools: Managing your infrastructure will become an ever increasing important part of your time. Use tools to automate repetitive tasks. •  Tools to manage AWS resources – AWS CloudFormation •  Tools to manage software and configuration on your

instances – AWS OpsWorks •  Automated data analysis of logs and user actions

User >500k+: You’ll potentially start to run into issues with speed and performance of your applications: •  Have monitoring/metrics/logging in place

–  If you can’t build it internally, outsource it! (3rd party SaaS) •  Pay attention to what customers are saying works well •  Squeeze as much performance as you can out of each

service/component

HOST LEVEL

METRICS

AGGREGATE LEVEL

METRICS

LOG ANALYSIS

EXTERNAL SITE

PERFORMANCE

Not having proper monitoring/metrics is like flying a plane

with an eye mask on in a thunderstorm.

Oh, and your wing is on fire.

AWS Marketplace & Partners Can Help •  Customer can find, research,

and buy software

•  Simple pricing, aligns with Amazon EC2 usage model

•  Launch in minutes

•  AWS Marketplace billing integrated into your AWS account

•  1300+ products across 20+ categories

Learn more at: aws.amazon.com/marketplace

There are further improvements to be

made in breaking apart our web/app layer

SOA = Service Oriented Architecture

SOA’ing Move services into their own tiers/modules. Treat each of these as 100% separate pieces of your infrastructure and scale them independently. Amazon.com and AWS do this extensively! It offers flexibility and greater understanding of each component.

Loose coupling sets you free! •  The looser they're coupled, the bigger they scale

–  Independent components –  Design everything as a black box –  Decouple interactions –  Favor services with built-in redundancy and scalability rather than

building your own

Controller  A   Controller  B  

Controller  A   Controller  B  

Q   Q  

Tight  coupling  

Use  Amazon  SQS  for  buffers  

Loose  coupling  

Loose coupling + SOA = winning

Examples: •  Email •  Queuing •  Transcoding •  Search •  Databases •  Monitoring •  Metrics •  Logging

Amazon CloudSearch

Amazon SQS Amazon SNS

Amazon Elastic Transcoder

Amazon SWF Amazon SES

In the early days, if someone has a service for it already, opt to use that instead of building it yourself. DON’T RE-INVENT THE WHEEL

On re-inventing the wheel… If you find yourself writing

your own: queue, DNS server, database, storage system,

monitoring tool

Take a deep breath and stop it. Now.

Back to SOA

Users > 1 Million

RDS DB Instance Active (Multi-AZ)

Availability Zone

Elastic Load Balancer

RDS DB Instance Read Replica

RDS DB Instance Read Replica

Web Instance

Web Instance

Web Instance

Web Instance

Amazon Route 53

User

Amazon S3

Amazon Cloudfront

Amazon DynamoDB

Amazon SQS

ElastiCache

Worker Instance

Worker Instance

Amazon CloudWatch

Internal App Instance

Internal App Instance

Amazon SES

The next big steps

From 5 to 10 Million Users You may start to run into issues with your database around contention on the write master. How can you solve it?

•  Federation - splitting into multiple DBs based on function

•  Sharding - splitting one data set up across multiple hosts

•  Moving some functionality to other types of DBs (NoSQL)

…and there you have it. 10 Million

A Quick Review

Review •  Multi-AZ your infrastructure •  Make use of self-scaling services

–  Elastic Load Balancing, Amazon S3, Amazon SNS, Amazon SQS, Amazon SWF, Amazon SES, etc.

•  Build in redundancy at every level •  Most likely start with SQL •  Cache data both inside and outside your

infrastructure •  Use automation tools in your infrastructure

Review (cont) •  Make sure you have good metrics/monitoring/

logging tools in place •  Split tiers into individual services (SOA) •  Use Auto Scaling when you’re ready for it •  Don’t reinvent the wheel •  Move to NoSQL when it really makes sense but

do your best not to administer it

Putting all this together means we should now

easily be able to handle 10+ million users!

To infinity…..

Thank You!

AWS EXPERT? GET CERTIFIED! aws.amazon.com/certification

Alex Sinner Solutions Architect @alexsinner