(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

112
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. November 13, 2014 | Las Vegas BDT204 Rendering a Seamless Satellite Map of the World with AWS and NASA Data Eric Gundersen and Will White, Mapbox

description

NASA imaging satellites deliver GB's of images to Earth every day. Mapbox uses AWS to process that data in real-time and build the most complete, seamless satellite map of the world. Learn how Mapbox uses Amazon S3 and Amazon SQS to stream data from NASA into clusters of EC2 instances running a clever algorithm that stiches images together in parallel. This session includes an in-depth discussion of high-volume storage with Amazon S3, cost-efficient data processing with Amazon EC2 Spot Instances, reliable job orchestration with Amazon SQS, and demand resilience with Auto Scaling.

Transcript of (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Page 1: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

November 13, 2014 | Las Vegas

BDT204

Rendering a Seamless Satellite Map of the

World with AWS and NASA Data

Eric Gundersen and Will White, Mapbox

Page 2: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 3: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Amazon EC2

Offers low-cost, scalable computing

Amazon S3

Data storage for input data and processed output

Auto Scaling

Controls the number of worker EC2 instances

Amazon SQS

Manages the units of work

Page 4: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 5: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 6: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 7: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 8: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 9: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 10: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Mapbox Satellite

Page 11: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 12: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 13: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 14: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 15: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 16: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 17: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 18: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 19: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 20: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 21: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 22: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 23: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 24: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 25: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 26: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

hmmm, this is slow going upgrade

EC2 type

w00t! killing it

spiked regional spot pricing :p

increases $ for spot pricing

Page 27: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 28: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 29: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 30: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 31: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 32: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 33: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 34: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 35: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 36: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 37: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 38: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 39: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 40: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 41: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

One image every day for the

last two years.

Page 42: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

17,179,869,184 pixels x 365 days x 2 years

12.5 trillion pixels

Page 43: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

That’s a lot of pixels…

Page 44: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

We need to

• Quickly process massive amounts of data

• Distribute processed data to users around the world

quickly and reliably

• Low cost

Page 45: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Processing

Page 46: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Processing requirements

• Massive storage for raw and processed data

• Massive computing that we can spin up and down in

minutes

• Everything must be fully automated

• Low cost

Page 47: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Amazon EC2

Low-cost, scalable computing

Amazon S3

Data storage for input data and processed output

Auto Scaling

Controls the number of worker EC2 instances

Amazon SQS

Manages the queue of work

Page 48: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

NASA Server

Source S3 Bucket

Watcher Instance

Auto Scaling group

SQS Queue

Worker Instances

Destination

S3 Bucket

Processed Outputs

Page 49: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Watcher EC2 instance

• Copies raw data files from NASA server to our S3

bucket

• Splits file up into smaller parts and sends them into

Amazon SQS as messages

Page 50: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Why stash raw data on Amazon S3?

• Extremely low latency between Amazon S3 and

Amazon EC2 in the same AWS region

• Don’t want to hammer NASA servers with requests

from our hundreds of workers

• Easy to reprocess data later

Page 51: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Messages for Amazon SQS

• Take a big job and split it up into smaller parts

• Shorter is better - a few minutes per message is

ideal

• Messages need to be repeatable in case of failure

Page 52: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Raw data

SQS Messages

Page 53: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

SQS messages

Page 54: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

NASA Server

Source S3 Bucket

Watcher Instance

Auto Scaling group

SQS Queue

Worker Instances

Destination

S3 Bucket

Processed Outputs

Page 55: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Worker EC2 instance

Grab message from

the queue

Source S3 BucketSQS Queue

Destination

S3 Bucket

Download raw data

from S3

Run software to

process the data

Deliver processed

data to S3

Delete message

from the queue to

mark it complete

Page 56: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

NASA Server

Source S3 Bucket

Watcher Instance

Auto Scaling group

SQS Queue

Worker Instances

Destination

S3 Bucket

Processed Outputs

Page 57: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Worker Auto Scaling Group

• Capacity is controlled by the number of messages in

the queue

• Spikes are no problem: more instances come online

automatically

Page 58: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Auto ScalingCloudWatchAmazon SQS

(Queue Size)

Data processing

Page 59: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

SQS Messages

EC2 Instances

Page 60: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

SQS Messages

EC2 Instances

Page 61: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

SQS Messages

EC2 Instances

Page 62: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

SQS Messages

EC2 Instances

Page 63: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

NASA Server

Source S3 Bucket

Watcher Instance

Auto Scaling group

SQS Queue

Worker Instances

Destination

S3 Bucket

Processed Outputs

Page 64: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 65: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 66: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

How can we make this cheap?

Page 67: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Spot market

• Bid on unused Amazon EC2 capacity and get a

discount

• Instance runs as long as your bid price is higher than

the market price

• If market prices spikes, your instances are terminated

immediately

• Perfect for big data processing jobs that aren’t on a

critical schedule

Page 68: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

c3.xlarge / us-east-1e / $0.210 per hour

On-Demand Market

Page 69: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

c3.xlarge / us-east-1e / $0.210 per hour

$151.20 per month

On-Demand Market

Page 70: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

avg price $0.032

Page 71: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

c3.xlarge / us-east-1e / $0.0321 per hour

$23.11 per month

Spot Market

Page 72: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Running 200 c3.xlarge instances

$25,618 in savings per month

Spot Market

Page 73: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 74: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

The graph isn’t always flat.

Page 75: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 76: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

bid price $1.90

Page 77: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 78: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

bid price $0.60

Page 79: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 80: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

bid price $0.60

Page 81: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

bid price $1.15

Page 82: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Spot market

• Jobs need to be small (just like Amazon SQS)

• Be prepared for spikes: wait them out or increase

your bid price

Page 83: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 84: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

How do we get the data to

users?

Page 85: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Distribution

Page 86: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

In the past 30 days we have served

9.8 billion requests

Page 87: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

That’s a lot of requests…

Page 88: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Distribution requirements

• Massive storage for processed data

• HTTP sever capacity that we can spin up and down

in minutes

• Global distribution for speed and redundancy

• Everything must be fully automated

• Low cost

Page 89: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Amazon EC2

Offers low-cost, scalable computing

Amazon S3

Data storage for input data and processed output

Auto Scaling

Controls the number of worker EC2 instances

Amazon SQS

Manages the units of work

Page 90: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Amazon EC2

Offers low-cost, scalable computing

Amazon S3

Data storage for input data and processed output

Auto Scaling

Controls the number of worker EC2 instances

Amazon SQS

Manages the units of work

Page 91: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Amazon EC2

Offers low-cost, scalable computing

Amazon S3

Data storage for input data and processed output

Auto Scaling

Controls the number of worker EC2 instances

Distributes web traffic between multiple EC2 instances

Elastic Load

Balancing

Page 92: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

NASA Server

Source S3 Bucket

Watcher Instance

Auto Scaling group

SQS Queue

Worker Instances

Destination

S3 Bucket

Processed Outputs

Page 93: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

S3 Bucket

Virginia

S3 Bucket

São Paulo

S3 Bucket

Ireland

S3 Bucket

Tokyo

S3 Bucket

California

S3 Bucket

Singapore

S3 Bucket

Sydney

S3 Bucket

Oregon

Processed Outputs

S3 Bucket

Frankfurt

Page 94: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 95: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

region

S3 Bucket

Auto Scaling group

Server Instances

Elastic Load

Balancing

Page 96: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

region

region

region

Amazon

Route 53

users

Page 97: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Auto ScalingCloudWatchAmazon SQS

(Queue Size)

Data processing

Page 98: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Auto ScalingCloudWatchAmazon SQS

(Queue Size)

Data processing

Auto ScalingCloudWatch

Data distributionElastic Load

Balancing

(Request Count)

Page 99: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Requests over 7 days Running instances over 7 days

Page 100: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Running instances across all regions over 7 days

Page 101: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 102: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 103: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

How can we make this cheap?

Page 104: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Instance reservations

• Buy computing up front for long-running instances

• Large upfront charge in exchange for low hourly

usage cost

• Save up to 60% or more over the course of a year

• Perfect for critical instances that need to stay online

Page 105: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Reservations about reservations

• Took us over a year to commit

• Changing infrastructure: splitting applications, new

instance types

Page 106: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

What made us eventually buy

• Easily swap reservations for instances within the

same family

• Sell unused instances on the secondary market

• Cloudability: Great reservation recommendation tool

Page 107: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 108: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 109: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Amazon EC2

Amazon S3Auto Scaling

Amazon SQSCloudWatch

Elastic Load

Balancing

Amazon

Route 53

CloudFront

Page 110: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 111: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014
Page 112: (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Please give us your feedback on this session.

Complete session evaluations and earn re:Invent swag.

http://bit.ly/awsevalsBDT204