(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

November 13, 2014 | Las Vegas

BDT204

Rendering a Seamless Satellite Map of the

World with AWS and NASA Data

Eric Gundersen and Will White, Mapbox

Amazon EC2

Offers low-cost, scalable computing

Amazon S3

Data storage for input data and processed output

Auto Scaling

Controls the number of worker EC2 instances

Amazon SQS

Manages the units of work

Mapbox Satellite

hmmm, this is slow going upgrade

EC2 type

w00t! killing it

spiked regional spot pricing :p

increases $ for spot pricing

One image every day for the

last two years.

17,179,869,184 pixels x 365 days x 2 years

12.5 trillion pixels

That’s a lot of pixels…

We need to

• Quickly process massive amounts of data

• Distribute processed data to users around the world

quickly and reliably

• Low cost

Processing

Processing requirements

• Massive storage for raw and processed data

• Massive computing that we can spin up and down in

minutes

• Everything must be fully automated

• Low cost

Amazon EC2

Low-cost, scalable computing

Amazon S3

Auto Scaling

Amazon SQS

Manages the queue of work

NASA Server

Source S3 Bucket

Watcher Instance

Auto Scaling group

SQS Queue

Worker Instances

Destination

S3 Bucket

Processed Outputs

Watcher EC2 instance

• Copies raw data files from NASA server to our S3

bucket

• Splits file up into smaller parts and sends them into

Amazon SQS as messages

Why stash raw data on Amazon S3?

• Extremely low latency between Amazon S3 and

Amazon EC2 in the same AWS region

• Don’t want to hammer NASA servers with requests

from our hundreds of workers

• Easy to reprocess data later

Messages for Amazon SQS

• Take a big job and split it up into smaller parts

• Shorter is better - a few minutes per message is

• Messages need to be repeatable in case of failure

Raw data

SQS Messages

SQS messages

NASA Server

Source S3 Bucket

Watcher Instance

Auto Scaling group

SQS Queue

Worker Instances

Destination

S3 Bucket

Processed Outputs

Worker EC2 instance

Grab message from

the queue

Source S3 BucketSQS Queue

Destination

S3 Bucket

Download raw data

from S3

Run software to

process the data

Deliver processed

data to S3

Delete message

from the queue to

mark it complete

NASA Server

Source S3 Bucket

Watcher Instance

Auto Scaling group

SQS Queue

Worker Instances

Destination

S3 Bucket

Processed Outputs

Worker Auto Scaling Group

• Capacity is controlled by the number of messages in

the queue

• Spikes are no problem: more instances come online

automatically

Auto ScalingCloudWatchAmazon SQS

(Queue Size)

Data processing

SQS Messages

EC2 Instances

SQS Messages

EC2 Instances

SQS Messages

EC2 Instances

SQS Messages

EC2 Instances

NASA Server

Source S3 Bucket

Watcher Instance

Auto Scaling group

SQS Queue

Worker Instances

Destination

S3 Bucket

Processed Outputs

How can we make this cheap?

Spot market

• Bid on unused Amazon EC2 capacity and get a

discount

• Instance runs as long as your bid price is higher than

the market price

• If market prices spikes, your instances are terminated

immediately

• Perfect for big data processing jobs that aren’t on a

critical schedule

c3.xlarge / us-east-1e / $0.210 per hour

On-Demand Market

$151.20 per month

On-Demand Market

avg price $0.032

$23.11 per month

Spot Market

Running 200 c3.xlarge instances

$25,618 in savings per month

Spot Market

The graph isn’t always flat.

bid price $1.90

bid price $0.60

bid price $1.15

Spot market

• Jobs need to be small (just like Amazon SQS)

• Be prepared for spikes: wait them out or increase

your bid price

How do we get the data to

users?

Distribution

In the past 30 days we have served

9.8 billion requests

That’s a lot of requests…

Distribution requirements

• Massive storage for processed data

• HTTP sever capacity that we can spin up and down

in minutes

• Global distribution for speed and redundancy

• Everything must be fully automated

• Low cost

Amazon EC2

Amazon S3

Auto Scaling

Amazon SQS

Amazon EC2

Amazon S3

Auto Scaling

Amazon SQS

Amazon EC2

Amazon S3

Auto Scaling

Distributes web traffic between multiple EC2 instances

Elastic Load

Balancing

NASA Server

Source S3 Bucket

Watcher Instance

Auto Scaling group

SQS Queue

Worker Instances

Destination

S3 Bucket

Processed Outputs

S3 Bucket

Virginia

S3 Bucket

São Paulo

S3 Bucket

Ireland

S3 Bucket

California

S3 Bucket

Singapore

S3 Bucket

Sydney

S3 Bucket

Oregon

Processed Outputs

S3 Bucket

Frankfurt

region

S3 Bucket

Auto Scaling group

Server Instances

Elastic Load

Balancing

region

Amazon

Route 53

(Queue Size)

Data processing

(Queue Size)

Data processing

Auto ScalingCloudWatch

Data distributionElastic Load

Balancing

(Request Count)

Requests over 7 days Running instances over 7 days

Running instances across all regions over 7 days

How can we make this cheap?

Instance reservations

• Buy computing up front for long-running instances

• Large upfront charge in exchange for low hourly

usage cost

• Save up to 60% or more over the course of a year

• Perfect for critical instances that need to stay online

Reservations about reservations

• Took us over a year to commit

• Changing infrastructure: splitting applications, new

instance types

What made us eventually buy

• Easily swap reservations for instances within the

same family

• Sell unused instances on the secondary market

• Cloudability: Great reservation recommendation tool

Amazon EC2

Amazon S3Auto Scaling

Amazon SQSCloudWatch

Elastic Load

Balancing

Amazon

Route 53

CloudFront

Please give us your feedback on this session.

Complete session evaluations and earn re:Invent swag.

http://bit.ly/awsevalsBDT204

(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Technology

Transcript of (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

AWS re:Invent 2016: High Performance Computing on AWS (CMP207)

(SOV209) Introducing AWS Directory Service | AWS re:Invent 2014

(PFC302) Performance Benchmarking on AWS | AWS re:Invent 2014

Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

AWS Re:Invent - Securing HIPAA Compliant Apps in AWS

AWS 2016 re:Invent Launch Summary

AWS re:Invent 2015 re:Cap

AWS re:Invent - Accelerating Research

(ENT302) Cost Optimization on AWS | AWS re:Invent 2014

AWS re:Invent 2016: The Effective AWS CLI User (DEV402)

AWS re:Invent Hackathon

AWS Billing Deep Dive (DMG203) | AWS re:Invent 2013

AWS re:Invent 2016 Photo Report

Mobile Game Architectures on AWS (MBL201) | AWS re:Invent 2013

AWS re:Invent 2016 - Scality's Open Source AWS S3 Server

Recap of AWS re:invent 2015

(APP304) AWS CloudFormation Best Practices | AWS re:Invent 2014

AWS re:Invent 2017 Recap

AWS re:Invent 2013 Recap

Overview of Windows on AWS (CPN206) | AWS re:Invent 2013