(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

Post on 30-Jun-2015

297 views 3 download

description

NASA imaging satellites deliver GB's of images to Earth every day. Mapbox uses AWS to process that data in real-time and build the most complete, seamless satellite map of the world. Learn how Mapbox uses Amazon S3 and Amazon SQS to stream data from NASA into clusters of EC2 instances running a clever algorithm that stiches images together in parallel. This session includes an in-depth discussion of high-volume storage with Amazon S3, cost-efficient data processing with Amazon EC2 Spot Instances, reliable job orchestration with Amazon SQS, and demand resilience with Auto Scaling.

Transcript of (BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Data | AWS re:Invent 2014

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

November 13, 2014 | Las Vegas

BDT204

Rendering a Seamless Satellite Map of the

World with AWS and NASA Data

Eric Gundersen and Will White, Mapbox

Amazon EC2

Offers low-cost, scalable computing

Amazon S3

Data storage for input data and processed output

Auto Scaling

Controls the number of worker EC2 instances

Amazon SQS

Manages the units of work

Mapbox Satellite

hmmm, this is slow going upgrade

EC2 type

w00t! killing it

spiked regional spot pricing :p

increases $ for spot pricing

One image every day for the

last two years.

17,179,869,184 pixels x 365 days x 2 years

12.5 trillion pixels

That’s a lot of pixels…

We need to

• Quickly process massive amounts of data

• Distribute processed data to users around the world

quickly and reliably

• Low cost

Processing

Processing requirements

• Massive storage for raw and processed data

• Massive computing that we can spin up and down in

minutes

• Everything must be fully automated

• Low cost

Amazon EC2

Low-cost, scalable computing

Amazon S3

Data storage for input data and processed output

Auto Scaling

Controls the number of worker EC2 instances

Amazon SQS

Manages the queue of work

NASA Server

Source S3 Bucket

Watcher Instance

Auto Scaling group

SQS Queue

Worker Instances

Destination

S3 Bucket

Processed Outputs

Watcher EC2 instance

• Copies raw data files from NASA server to our S3

bucket

• Splits file up into smaller parts and sends them into

Amazon SQS as messages

Why stash raw data on Amazon S3?

• Extremely low latency between Amazon S3 and

Amazon EC2 in the same AWS region

• Don’t want to hammer NASA servers with requests

from our hundreds of workers

• Easy to reprocess data later

Messages for Amazon SQS

• Take a big job and split it up into smaller parts

• Shorter is better - a few minutes per message is

ideal

• Messages need to be repeatable in case of failure

Raw data

SQS Messages

SQS messages

NASA Server

Source S3 Bucket

Watcher Instance

Auto Scaling group

SQS Queue

Worker Instances

Destination

S3 Bucket

Processed Outputs

Worker EC2 instance

Grab message from

the queue

Source S3 BucketSQS Queue

Destination

S3 Bucket

Download raw data

from S3

Run software to

process the data

Deliver processed

data to S3

Delete message

from the queue to

mark it complete

NASA Server

Source S3 Bucket

Watcher Instance

Auto Scaling group

SQS Queue

Worker Instances

Destination

S3 Bucket

Processed Outputs

Worker Auto Scaling Group

• Capacity is controlled by the number of messages in

the queue

• Spikes are no problem: more instances come online

automatically

Auto ScalingCloudWatchAmazon SQS

(Queue Size)

Data processing

SQS Messages

EC2 Instances

SQS Messages

EC2 Instances

SQS Messages

EC2 Instances

SQS Messages

EC2 Instances

NASA Server

Source S3 Bucket

Watcher Instance

Auto Scaling group

SQS Queue

Worker Instances

Destination

S3 Bucket

Processed Outputs

How can we make this cheap?

Spot market

• Bid on unused Amazon EC2 capacity and get a

discount

• Instance runs as long as your bid price is higher than

the market price

• If market prices spikes, your instances are terminated

immediately

• Perfect for big data processing jobs that aren’t on a

critical schedule

c3.xlarge / us-east-1e / $0.210 per hour

On-Demand Market

c3.xlarge / us-east-1e / $0.210 per hour

$151.20 per month

On-Demand Market

avg price $0.032

c3.xlarge / us-east-1e / $0.0321 per hour

$23.11 per month

Spot Market

Running 200 c3.xlarge instances

$25,618 in savings per month

Spot Market

The graph isn’t always flat.

bid price $1.90

bid price $0.60

bid price $0.60

bid price $1.15

Spot market

• Jobs need to be small (just like Amazon SQS)

• Be prepared for spikes: wait them out or increase

your bid price

How do we get the data to

users?

Distribution

In the past 30 days we have served

9.8 billion requests

That’s a lot of requests…

Distribution requirements

• Massive storage for processed data

• HTTP sever capacity that we can spin up and down

in minutes

• Global distribution for speed and redundancy

• Everything must be fully automated

• Low cost

Amazon EC2

Offers low-cost, scalable computing

Amazon S3

Data storage for input data and processed output

Auto Scaling

Controls the number of worker EC2 instances

Amazon SQS

Manages the units of work

Amazon EC2

Offers low-cost, scalable computing

Amazon S3

Data storage for input data and processed output

Auto Scaling

Controls the number of worker EC2 instances

Amazon SQS

Manages the units of work

Amazon EC2

Offers low-cost, scalable computing

Amazon S3

Data storage for input data and processed output

Auto Scaling

Controls the number of worker EC2 instances

Distributes web traffic between multiple EC2 instances

Elastic Load

Balancing

NASA Server

Source S3 Bucket

Watcher Instance

Auto Scaling group

SQS Queue

Worker Instances

Destination

S3 Bucket

Processed Outputs

S3 Bucket

Virginia

S3 Bucket

São Paulo

S3 Bucket

Ireland

S3 Bucket

Tokyo

S3 Bucket

California

S3 Bucket

Singapore

S3 Bucket

Sydney

S3 Bucket

Oregon

Processed Outputs

S3 Bucket

Frankfurt

region

S3 Bucket

Auto Scaling group

Server Instances

Elastic Load

Balancing

region

region

region

Amazon

Route 53

users

Auto ScalingCloudWatchAmazon SQS

(Queue Size)

Data processing

Auto ScalingCloudWatchAmazon SQS

(Queue Size)

Data processing

Auto ScalingCloudWatch

Data distributionElastic Load

Balancing

(Request Count)

Requests over 7 days Running instances over 7 days

Running instances across all regions over 7 days

How can we make this cheap?

Instance reservations

• Buy computing up front for long-running instances

• Large upfront charge in exchange for low hourly

usage cost

• Save up to 60% or more over the course of a year

• Perfect for critical instances that need to stay online

Reservations about reservations

• Took us over a year to commit

• Changing infrastructure: splitting applications, new

instance types

What made us eventually buy

• Easily swap reservations for instances within the

same family

• Sell unused instances on the secondary market

• Cloudability: Great reservation recommendation tool

Amazon EC2

Amazon S3Auto Scaling

Amazon SQSCloudWatch

Elastic Load

Balancing

Amazon

Route 53

CloudFront

Please give us your feedback on this session.

Complete session evaluations and earn re:Invent swag.

http://bit.ly/awsevalsBDT204