© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
November 13, 2014 | Las Vegas
BDT204
Rendering a Seamless Satellite Map of the
World with AWS and NASA Data
Eric Gundersen and Will White, Mapbox
Amazon EC2
Offers low-cost, scalable computing
Amazon S3
Data storage for input data and processed output
Auto Scaling
Controls the number of worker EC2 instances
Amazon SQS
Manages the units of work
Mapbox Satellite
hmmm, this is slow going upgrade
EC2 type
w00t! killing it
spiked regional spot pricing :p
increases $ for spot pricing
One image every day for the
last two years.
17,179,869,184 pixels x 365 days x 2 years
12.5 trillion pixels
That’s a lot of pixels…
We need to
• Quickly process massive amounts of data
• Distribute processed data to users around the world
quickly and reliably
• Low cost
Processing
Processing requirements
• Massive storage for raw and processed data
• Massive computing that we can spin up and down in
minutes
• Everything must be fully automated
• Low cost
Amazon EC2
Low-cost, scalable computing
Amazon S3
Data storage for input data and processed output
Auto Scaling
Controls the number of worker EC2 instances
Amazon SQS
Manages the queue of work
NASA Server
Source S3 Bucket
Watcher Instance
Auto Scaling group
SQS Queue
Worker Instances
Destination
S3 Bucket
Processed Outputs
Watcher EC2 instance
• Copies raw data files from NASA server to our S3
bucket
• Splits file up into smaller parts and sends them into
Amazon SQS as messages
Why stash raw data on Amazon S3?
• Extremely low latency between Amazon S3 and
Amazon EC2 in the same AWS region
• Don’t want to hammer NASA servers with requests
from our hundreds of workers
• Easy to reprocess data later
Messages for Amazon SQS
• Take a big job and split it up into smaller parts
• Shorter is better - a few minutes per message is
ideal
• Messages need to be repeatable in case of failure
Raw data
SQS Messages
SQS messages
NASA Server
Source S3 Bucket
Watcher Instance
Auto Scaling group
SQS Queue
Worker Instances
Destination
S3 Bucket
Processed Outputs
Worker EC2 instance
Grab message from
the queue
Source S3 BucketSQS Queue
Destination
S3 Bucket
Download raw data
from S3
Run software to
process the data
Deliver processed
data to S3
Delete message
from the queue to
mark it complete
NASA Server
Source S3 Bucket
Watcher Instance
Auto Scaling group
SQS Queue
Worker Instances
Destination
S3 Bucket
Processed Outputs
Worker Auto Scaling Group
• Capacity is controlled by the number of messages in
the queue
• Spikes are no problem: more instances come online
automatically
Auto ScalingCloudWatchAmazon SQS
(Queue Size)
Data processing
SQS Messages
EC2 Instances
SQS Messages
EC2 Instances
SQS Messages
EC2 Instances
SQS Messages
EC2 Instances
NASA Server
Source S3 Bucket
Watcher Instance
Auto Scaling group
SQS Queue
Worker Instances
Destination
S3 Bucket
Processed Outputs
How can we make this cheap?
Spot market
• Bid on unused Amazon EC2 capacity and get a
discount
• Instance runs as long as your bid price is higher than
the market price
• If market prices spikes, your instances are terminated
immediately
• Perfect for big data processing jobs that aren’t on a
critical schedule
c3.xlarge / us-east-1e / $0.210 per hour
On-Demand Market
c3.xlarge / us-east-1e / $0.210 per hour
$151.20 per month
On-Demand Market
avg price $0.032
c3.xlarge / us-east-1e / $0.0321 per hour
$23.11 per month
Spot Market
Running 200 c3.xlarge instances
$25,618 in savings per month
Spot Market
The graph isn’t always flat.
bid price $1.90
bid price $0.60
bid price $0.60
bid price $1.15
Spot market
• Jobs need to be small (just like Amazon SQS)
• Be prepared for spikes: wait them out or increase
your bid price
How do we get the data to
users?
Distribution
In the past 30 days we have served
9.8 billion requests
That’s a lot of requests…
Distribution requirements
• Massive storage for processed data
• HTTP sever capacity that we can spin up and down
in minutes
• Global distribution for speed and redundancy
• Everything must be fully automated
• Low cost
Amazon EC2
Offers low-cost, scalable computing
Amazon S3
Data storage for input data and processed output
Auto Scaling
Controls the number of worker EC2 instances
Amazon SQS
Manages the units of work
Amazon EC2
Offers low-cost, scalable computing
Amazon S3
Data storage for input data and processed output
Auto Scaling
Controls the number of worker EC2 instances
Amazon SQS
Manages the units of work
Amazon EC2
Offers low-cost, scalable computing
Amazon S3
Data storage for input data and processed output
Auto Scaling
Controls the number of worker EC2 instances
Distributes web traffic between multiple EC2 instances
Elastic Load
Balancing
NASA Server
Source S3 Bucket
Watcher Instance
Auto Scaling group
SQS Queue
Worker Instances
Destination
S3 Bucket
Processed Outputs
S3 Bucket
Virginia
S3 Bucket
São Paulo
S3 Bucket
Ireland
S3 Bucket
Tokyo
S3 Bucket
California
S3 Bucket
Singapore
S3 Bucket
Sydney
S3 Bucket
Oregon
Processed Outputs
S3 Bucket
Frankfurt
region
S3 Bucket
Auto Scaling group
Server Instances
Elastic Load
Balancing
region
region
region
Amazon
Route 53
users
Auto ScalingCloudWatchAmazon SQS
(Queue Size)
Data processing
Auto ScalingCloudWatchAmazon SQS
(Queue Size)
Data processing
Auto ScalingCloudWatch
Data distributionElastic Load
Balancing
(Request Count)
Requests over 7 days Running instances over 7 days
Running instances across all regions over 7 days
How can we make this cheap?
Instance reservations
• Buy computing up front for long-running instances
• Large upfront charge in exchange for low hourly
usage cost
• Save up to 60% or more over the course of a year
• Perfect for critical instances that need to stay online
Reservations about reservations
• Took us over a year to commit
• Changing infrastructure: splitting applications, new
instance types
What made us eventually buy
• Easily swap reservations for instances within the
same family
• Sell unused instances on the secondary market
• Cloudability: Great reservation recommendation tool
Amazon EC2
Amazon S3Auto Scaling
Amazon SQSCloudWatch
Elastic Load
Balancing
Amazon
Route 53
CloudFront
Please give us your feedback on this session.
Complete session evaluations and earn re:Invent swag.
http://bit.ly/awsevalsBDT204
Top Related