Data Processing without Servers | AWS Public Sector Summit 2016
-
Upload
amazon-web-services -
Category
Technology
-
view
345 -
download
0
Transcript of Data Processing without Servers | AWS Public Sector Summit 2016
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Jed Sundwall, Global Open Data LeadJune 21, 2016
Data Processing Without Servers: Serverless Processing of Landsat 8 Imagery
Using AWS Lambda with Landsat on AWS
What is Landsat?
Landsat
The Landsat program is a joint effort of the U.S. Geological Survey and NASA. It is the longest running program to gather Earth imagery from space and is considered the gold standard for natural resources satellite imagery.
Landsat—not just pretty pictures
Landsat scenes are made up of multiple files, each of which includes data about different kinds of light reflected off of Earth.
Each pixel of each Landsat 8 file represents a 12-bit measurement of light reflected off a 30m2 part of our planet. Each Landsat 8 scene contains about 840 million pixels and takes up about 800 MB.
We currently host over 400,000 Landsat 8 scenes and make about 700 new scenes available on Amazon S3 every day.
That’s 588 billion pixels a day.
RGBvisible light
Infraredvegetation
Shortwave infraredurban areas
Wellington, New Zealand
What does “serverless” mean?
“Serverless” is an approach to software development that eliminates the need for maintaining and administering servers
What does “serverless” mean?
Application design is facilitated through interaction with third-party APIs/services and self-created non-server based APIs.
What does “serverless” mean?
AWS Lambda
AWS Lambda
Serverless compute service that runs code in response to events and automatically manages the underlying compute resources
AWS Lambda
COMPUTE SERVICE
EVENT DRIVEN
Run code at any scale without thinking about
servers
Code only runs when it needs to run, charged on execution time
AWS Lambda + Landsat
Landsat on AWS
Landsat on AWS makes each band of each scene readily available as objects on Amazon S3.
Data can be accessed programmatically via HTTP and quickly deployed to any of our products for analysis and processing.
An Amazon SNS topic publishes a notification whenever a new scene is available.
Landsat on AWS
Landsat TIFFs represent individual wavelengths of light, and need to be combined to be interpretable by most people.
Using image processing tools, we can combine multiple bands into one “true color” image.
Our goal is to create true color images automatically as each scene is made publically available.
AWSLambda
AmazonDynamoDB
AmazonS3
AmazonSNS
We can seamlessly integrate various Amazon Web Services products to create a serverless architecture that will achieve this quickly and cost-effectively.
AWSLambda
AmazonDynamoDB
AmazonS3
AmazonSNS
Serverless architecture
AWS Lambda
Landsat 8 bucket
Amazon SNS Target bucket
Amazon DynamoDB
{ "Records": [ { "EventVersion": "1.0", "EventSubscriptionArn": "arn:aws:sns:EXAMPLE", "EventSource": "aws:sns", "Sns": { "SignatureVersion": "1", "Timestamp": "1970-01-01T00:00:00.000Z", "Signature": "EXAMPLE", "SigningCertUrl": "EXAMPLE", "MessageId": "95df01b4-ee98-5cb9-9903-4c221d41eb5e", "Message": "{\"Records\":[{\"eventVersion\":\"2.0\",\"eventSource\":\"aws:s3\",\"awsRegion\":\"us-west-2\",\"eventTime\":\"2016-01-16T01:36:55.014Z\",\"eventName\":\"ObjectCreated:Put\",\"userIdentity\":{\"principalId\":\"AWS:AIDAILHHXPNIKSGVUGOZK\"},\"requestParameters\":{\"sourceIPAddress\":\"52.27.39.85\"},\"responseElements\":{\"x-amz-request-id\":\"078952E6C7CC52B4\",\"x-amz-id-2\":\"Xboo1ULzd7PxY27iIaGXjUStV8TmG52JAbiWQpiRJWuRqfaBhLcc0XMUKNmXgd5fbIfRd1IcrgE=\"},\"s3\":{\"s3SchemaVersion\":\"1.0\",\"configurationId\":\"NewHTML\",\"bucket\":{\"name\":\"landsat-pds\",\"ownerIdentity\":{\"principalId\":\"A3LZTVCZQ87CNW\"},\"arn\":\"arn:aws:s3:::landsat-pds\"},\"object\":{\"key\":\"L8/169/060/LC81690602016015LGN00/index.html\",\"size\":3780,\"eTag\":\"736e4e5a36cb8a1c6cbfc58659126ff1\",\"sequencer\":\"0056999EB6F8BDBB8D\"}}}]}", "Type": "Notification", "UnsubscribeUrl": "EXAMPLE", "TopicArn": "arn:aws:sns:EXAMPLE", "Subject": "TestInvoke" } } ]
An Amazon SNS topic publishes a notification whenever a new scene is available.
This is what a notification looks like. It’s a JavaScript Object Notation (JSON) object.
{ "Records": [ { "EventVersion": "1.0", "EventSubscriptionArn": "arn:aws:sns:EXAMPLE", "EventSource": "aws:sns", "Sns": { "SignatureVersion": "1", "Timestamp": "1970-01-01T00:00:00.000Z", "Signature": "EXAMPLE", "SigningCertUrl": "EXAMPLE", "MessageId": "95df01b4-ee98-5cb9-9903-4c221d41eb5e", "Message": "{\"Records\":[{\"eventVersion\":\"2.0\",\"eventSource\":\"aws:s3\",\"awsRegion\":\"us-west-2\",\"eventTime\":\"2016-01-16T01:36:55.014Z\",\"eventName\":\"ObjectCreated:Put\",\"userIdentity\":{\"principalId\":\"AWS:AIDAILHHXPNIKSGVUGOZK\"},\"requestParameters\":{\"sourceIPAddress\":\"52.27.39.85\"},\"responseElements\":{\"x-amz-request-id\":\"078952E6C7CC52B4\",\"x-amz-id-2\":\"Xboo1ULzd7PxY27iIaGXjUStV8TmG52JAbiWQpiRJWuRqfaBhLcc0XMUKNmXgd5fbIfRd1IcrgE=\"},\"s3\":{\"s3SchemaVersion\":\"1.0\",\"configurationId\":\"NewHTML\",\"bucket\":{\"name\":\"landsat-pds\",\"ownerIdentity\":{\"principalId\":\"A3LZTVCZQ87CNW\"},\"arn\":\"arn:aws:s3:::landsat-pds\"},\"object\":{\"key\":\"L8/169/060/LC81690602016015LGN00/index.html\",\"size\":3780,\"eTag\":\"736e4e5a36cb8a1c6cbfc58659126ff1\",\"sequencer\":\"0056999EB6F8BDBB8D\"}}}]}", "Type": "Notification", "UnsubscribeUrl": "EXAMPLE", "TopicArn": "arn:aws:sns:EXAMPLE", "Subject": "TestInvoke" } } ]
An Amazon SNS topic publishes a notification whenever a new scene is available.
This is what a notification looks like. It’s a JavaScript Object Notation (JSON) object.
Programmatic access to dataL8/169/060/LC81690602016015LGN00/index.html → LC81690602016015LGN00_B1.TIF → LC81690602016015LGN00_B2.TIF → LC81690602016015LGN00_B3.TIF … → LC81690602016015LGN00_MTL.txt
The notification has given us everything we need to find the data for our task. AWS Lambda can do all of this automatically.
Serverless architecture
AWS Lambda
Landsat 8 bucket
Amazon SNS Target bucket
Amazon DynamoDB
The SNS message object is available to the Lambda function on execution.
From this object, we obtain the base Landsat scene information (Path, Row, Scene ID), as well as the MTL text file containing the detailed metadata for the scene.
Native JSONNext, the Lambda function retrieves the text file containing the scene metadata.
The metadata is parsed and converted to JSON.
Native JSONHaving the metadata available in JSON will allow for much easier storage of the metadata in DynamoDB.
After storing the scene metadata, the function then invokes an additional fleet of Lambda functions.
Each function is tasked with downloading the .TIF corresponding to the three bands to generate a true color image, converting them to a .JPG, and uploading them back to S3 to make them available to the parent Lambda function.
Lambda functions natively include the open source image processing library ImageMagick.
We call this library to retrieve the three compressed .JPG bands, assemble them into a single .JPG, and then make color/contrast adjustments.
The parent Lambda function uploads the converted bands and the processed true color image to S3.
We can then make these finished .JPGs publically available, or available only to a specific application, depending on the use case
Thank you!Jed Sundwall, Global Open Data Lead – [email protected] Opsitos, Solutions Architect – [email protected]
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Matthew Hanson, Development Seed, @geoskeptic
June 21, 2016
OSM-STATSGamification for Humanitarian Mapping
OpenStreetMap
Open map data Roads, rivers, buildings (e.g., hospitals)
Crowd-sourced mapping platform Users create vectors from satellite imagery OSM tasking manager identifies critical areas
Missing Maps
An initiative to map out areas most in need Humanitarian response Third-world regions with poor coverage
Organize marathons Events with groups of volunteers focus on a region
Website of statistics from marathons Keep track of contributions by hashtags users include in commits
OSM-Stats
Website of statistics by users and hashtags Track different groups, different mapathons Offer a reward mechanism to encourage contributions
Users earn badges for different statistics e.g., km of roads, # of buildings
Leaderboards for users and hashtags Produce stats in real-time for added fun at mapathons
missingmaps.org
OSM infrastructure
Commits (changesets) by users published every minute Include metadata, but not geometries http://planet.osm.org/replication/changesets/
Geometries made available by minute via ‘overpass’ http://overpass-api.de/
OSM-Stats Architecture
planet-stream
Node app Streams metadata and geometries from sources
Combine them using Redis Push augmented changesets to Amazon Kinesis stream Docker container running on Amazon EC2
osm-stats-workers
AWS Lambda with Node v4.3.2Event mapping to Amazon Kinesis streamCalculates metrics from each changes
Geometry calculations from vector data Determination of countries edited Ancillary data: user, editor used
Add to Amazon RDS database
Deployment Use Python script and boto3 Deploy database
Create Amazon RDS and osm-stats database, with inbound rules Migrate and populate
Create Amazon Kinesis stream Create AWS Lambda
Create with appropriate permissions—Amazon Kinesis, Amazon RDS security group pair Create event mapping
Deploy Amazon EC2 Create instance, create security groups Use fabric to upload .env file (with URLs and names of above services), Dockerfiles docker-compose up -d: starts pushing to stream as soon as augmented changesets
created
Why Lambda and Amazon Kinesis?
Microservices architecture Smaller replaceable components Easier to scale pieces
Lambda provides low-cost solution at scale Activity can vary from a few to 100 changesets/min
Amazon Kinesis stream allows flexible input for historical processing
Lambda Invocations and Durations
Plots using librato
Lambda lessons
Local testing framework would have been useful Lambda logs take some work
aws-cli—combined with Python or Bash scripts can be useful to parse logs awslogs—Amazon CloudWatch logs for Humans (
https://github.com/jorgebastida/awslogs) Error handling
Lambda function design should handle all errors—don’t let it return a failure Include top-level catch to catch any errors, log, and return success
Database connections using Knex Database pools and Lambda container reuse (pool min=0 !)
Lambda Security and VPCs
Initially configured closed RDS with Lambda accessPaired security groups for RDS and Lambda
As part of VPC, Lambda is in a bubbleosm-stats-workers—makes requests elsewhere
OSM API for tasking manager data
Ended up opening up RDS to the worldSecurity groups also seem to cause intermittent pool errors
github.com/AmericanRedCross/osm-stats