Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013
-
Upload
amazon-web-services -
Category
Technology
-
view
1.764 -
download
2
description
Transcript of Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Adding Location and Geospatial Analytics to
Your Big Data
Marwa Mabrouk, Esri
November 15, 2013
Why Big Data and Geospatial?
New Challenges for Organizations
• Better decision making
• Intelligence
• Insight / foresight
• Social data analysis
• Log files analysis
• Fraud detection
Collect Data!!!
Big Data Spatial Analytics – An
Introduction
Big Data – A New Data Type for Geospatial
Imagery
DBMS
Services Sensor
Networks
Big Data
Spreadsheets Maps
Social Media
Geospatial in Big Data
1. Geo Enable & Enrich Big Data (Geo E&E)
2. Run spatial queries and operations on data where it
resides
3. Results in Geospatial tools: Visualize results as a map;
Include in a report; Publish in a web or mobile app
Questions in Utilities
Smart Meters
• Billions of readings
• Where are the failures?
• What was the weather like here? Did it impact
operations in any of the areas?
• Patterns of usage in specific areas?
Questions in Agriculture
Tractor Control Box readings
• Billions of readings
• What was the yield in a field? – Broken by 2 inch x 2 inch
• What was the impact of weather (or other factors) on yield?
• What are the other places with conditions like this place?
Questions in Telco
Smart phones
• Billions of readings
• Where and when do people start using what
kind of apps?
• Patterns of usage in certain areas on certain
times?
Questions in Healthcare
Service Location
• Doctor/ patient/ location and time of service – Fraud detection
– Quality of service
• Health indicators readings related to where
patient has been – Impact of conditions, like weather
Questions in Social Media
Service Quality
• Where are the most complaints/ praises about a brand?
• Where is it best to start a new product limited roll out?
• What is the impact of other factors on what people say?
• Are there patterns within a certain area on how people react?
Geospatial Analysis
• Beyond a point on the map
• Simple operations – Geometry relations
• High level analysis – Hot spot analysis
Implementing Geospatial
Analysis in Big Data
select * from cities
where near(x,y,-
84.2,39.4);
Geometry Relations
select * from cities
where
contains(x,y,’#mypolys’);
Geometry Relations
Esri Geometry
API
Esri Spatial
UDF
GIS tools for Hadoop libraries
• http://esri.github.com/gis-tools-for-hadoop/
• Support running
geometry-based spatial
queries inside Hadoop
• Open Source
– Apache 2.0 license
GIS tools for Hadoop libraries
Analysis Tools Integration
ArcGIS
Run Hive Queries
with spatial
operators
Build Map/
Reduce Spatial
Apps in Java
ArcGIS
Esri Geometry
API
Esri UDF
Connect From ArcGIS
to Hadoop using GP Geoprocessing
Tools
GIS tools for Hadoop libraries
GIS Tools for Hadoop Walkthrough
Amazon Elastic Map Reduce
(Amazon EMR)
• Easy to use
• Elastic
• Low cost
• Reliable
• Secure
• Flexible
Amazon EMR Data Stores
• Amazon S3
• HDFS
• Amazon Redshift
• Amazon Glacier
• Amazon RDS
• Amazon DynamoDB
Amazon EMR for Geospatial Analysis
• Flexible platform to get started and grow large
• Hosted and managed by Amazon Web Services – No need for large Big Data in house infrastructure
– No need for hiring many people to maintain Hadoop
• Data ecosystem in the cloud is leveraged – Geospatial data is usually large in size
– Access to third party datasets in the same ecosystem
Esri Geometry
API
Esri Spatial
UDF
Connect From ArcGIS
to Hadoop using GP Geoprocessing
Tools
GIS tools for Hadoop libraries
Amazon
Elastic
MapReduce
(Amazon EMR)
ArcGIS Geoprocessing Tools
• Framework – Performing analysis
– Manage geographical data
• Rich library of analysis tools
• Chaining tools to create models – Drag and drop model builder
• Developing new custom tools – Python
GP Tools for AWS
• https://github.com/Esri/gptools-for-aws
• GP tools to use – Amazon EMR
– Amazon S3
• Open Source
– Apache 2.0 license
GP Tools for AWS Walkthrough
Boto: A python Interface to AWS
• Python package
• Supports multiple AWS services – Amazon EMR
– Amazon S3
• Complete feature set needed for Amazon EMR
• Reliable Amazon S3 implementation
Boto Walkthrough
A Real World Example
Putting it all together!
Geospatial analysis of log files
• Using: GP tools for AWS
• Goal: Analyze log files of a tile
base-map web service – Real life high demand web service
– Where is the most demand?
• Map visualization
The Architecture
Data
Scripts/
Logs/
output
ArcGIS
Desktop
+
GP Tools for
AWS
Availability Zone #1
Amazon EMR Master Node
Amazon EMR Slave Node
AWS cloud
Data Files
• Structured CSV files – ~8 GB
• Data rows – Represented 1 month
– More than 700 million records
• Represents all 18 map scales – To know in which areas users are looking for details
HQL Script
• External tables for data rows
• Calculations run through temp tables – Consolidate tile scales from most detailed to level
13
– Calculate points (x,y) representing each tile
– Aggregate results
– Format output as csv not tab delimited
• Ported from RDBMS operations – Adapted to Hive
Visualization
• Download output to local disk
• Add as a layer, set x/y for display – Set coordinate system
– Use visualization settings to cluster points
and categorize
• Use base maps
Demo
Lessons Learned
• External tables and Amazon S3
• Cluster shutdown protection
• Data – Partitioning
• Cluster sizes vs. execution time – Standard Large
– High Memory, XLarge vs Quadruple Xlarge
• Costs
Summary
• The value of asking Big Data spatial questions
• Hadoop is now spatially enabled – GIS Tools for Hadoop
• Boto for using Amazon EMR
• Geospatial analysts empowered – GP Tools for AWS
• Real world scenario using Amazon EMR and GP Tools
Q&A
Please give us your feedback on this
presentation
As a thank you, we will select prize
winners daily for completed surveys!
BDT210