Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

40
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Adding Location and Geospatial Analytics to Your Big Data Marwa Mabrouk, Esri November 15, 2013

description

(Presented by Esri) When people analyze a problem, they often include location at the core of the analysis. Location and spatial context, combined with geographical knowledge, can make the biggest difference in understanding a problem and analyzing it in a more meaningful way. In this session, we show how Amazon EMR can be used with location and geospatial analytics, and how the Amazon EMR API and the Python SDK were used to build tools that integrate Big Data and geospatial analysis. We also show powerful visualization options for displaying your results, using maps which can be shared in reports or distributed online and to mobile apps.

Transcript of Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Page 1: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Adding Location and Geospatial Analytics to

Your Big Data

Marwa Mabrouk, Esri

November 15, 2013

Page 2: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Why Big Data and Geospatial?

Page 3: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

New Challenges for Organizations

• Better decision making

• Intelligence

• Insight / foresight

• Social data analysis

• Log files analysis

• Fraud detection

Page 4: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Collect Data!!!

Big Data Spatial Analytics – An

Introduction

Page 5: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Big Data – A New Data Type for Geospatial

Imagery

DBMS

Services Sensor

Networks

Big Data

Spreadsheets Maps

Social Media

Page 6: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Geospatial in Big Data

1. Geo Enable & Enrich Big Data (Geo E&E)

2. Run spatial queries and operations on data where it

resides

3. Results in Geospatial tools: Visualize results as a map;

Include in a report; Publish in a web or mobile app

Page 7: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Questions in Utilities

Smart Meters

• Billions of readings

• Where are the failures?

• What was the weather like here? Did it impact

operations in any of the areas?

• Patterns of usage in specific areas?

Page 8: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Questions in Agriculture

Tractor Control Box readings

• Billions of readings

• What was the yield in a field? – Broken by 2 inch x 2 inch

• What was the impact of weather (or other factors) on yield?

• What are the other places with conditions like this place?

Page 9: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Questions in Telco

Smart phones

• Billions of readings

• Where and when do people start using what

kind of apps?

• Patterns of usage in certain areas on certain

times?

Page 10: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Questions in Healthcare

Service Location

• Doctor/ patient/ location and time of service – Fraud detection

– Quality of service

• Health indicators readings related to where

patient has been – Impact of conditions, like weather

Page 11: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Questions in Social Media

Service Quality

• Where are the most complaints/ praises about a brand?

• Where is it best to start a new product limited roll out?

• What is the impact of other factors on what people say?

• Are there patterns within a certain area on how people react?

Page 12: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Geospatial Analysis

• Beyond a point on the map

• Simple operations – Geometry relations

• High level analysis – Hot spot analysis

Page 13: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Implementing Geospatial

Analysis in Big Data

Page 14: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

select * from cities

where near(x,y,-

84.2,39.4);

Geometry Relations

Page 15: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

select * from cities

where

contains(x,y,’#mypolys’);

Geometry Relations

Page 16: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Esri Geometry

API

Esri Spatial

UDF

GIS tools for Hadoop libraries

Page 18: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Analysis Tools Integration

ArcGIS

Page 19: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Run Hive Queries

with spatial

operators

Build Map/

Reduce Spatial

Apps in Java

ArcGIS

Esri Geometry

API

Esri UDF

Connect From ArcGIS

to Hadoop using GP Geoprocessing

Tools

GIS tools for Hadoop libraries

Page 20: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

GIS Tools for Hadoop Walkthrough

Page 21: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Amazon Elastic Map Reduce

(Amazon EMR)

• Easy to use

• Elastic

• Low cost

• Reliable

• Secure

• Flexible

Page 22: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Amazon EMR Data Stores

• Amazon S3

• HDFS

• Amazon Redshift

• Amazon Glacier

• Amazon RDS

• Amazon DynamoDB

Page 23: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Amazon EMR for Geospatial Analysis

• Flexible platform to get started and grow large

• Hosted and managed by Amazon Web Services – No need for large Big Data in house infrastructure

– No need for hiring many people to maintain Hadoop

• Data ecosystem in the cloud is leveraged – Geospatial data is usually large in size

– Access to third party datasets in the same ecosystem

Page 24: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Esri Geometry

API

Esri Spatial

UDF

Connect From ArcGIS

to Hadoop using GP Geoprocessing

Tools

GIS tools for Hadoop libraries

Amazon

Elastic

MapReduce

(Amazon EMR)

Page 25: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

ArcGIS Geoprocessing Tools

• Framework – Performing analysis

– Manage geographical data

• Rich library of analysis tools

• Chaining tools to create models – Drag and drop model builder

• Developing new custom tools – Python

Page 26: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

GP Tools for AWS

• https://github.com/Esri/gptools-for-aws

• GP tools to use – Amazon EMR

– Amazon S3

• Open Source

– Apache 2.0 license

Page 27: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

GP Tools for AWS Walkthrough

Page 28: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Boto: A python Interface to AWS

• Python package

• Supports multiple AWS services – Amazon EMR

– Amazon S3

• Complete feature set needed for Amazon EMR

• Reliable Amazon S3 implementation

Page 30: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

A Real World Example

Page 31: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Putting it all together!

Geospatial analysis of log files

• Using: GP tools for AWS

• Goal: Analyze log files of a tile

base-map web service – Real life high demand web service

– Where is the most demand?

• Map visualization

Page 32: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

The Architecture

Data

Scripts/

Logs/

output

ArcGIS

Desktop

+

GP Tools for

AWS

Availability Zone #1

Amazon EMR Master Node

Amazon EMR Slave Node

AWS cloud

Page 33: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Data Files

• Structured CSV files – ~8 GB

• Data rows – Represented 1 month

– More than 700 million records

• Represents all 18 map scales – To know in which areas users are looking for details

Page 34: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

HQL Script

• External tables for data rows

• Calculations run through temp tables – Consolidate tile scales from most detailed to level

13

– Calculate points (x,y) representing each tile

– Aggregate results

– Format output as csv not tab delimited

• Ported from RDBMS operations – Adapted to Hive

Page 35: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Visualization

• Download output to local disk

• Add as a layer, set x/y for display – Set coordinate system

– Use visualization settings to cluster points

and categorize

• Use base maps

Page 36: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Demo

Page 37: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Lessons Learned

• External tables and Amazon S3

• Cluster shutdown protection

• Data – Partitioning

• Cluster sizes vs. execution time – Standard Large

– High Memory, XLarge vs Quadruple Xlarge

• Costs

Page 38: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Summary

• The value of asking Big Data spatial questions

• Hadoop is now spatially enabled – GIS Tools for Hadoop

• Boto for using Amazon EMR

• Geospatial analysts empowered – GP Tools for AWS

• Real world scenario using Amazon EMR and GP Tools

Page 39: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Q&A

Page 40: Adding Location and Geospatial Analytics to Big Data Analytics (BDT210) | AWS re:Invent 2013

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

BDT210