Big Data and Geospatial with HPCC Systems
-
Upload
hpcc-systems -
Category
Data & Analytics
-
view
684 -
download
1
Transcript of Big Data and Geospatial with HPCC Systems
![Page 1: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/1.jpg)
Big Data and Geospatial with HPCC Systems®Powered by LexisNexis Risk Solutions
Ignacio Calvo Greg McRandal
10/05/2016
![Page 2: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/2.jpg)
Concepts in Geospatial
How to use them with HPCC
Use cases
@HPCCSystems
![Page 3: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/3.jpg)
An approach to applying statistical analysis and other analytic techniques to data which has a geographical or spatial aspect
Definition
![Page 4: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/4.jpg)
![Page 5: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/5.jpg)
Origin of Geospatial
John Snow’s original map (1854), using GIS to save lives. This map was used to determine that Cholera was water-borne
![Page 6: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/6.jpg)
Need to know :
• Format
• Projection / coordinate system
Understanding the data
![Page 7: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/7.jpg)
Formats : Vector vs Raster
Vector Raster
![Page 8: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/8.jpg)
Projections are used to represent the world in ways we can process
•The Earth is round and maps are flat•Physical Maps•Computer Maps
What is a projection?
Have I seen projections before?
•Peter vs Mercator vs Winkel tripel•GPS (latitude/longitude)•Google Maps
![Page 9: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/9.jpg)
Two different projections representing the same place.
Projections
![Page 10: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/10.jpg)
WGS84•Latitude and longitude•Our best approximation of the world•Not always the best for a specific region•Not technically a projection
Projections to know about
Mercator•Many different ones, choose one based on your location•Reduces the area it covers to a simple Cartesian plane•Good near the central axis, bad far away from it :
• Web Mercator covers the whole world – good near equator, gets worse as you travel north or south
• Irish National Grid – very good for Ireland, awful anywhere else.
![Page 11: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/11.jpg)
Lies, damned lies, statistics… and maps!
*https://twitter.com/flashboy/status/641221733509373952
![Page 12: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/12.jpg)
Lies, damned lies, statistics… and maps!
Projection Woes:
A straight line in Mercator is not a straight line in WGS84
Four points convertedto WGS84
Where the lines should be
Don’t re-project polygons!
This “solution” is only good enough for visuals, not for maths.
![Page 13: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/13.jpg)
Lies, damned lies, statistics… and maps!
![Page 14: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/14.jpg)
Lies, damned lies, statistics… and maps!
Visuals don’t agree with maths: Wind and Hail.
Web Mercator WGS84
![Page 15: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/15.jpg)
Number one bug in Geospatial
*http://twcc.fr
![Page 16: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/16.jpg)
Number one bug in Geospatial
Latitude
Longitude
X
Y
LatY LonX
![Page 17: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/17.jpg)
Now I understand my data, what’s next?
Data Ingest Index Query
![Page 18: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/18.jpg)
Bringing Geospatial into HPCC
GOAL
Bring our geospatial processes into the realm of Big Data
![Page 19: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/19.jpg)
STEPS
Spatial filtering of vector geometries
Spatial operations using vector geometries
Spatial reference projection and transformation
Reading of compressed geo-raster files
Big Data
Extend HPCC and ECL to support the following main capabilities :
![Page 20: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/20.jpg)
STEPS
Big Data
Integration of open source libraries
![Page 21: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/21.jpg)
Ingesting Vector Data
It’s a CSV file.
Id Name Geometry Projection Value
1 Alice’s place
POINT (53.78925462 -6.08354321) 4326* €5,973,000
2 Bob’s place POINT (-34.78925462 7.08354321) 4326 €872,000
3 Celine’s place
POINT (102.78925462 -6.08354321) 4326 €9,324,000
* WGS84 (Lat/Lon)
3. Peril tag
2. Geocode address
1. Policy data
Data ready to ingest
![Page 22: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/22.jpg)
Ingesting Vector Data
It’s a GML / XML file.
3. Process and index
2. Parse XPATH
1. Shape data
Data ready to query
![Page 23: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/23.jpg)
Ingesting Vector Data
It’s a GML / XML file.
3. Process and index
2. Parse XPATH
1. Shape data
Data ready to query
![Page 24: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/24.jpg)
Ingesting Vector Data
It’s a GML / XML file.
3. Process and index
2. Parse XPATH
1. Shape data
Data ready to query
![Page 25: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/25.jpg)
Indexing vector data
• Outline Box: Biggest rectangle
• Boxes contain boxes
• Bottom box in the tree contains actual
geometries
• Here, 3 levels pictured
• Boxes can overlap (entries are only in one)
![Page 26: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/26.jpg)
Querying vector data
Searching an R-Tree: e.g. Finding all buildings (points) inside a flood zone (polygon)
Does the query polygon overlap our box?
Return empty list
Search our boxes’
children
Is it a leaf node?
Return all nodes
for verification
Y
N
Y
N
![Page 27: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/27.jpg)
Ingesting Raster Data
It’s a raster / TIFF file. Bitmap image
3. Process and index
2. Tile and spray
1. Raster data
Data ready to query
![Page 28: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/28.jpg)
Ingesting Raster Data
3. Process and index
2. Tile and spray
1. Raster data
Data ready to query
Tiling divides raster images into
small manageable areas of known
dimensions.
These tiles have their own
metadata:
• Bounding box
• Grid position
![Page 29: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/29.jpg)
Ingesting Raster Data
3. Process and index
2. Tile and spray
1. Raster data
Data ready to query
1. Figure out which grid position the
geometry needs
2. Extract the required pixel
3. Interrogate the pixel for its value
4. Interpret its value
5. Return to user
![Page 30: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/30.jpg)
Ingesting Raster Data
It’s a raster / TIFF file. Bitmap image
3. Process and index
2. Tile and spray
1. Raster data
Data ready to query
![Page 31: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/31.jpg)
Ingesting Raster Data
It’s a raster / TIFF file.
3. Process and index
2. Tile and spray
1. Raster data
Data ready to query
![Page 32: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/32.jpg)
Bringing it all together
*Andrew FarrellIn pursuit of perils : Geo-spatial risk analysis through HPCC Systemshttps://hpccsystems.com/resources/blog/afarrell/pursuit-perils-geo-spatial-risk-analysis-through-hpcc-systems
![Page 33: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/33.jpg)
Add even more value
![Page 34: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/34.jpg)
Add even more value
![Page 35: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/35.jpg)
Why Geospatial with HPCC?
• Efficient parallel processing
• Ability to import libraries from different languages
• Good coverage of functions and spatial predicates
• Fast ingestion
• Support for different formats
• Sub-second queries
![Page 36: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/36.jpg)
![Page 37: Big Data and Geospatial with HPCC Systems](https://reader031.fdocuments.us/reader031/viewer/2022032108/58ec86bc1a28abb32a8b4747/html5/thumbnails/37.jpg)
hpccsystems.com