SpaceCurve - Integrating with Hadoop
-
Upload
spacecurve -
Category
Technology
-
view
209 -
download
0
Transcript of SpaceCurve - Integrating with Hadoop
© 2015 SpaceCurve, Inc. Confidential. | 1!
© 2015 SpaceCurve, Inc. Confidential. | 2!
Spatial DataHadoop EcosystemSpaceCurve’s Spatial Data PlatformIntegrating with Hadoop
© 2015 SpaceCurve, Inc. Confidential. | 3!
© 2015 SpaceCurve, Inc. Confidential. | 4!
• Largest datasets are geospatial in nature– Daily generation of petabytes of data– Most is not used or simply discarded
• Proliferation of mobile platforms, sensors and IoT– More geospatial data will be generated in real-time
• Typical big data solutions can scale to ingest and store vast quantities of data– But these are not designed for real-time,
geospatial data
© 2015 SpaceCurve, Inc. Confidential. | 5!
Devices > PeopleIn 2008, # of internet devices ���exceeded # of people on earth
20 - 50 BillionEstimated # of connected devices by 2020
80% of all datahas spatial attributes*
90% of all mobiledata is location aware*���*According to Gartner
© 2015 SpaceCurve, Inc. Confidential. | 6!
ü Mobile Platformsü Operational Intelligence
ü Sensored World/Digital Businessü Context Rich Autonomous Systems
ü Smart Machines/M2M
Source: Gartner Technology Trends 2015
© 2015 SpaceCurve, Inc. Confidential. | 7!
THE WORLD IS A STATIC MAP
CAPTURING THE MOTION OF THINGS
���
REMOTE CONTROL OF THINGS
THINGS TALK TO EACH OTHER���
������������
THINGS BEHAVE INTELLIGENTLY
���������
Map coordinates of points of interest cataloged and described on the Internet.
Packages have passive sensors, we can track on web and know where they passed checkpoints.
UAVs used as remote sensing platforms for emergency response.
Aircraft optimize fuel consumption in real-time using data from internal and external sensor networks.
Large fleets of autonomous vehicles adapting to weather conditions and traffic congestion.EX
AM
PLE
S
© 2015 SpaceCurve, Inc. Confidential. | 8!
© 2015 SpaceCurve, Inc. Confidential. | 9!
• Hadoop’s open source platform has become synonymous ���with big data processing
• Core ecosystem:
– Distributed file system for data storage (HDFS)
– Distributed processing of data at scale (MapReduce)– Batch-oriented job execution
• Hadoop-based solutions excel at:
– Ingesting and data warehousing multiple sources of data– Creating and updating analytical dashboards on a weekly, daily or
hourly basis– Providing insights from historical data that apply to future
scenarios
© 2015 SpaceCurve, Inc. Confidential. | 10!
• Hadoop ecosystem can scale to geospatial storage requirements• HDFS not efficient for organizing and analyzing these data models as:
– Geospatial data does not have a predictable, uniform distribution– Hash functions can transform unpredictable, non-uniform
distributions do not preserve nor expose geospatial biases and relationships efficiently
• Results:– Reduction in parallelism and efficiency of geospatial analysis
– Inability to implement computational geometry needed for geospatial analytics
© 2015 SpaceCurve, Inc. Confidential. | 11!
© 2015 SpaceCurve, Inc. Confidential. | 12!
CONTINUOUS HIGH-VELOCITY data ingestion rates are far beyond the limits of traditional spatial analysis platforms.
SPATIAL ANALYTICS required for high-value Internet of Everything ���
applications are not supportable on popular big data platforms.
REAL-TIME operational analysis requirements preclude the use ���
of batch-oriented platforms.
DATA VOLUME greatly exceeds capacity of platforms designed for real-time
analysis of human-generated sources.
© 2015 SpaceCurve, Inc. Confidential. | 13!
• SpaceCurve has created the first purpose-built ���platform from the ground up:– Designed for organizing multiple streams of very large scale geospatial
data– Optimized for analyzing data in real-time
– Eliminates limitations on geospatial data inherent in other platforms
• The SpaceCurve platform makes it possible to:
– Collect and fuse multiple sources of data in real-time and immediately streaming it to an application
– Allow continuous queries and analytics to be run with second and sub-second responses
– Provide insights from real-time data that can apply to current, immediate scenarios
© 2015 SpaceCurve, Inc. Confidential. | 14!
CONTINUOUS HIGH-VELOCITY
INGESTION
COMPLEX SPATIAL DATA TYPES & OPERATIONS
EXTREME DATA���VOLUMES
REAL-TIME QUERY EXECUTION &
ANALYSIS
© 2015 SpaceCurve, Inc. Confidential. | 15!
© 2015 SpaceCurve, Inc. Confidential. | 16!
• Integration at the HDFS layer• Enables all current systems and tools to be
utilized in their normal workflows• Leverages existing investments and enables
real-time geospatial use cases
• Build combined workflows that operate in parallel or where Hadoop components can call out queries into SpaceCurve
© 2015 SpaceCurve, Inc. Confidential. | 17!
• Additional resources can be found below:– Github – https://github.com/SpaceCurve/hadoop
• This resource outlines the mechanics of export/import between SpaceCurve and Hadoop and includes a step-by-step tutorial using California earthquake data
– SpaceCurve VM – available upon request• This resource lets a user install the SpaceCurve system
loaded with sample data and use SpaceCurve SQL to query the data
© 2015 SpaceCurve, Inc. Confidential. | 18!
ESRI Tools
HDFS
MapReduce
Hive
GeoJSON
Mapper
Reducer
Hive SQL
SpaceCurve
HTTP/JSON
Hadoop Ecosystem
© 2015 SpaceCurve, Inc. Confidential. | 19!