Post on 07-Aug-2020
Introduction to Geospatial Data and Analytics on CyberGIS
Yan Liu, Anand Padmanabhan, Shaowen Wang
CyberGIS Center for Advanced Digital and Spatial StudiesCyberInfrastructure and Geospatial Information Laboratory (CIGI)Department of Geography and Geographic Information Sciences
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
April 2014
Outline
• Introduction to CyberGISo Cyberinfrastructure and Geographic Information Systems (GIS)
o CyberGIS
o CyberGIS & XSEDE
• CyberGIS Data and Analyticso Computational challenges
o CyberGIS applications
• CyberGIS Gatewayo Open service API (hands-on)
o Gateway applications (hands-on)
3
CyberInfrastructure
• “The whole is more than the sum of
its parts.”o By Aristotle in the Metaphysics
Borromean rings, after Daniel E. Atkins
Image source:
http://www.phy.ornl.gov/theory/dean/RIATG/web_pages/structure_one_pager.html
Geographic Information Systems
• “Geographic Information Systems (GIS) are simultaneously the telescope, the microscope, the computer, and the Xerox machine of regional analysis and synthesis of spatial data.” (Ronald F.
Abler 1988)
4
Cyber + GIS > Cyber | GIS
Cyber
GIS
CyberGIS Center Vision
Computing- and
Data-Intensive
Applications
and Sciences
Geospatial
Sciences and
Technologies
CyberGISAdvanced
Cyber-
infrastructure
Discovery & Innovation
Advanced Digital Technologies
7
People
• Director
• Executive committeeo Danny Powell, National Center for Supercomputing Applications
o Stephen Marshak, Director, School of Earth, Society and the Environment
o Sara McLafferty, Head, Department of Geography & Geographic Information Science
o Allen Renear, Interim Dean, Graduate School of Library and Information Science
o Brian Ross, Interim Dean, College of Liberal Arts and Sciences
o Peter Schiffer, Vice Chancellor for Research
o William Shilts, Prairie Research Institute
o Up to two additional faculty or staff user members can be added to the EC by majority vote of the Executive Committee.
• Program coordinator
• Technical coordinator
• Project manager
• Education, training and outreach coordinator
• Faculty and researcher affiliates – 70-100
• Multiple research programmers
• Multiple postdoctoral researchers
Focal Themes and Related Fields
• Sciences and technologies of CyberGIS (e.g., advanced cyberinfrastructure, computer science, computational and data science, geography and geographic information science, library and information science, mathematics, and statistics)
• Research and engineering applications of CyberGIS for enabling creative work, discovery, and innovation (e.g., agriculture, applied health sciences, atmospheric sciences, business, civil and environmental engineering, geography and geographic information science, geology, history, political science, sociology, urban and regional planning, and veterinary medicine)
• Human and societal dimensions of CyberGIS (e.g., business, communication, geography and geographic information science, industrial and enterprise systems engineering, and psychology)
• CyberGIS education, outreach, and training (all of the above fields)
NSF SI2-SSI: CyberGIS Project
$4.43 million, Year: 2010-2015
Principal Investigator
– Shaowen Wang
Project Staff
– ASU: Wenwen Li and Rob Pahle
– ORNL: Ranga Raju Vatsavai
– SDSC: Choonhan Youn
– UIUC: Yan Liu and Anand
Padmanabhan
– Graduate and undergraduate students
Industrial Partner: Esri
– Steve Kopp and Dawn Wright
10
Co-Principal Investigators
– Luc Anselin
– Budhendra Bhaduri
– Timothy Nyerges
– Nancy Wilkins-Diehr
Senior Personnel
– Michael Goodchild
– Sergio Rey
– Xuan Shi
– Marc Snir
– E. Lynn Usery
Project Manager
– Anand Padmanabhan
Chair of the Science Advisory Committee
– Michael Goodchild
CyberGIS Communities
• Science Communities• Advanced cyberinfrastructure
• Climate change impact assessment
• Emergency management
• Geographic information science
• Geography and spatial sciences
• Geosciences
• Social sciences
• Etc.
• User Communities• Biologists
• Geographers
• Geoscientists
• Social scientists
• General public
• Broad GIS users
• Etc.
11
12
CyberGIS on XSEDE
• Geographic Information Science gatewayo 2007 – present
o Science gateways program on XSEDE
• Allocationso Annual allocations awarded by XSEDE
o 8M computing hours for the academic year 2013 - 2014
Discoveries
Questions
Predictions
Grand Challenges?
14
Big Spatial Data
15
Big Spatial Simulation
Image created by Eric Shook
16
Complex Spatial Decision Making
17
Collaborative Knowledge Discovery
19
CyberGIS for What and Whom?
CyberGIS
Gateway
CyberGIS
Toolkit
20
www.opensciencegrid.org www.xsede.org
http://lakjeewa.blogspot.com/20
11/09/what-is-cloud-
computing.html
Integrated Digital and Spatial Sciences
CyberGIS
Gateway
CyberGIS
Toolkit
Space-Time Integration & Synthesis
GISolve
Middleware
21
22
23
24
http://blogs.esri.com/esri/arcgis/2013/10/01/what-is-cybergis/
Big Spatial
Data
Big Spatial
Simulation
Complex
Spatial
Decision
Making
Collaborative
Knowledge
Discovery
CyberGIS
Gateway
Yes
Maybe
Yes
Maybe
Yes
Maybe
Yes
Maybe
CyberGIS
Toolkit
Yes
Maybe
Yes
Maybe
Yes
Maybe
Yes
Maybe
GISolve
Middleware
Yes
Maybe
Yes
Maybe
Yes
Maybe
Yes
Maybe
26
Education
• Curriculum and pedagogy
• Partnerships
• Open ecosystems
Center Services
• CyberGIS Commonso Short courses
o One-on-one training
• CyberGIS Helpdesko Proposal development
o Technical consulting
• CyberGIS Infrastructureo Hardware
o SoftwareP
eo
ple
CyberGIS Data and Analytics
• Computational Challengeso Computational intensity
o Research
o Applications
• Scalable CyberGIS Data and Analyticso Data
o Scalable computing
• Performance
• Scalability
Heterogeneous
• Syntactic
• Semantic
Dynamic
• Spatial and temporal
• E.g. social media
Massive
• Produced by individuals
• Accessible to individuals
30
Large-scale
• Global coverage
Fine granularity
• Individual-level
• High-resolution
Distributed access
• Interoperability
• Privacy
• Security
Theory + Experiment + Computation + Big Data
Digital Environments
Parallel
o Used to be regarded as a way for speeding up GIS
functions and spatial analysis
o Now becoming a must for GIS and spatial analysis to
be built on
• Multi- and many-core
• GPU (graphics processing unit)
Heterogeneous architecture
Mobile
Distributed
o Service-oriented
o Clouds31
Computational Intensity Question
• What is the nature of computational intensity of geographic analysis?
o Why spatial is special?
• Comparable to
o “What is the nature of computational complexity of an algorithm?”
32
Spatial Computational Principles/Theories
Spatial
• Distribution
• Dependence
• Integration
• Representation
• Uncertainty
• Etc.
Computational
• Complexity vs. intensity
• Uncertainty vs. validity
• Performance vs. reliability
• Etc.
SC
AL
E
33
Big Geospatial Data Processing and Analysis
• 1/3 arcsec National
Elevation Dataset
• Resolution: 10m
• Size: 0.5TB
• Data operations
• Downloading
• Clipping
• Reprojection
• Transformation
• Visualization
• DEM-based analysis
• TauDEM
• 3DEP
• Multiple PBs
• NED: http://ned.usgs.gov; 3DEP: http://nationalmap.gov/3DEP/
• TauDEM: http://hydrology.usu.edu/taudem
Is Greenland bigger than US?
Scalable Reprojection• pRasterBlaster (Behzad et. al, 2012)
o A high-performance map reprojection software developed by US Geological Survey (USGS)
o Computational performance improvement
• Collaboration between USGS and CIGI@UIUC
• Parallelismo Spatial data domain decomposition
• Computational bottlenecko Load balancing
• Programming techniques
• Randomized workload distribution
• Resultso 12GB raster projection: ~200 seconds on Trestles supercomputer @ SDSC,
1024 processor cores
Scalability Issue
Profiling
Processor index
Programming for Scalability
38
N rows on P processor cores
When P is small When P is big
int load[P];
load[i] = N / P; // i = 0, 1, 2, …, N-1
load[N-1] += N % P;
Programming for ScalabilityN rows on P processor cores
When P is small When P is big
int load[P];
load[i] = N / P; // i = 0, 1, 2, …, N -1
load[j] += 1; // j = 0, 1, 2, …, (N % P) -1.
Input/Output (I/O) Bottleneck• TauDEM
o Terrain analysis using Digital Elevation Models (TauDEM)
• Scalability Improvement (Fan et. al, submitted)o Before: max DEM size – 6GB. Not scalable to the number of processors
o After: 36GB+
Programming for I/O Performance
Before
On each process:
foreach input DEM file
read spatial metadata
determine input data blocks
read data
After
On root process:
foreach input DEM file
read spatial metadata
broadcast metadata to all other processes
On earch process:
determine input data blocks
read data
CyberGIS Gateway - FluMapper
42
http://flumapper.org/
Redistricting Problem
• Redistrictingo Partitioning a group of indivisible geographic units into a smaller number
of political districts
• Redistricting as A Combinatorial Optimization problemo Objectives and constraints
• Contiguity, competitiveness, equal-population, preservation of
communities of interest and local political subdivisions, Minority
districts
o Computational complexity
• Number of possible solutions
o Stirling number of the second kind: S(n, k)
o Example: S(55, 6) = 8.7 x 1039
• Computationally intractable
o NP-hard
Computational Intractable!
1 2 4 8 16 32 64 128
256 512 1024 2048 4096 8192 … …
263
Genetic Algorithm (GA) Approach• Principles
o Evolutionary process
• “survival of the fittest”
• Iterative algorithm
o Solution population: a diverse set of initial solutions
o GA operators
• Selection, crossover, mutation, replacement
o Stopping criteria
• Solution quality
• Time or the number of iterations
• Spatial GA operatorso Solution generation
o Crossover
o Mutation
Solution Generation
• Randomization-based district formation
• Contiguity and hole-free
Neighborhood graph Growth of seeds
(a) Seeding (b) District expansion (c) Initial solution
Crossover
• Binary crossover
• Spatial crossovero Overlay of two redistricting solutions splits
o Formulation of a finer level (split-level) redistricting problem
o Solution conversion
(a) Selection of
parent solutions
(b) Overlapping
(c) Solution conversion
Mutation
• Exchange of multiple units on district boundary and
beyond
Pick units:
units selected
for changing
district
assignment
High Performance Solution:
Parallel Genetic Algorithm (Liu & Wang, 2014)
Figure 2. Asynchronous PGA framework
CyberGIS Gateway
• Hands-on: Open Service API
• Hands-on: Use Viewshed Analysis for Planning
Hands-on: Open Service API
• Open Service APIo REST Web services
o Document:
• https://wiki.cigi.illinois.edu/display/DOC/GISolve+Open+Service+API+User+
Guide
o Code:
• https://svn2.cigi.uiuc.edu:8443/open-service-api/trunk
• Usageo Develop your own Web or desktop applications that leverage CyberGIS
capabilities
o Results sharing
o Embed analysis results as map layers on Web
Tutorial Server
• tutorial.cigi.uiuc.edu
• Accounts: train1 – train49
• Access:o ssh client
• Mac/linux: ssh username@tutorial.cigi.uiuc.edu
o scp
Hands-on: Viewshed Analysis
• Locating fire lookout towerso Study area: Silver Plume City, Colorado
o Requirements
• The visible area (viewshed) from the lookout towers should be as large as
possible to cover the service area identical to the bounding box of the given
DEM (https://dl.dropbox.com/u/21798649/competition/problem1/sp_b10m.tif)
• The number of towers to be built should be four or less, and be as small as
possible (due to budget and maintenance cost reasons)
Acknowledgments
Federal Agencies Department of Energy’s Office of Science National Science Foundation
– BCS-0846655– EAR-1239603– OCI-1047916– IIS-1354329– PHY-0621704– PHY-1148698– TeraGrid/XSEDE SES070004
US Geological Survey (USGS)
Industry Environmental Systems Research Institute (Esri)
57
Acknowledgments – U of I
College of Liberal Arts and Sciences Department of Geography and Geographic
Information Science Graduate School of Library and Information
Science National Center for Supercomputing Applications Office of the Vice Chancellor for Research Prairie Research Institute School of Earth, Society, and Environment
58
Acknowledgements - CyberGIS Center
Thanks!
• Comments / Questions? o Science Gateway questions
• help@xsede.org
• Surveyo http:/bit.ly/CSUXSEDE
60