Modeling & Analyzing Massive Terrain Data Sets
description
Transcript of Modeling & Analyzing Massive Terrain Data Sets
Modeling & Analyzing Massive Modeling & Analyzing Massive Terrain Data SetsTerrain Data Sets
Pankaj K. AgarwalPankaj K. Agarwal
Duke UniversityDuke University
STREAM ProjectSTREAM Project
Scalable Techniques for hi-Resolution Elevation data Analysis & Modeling
In collaboration with: Lars Arge, Helena Mitasova
Students: Andrew Danner,Thomas Molhave,Ke Yi
Diverse elevation point data: density, Diverse elevation point data: density, distribution, accuracydistribution, accuracy
1974
20011999
1995Lidar 0.15m v. accuracy; altitude 700m and 2300m
Photogrammetry 0.76m v. accuracy (5ft contours)
2004
1998
RTK-GPS 0.10m v. accuracy
Constructing Digital Elevation ModelsConstructing Digital Elevation Models
Grid, TIN, Contour DEMs
Natural feature extractionNatural feature extraction
Extracted foredunes 1999-2004 usingprofile curvature and elevation threshold
MaMaps of topo parameters: computed simultaneously with interpolation using partial derivatives of the RST function, terrain features: combined thresholds of topo parameters
Tracking evolving featuresTracking evolving featuresslip faces: slope > 30deg 1995 new slip face in 1999 2001
how to automatically track features that change some of their properties over time?
dune crests: profile curvature threshold
Hydrology AnalysisHydrology Analysis
• Flow Analysis• Watershed
hierarchy
Challenge: Massive Data SetsChallenge: Massive Data Sets
• LIDAR– NC Coastline: 200 million points – over 7 GB– Neuse River basin (NC): 500 million points – over
17 GB
• Output is also large– 10ft grid: 10GB– 5ft grid: 40GB
Approximation AlgorithmsApproximation Algorithms
• Exact computation expensiveMany practical problems are intractable
Multiple & often conflicting optimization criteria
Suffices to find a near-optimal solution
• Tunable Approximation algorithms– Tradeoff between accuracy and efficiency– User specifies tolerance
I/O-BottleneckI/O-Bottleneck• Data resides in secondary memory• Disk access is 106 times slower than main memory
access– Maximize useful data transferred with each access– Amortize large access time by transferring large
contiguous blocks of data
• I/O – not CPU – is often the bottleneck for processing massive data sets
track
magnetic surface
read/write armread/write head
• Traditional algorithms optimize CPU computation– Not sensitive to penalty of disk access
• I/O model– Memory is finite
– Data is transferred in blocks
– Complexity measured in disk blocks transferred
I/O-efficient Algorithms [AV88]I/O-efficient Algorithms [AV88]
Disk
RAM
CPU
M
B
B ~ 2K-8K
Elevation Data to Watershed Hierarchies: Elevation Data to Watershed Hierarchies: A Pipeline ApproachA Pipeline Approach
• Given a point cloud sampled from a bivariate function z=F(x,y) (elevation data)
• Goal: construct a grid DEM for z• DEM Applications
– Hydrology
– Ecology
– Viewsheds …• Many algorithms only work on grid data
Point Cloud to Grid DEMPoint Cloud to Grid DEM
Sub-meter resolution: improved structures Sub-meter resolution: improved structures representationrepresentation
2004 lidar-based 30cm resolution DSM
1999 lidar 2m resolution DSM
Binning (within cell average) sufficient for ~1-3m DEMs but interpolation producesimproved geometry representation and allows us to create higher resolution DEMs/DSMs
2004 lidar: 0.5m resolution binning
interpolated DSM
General Approach: InterpolationGeneral Approach: Interpolation
• Kriging, IDW, Splines, …
Interpolate
O(N3)z=F(x,y)
• Modern data sets can have millions of points
• Direct interpolation does not scale
N Points
Improving Interpolation with Quad TreesImproving Interpolation with Quad Trees
• Segment initial point set using a quad tree– Each quad tree leaf has a <K points (10-100)
• Interpolate each segment separately• N/K segments
– O(K3) interpolation per segment
– O(K2N) total running time
– Linear in N
• Boundary smoothness?– Use points from nearby
segments
Overview of the AlgorithmOverview of the Algorithm
• Construct quad-tree on disk using bulk loading [AAPV 01]
• For each quad-tree node, find points in neighboring segments
• Arrange points on disk for sequential interpolation– Use the algorithm by Mitasova, Mitas, Harmon
– Can use any other interpolation method
Building an External Quad TreeBuilding an External Quad Tree• Build top few levels in memory• Buffer points in lower levels• Write buffers to disk when full
K=2
Block I/O
Building a TINBuilding a TIN
TIN: Triangulated Irregular Network
Constrained DelaunayTriangulation
Developed an I/O-efficient algorithm
Builds TIN for NeuseRiver Basin (14GB) in7.5 hours
Future Work: Hierarchical DEMsFuture Work: Hierarchical DEMs
• Considerable work in graphics and GIS communities on multiresolution hierarchies for terrains– Focuses on visualization– Most algorithms perform random memory access
• Out-of-core simplification– I/O-efficient algorithms for terrain simplification (edge
contraction method)
• Feature preserving simplification– Preserves main river networks– Preserves visibility for most of the points
Coping with Noisy DataCoping with Noisy Data
• Nearly all flow models assume [O’Callaghan and Mark 84, de Berg et al. 96]
– water flows downwards– stops at a minimum
Denoising via FloodingDenoising via Flooding [Jenson and Domingue 88, Arge et al. 03][Jenson and Domingue 88, Arge et al. 03]
• Pour water uniformly on terrain• Water level rises until steady-state
River Network After FloodingRiver Network After Flooding
Sort(N) I/Os
After flooding
Denoising via FloodingDenoising via Flooding [Jenson and Domingue 88, Arge et al. 03][Jenson and Domingue 88, Arge et al. 03]
• Pour water uniformly on terrain• Water level rises until steady-state
• Use topological persistence (Edelsbrunner et al.) as importance measure to measure the importance of a minimum
• Flood low-persistence minima, preserve high-persistence minima
Topological Persistence Topological Persistence [Edelsbrunner et al. 02][Edelsbrunner et al. 02]
The Union-Find ProblemThe Union-Find Problem
• A universe of N elements: x1, x2, …, xN
• Initially N singleton sets: {x1}, {x2 }, …, {xN}
• Each set has a representative
• Maintain the partition under– Union(xi, xj) : Joins the sets containing xi and xj
– Find(xi) : Returns the representative of the set containing xi
Formulated as Batched Union-FindFormulated as Batched Union-Find• On a terrain represented as a TIN
• When reach– A minimum or maximum: do nothing– A regular point u: Issue union(u,v) for a lower neighbor v– A saddle u: let v and w be nodes from u’s two connected pieces
in its lower link Issue: find(v), find(w), union(u,v), union(u,w)
• Sequence of union/find operations can be computed in advance
lower link
Classical Union-Find AlgorithmClassical Union-Find Algorithm
d
b j a
e g
h
f l
n
m
i
s r cz k
p
representatives
d
b j a
e g
h
f l
n
m
Union(d, h) :
link-by-rank
d
b j a
e g
h
f l n
Find(n) :
path compression
m
ComplexityComplexity
• O(N α(N)) for a sequence of N union and find operations [Tarjan 75]– α(•) : Inverse Ackermann function (very slow!)
– Optimal in the worst case [Tarjan79, Fredman and Saks 89]
• Batched (Off-line) version– Entire sequence known in advance
– Can be improved to linear on RAM [Gabow and Tarjan 85]
– Not possible on a pointer machine [Tarjan79]
Our ResultsOur Results
• An I/O-efficient algorithm for the batched union-find problem using O(sort(N)) = O(N/B logM/B(N/B)) I/Os– Same as sorting– optimal in the worst case
• A simpler algorithm using O(sort(N) log(N/M)) I/Os– Implemented
• Apply to persistence computation, flooding– O(sort(N)) time
• Other applications
I/O-Efficient Batched Union-FindI/O-Efficient Batched Union-Find
• Two-stage algorithm– Convert to interval union-find
• Compute an order on the elements s.t. each union joins two adjacent sets
– Solve batched interval union-find
• Each stage formulated as a geometric problem and solved using sort(N) I/Os
Experimental ResultsExperimental Results
Traditional algorithm good as long as data fits in memory
Experimental Results:Neuse River BasinExperimental Results:Neuse River Basin
On the entire data set: EM takes 5 hours, IM fails
0.5 billion points (14GB raw data), sampled to obtain smaller data sets
Contour TreesContour Trees
Previous ResultsPrevious Results
• Directly maintain contours– O(N log N) time [van Kreveld et al. 97]
– Needs union-split-find for circular lists– Do not extend to higher dimensions
• Two sweeps by maintaining components, then merge– O(N log N) time [Carr et al. 03]
– Extend to arbitrary dimensions
Join Tree and Split TreeJoin Tree and Split Tree [Carr et al. 03][Carr et al. 03]
9
8
76
5
4
32
1
Join tree
9
8
76
5
4
32
1
Split tree
Qualified nodes
9
8
76
5
4
3
1
Join tree
9
8
76
5
4
3
1
Split tree
Final Contour TreeFinal Contour Tree
9
8
76
5
4
32
1
Join tree
9
8
76
5
4
32
1
Split tree
9
8
76
5
4
32
1
Contour tree
Hard to BATCH!
Another CharacterizationAnother CharacterizationLet w be the highest node that is a descendant of v in join treeand ancestor of u in split tree, (u, w) is a contour tree edge
9
8
76
5
4
32
1
Join tree
9
8
76
5
4
32
1
Split tree
9
8
76
5
4
32
1
Contour tree
u
vw
u
vw
u
vw
Now can BATCH!
Experimental ResultsExperimental Results
On the Neuse River Basin data set
Coping with Noise: Future WorkCoping with Noise: Future Work
• Constructing hydrologically conditioned DEMs– Formulate as an optimization problem
– Using persistent homology
– Integrating topology and geometry
– Handling flat areas
• Scalability remains a big issue• Handling anthropogenic features
Back to Pipeline: Flow ModelingBack to Pipeline: Flow Modeling
From Elevation to River NetworksFrom Elevation to River Networks
• Where does water go?• From higher elevation to lower
elevation• Single flow directions forms a tree
Sea
80
90
100
95
85
110
Drainage AreaDrainage Area
• How much area is upstream of each node?
• Each node has initial drainage area (1)
• Drainage area of internal nodes depends on drainage area of children
1
1
1
1
1
1
1
1 1
1
3
3
2
2
2
3
2
5
7
17
25
14
11
9
7
Drainage Area example (Panama)Drainage Area example (Panama)
IFSARE and SRTM based stream analysisIFSARE and SRTM based stream analysisStream and watershed boundary extraction for Panama for on-going water quality research to provide more detailed and up-to-date data.
SRTM - 7400x3600 DSM at 90m resolution for entire Panama, IFSARE - 10800x11300 DEM at 10m resolution for the Panama canal sectionUsing r.terraflow - officially released in GRASS6.2 in 2006 - streams can be extracted in 3-4 hours, tile-based approach takes days SRTM DEM for entire Panama with main rivers extracted from DEM in 3 hr
Hierarchical Watershed DecompositionHierarchical Watershed Decomposition
Watershed HierarchiesWatershed Hierarchies• Decompose a river network into a hierarchy of hydrological
units• All water in HU flows to a common outlet• Hierarchy provides tunable level of detail• Method used: Pfafstetter [VV99]• Want a solution scalable to large modern hi-res terrains
PfafstetterPfafstetter• Find main river
• Find four largest tributaries
• Label basins/interbasins
• Recurse until single path
1
1
1
1
1
1
1
1
1
1
2
2
2
3
2
7
17
25
14119
7
3
3
5
8
6
4
2
7
53
1
9
226
25 24
23
21
2227
771
75
7374
72
RecurseRecurse
Example Watershed BoundariesExample Watershed Boundaries
1
2
3
4
5
6
7
8
9
91
92
93
94
95
9697
98
99
991
992993
994
995
996
997
998999
A Complete PipelineA Complete Pipeline
ImplementationImplementation
• TPIE: C++ primitives for I/O-efficient algorithms• GRASS: Open Source GIS• Interpolation: Regularized spline with tension (in
GRASS)• Data:
– North Carolina LIDAR
• Neuse river basin: 400 million points (NC Floodmaps)
• Outer banks coastal data : 128 million points (NOAA CSC)
– USGS 30m NED
Grid Construction ResultsGrid Construction Results
Resolution (ft) 40 20 10
Grid cells x106 221 885 3542
Points x106 205 340 415
Total time 12h32m 14h46m 26h52m
Time spent(%)
Build quad tree 8.9 7.1 5.7
Find neighbors 31.6 32.4 29.1
Interpolate 58.8 58.5 59.3
Write output 0.7 2 5.9
• Neuse• Point Thinning • Adjusted
interpolation parameters
Sample Watershed ResultsSample Watershed Results
size (MB) 150 713 5819
size (mln cells) 30.8 147 396.5
total time 10m29s 58m10s 187m43s
Time spent… %
importing data 8 7 16
sorting by flow 16 15 13
building river list 31 35 29
sorting river list 19 20 19
computing labels 7 6 6
sort by grid order 14 13 12
exporting data 5 4 5
Future WorkFuture Work
•Terrain Modeling
•Terrain Analysis
Multivalent terrain modelsMultivalent terrain models
- multiple surfaces- hybrid raster and 3D vector for structured environments (urban)- voxel for unstructured environments (forested areas)
Multiple surface representation:DSM and DEM
Terrain analysis on DEMs with structuresTerrain analysis on DEMs with structuresInvestiagte multivalent elevation field representations that preserve road and stream connectivity
Dynamic DataDynamic Data
• Continuous data streams from various sensors– Maintain smart summaries of data
– Communication, energy efficient algorithms for sensor networks
• Coping with moving/deformable objects– Tracking evolving terrain features
– Kinetic algorithms that update the structure locally
Terrain AnalysisTerrain Analysis• Flow modeling
– Erosion analysis– Soil moisture modeling (collaboration with
ecologists)
• Tracking evolving features• Visibility based navigation
– Finding paths from which most of the terrain is visible
– Computing well concealed paths– Covering terrains from a few points
AcknowledgmentsAcknowledgments
• Collaborators– Lars Arge, Helena Mitasova– Andrew Danner, Thomas Molhave, Ke Yi
• Sponsored by– Army Research Office W911NF-04-1-0278
Relevant PublicationsRelevant Publications• Danner, Yi, Molhave, A., Arge, Mitasova, TerraStream: From
Elevation Data to Watershed Hierarchies, submitted for publication, 2007.
• A., Arge, Danner, From point cloud to grid DEM: A scalable approach. In Proc. 12th SSDH, 771–788, 2006.
• A., Arge, Yi, I/O-efficient construction of constrained Delaunay triangulations. In Proc. ESA, 355–366, 2005.
• A., Arge, Yi. I/O-efficient batched union-find and its applications to terrain analysis. In Proc. 22nd SoCG, 2006.
• Arge, Danner, Haverkort, Zeh. I/O-efficient hierarchical watershed decomposition of grid terrain models. In Proc. 12th SSDH, 825–844, 2006.
• Danner. I/O Efficient Algorithms and Applications in Geographic Information Systems. PhD thesis, Duke University, 2006.
Thanks!Thanks!
Future Directions – Noise RemovalFuture Directions – Noise Removal
• Bridge detection/removal
• Other flow routing methods
• Flow routing on flat surfaces
• Comparing flow networks
Mapping LIDAR point densityMapping LIDAR point density
no. of points/2m grid cell1996 0.21997 0.9 ATM II NASA1998 0.41999 1.42001 0.2 NCflood Leica2003 2.0 EARL NASA2004 15.0 SHOALS USACE2005 6.0 SHOALS USACE
pt / 2m grid cell
2004: SHOALS2001: NC Flood
1998 2004
point density at the beginning of the project current industry standard
Building an External Quad TreeBuilding an External Quad Tree
• After first scan of data:– Write top of tree to disk– Flush all buffers
• Recurse on buffers
• levels in one scan
• I/Os total
• Identifying minima likely due to noise• Don’t want to remove real features
– Persistent homology [ELZ 02]
– Computed in Sort(N) I/Os
Coping with Noisy DataCoping with Noisy Data
InterpolationInterpolation• Scan through list of points with neighbors• Interpolate each leaf separately – Use any
interpolation method– Evaluate at grid cells
– Write grid cell values as (i,j,z) as they are computed
• Sort grid cells by raster order
Interpolate Evaluate
(i, j, z)
Sort