Adaptive Sampling with Topological Scores
description
Transcript of Adaptive Sampling with Topological Scores
Adaptive Sampling with Topological Scores
Dan Maljovec1, Bei Wang1, Ana Kupresanin2, Gardar Johannesson2, Valerio Pascucci1, and Peer-Timo
Bremer2
1SCI Institute, University of Utah2Lawrence Livermore National Laboratory
Select training data & run simulation
Fit a Predicting Model (PM)
Select candidates & predict response values from PM
Score candidates & select candidate to add to training data
1
2
3
4
Framework:
Choosing the “Right” Points4 Models fit by using 10 training points:
True Response Model:
Space-filling SamplingNo prior knowledge of the dataset
Where should we sample the model?
Space-filling SamplingFill the domain space as evenly as possible
Space-filling SamplingObtain responses and fit model
Space-filling Sampling1 round:
Space-filling Sampling2 rounds:
Space-filling Sampling3 rounds:
Space-filling Sampling4 rounds:
Space-filling Sampling5 rounds:
Space-filling SamplingRefit after 5 rounds of “filling voids”:
Space-filling SamplingWhat have we learned?
Initial fit Refit after adding 5 points
Topologically-Inspired Adaptive Sampling
Starting over
Topologically-Inspired Adaptive Sampling
1 round:
Topologically-Inspired Adaptive Sampling
2 rounds:
Topologically-Inspired Adaptive Sampling
3 rounds:
Topologically-Inspired Adaptive Sampling
4 rounds:
Topologically-Inspired Adaptive Sampling
5 rounds:
Topologically-Inspired Adaptive Sampling
Refit after 5 rounds of choosing the most topologically significant point:
What have we learned?
Initial fit Refit after adding 5 points
Topologically-Inspired Adaptive Sampling
Space-filling Method Adaptive Method
Comparison
Initial Fit
Space-filling Method Adaptive Method
Comparison
True Response Model
Measuring Topological SignificanceThese points were selected in topologically significant regions:
How do we measure topological impact?
Morse-Smale ComplexA partition of the domain into monotonic regions
Stable Manifolds Unstable Manifolds Morse-Smale Complex
∩ =
Computing the Morse-Smale Complex
Stable Manifolds Unstable Manifolds Morse-Smale Complex
∩ =
Traditionally:1. Compute steepest ascent/descent gradient from each point in
dataset2. Color points twice once by following ascending gradient to
maximum & another by descending gradient to minimum3. Intersect partitions to form Morse-Smale Complex
Computing the Morse-Smale ComplexTraditionally:
1.a. Compute steepest ascent gradient from each point in dataset2.a. Color points based on maximum where flow halts
Computing the Morse-Smale ComplexTraditionally:
1.b. Compute steepest descent gradient from each point in dataset2.b. Color points based on minimum where flow halts
Computing the Morse-Smale Complex
Stable Manifolds Unstable Manifolds Morse-Smale Complex
∩ =
Traditionally:3. Intersect partitions to form Morse-Smale Complex
Computing the Morse-Smale Complex
Stable Manifolds Unstable Manifolds Morse-Smale Complex
∩ =
Traditionally:1. Compute steepest ascent/descent gradient from each point in
dataset Assumes points have a neighborhood2. Color points twice once by following ascending gradient to
maximum & another by descending gradient to minimum3. Intersect partitions to form Morse-Smale Complex
Computing the Approximate Morse-Smale Complex
Stable Manifolds Unstable Manifolds Morse-Smale Complex
∩ =
Modified Algorithm:1. Compute approximate KNN for a point2. Compute steepest ascent/descent gradient from each point’s
neighborhood in dataset3. Color points by gradient flows4. Intersect partitions to form Morse-Smale Complex
0 1 2 30
1
2
3
x
y
a b
c
d
Tracking Persistence
Count the number of connected components belonging to the sub-level sets of the function
0 1 2 30
1
2
3
x
y
a b
c
d
Tracking Persistence
Every connected component has a birth
0 1 2 30
1
2
3
x
y
a b
c
d
Tracking Persistence
Every connected component has a birth
0 1 2 30
1
2
3
x
y
a b
c
d
Tracking Persistence
And every connected component has a death
0 1 2 30
1
2
3
x
y
a b
c
d
Tracking Persistence
Births and deaths of connected components are tied to critical points of the function
0 1 2 30
1
2
3
x
y
a b
c
d
Tracking Persistence
Births and deaths of connected components are tied to critical points of the function
0 1 2 30
1
2
3
x
y
a b
c
d
Tracking Persistence
0 1 2 30
1
2
3
0 1 2 30
1
2
3
x
y
birth
death
a b
c
d
Tracking Persistence
A persistence diagram plots birth-death pairings
The function value difference between the critical point pairings is what we define as the persistence of a feature
In surfaces (and higher dimension corollaries), we have saddle points
Persistence Simplification in 2D
Persistence Simplification in 2DWe can smooth away noise features by thresholding
critical point pairs above a certain persistence level
Persistence Simplification in 2D
0 1 2 30
1
2
3
x
y
birth
death
0 1 2 30
1
2
3
Bottleneck Distance
Comparing two similar functions persistence diagrams
0 1 2 30
1
2
3
0 1 2 30
1
2
3
x
y
birth
death
Bottleneck Distance
Optimization: minimize the distance between pairs of points in the two plots including the line y=x in both pairs
Bottleneck distance – the maximum distance resulting from this pairing
Select training data & run simulation
Fit a Predicting Model (PM)
Select candidates & predict response values from PM
Score candidates & select candidate to add to training data
1
2
3
4
Framework:
Selecting Initial Training DataTests use Latin Hypercube Sampling
Fitting a Statistical ModelExperimented with:• Gaussian Process Model– Sparse Online Gaussian Process (C++)
• 2 Implementations of MARS with bootstrapping– Earth (CRAN)– MDA (CRAN)
• Neural Network with bootstrapping– NNet (CRAN)
Classical ScoringActive-Learning McKay (ALM)– Sample high-frequency or low-confidence regions
Delta– Distribute samples in the range space or areas of steep
gradientExpected Improvement (EI)– Select points with high uncertainty or large discrepancy
with existing dataDistance (*DP)– Scaling factor applied to above, creating 3 new scoring
functions (ALMDP, DeltaDP, EIDP)
Topological ScoringTOPOP– Average change in persistence of each point computed
by comparing the Morse-Smale before and after inserting a candidate
TOPOB– Bottleneck distance between persistence diagram of
Morse-Smale before and after inserting a candidate point
TOPOHP– Highest persistence critical point in Morse-Smale
complex constructed from combined training and predicted candidate data
Topological ScoringAverage Change in Persistence (TOPOP)– Average change in persistence of each point
computed by comparing the Morse-Smale before and after inserting a candidate
Topological ScoringAverage Change in Persistence (TOPOP)– Average change in persistence of each point
computed by comparing the Morse-Smale before and after inserting a candidate
Topological ScoringBottleneck Distance in Persistence (TOPOB)– Bottleneck distance between persistence diagram of
Morse-Smale before and after inserting a candidate point
Topological ScoringBottleneck Distance in Persistence (TOPOB)– Bottleneck distance between persistence diagram of
Morse-Smale before and after inserting a candidate point
Topological ScoringBottleneck Distance in Persistence (TOPOB)– Bottleneck distance between persistence diagram of
Morse-Smale before and after inserting a candidate point
Topological ScoringHighest Persistence (TOPOHP)– Find highest persistence critical point in Morse-
Smale complex constructed from combined training and predicted candidate data
Initial Test Functions
Initial Test FunctionsDiagnosis:Little hope of modeling the complexity of some of these functions, especially in higher dimensions.
Test FunctionsAckley Diagonal 2 Max Diagonal 4 Max
2-Component GMM
5-Component GMM
Mis-Scaled Generalized
Rastrigin (MGR)
Salomon Generalized Rosenbrock
Comparing Model ResultsGPM Earth
NNet MDA
GPM Results
GPM Results
GPM Results
Moving ForwardLimitations
– Computation time of approximate K-nearest neighborhood is cost-prohibitive to build for every candidate point (TOPOB & TOPOP)• Morse-Smale computation is limited by number of extrema which need processed
– Topology is based on current prediction modelUnknowns
– Good parameter selection (k for KNN, parameters of prior)Future Work
– Integrate visualizations to better understand how points are being selected and gain intuition as to how to improve results
– Better neighborhood estimates– Investigate other means of measuring topological impact– Fitting local models based on topological partitioning– Using better models– Hybrid scoring or hybrid sampling techniques