Adaptive Sampling with Topological Scores

Adaptive Sampling with Topological Scores

Dan Maljovec1, Bei Wang1, Ana Kupresanin2, Gardar Johannesson2, Valerio Pascucci1, and Peer-Timo

Bremer2

1SCI Institute, University of Utah2Lawrence Livermore National Laboratory

Select training data & run simulation

Fit a Predicting Model (PM)

Select candidates & predict response values from PM

Score candidates & select candidate to add to training data

1

2

3

4

Framework:

Choosing the “Right” Points4 Models fit by using 10 training points:

True Response Model:

Space-filling SamplingNo prior knowledge of the dataset

Where should we sample the model?

Space-filling SamplingFill the domain space as evenly as possible

Space-filling SamplingObtain responses and fit model

Space-filling Sampling1 round:

Space-filling Sampling2 rounds:

Space-filling SamplingRefit after 5 rounds of “filling voids”:

Space-filling SamplingWhat have we learned?

Initial fit Refit after adding 5 points

Topologically-Inspired Adaptive Sampling

Starting over


1 round:


2 rounds:


3 rounds:


4 rounds:


5 rounds:


Refit after 5 rounds of choosing the most topologically significant point:

What have we learned?

Initial fit Refit after adding 5 points


Space-filling Method Adaptive Method

Comparison

Initial Fit

Space-filling Method Adaptive Method

Comparison

True Response Model

Measuring Topological SignificanceThese points were selected in topologically significant regions:

How do we measure topological impact?

Morse-Smale ComplexA partition of the domain into monotonic regions

Stable Manifolds Unstable Manifolds Morse-Smale Complex

∩ =

Computing the Morse-Smale Complex


∩ =

Traditionally:1. Compute steepest ascent/descent gradient from each point in

dataset2. Color points twice once by following ascending gradient to

maximum & another by descending gradient to minimum3. Intersect partitions to form Morse-Smale Complex

Computing the Morse-Smale ComplexTraditionally:

1.a. Compute steepest ascent gradient from each point in dataset2.a. Color points based on maximum where flow halts

Computing the Morse-Smale ComplexTraditionally:

1.b. Compute steepest descent gradient from each point in dataset2.b. Color points based on minimum where flow halts



∩ =

Traditionally:3. Intersect partitions to form Morse-Smale Complex



∩ =

Traditionally:1. Compute steepest ascent/descent gradient from each point in

dataset Assumes points have a neighborhood2. Color points twice once by following ascending gradient to

maximum & another by descending gradient to minimum3. Intersect partitions to form Morse-Smale Complex

Computing the Approximate Morse-Smale Complex


∩ =

Modified Algorithm:1. Compute approximate KNN for a point2. Compute steepest ascent/descent gradient from each point’s

neighborhood in dataset3. Color points by gradient flows4. Intersect partitions to form Morse-Smale Complex

0 1 2 30

1

2

3

x

y

a b

c

d

Tracking Persistence

Count the number of connected components belonging to the sub-level sets of the function

0 1 2 30

1

2

3

x

y

a b

c

d


Every connected component has a birth

0 1 2 30

1

2

3

x

y

a b

c

d


And every connected component has a death

0 1 2 30

1

2

3

x

y

a b

c

d


Births and deaths of connected components are tied to critical points of the function

0 1 2 30

1

2

3

x

y

a b

c

d


0 1 2 30

1

2

3

0 1 2 30

1

2

3

x

y

birth

death

a b

c

d


A persistence diagram plots birth-death pairings

The function value difference between the critical point pairings is what we define as the persistence of a feature

In surfaces (and higher dimension corollaries), we have saddle points

Persistence Simplification in 2D

Persistence Simplification in 2DWe can smooth away noise features by thresholding

critical point pairs above a certain persistence level

Persistence Simplification in 2D

0 1 2 30

1

2

3

x

y

birth

death

0 1 2 30

1

2

3

Bottleneck Distance

Comparing two similar functions persistence diagrams

0 1 2 30

1

2

3

0 1 2 30

1

2

3

x

y

birth

death

Bottleneck Distance

Optimization: minimize the distance between pairs of points in the two plots including the line y=x in both pairs

Bottleneck distance – the maximum distance resulting from this pairing

Select training data & run simulation

Fit a Predicting Model (PM)

Select candidates & predict response values from PM

Score candidates & select candidate to add to training data

1

2

3

4

Framework:

Selecting Initial Training DataTests use Latin Hypercube Sampling

Fitting a Statistical ModelExperimented with:• Gaussian Process Model– Sparse Online Gaussian Process (C++)

• 2 Implementations of MARS with bootstrapping– Earth (CRAN)– MDA (CRAN)

• Neural Network with bootstrapping– NNet (CRAN)

Classical ScoringActive-Learning McKay (ALM)– Sample high-frequency or low-confidence regions

Delta– Distribute samples in the range space or areas of steep

gradientExpected Improvement (EI)– Select points with high uncertainty or large discrepancy

with existing dataDistance (*DP)– Scaling factor applied to above, creating 3 new scoring

functions (ALMDP, DeltaDP, EIDP)

Topological ScoringTOPOP– Average change in persistence of each point computed

by comparing the Morse-Smale before and after inserting a candidate

TOPOB– Bottleneck distance between persistence diagram of

Morse-Smale before and after inserting a candidate point

TOPOHP– Highest persistence critical point in Morse-Smale

complex constructed from combined training and predicted candidate data

Topological ScoringAverage Change in Persistence (TOPOP)– Average change in persistence of each point

computed by comparing the Morse-Smale before and after inserting a candidate

Topological ScoringBottleneck Distance in Persistence (TOPOB)– Bottleneck distance between persistence diagram of

Morse-Smale before and after inserting a candidate point

Topological ScoringHighest Persistence (TOPOHP)– Find highest persistence critical point in Morse-

Smale complex constructed from combined training and predicted candidate data

Initial Test Functions

Initial Test FunctionsDiagnosis:Little hope of modeling the complexity of some of these functions, especially in higher dimensions.

Test FunctionsAckley Diagonal 2 Max Diagonal 4 Max

2-Component GMM

5-Component GMM

Mis-Scaled Generalized

Rastrigin (MGR)

Salomon Generalized Rosenbrock

Comparing Model ResultsGPM Earth

NNet MDA

GPM Results

Moving ForwardLimitations

– Computation time of approximate K-nearest neighborhood is cost-prohibitive to build for every candidate point (TOPOB & TOPOP)• Morse-Smale computation is limited by number of extrema which need processed

– Topology is based on current prediction modelUnknowns

– Good parameter selection (k for KNN, parameters of prior)Future Work

– Integrate visualizations to better understand how points are being selected and gain intuition as to how to improve results

– Better neighborhood estimates– Investigate other means of measuring topological impact– Fitting local models based on topological partitioning– Using better models– Hybrid scoring or hybrid sampling techniques

Adaptive Sampling with Topological Scores

Documents

Transcript of Adaptive Sampling with Topological Scores