Post on 06-Feb-2021
1/23
Intro Curvature Application
Topology for Data Science 3
Peter Bubenik
University of FloridaDepartment of Mathematicspeter.bubenik@ufl.edu
http://people.clas.ufl.edu/peterbubenik/
January 23, 2017
Tercera Escuela de Análisis Topológico de Datosy Topoloǵıa Estocástica
ABACUS, Estado de México
Peter Bubenik Topology for Data Science 3
peter.bubenik@ufl.eduhttp://people.clas.ufl.edu/peterbubenik/
2/23
Intro Curvature Application Motivation
Homology
Definition
Homology in degree k is given by k-cycles modulo thek-boundaries.
Peter Bubenik Topology for Data Science 3
3/23
Intro Curvature Application Motivation
Persistent homology
Main idea
Vary a parameter and keep track of when homology appears anddisappears.
radius = 0Peter Bubenik Topology for Data Science 3
3/23
Intro Curvature Application Motivation
Persistent homology
Main idea
Vary a parameter and keep track of when homology appears anddisappears.
radius = 1Peter Bubenik Topology for Data Science 3
3/23
Intro Curvature Application Motivation
Persistent homology
Main idea
Vary a parameter and keep track of when homology appears anddisappears.
radius = 2Peter Bubenik Topology for Data Science 3
3/23
Intro Curvature Application Motivation
Persistent homology
Main idea
Vary a parameter and keep track of when homology appears anddisappears.
radius = 3Peter Bubenik Topology for Data Science 3
3/23
Intro Curvature Application Motivation
Persistent homology
Main idea
Vary a parameter and keep track of when homology appears anddisappears.
radius = 4Peter Bubenik Topology for Data Science 3
3/23
Intro Curvature Application Motivation
Persistent homology
Main idea
Vary a parameter and keep track of when homology appears anddisappears.
radius = 5Peter Bubenik Topology for Data Science 3
3/23
Intro Curvature Application Motivation
Persistent homology
Main idea
Vary a parameter and keep track of when homology appears anddisappears.
radius = 6Peter Bubenik Topology for Data Science 3
3/23
Intro Curvature Application Motivation
Persistent homology
Main idea
Vary a parameter and keep track of when homology appears anddisappears.
radius = 7Peter Bubenik Topology for Data Science 3
3/23
Intro Curvature Application Motivation
Persistent homology
Main idea
Vary a parameter and keep track of when homology appears anddisappears.
radius = 8Peter Bubenik Topology for Data Science 3
3/23
Intro Curvature Application Motivation
Persistent homology
Main idea
Vary a parameter and keep track of when homology appears anddisappears.
radius = 9Peter Bubenik Topology for Data Science 3
3/23
Intro Curvature Application Motivation
Persistent homology
Main idea
Vary a parameter and keep track of when homology appears anddisappears.
radius = 10Peter Bubenik Topology for Data Science 3
3/23
Intro Curvature Application Motivation
Persistent homology
Main idea
Vary a parameter and keep track of when homology appears anddisappears.
radius = 11Peter Bubenik Topology for Data Science 3
4/23
Intro Curvature Application Motivation
Barcode and Persistence Landscapes
Barcode:
0 2 4 6 8 10 12 14
Convert to Persistence Landscape:
2 4 6 8 10 12 14
2
4
6
0
λ1
λ2
λ3
λk = 0,
for k ≥ 4
Peter Bubenik Topology for Data Science 3
5/23
Intro Curvature Application Motivation
Persistent homology of sampled points
Peter Bubenik Topology for Data Science 3
6/23
Intro Curvature Application Setup Theory Computations Statistics & Machine Learning
Short bars
Question
Can we understand the small bars in terms of the underlyinggeometry – specifically curvature?
This is joint work in progress with
Dhruv Patel (Univ of Florida)
Benjamin Whittle (Univ of Florida)
Peter Bubenik Topology for Data Science 3
7/23
Intro Curvature Application Setup Theory Computations Statistics & Machine Learning
Metric geometry
Curvature in a metric space, M
compare triangles in M with triangles in certain spaces
Model spaces of constant curvature K
K = −1: Hyperbolic planeK = 0: Euclidean plane
K = 1: Sphere of radius 1
Assumptions:
sample points independently
from a uniform density
on a unit disk of constant curvature
Peter Bubenik Topology for Data Science 3
8/23
Intro Curvature Application Setup Theory Computations Statistics & Machine Learning
Čech complex
Acute triangles 7→ persistent H1 in the Čech complex
Asymptotically almost all H1 is of this form.
Peter Bubenik Topology for Data Science 3
8/23
Intro Curvature Application Setup Theory Computations Statistics & Machine Learning
Čech complex
Acute triangles 7→ persistent H1 in the Čech complex
Asymptotically almost all H1 is of this form.
Peter Bubenik Topology for Data Science 3
8/23
Intro Curvature Application Setup Theory Computations Statistics & Machine Learning
Čech complex
Acute triangles 7→ persistent H1 in the Čech complex
Asymptotically almost all H1 is of this form.
Peter Bubenik Topology for Data Science 3
8/23
Intro Curvature Application Setup Theory Computations Statistics & Machine Learning
Čech complex
Acute triangles 7→ persistent H1 in the Čech complex
Asymptotically almost all H1 is of this form.
Peter Bubenik Topology for Data Science 3
8/23
Intro Curvature Application Setup Theory Computations Statistics & Machine Learning
Čech complex
Acute triangles 7→ persistent H1 in the Čech complex
Asymptotically almost all H1 is of this form.
Peter Bubenik Topology for Data Science 3
9/23
Intro Curvature Application Setup Theory Computations Statistics & Machine Learning
Čech complex
The most persistent such H1 arises from equilateral triangles.
Consider equilateral triangles with circumcircle of radius 1.
Hyperbolic: death/birth ≈ 1.119Euclidean: death/birth = 2/
√3 ≈ 1.155
Spherical: death/birth ≈ 1.225
Peter Bubenik Topology for Data Science 3
10/23
Intro Curvature Application Setup Theory Computations Statistics & Machine Learning
Points sampled from unit disks
Sample 1000 points
−1.0 −0.5 0.0 0.5 1.0
−1.0
−0.5
0.0
0.5
1.0
x
y
−1.0 −0.5 0.0 0.5 1.0
−1.0
−0.5
0.0
0.5
1.0
x
y
Peter Bubenik Topology for Data Science 3
11/23
Intro Curvature Application Setup Theory Computations Statistics & Machine Learning
Average Landscapes
0 50 100 150 200
05
1015
2025
Average PL in degree 1 for hyperbolic
Index
Peter Bubenik Topology for Data Science 3
11/23
Intro Curvature Application Setup Theory Computations Statistics & Machine Learning
Average Landscapes
0 50 100 150 200
05
1015
20
Average PL in degree 1 for euclidean
Index
Peter Bubenik Topology for Data Science 3
11/23
Intro Curvature Application Setup Theory Computations Statistics & Machine Learning
Average Landscapes
0 50 100 150 200
05
1015
20
Average PL in degree 1 for spherical
Index
Peter Bubenik Topology for Data Science 3
12/23
Intro Curvature Application Setup Theory Computations Statistics & Machine Learning
Differences in Average Landscapes
0 50 100 150 200
−10
12
34
5
hyperbolic − euclidean in degree 1
Index
0 50 100 150 200−1
01
2
euclidean − spherical in degree 1
Index
Peter Bubenik Topology for Data Science 3
13/23
Intro Curvature Application Setup Theory Computations Statistics & Machine Learning
Classification
100 samples from each of hyperbolic, euclidean and spherical
Classify using SVM and 10-fold cross validation
Classification accuracy
Using degree 0: 100% Using degree 1: 87%
Peter Bubenik Topology for Data Science 3
14/23
Intro Curvature Application Hippocampus Summary
Alzheimers Disease Neuroimaging Initiative (ADNI)
Joint work in progress with Ulrich Bauer (TU Munich), and RolandKwitt (Salzburg).
The data:995 left and right (paired) hippocampi consisting of
1 284 Normal
2 307 Mild Cognitive Impairment
3 178 Late Mild Cognitive Impairment
4 226 Alzheimer’s Disease (AD)
Each hippocampus converted to a 32× 32× 32 binary cubical grid.
Peter Bubenik Topology for Data Science 3
15/23
Intro Curvature Application Hippocampus Summary
Left Hippocampi
Peter Bubenik Topology for Data Science 3
15/23
Intro Curvature Application Hippocampus Summary
Left Hippocampi
Peter Bubenik Topology for Data Science 3
15/23
Intro Curvature Application Hippocampus Summary
Left Hippocampi
Peter Bubenik Topology for Data Science 3
15/23
Intro Curvature Application Hippocampus Summary
Left Hippocampi
Peter Bubenik Topology for Data Science 3
15/23
Intro Curvature Application Hippocampus Summary
Left Hippocampi
Peter Bubenik Topology for Data Science 3
15/23
Intro Curvature Application Hippocampus Summary
Left Hippocampi
Peter Bubenik Topology for Data Science 3
15/23
Intro Curvature Application Hippocampus Summary
Left Hippocampi
Peter Bubenik Topology for Data Science 3
15/23
Intro Curvature Application Hippocampus Summary
Left Hippocampi
Peter Bubenik Topology for Data Science 3
16/23
Intro Curvature Application Hippocampus Summary
Persistent homology transform
Theorem (Turner, Mukherjee, Boyer (2014))
For a surface in R3, persistent homology of sublevel sets in alldirections is a sufficient statistic.
Our approach:
filter each hippocampus in 144 directions
calculate persistent homology
convert to persistence landscape
concatenate
Peter Bubenik Topology for Data Science 3
17/23
Intro Curvature Application Hippocampus Summary
Persistence Landscape Transform
0 5000 10000 15000 20000 25000 30000
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Index
0
Peter Bubenik Topology for Data Science 3
17/23
Intro Curvature Application Hippocampus Summary
Persistence Landscape Transform
0 5000 10000 15000 20000 25000 30000
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Index
0
Peter Bubenik Topology for Data Science 3
17/23
Intro Curvature Application Hippocampus Summary
Persistence Landscape Transform
0 5000 10000 15000 20000 25000 30000
0.0
0.5
1.0
1.5
2.0
Index
0
Peter Bubenik Topology for Data Science 3
17/23
Intro Curvature Application Hippocampus Summary
Persistence Landscape Transform
0 5000 10000 15000 20000 25000 30000
0.0
0.5
1.0
1.5
2.0
2.5
Index
0
Peter Bubenik Topology for Data Science 3
17/23
Intro Curvature Application Hippocampus Summary
Persistence Landscape Transform
0 5000 10000 15000 20000 25000 30000
0.0
0.5
1.0
1.5
2.0
2.5
Index
0
Peter Bubenik Topology for Data Science 3
17/23
Intro Curvature Application Hippocampus Summary
Persistence Landscape Transform
0 5000 10000 15000 20000 25000 30000
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Index
0
Peter Bubenik Topology for Data Science 3
17/23
Intro Curvature Application Hippocampus Summary
Persistence Landscape Transform
0 5000 10000 15000 20000 25000 30000
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Index
0
Peter Bubenik Topology for Data Science 3
17/23
Intro Curvature Application Hippocampus Summary
Persistence Landscape Transform
0 5000 10000 15000 20000 25000 30000
0.0
0.5
1.0
1.5
2.0
2.5
Index
0
Peter Bubenik Topology for Data Science 3
18/23
Intro Curvature Application Hippocampus Summary
Average Landscapes
0 5000 10000 15000 20000 25000 30000
0.0
0.5
1.0
1.5
Average PL in degree 0 for Normal
Index
0
Peter Bubenik Topology for Data Science 3
18/23
Intro Curvature Application Hippocampus Summary
Average Landscapes
0 5000 10000 15000 20000 25000 30000
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Average PL in degree 0 for AD
Index
0
Peter Bubenik Topology for Data Science 3
19/23
Intro Curvature Application Hippocampus Summary
Average Landscape Difference
0 5000 10000 15000 20000 25000 30000
−0.1
0.0
0.1
0.2
AD − Normal in degree 0
Index
0
Is this difference significant? Permutation test: Yes (p val 0.000)
Peter Bubenik Topology for Data Science 3
19/23
Intro Curvature Application Hippocampus Summary
Average Landscape Difference
0 5000 10000 15000 20000 25000 30000
−0.1
0.0
0.1
0.2
AD − Normal in degree 0
Index
0
Is this difference significant?
Permutation test: Yes (p val 0.000)
Peter Bubenik Topology for Data Science 3
19/23
Intro Curvature Application Hippocampus Summary
Average Landscape Difference
0 5000 10000 15000 20000 25000 30000
−0.1
0.0
0.1
0.2
AD − Normal in degree 0
Index
0
Is this difference significant? Permutation test: Yes (p val 0.000)
Peter Bubenik Topology for Data Science 3
20/23
Intro Curvature Application Hippocampus Summary
Principal Components Analysis
PCA for PL in degree 0
−50 −40 −30 −20 −10 0 10−20
−10
0 1
0 2
0 3
0 4
0
−30−20
−10 0
10 20
3
pca1
pca2pc
a3
1
11
1
1 111
1
1
11
1 1
1
11
111
1
1111 11
1111
111
11
111
11
1
1
11
11
1
1
11
1
1
111 11
1
11
111
11
11
1 1
1
1
1
11
1
1
11
11
11
11
1
11
111 1
1
1
1
1
1
1
11
1
1
1
1
1
1
11
11111
1
1
1
1
1
1
1
111
1
111
11
1 111
11
11 11
1
11
1
11
1
1
111111
11
11
1111
11
11
1
11
111
11
1
1
1
11
1
1
1
1
1
1
1
1
11
1
1
1
11 1
1
1
1
11
1
11111
1
1
111
1
1
11
1
11
1111
1
11
111
11
1 11
1
1
11 111
1
11
1
1
1
1
1
11
11
1
11
1
1 111
11
11
1 1
11
1
1
11
1
1
11
11
1
111 1
1 14
4 4
44
4 4
4
4
4
4
44
4
4
4
4
4
4
44
4
44
4
444
4
44
4
4
44
4
4
4
44
4
4 4
4 44
44
44 4
4
4
4
44
44
4
4
4
4
44
4
4
4 44
4
444
4
44
4 44
4
4
4 44
4
44
4
4
4
4
4
4 4
444
44
44
4
4
44
44
4
4
4
4
4
4
444
44
4
4
44
4
4
4444
44
4
4
4
4
444
4
4
4
4
4
4
4 4
44
4
4
4
4
4
4
44
4
444
444
44
4
4
4
4
4
4
44
44 444
4
4
4444
4
4
44 4
44
4
44
44
4444
444
4
4 4
444
4
44
4
44
4
4
4
4
4
4
4
4
4
4
4
4
Peter Bubenik Topology for Data Science 3
21/23
Intro Curvature Application Hippocampus Summary
Support Vector Machine on PCA coordinates
Peter Bubenik Topology for Data Science 3
22/23
Intro Curvature Application Hippocampus Summary
Classification on Landscape coordinates
Support vector classification with 10-fold cross validation:
truepred Normal Alzheimer’s Disease
Normal 232 83Alzheimer’s Disease 52 143
Prediction accuracy: 73%
Peter Bubenik Topology for Data Science 3
23/23
Intro Curvature Application Hippocampus Summary
Topological Data Analysis Summary
DataGeometricstructure
SummaryStatistics
& MachineLearning
Encode
Topology
Peter Bubenik Topology for Data Science 3
IntroMotivation
CurvatureSetupTheoryComputationsStatistics & Machine Learning
ApplicationHippocampusSummary