Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is...
Transcript of Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is...
![Page 1: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/1.jpg)
Open Source Software for TDA ACM-BCB Workshop on TDAOctober 2, 2016
by Svetlana Lockwood
![Page 2: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/2.jpg)
Topological Data Analysis1. Persistence-Way
• Topological analysis using persistent homology
• Finds topological invariants in data (# of connected components, enclosed voids, etc.)
𝛽0 = 1
𝛽1 = 0
𝛽2 = 1
𝛽0 = 1
𝛽1 = 2
𝛽2 = 1
![Page 3: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/3.jpg)
Topological Data Analysis1. Persistence-Way
• Topological analysis using persistent homology
• Finds topological invariants in data (# of connected components, enclosed voids, etc.)
2. Mapper-Way
• Apply a filter function to project data onto a lower dimensional space
• Performs partial clustering in the level sets
𝛽0 = 1
𝛽1 = 0
𝛽2 = 1
𝛽0 = 1
𝛽1 = 2
𝛽2 = 1
![Page 4: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/4.jpg)
TDA: the Persistence-Way (# 1)
• A number of free software has appeared recently
• R package – “TDA”
• A number of benefits:
• Familiar R environment
• Implements 2 types of representation (barcodes & birth-death)
• R interface to efficient C++ libraries of GUDHI, Dionysus and PHAT
![Page 5: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/5.jpg)
TDA: the Persistence-Way (# 1)
• TDA package for R is developed by
• Brittany T. Fasy, Jisu Kim, Fabrizio Lecci, Clement Maria, Vincent Rouvreau
• Some of examples from:
• Fasy, Brittany Terese, Jisu Kim, Fabrizio Lecci, and Clément Maria. "Introduction to the R package TDA." arXiv preprint arXiv:1411.1830 (2014).
• Kim, Jisu. "Tutorial on the R package TDA."
![Page 6: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/6.jpg)
TDA: the Persistence-Way (# 1)
• Goal: to discover underlying shape of data
![Page 7: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/7.jpg)
TDA: the Persistence-Way (# 1)
• Goal: to discover underlying shape of data
Data
Ghrist, R., 2008. Barcodes: the persistent topology of data.
![Page 8: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/8.jpg)
TDA: the Persistence-Way (# 1)• Goal: to discover underlying shape of data
Data
Topological Features
Ghrist, R., 2008. Barcodes: the persistent topology of data.
![Page 9: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/9.jpg)
TDA: the Persistence-Way (# 1)• Goal: to discover underlying shape of data
Data
Topological Features
Ghrist, R., 2008. Barcodes: the persistent topology of data.
• (switch to R)
![Page 10: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/10.jpg)
Plasmids Data
(switch to R)
• Plasmids are mobile elements
• Exchange genetic material
• 831 plasmids (see table)
• Original data: 831 plasmids by 81898 features
• Computed pairwise genetic distance 831 x 831 matrix
• Want to see if there is any “interesting” structure
Pictures adapted from http://www.scienceprofonline.com
Subgroup Count
1. Alpha 159
2. Beta 85
3. Gamma 519
4. Delta/epsilon 68
Total plasmids 831
![Page 11: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/11.jpg)
Plasmids Data
351
471
292
570
![Page 12: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/12.jpg)
Plasmids Data
351
471
292
570
![Page 13: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/13.jpg)
Plasmids Data
351
471
292
570
![Page 14: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/14.jpg)
Plasmids Data
351
471
292
570
![Page 15: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/15.jpg)
Plasmids Data
351
471
292
570
![Page 16: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/16.jpg)
Other open source software is available for computing persistent homology
So
ftw
are
Inst
all
ati
on
Co
mp
lex
Bo
un
da
ry
ma
trix
Ba
rco
de
s
Vis
ua
liza
tio
n
Da
ta S
et
Siz
e
Ea
se o
f U
se
JavaPlex small easy
Perseus small easy
Dionysus -- -- medium medium
DIPHA -- large hard
GUDHI -- -- large hard
arxiv 2015, N. Otter, M. A. Porter, U. Tillmann, P. Grindrod, H. A. Harrington
Other Software For Persistent Homology
Interface to Matlab/Octave
![Page 17: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/17.jpg)
TDA: the Mapper-Way (# 2)
![Page 18: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/18.jpg)
TDA: the Mapper-Way (# 2)• Apply a filter function to project data
onto a lower dimensional space
![Page 19: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/19.jpg)
TDA: the Mapper-Way (# 2)• Apply a filter function to project data
onto a lower dimensional space
• Performs partial clustering in the level sets using standard clustering algorithms to subsets of the original data
![Page 20: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/20.jpg)
TDA: the Mapper-Way (# 2)• Apply a filter function to project data
onto a lower dimensional space
• Performs partial clustering in the level sets using standard clustering algorithms to subsets of the original data
• Goal: to understand the interaction of the partial clusters formed in this way with each other
![Page 21: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/21.jpg)
TDA: the Mapper-Way (# 2)• Apply a filter function to project data
onto a lower dimensional space
• Performs partial clustering in the level sets using standard clustering algorithms to subsets of the original data
• Goal: to understand the interaction of the partial clusters formed in this way with each other
• A few open source software exists
• However all have some limitations
![Page 22: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/22.jpg)
TDA: the Mapper-Way (# 2)• I’ll present Python-based version
developed by MLWave & examples from https://github.com/MLWave/kepler-mapper
![Page 23: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/23.jpg)
TDA: the Mapper-Way (# 2)• I’ll present Python-based version
developed by MLWave & examples from https://github.com/MLWave/kepler-mapper
• Pros:• Simple programming interface• Makes use of existing python ML
libraries• Nice visualizations
• Cons:• Limited coloring• Not completely automated
![Page 24: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/24.jpg)
Python Mappers: Prerequisites
• I highly recommend installing Anaconda
• Saves a lot of troubles
• Comes with SciPy, NumPy, scikit-learn
• Includes Python IDE and package manager (pip)
• Copy km.py from MLWave into Anaconda Lib folder
![Page 25: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/25.jpg)
Intro Mapper Example: MNIST digitsIntro example from MLWave
• The MNIST database of handwritten digits
• Thousands of digits
![Page 26: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/26.jpg)
Intro Mapper Example: MNIST digits
(switch to python)
Intro example from MLWave
• The MNIST database of handwritten digits
• Thousands of digits
• Each digit is represented by 8x8 pixel image
• Goal: cluster handwritten digits according to their value
![Page 27: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/27.jpg)
Plasmids NetworkOverlap – 10%
![Page 28: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/28.jpg)
Plasmids NetworkOverlap – 30%
![Page 29: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/29.jpg)
Plasmids NetworkOverlap – 50%
![Page 30: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/30.jpg)
Plasmids NetworkOverlap – 70%
![Page 31: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/31.jpg)
Plasmids NetworkOverlap – 90%
![Page 32: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/32.jpg)
Other Mapper Software
• Mapper by Daniel Müllner
• Installation and the list of dependencies
• http://danifold.net/mapper/installation/
• Website also contains Mapperdocumentation
• Nice GUI (show)
• More complex
![Page 33: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/33.jpg)
Other Mapper Software
• R package “TDAmapper”
• A walkthrough and a tutorial by Frederic Chazal and Bertrand Michel at
• http://www.lsta.upmc.fr/michelb/Enseignements/TDA/Mapper_solutions.html
• Familiar R environment
• Visualizations are somewhat limited (show)
![Page 34: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/34.jpg)
References1. Fasy, Brittany Terese, Jisu Kim, Fabrizio Lecci, and Clément Maria.
"Introduction to the R package TDA." arXiv preprint arXiv:1411.1830 (2014).
2. Kim, Jisu. "Tutorial on the R package TDA.“
3. Daniel Muller’s Mapper http://danifold.net/mapper/installation/
4. TDAmapper in Rhttp://www.lsta.upmc.fr/michelb/Enseignements/TDA/Mapper_solutions.html
5. Python Mapper by MLWave https://github.com/MLWave/kepler-mapper
6. Ghrist, R., 2008. Barcodes: the persistent topology of data. Bulletin of the American Mathematical Society, 45(1), pp.61-75.
![Page 35: Open Source Software for TDAbeiwang/acmbcbworkshop2016/... · Other open source software is available for computing persistent homology e tion x Boundary rix s lization Size e JavaPlex](https://reader033.fdocuments.us/reader033/viewer/2022060209/5f0443e77e708231d40d1f8f/html5/thumbnails/35.jpg)
Thank You!Questions?