Shape as Organizing Principle for Data
-
Upload
anthony-bak -
Category
Engineering
-
view
56 -
download
1
Transcript of Shape as Organizing Principle for Data
Shape as Organizing Principle for Data
MLConf, SF 2014
Anthony Bak, Principal Data Scientist
The Data Problem: Complexity
Solution: Topological Summaries
Shape as Organizing Principle for Data
Shape as Organizing Principle
Reduce Bias, Discover Models
Want to Discover the underlying structure without bias.
TDA analyzes the data you have, not the data you want to have.
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Generating Topological Summaries
Remember/Forget
Use multiple lenses/metrics to get the complete picture
Different lenses provide different summaries
Generating Topological Summaries
Lenses: where do they come from?
Mean/Max/MinVariancen-MomentDensity…
Statistics
PCA/SVDAutoencodersIsomap/MDS/TSNE…
Machine Learning
CentralityCurvatureHarmonic Cycles…
Geometry
Why Topology?
Key Properties of TDA
Deformation Invariance
Compressed Representation
Coordinate Freeness
Coordinate Invariance
1. Topology of shape doesn’t depend on the coordinates used to describe the shape
2. Different feature sets can describe the same phenomena
3. While processing data, we frequently alter coordinates: scaling, rotating, whitening
You want to study properties of your data that are invariant under coordinate changes
Coordinate Invariance: Gene Expression
NKI
GSE230
Deformation Invariance
• Topological features don’t change when you stretch and distort the data
Advantage: Makes problems easier
Noise resistance Less pre-processing of data Robust (stable) data
Deformation Invariance
Deformation Invariance
Deformation Invariance
Deformation Invariance
Compressed Representation
• Replace the metric space with a combinatorial summary: a simplicial complex.
• Data becomes easier to manage, search, and query while maintaining essential features.
• Leverages many known algorithms from graph theory, computational topology, computational geometry.
Compressed Representation
Baby Steps: PCA
PCA
PCA
Data Stories
Model Introspection
Model Introspection
Predictive Maintenance
Customer Churn
Customer Churn
Customer Churn
Transaction Fraud
Transaction Fraud
Transaction Fraud
Data has shape, Shape has meaning.
http://www.ayasdi.com/company/careers/