Shape as Organizing Principle for Data

51
Shape as Organizing Principl for Data MLConf, SF 2014 Anthony Bak, Principal Data Scientist

Transcript of Shape as Organizing Principle for Data

Page 1: Shape as Organizing Principle for Data

Shape as Organizing Principle for Data

MLConf, SF 2014

Anthony Bak, Principal Data Scientist

Page 2: Shape as Organizing Principle for Data

The Data Problem: Complexity

Page 3: Shape as Organizing Principle for Data

Solution: Topological Summaries

Page 4: Shape as Organizing Principle for Data

Shape as Organizing Principle for Data

Page 5: Shape as Organizing Principle for Data

Shape as Organizing Principle

Page 6: Shape as Organizing Principle for Data

Reduce Bias, Discover Models

Want to Discover the underlying structure without bias.

TDA analyzes the data you have, not the data you want to have.

Page 7: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 8: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 9: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 10: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 11: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 12: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 13: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 14: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 15: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 16: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 17: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 18: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 19: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 20: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 21: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 22: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 23: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 24: Shape as Organizing Principle for Data

Remember/Forget

Use multiple lenses/metrics to get the complete picture

Different lenses provide different summaries

Page 25: Shape as Organizing Principle for Data

Generating Topological Summaries

Page 26: Shape as Organizing Principle for Data

Lenses: where do they come from?

Mean/Max/MinVariancen-MomentDensity…

Statistics

PCA/SVDAutoencodersIsomap/MDS/TSNE…

Machine Learning

CentralityCurvatureHarmonic Cycles…

Geometry

Page 27: Shape as Organizing Principle for Data

Why Topology?

Page 28: Shape as Organizing Principle for Data

Key Properties of TDA

Deformation Invariance

Compressed Representation

Coordinate Freeness

Page 29: Shape as Organizing Principle for Data

Coordinate Invariance

1. Topology of shape doesn’t depend on the coordinates used to describe the shape

2. Different feature sets can describe the same phenomena

3. While processing data, we frequently alter coordinates: scaling, rotating, whitening

You want to study properties of your data that are invariant under coordinate changes

Page 30: Shape as Organizing Principle for Data

Coordinate Invariance: Gene Expression

NKI

GSE230

Page 31: Shape as Organizing Principle for Data

Deformation Invariance

• Topological features don’t change when you stretch and distort the data

Advantage: Makes problems easier

Noise resistance Less pre-processing of data Robust (stable) data

Page 32: Shape as Organizing Principle for Data

Deformation Invariance

Page 33: Shape as Organizing Principle for Data

Deformation Invariance

Page 34: Shape as Organizing Principle for Data

Deformation Invariance

Page 35: Shape as Organizing Principle for Data

Deformation Invariance

Page 36: Shape as Organizing Principle for Data

Compressed Representation

• Replace the metric space with a combinatorial summary: a simplicial complex.

• Data becomes easier to manage, search, and query while maintaining essential features.

• Leverages many known algorithms from graph theory, computational topology, computational geometry.

Page 37: Shape as Organizing Principle for Data

Compressed Representation

Page 38: Shape as Organizing Principle for Data

Baby Steps: PCA

Page 39: Shape as Organizing Principle for Data

PCA

Page 40: Shape as Organizing Principle for Data

PCA

Page 41: Shape as Organizing Principle for Data

Data Stories

Page 42: Shape as Organizing Principle for Data

Model Introspection

Page 43: Shape as Organizing Principle for Data

Model Introspection

Page 44: Shape as Organizing Principle for Data

Predictive Maintenance

Page 45: Shape as Organizing Principle for Data

Customer Churn

Page 46: Shape as Organizing Principle for Data

Customer Churn

Page 47: Shape as Organizing Principle for Data

Customer Churn

Page 48: Shape as Organizing Principle for Data

Transaction Fraud

Page 49: Shape as Organizing Principle for Data

Transaction Fraud

Page 50: Shape as Organizing Principle for Data

Transaction Fraud

Page 51: Shape as Organizing Principle for Data

Data has shape, Shape has meaning.

http://www.ayasdi.com/company/careers/