Data Visualization · Data Visualization Giri Iyengar Cornell University [email protected] March 26,...

29
Data Visualization Giri Iyengar Cornell University [email protected] March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Transcript of Data Visualization · Data Visualization Giri Iyengar Cornell University [email protected] March 26,...

Page 1: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Data Visualization

Giri Iyengar

Cornell University

[email protected]

March 26, 2018

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Page 2: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Overview

1 Overview

2 t-SNE

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 2 / 26

Page 3: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Overview

1 Overview

2 t-SNE

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 3 / 26

Page 4: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Agenda

Are the models working as expected?Do the metrics make sense?Visualizing multi-dimensional dataTrying to understand DL models

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 4 / 26

Page 5: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Plotting Residuals

Residual ErrorsIn a good model, it is expected that the errors that the model makes willnot have any systematic nature to them. That is, the errors should beessentially random.

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 5 / 26

Page 6: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Plotting Residuals: No systematic errors in prediction

Figure: Source - Databricks blog

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 6 / 26

Page 7: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Plotting Residuals: Systematic errors in prediction

Figure: Source - Databricks blog

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 7 / 26

Page 8: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Visualizing KMeans fit

Figure: Source - Databricks blog

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 8 / 26

Page 9: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

ROC Curve examples

Figure: Source - MLWiki

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 9 / 26

Page 10: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Overview

1 Overview

2 t-SNE

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 10 / 26

Page 11: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

t-distributed Stochastic Neighbor Embedding (t-SNE)

t-SNEt-Distributed Stochastic Neighbor Embedding (t-SNE) is a (prize-winning)technique for dimensionality reduction that is particularly well suited forthe visualization of high-dimensional datasets. The technique can beimplemented via Barnes-Hut approximations, allowing it to be applied onlarge real-world datasets. It has been applied on data sets with up to 30million examples [1].

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 11 / 26

Page 12: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Visualizing/reducing dimensions of high-dimensional data

PCA - preserves large distancesISOMAP - changes similarityfunction and then applies PCALocally linear embedding

Figure: Source: Xiaofei He

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 12 / 26

Page 13: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

ISOMAP

ISOMAP reduces dimensionsnon-linearlyRelated to kernel PCAInstead of Euclidean distance,use a geodesic / manifolddistance

Figure: Source: ESRI

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 13 / 26

Page 14: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Locally Linear Embedding

Figure: Source: Roweis and Saul Figure: Source: Roweis and Saul

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 14 / 26

Page 15: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

SNE Algorithm

Similar to LLE but use probabilities instead of distancesCompute pj|i, conditional probability that xi would pick xj asneighbor under a locally modeled pdf

Formally pj|i =exp(−

|xi−xj |2

2σ2i

)∑k 6=i exp(− |xi−xk|

2

2σ2i

)

Define qj|i = exp(−|yi−yj |2)∑k 6=i exp(−|yk−yj |2)

Define C =∑i

∑j pj|i log pj|i

qj|i, the KL Divergence

Perform gradient descent to minimize C

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 15 / 26

Page 16: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

SNE Algorithm: KL Divergence

KL Divergence is asymmetricNearby points (large pj|i) weigh more than far-away points (low pj|i)Objective function strongly favors preserving distances betweennearby points over far away points

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 16 / 26

Page 17: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

t-SNE Algorithm

Use pij = pj|i+pi|j2n instead

Use qij = (1+|yi−yj |2)−1∑k 6=i(1+|yi−yk|2)−1 ,

the Student-t distributionStudent-t distribution isheavy-tailed. Allows for a smallprobability for far-away points,forcing them to move furtheraway in low-dim space

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 17 / 26

Page 18: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

t-SNE: Barnes-Hut approximation

As formulated, O(n2) algorithm

Doesn’t work for really large datasetsWhat can we do to reduce the cost?Insight: Can we approximate roughly equally distance far awaypoints?

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 18 / 26

Page 19: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

t-SNE: Barnes-Hut approximation

As formulated, O(n2) algorithmDoesn’t work for really large datasets

What can we do to reduce the cost?Insight: Can we approximate roughly equally distance far awaypoints?

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 18 / 26

Page 20: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

t-SNE: Barnes-Hut approximation

As formulated, O(n2) algorithmDoesn’t work for really large datasetsWhat can we do to reduce the cost?

Insight: Can we approximate roughly equally distance far awaypoints?

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 18 / 26

Page 21: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

t-SNE: Barnes-Hut approximation

As formulated, O(n2) algorithmDoesn’t work for really large datasetsWhat can we do to reduce the cost?Insight: Can we approximate roughly equally distance far awaypoints?

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 18 / 26

Page 22: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

t-SNE with Barnes-Hut approximation

Barnes-Hut approximationBarnes-Hut is an approximation algorith used in Astronomy to simulaten-body problem. It uses a octree representation to model bodies in a 3-Dspace and recursively groups them in this octree. In 2D, we replace octreewith quadtree. Converts the n2 search into an n log n search.

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 19 / 26

Page 23: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Barnes-Hut Approximation

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 20 / 26

Page 24: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Quadtree representation

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 21 / 26

Page 25: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

t-SNE examples: MNIST Digits

Figure: Source: Laurens van der Maaten

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 22 / 26

Page 26: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

t-SNE examples: Netflix movies

Figure: Source: Laurens van der Maaten

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 23 / 26

Page 27: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

t-SNE examples: Words

Figure: Source: Laurens van der Maaten

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 24 / 26

Page 28: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

t-SNE Multiple Map extension

Multiple word senses: e.g. (River, Bank, Bailout)In general, how do we deal with non-metric similarities?

Extend qij =∑

mπmi π

mj (1+|ymi −y

mj |

2)−1∑k

∑m′

∑l 6=k(1+|ym′

k−ym′

l|2)−1

Now, you get multiple maps. Each map models a different similaritybetween words

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 25 / 26

Page 29: Data Visualization · Data Visualization Giri Iyengar Cornell University gi43@cornell.edu March 26, 2018 Giri Iyengar (Cornell Tech) Visualization March 26, 2018 1 / 26

Giri Iyengar (Cornell Tech) Visualization March 26, 2018 26 / 26