Visualizing and Communicating High-dimensional Data
-
Upload
stefan-kuehn -
Category
Data & Analytics
-
view
74 -
download
0
Transcript of Visualizing and Communicating High-dimensional Data
![Page 1: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/1.jpg)
Visualizing and CommunicatingHigh-dimensional Data
Dr. Stefan KühnLead Data Scientist
Data Natives Berlin - 26.10.2016
![Page 2: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/2.jpg)
Data Visualization Basics
2
![Page 3: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/3.jpg)
The Modes of Perception
3
X
X
X
X
X
XX
O
O
O O
O
O
O
OX
X
XX
O
O
O
OX
X
X
X
X
X
XX
O
O
O O
O
O
O
O
X
X X
X
XO
OO
Fast Slow
Find the outlier
![Page 4: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/4.jpg)
The Modes of Perception
4
• Pre-attentive• fast• parallel processing• effortless
• Pattern recognition• semi-fast• governed by laws of Gestalt
• Attentive• slow• sequential• high effort (attention is a very limited resource)
![Page 5: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/5.jpg)
Main Properties of Graphics
5
Category ExamplePosition
Shape
Size
Color
Orientation (Line)
Length (Line)
Type and Size (Line)
Brightness
![Page 6: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/6.jpg)
Main Properties of Graphics Humans
6
Category Amount of pre-attentive informationPosition very high
Shape ———
Size approx. 4
Color approx. 8
Orientation (Line) approx. 4
Length (Line) ———
Type and Size (Line) ———
Brightness approx. 8
![Page 7: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/7.jpg)
Pre-attentive perception
7
• Position• fast• effective• high number of different positions
• Color• use with care
• Shape• Orientation
Pre-attentive perception is effortless.Exploit this as much as you can.
![Page 8: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/8.jpg)
Pattern detection
8
„It is interesting to note that our brain […] subconsciously always prefers meaningful situations and objects.“
• Emergence• Reiification• Multi-stability• Invariance
Pattern detection can be trained.Exploit this for frequent visualizations.
![Page 9: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/9.jpg)
What is this?
9
![Page 10: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/10.jpg)
What is this?
10
![Page 11: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/11.jpg)
What is this?
11
![Page 12: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/12.jpg)
Laws of Gestalt
12
„It is interesting to note that our brain, inaccordance with the laws of Gestalt, subconsciously always prefers meaningful situations and objects.“
![Page 13: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/13.jpg)
Accuracy of Graphics
13
Square Pie vs Stacked Bar vs Pie vs Donut
What do you think?
https://eagereyes.org/blog/2016/a-reanalysis-of-a-study-about-square-pie-charts-from-2009
![Page 14: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/14.jpg)
Why do we visualize data?
14
• Explore• use different techniques• avoid „construction“ bias• be careful with „aesthetics“• challenge findings -> use attentive mode
• Explain• focus on message or „story“• use pre-attentive mode
Don’t trust graphics that you have not falsified yourself.
![Page 15: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/15.jpg)
Beyond the Basics
15
![Page 16: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/16.jpg)
Pair Plots
16
![Page 17: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/17.jpg)
Correlation Plots
17
![Page 18: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/18.jpg)
Fundamental Problems
• No accurate method in higher dimensions• Approximation methods• „Simulated“ dimensions (color, size, shape)• Animations?
• No notion of quality or accuracy for Visualizations• Information Theory?• „Stability“?
All Visualizations are wrong, but some are useful.
18
![Page 19: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/19.jpg)
Approximation methods
• Pair Plots• Axis-aligned projections • Interpretable in terms of original variables
• Singular Value Decomposition• Optimal with respect to 2-norm (Euclidean norm)
and supremum norm• Comes with an error estimate
• Other methods• Stochastic Neighbor Embedding ((t-)SNE)• „Manifold Learning“
19
![Page 20: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/20.jpg)
Manifold Learning Methods
• Locally Linear Embedding• Neighborhood-preserving embedding
• Isomap• quasi-isometric
• Multi-dimensional scaling• quasi-isometric
• Spectral Embedding• Spectral clustering based on similarity
• Stochastic Neighbor Embedding (SNE, t-SNE)• preserves conditional probabilities for similarity
• Local Tangent Space Alignement (LTSA)
20
![Page 21: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/21.jpg)
Manifold Learning
21
![Page 22: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/22.jpg)
Manifold Learning
22
![Page 23: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/23.jpg)
Manifold Learning
23
![Page 24: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/24.jpg)
Local Tangent Space Alignement
24
![Page 25: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/25.jpg)
Principal Components and Curves
• Principal Component Analysis• orthogonal decomposition based on SVD• linear in all variables• tries to preserve variance
• Principal Curves• minimize the Sum of Squared Errors with respect
to all variables (as PCA, preserve variance)• nonlinear• smooth
25
![Page 26: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/26.jpg)
Principal Components and Curves
26
![Page 27: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/27.jpg)
Parallel Coordinates
• Parallel Coordinates• especially useful for high-dimensional data• depends on ordering and scaling
27
![Page 28: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/28.jpg)
The Grand Tour
• Animated sequence of 2-D projections• https://en.wikipedia.org/wiki/
Grand_Tour_(data_visualisation)• Asimov (1985): The grand tour: a tool for viewing
multidimensional data.• Underlying idea• Randomly generate 2-D projections (random
walk)• Over time generate a dense subset of all
possible 2-D projections• Optional: Follow a given path / guided tour
28
![Page 29: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/29.jpg)
The Grand Tour
29
![Page 30: Visualizing and Communicating High-dimensional Data](https://reader031.fdocuments.us/reader031/viewer/2022030309/58f1dc491a28ab0a728b45a5/html5/thumbnails/30.jpg)
The Grand Tour
30