of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel...

Post on 10-Aug-2020

5 views 2 download

Transcript of of multivariate data (part 2) ETC3250/5250: Visualisation · Parallel coordinate plots Parallel...

ETC3250/5250: Visualisationof multivariate data (part 2)Semester 1, 2020

Professor Di Cook

Econometrics and Business Statistics Monash University

Week 5 (b)

Parallel coordinate plots

Parallel coordinate plots show the data using parallel axes

Scatterplots use orthogonal axes, and are thus limited to twovariables on the page. Turning the axes parallel allows for many more variables to bedisplayed together. Lines connecting the points show associations between variables.

2 / 19

Comparison with tours

Compare the tour of the �ea data, with three clusters:

tars1

tars2

headaede1aede2aede3

1 , 00 , 10 , 00 , 00 , 00 , 0

~indx: 11

11 20 29 38 47 56 65 74 83 92 101 110 119 128 137 146 155

Play

3 / 19

Comparison with tours

With the parallel coordinate plot:

4 / 19

How to read parallel coordinate plots

A set of points in -dimensional space map to a set of lines in parallel axes The pattern among and between the lines indicate structure in high-dimensions

Groups of lines trending together indicate clustering Single lines trending differently to other indicate outliersBetween pairs of axes intersecting lines indicate strong negativeassociation, and parallel lines indicate strong positive association

Points in Euclidean space lines in parallel coordinates

p p

5 / 19

Parallel coordinate plots - controls

Details that need to be controlled:

Order of axes can affect perception of structure. Placing axes nextto each other emphasizes that association Variables need to be on a similar scale, and may need to bestandardised

6 / 19

Parallel coordinate plots- controls

Unordered, default standardisedscale.

7 / 19

Parallel coordinate plots- controls

Ordered, default standardised scale.

8 / 19

Parallel coordinate plots- controls

Ordered, centered standardisedscale.

9 / 19

Parallel coordinate plots- controls

Ordered, scaled univariately to 0-1.

10 / 19

Parallel coordinate plots- controls

Ordered, scaled globally to 0-1.

11 / 19

Andrews curves

For the variables make a fourier transform

Preceded the tour, but like a tour laid out on a page, butalgorithm is not space-�lling.

p (x1, . . . ,xp)fx(t) = x1/√2+x2 sin(t)+x3 cos(t)+x4 sin(2t)+x5 cos(2t)+. . .

(d = 1)

12 / 19

Categorical variables

Hammock plots are a variation of parallel coordinate plots forcategorical variables. They show the �ow of groups between stackedbarcharts, providing information about the association betweenmultiple categorical variables.

13 / 19

Categorical variables

Mosaic plots partition the axes sequentially on the categoricalvariables.

For this data, titanic, there is a response variable, "Survived" andgood practice would have this variable mapped to �ll colour, becausewe are interested in the proportion change in response across thepredictor categories.

14 / 19

Categorical variables

Mosaic plots can handle multiple categorical variables.

15 / 19

Data in the model space

(Chapter3/3.9.pdf)

Model in the data space

16 / 19

Inference

Its hard to read residual plots. How do you know if the residual plot"really" has no structure?

17 / 19

Boostrapping

Bootstrapped residuals. If there is still structure in the residual plot, the "true" residualplot should be identi�able from the plots of bootstrapped residuals.

18 / 19

� Made by a human with a computerSlides at https://iml.numbat.space.

Code and data at https://github.com/numbats/iml.

Reading: Wickham, Cook, Hofmann (2015) Visualizing statisticalmodels: Removing the blindfold, Statistical Analysis and Data Mining,8(4):203-225.

Created using R Markdown with �air by xaringan, andkunoichi (female ninja) style.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

19 / 19