ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and...

25
ENV 2006 3.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie

Transcript of ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and...

Page 1: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.1

Envisioning Information

Lecture 3 – Multivariate Data Exploration

Scatter plots and parallel coordinates

Ken Brodlie

Page 2: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.2

Data Tables

• Multivariate datasets can be expressed as a data table

– Each entry in table is an observation

– An observation consists of values of a set of variables, or variates

• Exercise– Create a data table from the

MSc class… A B C

1 .. .. ..

2 .. .. ..

variables

observations

Page 3: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.3

Scatter Plot

• For two variates, we have already met the scatter plot technique

• It is useful for showing what happens to one variable as another changes…

Page 4: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.4

Scatter Plot

• Visicube from Datamology is a useful free charting tool

• Here is an example scatter plot, visualizing the speed of the (receding) galaxy NGC7531 relative to the earth, measurements of speed being taken at different points on galaxy

• Circles represent measurements at 133o to horizon; pluses at 43o

• What can you observe?

http://www.datamology.com/sample-S2.shtml

Page 5: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.5

3D Scatter Plot

• Visicube has a tool specifically for 3D scatter plots

• Third variate expressed as a

vertical axis and widget lets you take slices at different heights

• Here we have same dataset but X and Y are positions, and Z axis is velocity … ie layered by velocity – here 3rd layer (1482 – 1519 km/sec)

• Observations less than 1500 km/sec highlighted in yellow (almost allowing 4D)

• Conclusion?

http://www.datamology.com/sample-S3.shtml

Page 6: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.6

3D Scatter Plots

• Here is an alternative approach, using 3D plotting…

• … does this work?

XRT/3d

http://www.ist.co.uk/XRT/xrt3d.html

Page 7: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.7

Extending to Higher Numbers of Variables

• Additional variables can be visualized by colour and shape coding

• IRIS Explorer ( a scientific visualization system!) used to visualize data from BMW

– Five variables displayed using spatial arrangement for three, colour and object type for others

– Notice the clusters…

• But there are clearly limits to how much this will scale

Kraus & Ertl, U Stuttgarthttp://wscg.zcu.cz/wscg2001/Papers_2001/R54.pdf

Page 8: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.8

Multivariate Visualization Techniques

• Software:– Xmdvtool

Matthew Ward Techniques designed for any number of variables

– Scatter plot matrices

– Parallel co-ordinates

– Glyph techniques

Acknowledgement:Many of images in followingslides taken from Ward’s work

http://davis.wpi.edu/~xmdv

Page 9: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.9

What are these?

Page 10: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.10

Multivariate Visualization

• Example of iris data set– 150 observations of 4

variables (length, width of petal and sepal)

– Check wikipedia for explanations of petals & sepals

– Techniques aim to display relationships between variables – the analytical task

Challenge in visualization is to design the visualization to match the analytical task

Page 11: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.11

Scatter Plot Matrices

Page 12: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.12

• For table data of M variables, we can look at pairs in 2D scatter plots

• The pairs can be juxtaposed:

A

B

C

CBA

With luck, you may spotcorrelations between pairsas linear structures… oryou may observe clusters

..

..

..

..

.

. . .

..

.

. . .

Scatter Plot Matrices

Page 13: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.13

Scatter Plot Matrix – Iris Data Set

Page 14: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.14

Scatter Plot Matrix – Car Data Set

Data represents7 aspects of cars:what relationshipscan we notice?

For example, what correlates with high MPG?

Page 15: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.15

A B C D E F

- create M equidistant vertical axes, each corresponding to a variable- each axis scaled to [min, max] range of the variable- each observation corresponds to a line drawn through point on each axis corresponding to value of the variable

Parallel Coordinates

Page 16: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.16

A B C D E F

- correlations may start to appear as the observations are plotted on the chart- here there appears to be negative correlation between values of A and B for example- this has been used for applications with thousands of data items

Parallel Coordinates

Page 17: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.17

Parallel Coordinates – Iris Data

Page 18: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.18

Parallel Coordinates Example

Detroit homicidedata7 variables13 observations

1961 -1973

Page 19: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.19

Parallel Coordinates

• Concept due to Alfred Inselberg

• Conceived the idea as a research student in 1959…

• … idea gradually refined over next 40 years

http://www.math.tau.ac.il/~aiisreal/

Page 20: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.20

Parallel Coordinates

• Parallel coordinates is a clever mechanism for transforming geometry from one space to another

• To get a handle on the idea, consider two variables X,Y

• In parallel coordinates, a point (X,Y) becomes… what?

• A line becomes… what?

• Why is the ordering of the axes important?

• Use this space to sketch the answers…

Page 21: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.21

The Screen Space Problem

• All techniques, sooner or later, run out of screen space

• Parallel co-ordinates– Usable for up to 150

variates– Unworkable greater than

250 variates

Remote sensing: 5 variates, 16,384 observations)

Page 22: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.22

Brushing as a Solution

• Brushing selects a restricted range of one or more variables

• Selection then highlighted

Page 23: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.23

Scatter Plot

Use of a‘brushing’ toolcan highlight subsets of data

..now we can seewhat correlateswith high MPG

Page 24: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.24

Parallel Coordinates

Brushing picksout the high MPGdata

Can you observethe same relationsas with scatterplots?

More or less easy?

Page 25: ENV 20063.1 Envisioning Information Lecture 3 – Multivariate Data Exploration Scatter plots and parallel coordinates Ken Brodlie.

ENV 2006 3.25

Parallel Coordinates

Here we highlighthigh MPG andnot 4 cylinders