Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and...

40
Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan

Transcript of Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and...

Page 1: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Visualizing and ExploringData

Based on Chapter 3 of Hand, Manilla, & Smyth

David Madigan

Page 2: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Introduction

•Exploratory Data Analysis legitimized by Tukey (1997)

• ~ Data based hypothesis generation

•Always need to be skeptical about findings since thesearch space can be very large

•Useful tools: S-Plus, Ggobi, DataDesk, JMP

•http://otal.umd.edu/Olive/ (On-line Library of Information Visualization Environments)

Page 3: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Displaying Single Variables

credit card example

Page 4: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Pima indian example

Page 5: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Smoothing Estimates

• Kernel estimates smooth out the contribution of eachdatapoint over a local neighborhood of that point.

!=

"=

n

i

nhh

ixxKxf

1

1 ))(

()(ˆ

h is the kernel width

• Gaussian kernel is common:2

)(

2

1!"

#$%

& ''

h

ixx

Ce

• Formal procedures for optimal bandwidth choice

• Gray & Moore’s work on speeding this up…

Page 6: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

ATE3hour

Page 7: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Displaying Two Variables

Page 8: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction
Page 9: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction
Page 10: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction
Page 11: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Correlation = 0.5

Page 12: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction
Page 13: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction
Page 14: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction
Page 15: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction
Page 16: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Tinting

• Experiment to model the effects of car window tinting onvisual performance

• csoa: critical stimulus onset asynchrony (time to recognizean alphanumeric target)

• it: inspection time (time required for a simplediscrimination task)

• age, tint (no,lo,hi), target (locon,hicon), sex

Page 17: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

xyplot(csoa~it | sex*agegp, data=tinting, groups=target, auto.key=list(columns=2))

Page 18: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

xyplot(csoa~it | sex*agegp, data=tinting, groups=tint, auto.key=list(columns=3))

Page 19: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

xyplot(csoa~it | sex*agegp, data=tinting, groups=tint, auto.key=list(columns=3), type=c("p","smooth"),span=0.8)

Page 20: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Source: Michael Friendly

Page 21: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Half-space location depth of z in R2 relative to z1,…,zn is thesmallest number of zi contained in any closed half-plane withboundary line through z

Bagplot

Rousseeuw,Ruts, andTukey

Page 22: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction
Page 23: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Four-fold display for categorical data

Page 24: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Four-fold display for categorical data

Page 25: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Mosaic plots for categorical data

Page 26: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction
Page 27: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Visual ScalabilityEick and Karr

•Human perception: 6.5 million pixels?

•Monitor resolution: 640X480=307,300; 1600X1200=1,920,000

•Visual metaphors:

•Bar charts: can display 500; realistic limit about 50; color

•Matrix views: 1280X1024 can display 13,000 10X10 entities

•Landscapes: 3-D matrix view; color, height, and shape; occlusion?

•Network views: scalability depends on connectivity

•Scatterplots: 100,000 points?

•Histograms: smoothing calculations become expensive

•Interactivity

Page 28: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Dimensionality Reduction•Scatterplot = 2-D projection defined by 2 variables(e.g. x1 vs. x4)

•Other projections? e.g. 2x1+3x2 vs. 6x1 + 2x4

•Projection pursuit…issues with scalability

•PCA: scales quite well; used in text retrieval

•MDS: metric versus non-metric

•Random projections

Page 29: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Tufte:

Graphical excellence is the well-designed presentation ofinteresting data - a matter of substance, of statistics, and ofdesign.

Graphical excellence consists of complex ideascommunicated with clarity, precision, and efficiency.

Graphical excellence is that which gives the viewer thegreatest number of ideas in the shortest time with the leastink in the smallest space.

Graphical excellence is nearly always multivariate.

And graphical excellence requires telling the truth about thedata.

Page 30: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Tufte also insists that graphical displays should:

induce the viewer to think about the substance rather thanabout methodology, graphic design, the technology ofgraphic production or something else

reveal the data at several levels of detail, from a broadoverview to the fine structure

Page 31: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction
Page 32: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction
Page 33: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

In the following example, from The Times of Saturday 1/2/3 is a superb example of this form of abuse. The the two shells supposedly represent twoquantities in the ratio 500 to 364, so the first should be 500/364 or 1.374 times bigger than the second, representing a 37.4% increase. But their lengths are inthe ratio 102mm to 65mm, making the first 1.569 times longer than the second, and giving it a volume greater than that of the second by a factor of 1.569cubed, or 3.864. This gives a shocking lie factor of 3.864/1.374 or 2.8 times!

Page 34: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction
Page 35: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

Tufte’s worst graphic ever!

Page 36: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction
Page 37: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction
Page 38: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction
Page 39: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction

http://www.ted.com/talks/view/id/92

Page 40: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction