Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life...

33
How I learned to stop visualizing and love statistics 1

Transcript of Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life...

Page 1: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

How I learned to stop visualizing and love statistics

1

Page 2: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

You have a hunch

2

Page 3: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

Visualizations sanity check

Statistics quantify the hunch

(Visualizations storytelling)

3

Page 4: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

Someone says: “Obama got more small campaign

contributions than McCain”

4

Page 5: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

5

Page 6: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

???

6

Page 7: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

7

© Jhguch on Wikipedia. All rights reserved. This content is excluded from our CreativeCommons license. For more information, see http://ocw.mit.edu/fairuse.

Page 8: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

Median

8

Page 9: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

9

25% 75%

Page 10: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

10

Inner Quartile Range

Page 11: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

11

Whiskers / Extremes

Page 12: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

12

Outliers

Page 13: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

13

Box-and-Whiskers Plot

Page 14: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

14

Page 15: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

???

15

Page 16: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

Are they actually different?

T-Test

16

Page 17: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

Obama McCain Obama McCain 17

Assume Reality

Page 18: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

Obama McCain Obama McCain 18

How likely is given ? How likely is given ? How likely is given ?

Page 19: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

Obama 19

McCain

avg1

avg2

Page 20: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

Obama 20

McCain

avg1

avg2

Effect Size

Page 21: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

Obama 21

McCain

avg1

avg2

vari

ance

1

vari

ance

2

Page 22: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

How likely is given ? How likely is given ? How likely is given ?

avg1

avg2 avg2 avg2 avg2

avg1 avg1 avg1

vari

ance

1

vari

ance

2

22

Page 23: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

How likely are they equal given avg/variance differences?

Probablility p

p is low p is high Obama, McCain Don’t trust

are different the difference (significant) 23 (not significant)

Probablility

p is high

Page 24: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

Significance is binary

• Pick a threshold: .01? .05?

• Is p > threshold, or < threshold?

p < .05? significant

p > .05? don’t trust the difference 24

Page 25: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

avg1

avg2 avg2 avg2 avg2

avg1 avg1 avg1

vari

ance

1

vari

ance

2

T-Test Signifiance

# Samples Obama: >1M

McCain: >1M +

25

Page 26: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

Correlation, Linear Regression

26

Page 27: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

County Health Rankings

• Every county in USA

• Years of Potential Life Lost (YPLL): early morbidity

– less is good

– more is bad

• Median income, % population w/ diabetes,

% population under 18, …

27

Page 28: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

What is correlated with early death in a community?

Burgers Sleep

Education Exercise

# Rappers Your theory here

28

Page 29: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

29

Page 30: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

y = mx + b

R2 (0 to 1)

p < .05?

Line coefficients:

Correlation amount:

Significance:

30

Page 31: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

31

Page 32: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

Correlation != Causation

Correlation

Causal Hunch

Randomized Trial

T-Test! 32

Causal Hunch

Randomized Trial

T-Test!

Page 33: Visualizations and statistics - MIT OpenCourseWare · 2020. 12. 31. · • Years of Potential Life Lost (YPLL): early morbidity –less is good –more is bad • Median income,

MIT OpenCourseWarehttp://ocw.mit.edu

Resource: How to Process, Analyze and Visualize DataAdam Marcus and Eugene Wu

The following may not correspond to a particular course on MIT OpenCourseWare, but has been provided by the author as an individual learning resource.

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.