Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer...
Transcript of Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer...
![Page 1: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/1.jpg)
Session 2: Visualizing data
Stats 60/Psych 10Ismael LemhadriSummer 2020
![Page 2: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/2.jpg)
This time
• Visualizing data • How to spot bad graphs • How to create good graphs
![Page 3: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/3.jpg)
How better data visualization could have saved 7 lives
January 28, 1986
![Page 4: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/4.jpg)
What happened?
Tufte, 1997
![Page 5: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/5.jpg)
https://www.slideshare.net/catalyst00/truth-lies-and-orings-inside-the-space-shuttle-challenger-disasterhttp://www.aerospaceweb.org/question/investigations/q0122.shtml
![Page 6: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/6.jpg)
What does this have to do with data visualization?
• Temperatures were forecast to be very cold on Jan 28
• Engineers from the rocket contractor Morton Thiokol presented 13 charts in an attempt to convince NASA to postpone the launch due to concerns about the O-rings failing at low temperature
• They failed
![Page 7: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/7.jpg)
Ineffective presentation of data
![Page 8: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/8.jpg)
A more effective summary of the data
Tufte, 1997
![Page 9: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/9.jpg)
An even more effective visualization of the data
adapted from Tufte, 1997
What are the two important takeaway messages?
26-29º range of forecasted
temperatures for launch of Challenger
on Jan 28
![Page 10: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/10.jpg)
It’s very easy to find bad graphs
![Page 11: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/11.jpg)
https://flowingdata.com/2013/07/15/open-thread-what-is-wrong-with-these-charts/
![Page 12: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/12.jpg)
http://viz.wtf/
![Page 13: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/13.jpg)
http://viz.wtf/
![Page 14: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/14.jpg)
Principles of good visualizations
1. Show the data and make them stand out • Avoid clutter and chartjunk
2. Avoid distorting the data • Use proper scales
3. Keep human limitations in mind 4. Reveal the underlying message of the data
• Make captions and labels clear and informative
![Page 15: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/15.jpg)
Show us the data!
![Page 16: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/16.jpg)
The “Datasaurus
Dozen”
https://www.autodeskresearch.com/publications/samestats
![Page 17: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/17.jpg)
Not a very good graph
ggplot(dfmean,aes(x=Gender,y=Height)) + geom_bar(stat="identity") +
dfmean <- NHANES_adult %>% group_by(Gender) %>% summarise(Height=mean(Height))
![Page 18: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/18.jpg)
Much better: Box plot
ggplot(NHANES_adult,aes(x=Gender,y=Height)) + geom_boxplot()
Median
“Outliers” (≥1.5 IQR
outside quartile)
IQRFirst quartile
Third quartile}
![Page 19: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/19.jpg)
Also great: Violin plot
ggplot(NHANES_adult,aes(x=Gender,y=Height)) + geom_violin()
![Page 20: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/20.jpg)
Maximize the data-ink ratio
Data-ink ratio = Amount of ink used on data
Total amount of ink
![Page 21: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/21.jpg)
Maximizing the data-ink ratio
![Page 22: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/22.jpg)
Avoid “chartjunk”
• Extraneous visual elementsChart'Junk?'
Nigel Holmes Style Graphics
http://classes.engr.oregonstate.edu/eecs/spring2015/cs419-001/Slides/tufteDesign.pdf
http://junkcharts.typepad.com/junk_charts/2014/10/index.html
![Page 23: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/23.jpg)
Rule #1 for avoiding bad visualizations: Don’t use Microsoft Office to generate them
![Page 24: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/24.jpg)
Avoiding chartjunk
• Avoid textures and images in plots
Series10
5
10
15
20
25
30
35
40
45
50
Protestant None Catholic Jewish Other Mormon OtherChristian Muslim Buddhist Don'tknowSeries1 46.6 22.8 20.8 1.9 1.8 1.6 1.6 0.9 0.7 0.6
ChartTitle
![Page 25: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/25.jpg)
Avoid distorting the data
• Use appropriate scales for the Y axis • Beware of effects that distort the data
![Page 26: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/26.jpg)
Violent crime was flat from 1990-2014
![Page 27: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/27.jpg)
Wait… Violent crime has plummeted since 1990!
![Page 28: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/28.jpg)
Should you always include zero in the y axis?
![Page 29: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/29.jpg)
Using zero as the basis often makes no sense
![Page 30: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/30.jpg)
It’s ok not to start your Y axis at zero
“In general, in a time-series, use a baseline that shows the data not the zero point; don’t spend a lot of empty vertical space trying to reach down to the zero point at the cost of hiding what is going on in the data line itself.” Edward Tufte
https://qz.com/418083/its-ok-not-to-start-your-y-axis-at-zero/
![Page 31: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/31.jpg)
The “Lie Factor”
• Tufte, 1983 • The size of the effect on the physical graphic, relative to the
size of the effect in the data • A lie factor of about 1 is good
![Page 32: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/32.jpg)
The Lie Factor
• Change in fuel economy from 1978-1985 = 53% (0.53) • Change in graphic = change from 0.6” to 5.3” • (5.3 - 0.6)/0.6 = 7.83 = 783% • Lie Factor = 7.83/0.53 = 14.8 -- almost 15 times reality
Tufte, 1983/R. Smith
![Page 33: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/33.jpg)
Always use zero as the basis for bar/column charts
• Doing otherwise introduces a potential lie factor
Lie factor~2.8
![Page 34: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/34.jpg)
Remember human limitations
• Perceptual limitations • Many people have problematic color vision • Volume/area is harder to perceive than length
• Cognitive limitations • We have limited working memory capacity • Don’t make the viewer remember too much
![Page 35: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/35.jpg)
0
25
50
75
100
April May June July
Always use brightness contrast in addition to colorAlways use brightness contrast in addition to color
![Page 36: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/36.jpg)
ReligionintheUnitedStates
Protestant Catholic Mormon OtherChristian Jewish Muslim Buddhist Other None Don'tknow
Volume can be very hard to distinguish visuallyDon’t make your viewer remember too much
![Page 37: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/37.jpg)
![Page 38: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/38.jpg)
Group exercise
• What is the message of this visualization?
• How could that message be better conveyed?
https://howmuch.net/articles/bitcoin-wealth-distribution
![Page 39: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/39.jpg)
Correcting for other factors
• Inflation • Population
size • Seasonal
adjustment
Gasoline prices, with and without adjustment for inflation (using CPI)
![Page 40: Stats 60 - Session 2: Visualizing data · 2020. 9. 14. · Stats 60/Psych 10 Ismael Lemhadri Summer 2020. This time • Visualizing data • How to spot bad graphs ... Always use](https://reader036.fdocuments.us/reader036/viewer/2022071608/6145bd4607bb162e665fe152/html5/thumbnails/40.jpg)
Recap
• Focus on showing the data and revealing its story • Don’t misrepresent the data through graphics