Tips for Effective Data Visualization · Kandel, Heer, Plaisant, et al. (2011) Query using...

Post on 03-Aug-2020

1 views 0 download

Transcript of Tips for Effective Data Visualization · Kandel, Heer, Plaisant, et al. (2011) Query using...

TipsforEffectiveDataVisualizationAngelaZoss·EricMonsonDataandVisualizationServicesSTA112FS·Fall2017

Slides:http://bit.ly/STA112FSVisFall2017

Visualexplorationcanrevealdataqualityproblems

Query using Facebook API• Node-link diagram

Kandel,Heer,Plaisant,etal.(2011)http://dx.doi.org/10.1177/1473871611415994

Kandel,Heer,Plaisant,etal.(2011)http://dx.doi.org/10.1177/1473871611415994

Query using Facebook API• Node-link diagram• Matrix display, API return order

5000-item result limitSilent failure

Visualexplorationcanrevealdataqualityproblems

Fisher’sIrisdataset(1936)

http://sebastianraschka.com/images/blog/2014/linear-discriminant-analysis/iris_petal_sepal.png

Youcanseeavariabledistributionbyjustplottingallthepoints

Youcanseeavariabledistributionbyjustplottingallthepoints(+jitter)

Youcanseeavariabledistributionbyjustplottingallthepoints(swarm)

Histogramscanshowthedistributionofonevariable

…but theresultswilldependonthebinwidth

Boxplotscansummarizethedistributionofonevariable

…andaregreatforcomparingdistributionsacrosscategories

Scatterplotsexplorerelationshipsbetweenvariablepairs

Pairsplotscanhelpyouexplorerelationshipsbetweenvariables

Pairsplotscanhelpyouexplorerelationshipsbetweenvariables

Pairsplotscanhelpyouexplorerelationshipsbetweenvariables

https://www.youtube.com/watch?v=AuJFuEq-qD8

ggplot2

PrinciplesforEffectiveVisualizations

Principle1:Ordermatters

Orderbymeaningdata$answer <-

factor(data$answer, levels=c("None", "A little", "Some", "A lot"),ordered = TRUE)

Orderbyvaluedata$academic_field <-

factor(data$academic_field,levels=names(

sort(table(

data$academic_field),decreasing=TRUE)))

Principle2:Putlongcategoriesony-axis

Fliptheaxescoord_flip()

Oops!data$academic_field <-

factor(data$academic_field,levels=names(

sort(table(data$academic_field),decreasing=TRUE)))

data$academic_field <-factor(data$academic_field,

levels=names(sort(

table(data$academic_field))))

Principle3:Pickapurpose

Differentplacementhelpswithdifferentcomparisons

fill=highest_degree

facet_grid(.~highest_degree)

Principle4:Keepscalesconsistent

Keepallcategories,manuallysetaxes

scale_x_discrete(drop=FALSE)

scale_y_continuous(limits=c(0,40),breaks=c(0,10,20,30,40),minor_breaks=NULL)

Principle5:Selectmeaningfulcolors

Selectcolorsmanually,orusealternatepalette

scale_fill_manual(values=c("snow4","snow3",

"tan3","tan1","turquoise2","turquoise4"))

scale_fill_manual(values=c("#fee391","#fe9929", "#cc4c02"))

# Also see package RColorBrewerscale_fill_brewer(palette="BrBG")

ggplot2Resources

• Generalggplot2informationhttp://ggplot2.tidyverse.org/

• RGraphicsCookbook(recipesforplots)http://www.cookbook-r.com/Graphs/index.html

• RforDataScience(onlinebookthatincludesggplot2)http://r4ds.had.co.nz/

• ggplot2:ElegantGraphsforDataAnalysis(bookbyHadleyWickham)http://ggplot2.org/book/

• ggplot2cheatsheet(alsoinRStudio)http://bit.ly/ggplot2-cheatsheet

Resources

DataandVisualizationServices

http://library.duke.edu/dataaskdata@duke.edu

InformationaboutDVS• Datacollections,LibGuides,etc.http://library.duke.edu/data/

• Blog(tutorials,announcements,etc.)http://blogs.library.duke.edu/data/

• E-mailconsultationsaskdata@duke.edu

• Mailinglistforannouncements:https://lists.duke.edu/sympa/subscribe/dvs-announce

• Twitteraccounts@duke_data,@duke_vis

Videosofpastworkshops

http://bit.ly/DVSvideos

Questions?askdata@duke.edu