Intro to data visualization - Evaluation Ontario · 2016-06-23 · Data exploration or conveying...
Transcript of Intro to data visualization - Evaluation Ontario · 2016-06-23 · Data exploration or conveying...
THINKING VISUALLY:AN INTRODUCTION TO DATA & INFORMATION VISUALIZATION
Learning Event for CES Ontario Li Ka Shing Knowledge Institute
June 22, 2016
Jesse Carliner
ACTING COMMUNICATIONS & REFERENCE LIBRARIAN
Leanne Trimble
DATA & STATISTICS LIBRARIAN
University of Toronto Libraries
WHY DATA VISUALIZATION?
COMMUNICATION
GAIN INSIGHTS
VISUALIZATION WORKFLOW
WORKFLOW DESIGN
Audience & Purpose
Select and Prepare Data
Select Visualization
Form
Select Visualization
Elements
Visualize Data
Share
STAGE 1: AUDIENCE & PURPOSE
Audience & Purpose
STAGE 2: SELECT & PREPARE DATA
WORKFLOW DESIGN
Audience & Purpose
Select and Prepare Data
Select Visualization
Form
Select Visualization
Elements
Visualize Data
Share
STAGE 2: TYPES OF DATA
QUANTITATIVE
QUALITATIVE
STAGE 2: MEASUREMENT SCALE TYPE
CATEGORICAL (NOMINAL)
ORDINAL
INTERVAL
RATIO
STAGE 2: TYPES OF DATA
GEOGRAPHIC DATA
NETWORK DATA
TEMPORAL DATA
TOPICAL DATA
NUMERIC DATA
STAGE 2: UNIT OF OBSERVATION
Microdata: Unaggregated, responses at the individual level
Statistics: Aggregated from microdata, responses grouped into categories; shows counts, percentages, averages, etc.
STAGE 2: LEVEL OF ANALYSIS
SMALL LARGE
STAGE 2: OTHER FACTORS
PRIVACY?
HOW MANY VARIABLES?
STAGE 2: PREPARE YOUR DATA
WORKFLOW DESIGN
Audience & Purpose
Select and Prepare Data
Select Visualization
Form
Select Visualization
Elements
Visualize Data
Share
STAGE 2: PREPARE YOUR DATA
Aggregating
Dealing with missing values & outliers
Cleaning, sorting, labelling variables
Summary statistics, algorithms, correlation coefficients, etc.
Merging with other datasets
It’s iterative!
STAGE 2: PREPARE YOUR DATA
Before
After
STAGE 3: SELECT VISUALIZATION FORM
WORKFLOW DESIGN
Audience & Purpose
Select and Prepare Data
Select Visualization
Form
Select Visualization
Elements
Visualize Data
Share
STAGE 3: CHOOSE VISUALIZATION TYPE
Based on purpose
Data exploration or conveying thesis?Overall trends or accurate judgments?
Based on audience
Academic or hospital stakeholders?
Based on data
Quantitative, qualitative, categorical, scale (ordinal, interval), scope, temporal & geospatial considerations
Source: Cairo, Alberto. The Functional Art, p.120
Graphical perceptual accuracy,Cleveland & McGill 1984
RESOURCES FOR CHOOSING VISUALIZATION TYPES
http://datavizcatalogue.com/ Handout
PIE CHARTS
• Lack an inherent reference system (no x, y, z axis or grid lines)• Use angle to convey difference
BAR GRAPHS
• Use with when comparing quantitative variables across discrete categorical variables
• Counts or sums rather than characteristics of a distribution• Length or height more accurate than angles
b
BAR GRAPHS
• Use consistent axes when comparing two graphs
BAR GRAPHS
Use a zero baseline!
SCATTER PLOTS
• Effective for comparing how one variable is affected by another, e.g. correlation, cause & effect relationships
• Pros: Easy to spot outliers & trends• Cons: Hard to use with REALLY large datasets, occlusion issues with multivariate
scatterplots
SCATTER PLOTS
Best practices:• Prevent occlusion & ambiguity issues with easily distinguishable symbols • Consider colour, size, & symbols to capture more dimensions of your data
SCATTER PLOTS
Separate multivariate scatter plots with occlusion issues into multiple scatter plotsConsider scatterplot matrices (aka Trellis plots ) for large, multivariate sets
age
0.80 0.84 0.88 0.92 0.1 0.2 0.3 0.4 30000 50000 70000
25
35
45
55
0.8
00.8
60
.92
high_school
bach_degree
0.1
0.3
0.5
0.1
0.3
adv_degree
unemploy
0.0
80.1
20
.16
300
00
50
00
070
00
0
hh_income
25 35 45 55 0.1 0.2 0.3 0.4 0.5 0.08 0.12 0.16 300 500 700
300
50
07
00
rent
“SMALL MULTIPLES” & “SPARKLINES”
HEAT MAPS
VISUALIZING PARTS OF A WHOLE: TREEMAPS
positionlengthdirectionangleareavolumecurvatureshadingcolour
Accuracy
VISUALIZING PARTS OF A WHOLE: STREAMGRAPHS
BOX PLOTS
• Effective for displaying distribution of data• Data dense: displays outliers, minimum, maximum, median, quartiles• Take up little space
NETWORK GRAPHS
• Track activity & relationships across temporal, geo., & categorical variables• Can analyze hubs & clusters of activity• May need to apply algorithms to fix layout issues• Labels can often get obscured
TABLES
Use for a small set of quantitative data that can’t be quickly summarized in text
Edward Tufte’s rule: Fewer than 20 numbers, just use a table*
Takes up less space than graphs
Author submission guidelines often spell out when to just use a table
*Tufte, E. R. (1983). The visual display of quantitative information. Cheshire, Conn. : Graphics Press. p.56
EXERCISE
1. I want to compare obesity rates across countries across a 5 year span.
Which variables will you need?
What are you trying to observe? (e.g. distribution, relationships, data overtime, etc.)
Which visualization type(s) would you use?
Country Obese % of population2005
Obese % of population 2010
Canada 23.7 25.4
New Zealand 26 27.8
United States 33 36.1
Spain etc. etc.
Russia etc. etc.
Use http://datavizcatalogue.com or the handout
EXERCISE
2. I want to explore the relationship between obesity rates across countries, across a 5 year span in relation to GDP per capita.
Which variables will you need?
What are you trying to observe? (e.g. distribution, relationships, data overtime, etc.)
Which visualization type(s) would you use?
Country Obese % of population2005
Obese % of population2010
GDP per capita2005
GDP per capita2010
Canada 23.7 25.4 36000 47465
New Zealand
26 27.8 27525 32845
United States
33 36.1 44307 48377
Russia etc. etc. etc. etc.
Use http://datavizcatalogue.com or the handout
STAGE 4: SELECT VISUAL ELEMENTS
WORKFLOW DESIGN
Audience & Purpose
Select and Prepare Data
Select Visualization
Form
Select Visualization
Elements
Visualize Data
Share
STAGE 4: SELECT VISUAL ELEMENTS
Select Visualization
Elements
POSITION FORM COLOUR OPTICS TEXTURE
STAGE 4: SELECT VISUAL ELEMENTS
POSITION
STAGE 4: SELECT VISUAL ELEMENTS
POSITION
STAGE 4: SELECT VISUAL ELEMENTS
POSITION
STAGE 4: SELECT VISUAL ELEMENTS
FORM
STAGE 4: SELECT VISUAL ELEMENTS
FORM
Visualization from Information is beautiful: http://www.informationisbeautiful.net/visualizations/best-in-show-whats-the-top-data-dog/
STAGE 4: SELECT VISUAL ELEMENTS
FORM
Visualization from Information is beautiful: http://www.informationisbeautiful.net/visualizations/snake-oil-supplements/
STAGE 4: SELECT VISUAL ELEMENTS
COLOUR
STAGE 4: SELECT VISUAL ELEMENTS
COLOUR ACCESSIBILITY
COLOUR BLINDNESS SIMULATOR: http://www.color-blindness.com/coblis-color-blindness-simulator/ :
STAGE 4: SELECT VISUAL ELEMENTS
COLOUR
COLOUR BREWER: http://www.colorbrewer2.org/#
STAGE 4: SELECT VISUAL ELEMENTS
QUALITATIVECOLOUR PALETTE
Visualization from Information is beautiful: http://www.informationisbeautiful.net/visualizations/oil-well-every-cooking-oil-compared/
STAGE 4: SELECT VISUAL ELEMENTS
SEQUENTIALCOLOUR PALETTE
Visualization from Guardian: http://www.theguardian.com/money/ng-interactive/2014/may/23/-sp-see-how-house-prices-have-risen
STAGE 4: SELECT VISUAL ELEMENTS
DIVERGINGCOLOUR PALETTE
Visualization from Guardian:http://www.theguardian.com/society/ng-interactive/2015/sep/02/unaffordable-country-where-can-you-afford-to-buy-a-house
STAGE 4: SELECT VISUAL ELEMENTS
WHAT COLOUR PALETTE (S) WOULD YOU CHOOSE FOR THIS DATA SET?
:COLOUR BREWER: http://www.colorbrewer2.org/#
FAMILIESACRES OF LAND
RANK
RIDOLFI 30 5
SALVATTI 15 3
ALBIZZI 20 4
GINORI 10 2
STAGE 4: CREATE TEXT
INCLUDE ALL
NECESSARY CONTENT
CLARITY
LESS IS MORE
TYPOGRAPHIC
BEST PRACTICES
STAGE 4: LAYOUT AND CREATION
DATA-INK RATIO
GESTALT
GRIDS
STAGE 4: DATA-INK RATIO
LOW HIGH
Images from: http://www.infovis-wiki.net/index.php/Data-Ink_Ratio
STAGE 4: LAYOUT AND CREATION
GESTALT
STAGE 4: LAYOUT AND CREATION
SIMILARITY
PROXIMITY
STAGE 4: LAYOUT AND CREATION
SIMILARITY
PROXIMITY
STAGE 4: LAYOUT AND CREATION
GROUPING
STAGE 4: LAYOUT AND CREATION
VISUAL COMPLETION
OR
LESS IS MORE
STAGE 4: LAYOUT AND CREATION
AVOID
DISTRACTIONS
THIS OR THIS?
STAGE 4: LAYOUT AND CREATION
GRIDS
/:
STAGE 5: SHARE YOUR VISUALIZATION
WORKFLOW DESIGN
Audience & Purpose
Select and Prepare Data
Select Visualization
Form
Select Visualization
Elements
Visualize Data
Share
STAGE 5: SHARING YOUR VISUALIZATION
SHARE
GET FEEDBACK
WAS IT EFFECTIVE?
REVISEShare
EXERCISE: IMPROVING VISUALIZATIONS
In groups of 2 or 3, examine the visualization on your handout.
These are problematic visualizations. What are the issues and how could they be improved?
What’s up with the x axis?
How the countries being plotted on the chart?
Appropriate use of visualization type?
Enough contextual information?
What’s up with the colour scale?
Appropriate colour choices?
Efficient way to display info?
Watch out for dual scales!
Exaggerating data?
Appropriate colour choice?
Proximity causes the reader to group together similar images.
Consistency with scales if drawing comparisons btw visualizations.
STATISTICAL SOFTWARE PACKAGES
SPSS: Viz types: Standard graphs, charts, plotsProgramming?: Not requiredStorage: Computer hard driveCost: Annual license
*note: Produces low quality images, not suitable for publication, consider rebuilding visualizations in Excel or using R or Stata
STATISTICAL SOFTWARE PACKAGES
R: Viz types: Almost any two dimensional visualization typesProgramming?: Requires knowledge of R language, steep learning curve, but
robust sets of tutorials & learning material onlineStorage: Computer hard driveCost: Free, open source
*note: • Higher quality images• Install ggplot2 & lattice packages for excellent
visualization libraries• New packages continuously being developed
COMMERCIAL VISUALIZATION SOFTWARE
Tableau: Viz types: Many 2-d & 3-d visualization types, including dynamic &
geospatial visualizationsProgramming?: None, drag & drop functionalityStorage: Computer hard driveCost: Expensive $$
*note: • High quality images• Allows users to build custom interactive
dashboard• Suggests visualization types based on your data• Free version with limited functionality, Tableau
Public
EXCEL
Viz types: Most standard 2-d visualizationsProgramming?: None, easy to use drag & drop functionalityStorage: Computer hard driveCost: Included in Microsoft Office
*note: • Produces serviceable visualizations• Don’t settle for the default charts!• Download additional Excel templates for more
visualization types• See Jorge Comoes’ blog for tutorials:
http://www.excelcharts.com/blog/
OPEN SOURCE VISUALIZATION TOOLS
Google Fusion TablesViz types: Most standard 2-d visualizations, network graphs, & geospatial
visualizations with Google Maps’ data, static & dynamic visualizations
Programming?: None, easy to use drag & drop functionalityStorage: Cloud based, attached to Google accountCost: Free, open source
OPEN SOURCE VISUALIZATION TOOLS
Raw DensityViz types: Dynamic visualizations, visualizations that show relationships &
hierarchies, tree maps, scatterplots, network visualizationsProgramming?: None, easy to use drag & drop functionalityStorage: Cloud basedCost: Free, open source
OPEN SOURCE VISUALIZATION TOOLS
d3.jsViz types: Interactive, geospatial, & network visualizationsProgramming?: Light javascript, many tutorialsStorage: Computer hard drive, interactive visualization onlineCost: Free, open source
OPEN SOURCE VISUALIZATION TOOLS
Gephi & Sci2Viz types: Dynamic visualizations, visualizations that show relationships &
hierarchies, scatterplots, networks, geospatial visualizationsProgramming?: Light Python, easy to use drag & drop functionalityStorage: Cloud basedCost: Free, open source
IMAGE EDITORS & COLOUR TOOLS
InkscapeOpen source image editor
Useful for cleaning up images, editing axes or adding legends
ColorBrewer2Choose appropriate colour schemes according to brewer palatesCreates RGB, CMYK, & HEX codes for coloursPerforms various colour blindness tests
ADDITIONAL RESOURCES
The Data Visualization Catalogue: Detailed descriptions of data visualization types, their uses, & limitations
http://www.datavizcatalogue.com/
Tools organized by visualization type, Duke University Libraries:http://bit.ly/1EHjs5K
Curated list of visualization toolshttp://selection.datavisualization.ch/
Books
Cairo, A. (2013). The functional art: An introduction to information graphics and visualization.
Bo ̈rner, K., & Polley, D. E. (2014). Visual insights: A practical guide to making sense of data.
Tufte, E. R. (1983). The visual display of quantitative information.
Nature Methods Points of View Columns:
http://blogs.nature.com/methagora/2013/07/data-visualization-points-of-view.html
QUESTIONS?
Jesse Carliner
Leanne Trimble
THANK YOU!