Data & Analytics Club - Data Visualization Workshop

Post on 15-Jan-2017

524 views 0 download

Transcript of Data & Analytics Club - Data Visualization Workshop

Data Visualization Nikhil Srivastava, 2015

Nikhil Srivastava

Wharton Data & Analytics Club

Data Visualization Nikhil Srivastava, 2015

hoster@wharton.upenn.edu

Data Visualization Nikhil Srivastava, 2015

About this Lecture

• Shortened version of longer course

– Slides, demos, extra material

– Code samples and libraries

– Sample projects

• Questions

Data Visualization Nikhil Srivastava, 2015

About You

Data Visualization Nikhil Srivastava, 2015

• What is Data Visualization?

• Thinking and Seeing

• From Data to Graphics

• Principles and Guidelines

• Building Visualizations

• Advanced

introduction

foundation & theory

building blocks

design & critique

construction

Outline

Data Visualization Nikhil Srivastava, 2015

• What is Data Visualization?

• Thinking and Seeing

• From Data to Graphics

• Principles and Guidelines

• Building Visualizations

• Advanced

introduction

foundation & theory

building blocks

design & critique

construction

Data Visualization Nikhil Srivastava, 2015

Data Visualization

Information VisualizationScientific Visualization

Infographics

Statistical GraphicsInformative Art

ArtScience

Statistics

JournalismDesign

Visual Analytics

Business

Data Visualization Nikhil Srivastava, 2015

City State Population   Baton Rouge   Louisiana  191,741    Birmingham   Alabama  220,927    Broken Arrow   Oklahoma  58,018    Eugene   Oregon  115,890    Glendale   Arizona  245,868    Huntsville   Alabama  55,741    Lafayette   Louisiana  87,737    Mobile   Alabama  98,147    Montgomery   Alabama  126,250    New Orleans   Louisiana  322,172    Norman   Oklahoma  101,590    Peoria   Arizona  167,868    Portland   Oregon  514,108    Salem   Oregon  147,631    Scottsdale   Arizona  134,335    Shreveport   Louisiana  68,756    Surprise   Arizona  90,548    Tempe   Arizona  143,369    Tulsa   Oklahoma  392,138  

Data Visualization Nikhil Srivastava, 2015

• Which is the most populous

city in the list?

• Which state in the list has

the most cities?

• Which state in the list has

the largest average city?

City State Population   Baton Rouge   Louisiana  191,741    Birmingham   Alabama  220,927    Broken Arrow   Oklahoma  58,018    Eugene   Oregon  115,890    Glendale   Arizona  245,868    Huntsville   Alabama  55,741    Lafayette   Louisiana  87,737    Mobile   Alabama  98,147    Montgomery   Alabama  126,250    New Orleans   Louisiana  322,172    Norman   Oklahoma  101,590    Peoria   Arizona  167,868    Portland   Oregon  514,108    Salem   Oregon  147,631    Scottsdale   Arizona  134,335    Shreveport   Louisiana  68,756    Surprise   Arizona  90,548    Tempe   Arizona  143,369    Tulsa   Oklahoma  392,138  

Data Visualization Nikhil Srivastava, 2015

Data Visualization Nikhil Srivastava, 2015

• Which is the most populous

city in the list?

• Which state in the list has

the most cities?

• Which state in the list has

the largest average city?

Data Visualization Nikhil Srivastava, 2015

• Which is the most populous

city in the list?

• Which state in the list has

the most cities?

• Which state in the list has

the largest average city?

• What is the population of

Montgomery, Alabama?

Data Visualization Nikhil Srivastava, 2015

Data Visualization is:• Useful

– Answers user questions

– Reduces user workload

(by design, not by default)

Data Visualization Nikhil Srivastava, 2015

Anscombe’s quartet (1973)

Data Visualization Nikhil Srivastava, 2015

Anscombe’s quartet (1973)

Data Visualization Nikhil Srivastava, 2015

Data Visualization is:• Useful

– Understand structure and patterns

– Resolve ambiguity

– Locate outliers

Data Visualization Nikhil Srivastava, 2015

Data Visualization Nikhil Srivastava, 2015

Data Visualization is:• Important

– Design decisions affect interpretation

Data Visualization Nikhil Srivastava, 2015

Crimean War Deaths

Florence Nightingale, 1858 (re-colorized)

Data Visualization Nikhil Srivastava, 2015

Gapminder Foundation

Data Visualization Nikhil Srivastava, 2015

Data Visualization is:• Powerful

– Communicate, teach, inspire

Data Visualization Nikhil Srivastava, 2015

purpose communicate explore, analyze

data type numerical,categorical

text, maps, graphs, networks

method staticrepresentation

animation,interactivity

Our Focus

Data Visualization Nikhil Srivastava, 2015

• What is Data Visualization?

• Thinking and Seeing

• From Data to Graphics

• Principles and Guidelines

• Building Visualizations

• Advanced

introduction

foundation & theory

building blocks

design & critique

construction

Data Visualization Nikhil Srivastava, 2015

The Hardware

Data Visualization Nikhil Srivastava, 2015

The Software• High-level concepts: objects, symbols

• Involves working memory

• Slower, serial, conscious

• Sensory input

• Low-level features: orientation,

shape, color, movement

• Rapid, parallel, automatic

Visual Perception

“Bottom-up”

Data Visualization Nikhil Srivastava, 2015

The Software• High-level concepts: objects, symbols

• Involves working memory

• Slow, sequential, conscious

• Sensory input

• Low-level features: orientation,

shape, color, movement

• Rapid, parallel, automatic

“Bottom-up”

“Top-down”

Visual Perception

Data Visualization Nikhil Srivastava, 2015

Task: Counting

How many 3’s?

1281768756138976546984506985604982826762 9809858458224509856458945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686

Data Visualization Nikhil Srivastava, 2015

Task: Counting

How many 3’s?

1281768756138976546984506985604982826762 9809858458224509856358945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686

1281768756138976546984506985604982826762 9809858458224509856358945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686

Data Visualization Nikhil Srivastava, 2015

Task: Counting

Slow, sequential, conscious

Rapid, parallel, automatic

1281768756138976546984506985604982826762 9809858458224509856358945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686

1281768756138976546984506985604982826762 9809858458224509856358945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686

Data Visualization Nikhil Srivastava, 2015

Task: (Distracted) Search

Which side has the red circle?

Data Visualization Nikhil Srivastava, 2015

Task: (Distracted) Search

Which side has the red circle?

Data Visualization Nikhil Srivastava, 2015

Task: (Distracted) Search

Which side has the red circle?

Data Visualization Nikhil Srivastava, 2015

Task: (Distracted) Search

Which side has the red circle?

Data Visualization Nikhil Srivastava, 2015

Task: (Distracted) SearchSlow, sequential, conscious

Rapid, parallel, automatic

Data Visualization Nikhil Srivastava, 2015

Task: (Distracted) Search

Data Visualization Nikhil Srivastava, 2015

Task: (Distracted) Search

Data Visualization Nikhil Srivastava, 2015

Task: (Distracted) Search

Data Visualization Nikhil Srivastava, 2015

Task: (Distracted) SearchSlow, sequential, conscious

Rapid, parallel, automatic

(n=7)

(n=5)

(n=3)

Data Visualization Nikhil Srivastava, 2015

Lessons for Visualization

• Use “pre-attentive” attributes when possible

– Color, shape, orientation (depth, motion)

– Faster, higher bandwidth

• Caveats

– Beware limits of working memory (<7)

– Be careful mixing attributes

Data Visualization Nikhil Srivastava, 2015

Example: Inefficient Attributes

Data Visualization Nikhil Srivastava, 2015

Example: Too Many Attributes

Data Visualization Nikhil Srivastava, 2015

• What is Data Visualization?

• Thinking and Seeing

• From Data to Graphics

• Principles and Guidelines

• Building Visualizations

• Advanced

introduction

foundation & theory

building blocks

design & critique

construction

Data Visualization Nikhil Srivastava, 2015

What kind of data do we

have?

How can we represent the data

visually?

How can we organize this into a

visualization?

Visual Encoding

Data Visualization Nikhil Srivastava, 2015

Data TypesCATEGORICAL ORDINAL NUMERICAL

Interval Ratio

Male / Female

Asia / Africa / Europe

True / False

Small / Med / Large

Low / High

Yes / Maybe / No

Latitude/Longitude

Compass direction

Time (event)

Length

Count

Time (duration)

= = = =<  > < > < >

- + -* /

Data Visualization Nikhil Srivastava, 2015

Data TypesCATEGORICAL ORDINAL NUMERICAL

Interval Ratio

Male / Female

Asia / Africa / Europe

True / False

Small / Med / Large

Low / High

Yes / Maybe / No

Latitude/Longitude

Compass direction

Time (event)

Length

Count

Time (duration)

Bin/Categorize

Difference/Normalize

Data Visualization Nikhil Srivastava, 2015

Data Types (Advanced)

• Networks/Graphs

– Hierarchies/Trees

• Text

• Maps: points, regions, routes

Data Visualization Nikhil Srivastava, 2015

What kind of data do we

have?

How can we represent the data

visually?

How can we organize this into a

visualization?

Visual Encoding

Data Visualization Nikhil Srivastava, 2015

Visual Encodings

Marks

point

line

area

volume

Channels

position

size

shape

color

angle/tilt

Data Visualization Nikhil Srivastava, 2015

Channel Effectiveness

Data Visualization Nikhil Srivastava, 2015

Channel Effectiveness

“Spatial position is such a good visual

coding of data that the first decision of

visualization design is which variables get

spatial encoding at the expense of others”

Data Visualization Nikhil Srivastava, 2015

What kind of data do we have?

How can we represent the data visually?

How can we organize this into a visualization?

  Athi River   Machakos  139,380   

  Awasi   Kisumu  93,369   

  Kangundo-Tala   Machakos  218,557   

  Karuri   Kiambu  129,934   

  Kiambu   Kiambu  88,869   

  Kikuyu   Kiambu  233,231   

  Kisumu   Kisumu  409,928   

  Kitale   Trans-Nzoia  106,187   

  Kitui   Kitui  155,896   

  Limuru   Kiambu  104,282   

  Machakos   Machakos  150,041   

  Molo   Nakuru  107,806   

  Mwingi   Kitui  83,803   

  Naivasha   Nakuru  181,966   

  Nakuru   Nakuru  307,990   

  Nandi Hills   Trans-Nzoia  73,626   

 

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Scatter Plot point position 2 quantitative

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Scatter + Hue point position,color

2 quantitative, 1 categorical

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Scatter + Size (“Bubble”)

point position,size

3 quantitative

Data Visualization Nikhil Srivastava, 2015

Scatter Plot – Applications

RELATIONSHIP GROUPING OUTLIERS

Data Visualization Nikhil Srivastava, 2015

Scatter Plot – Dangers

OCCLUSION (DENSITY)

OCCLUSION (OVERLAP)

3-D

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Line Chart line position(orientation)

2 quantitative

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Area Chart area size (length) 2 quantitative

Data Visualization Nikhil Srivastava, 2015

Line Chart – Applications

PATTERN OVER TIME COMPARISON

Data Visualization Nikhil Srivastava, 2015

Line Chart – Dangers

Y SCALING

X SCALING

OVERLOAD

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Bar Chart line size (length) 1 categorical,1 quantitative

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Histogram line size (length) 1 ordinal/quantitative,1 quantitative (count)

Data Visualization Nikhil Srivastava, 2015

Bar Chart – Applications

COMPARE CATEGORIES DISTRIBUTION

Data Visualization Nikhil Srivastava, 2015

Bar Chart – Dangers

TOO MANY CATEGORIES

POORLY SORTED CATEGORIES

ZERO AXIS

Data Visualization Nikhil Srivastava, 2015

type mark channel data represented

Pie Chart area size (angle) 1 quantitative

Data Visualization Nikhil Srivastava, 2015

Pie Chart – Dangers

AREA/ANGLE SCALE SIMILAR AREAS OVERLOAD

Data Visualization Nikhil Srivastava, 2015

Multi-Series: Bar

“GROUPED” BAR CHART

“STACKED” BAR CHART

Data Visualization Nikhil Srivastava, 2015

Multi-Series: Line

MULTIPLE LINE

STACKED AREA CHART

Data Visualization Nikhil Srivastava, 2015

Normalization

NORMALIZED BAR NORMALIZED AREA

Data Visualization Nikhil Srivastava, 2015

• What is Data Visualization?

• Thinking and Seeing

• From Data to Graphics

• Principles and Guidelines

• Building Visualizations

• Advanced

introduction

foundation & theory

building blocks

design & critique

construction

Data Visualization Nikhil Srivastava, 2015

From Science to Art

• Design principles*

• Style guidelines*

*dependent on context and objective (and author)

Data Visualization Nikhil Srivastava, 2015

Design Principles

Data Visualization Nikhil Srivastava, 2015

Design Principles

• Integrity

– Tell the truth with data

• Effectiveness

– Achieve visualization objectives

• Aesthetics

– Be compelling, vivid, beautiful

Data Visualization Nikhil Srivastava, 2015

Integrity

Lie Ratio = size of effect in graphic

size of effect in data

Data Visualization Nikhil Srivastava, 2015

Integrity

Data Visualization Nikhil Srivastava, 2015

Integrity

“show data variation, not design variation”

Data Visualization Nikhil Srivastava, 2015

Effectiveness*

Data/Ink Ratio = ink representing data

total ink

*Tufte

Data Visualization Nikhil Srivastava, 2015

Effectiveness* *Tufte

avoid “chart junk”

Data Visualization Nikhil Srivastava, 2015

Avoid Chart Junk

Data Visualization Nikhil Srivastava, 2015

Avoid Chart Junk

Data Visualization Nikhil Srivastava, 2015

Avoid Chart Junk

Data Visualization Nikhil Srivastava, 2015

Avoid Chart Junk

Data Visualization Nikhil Srivastava, 2015

Avoid Chart Junk

Data Visualization Nikhil Srivastava, 2015

Avoid Chart Junk

Data Visualization Nikhil Srivastava, 2015

Effectiveness (Few)

Data Visualization Nikhil Srivastava, 2015

Practical Guidelines

• Avoid 3-D charts

• Focus on substance over graphics

• Avoid separate legends and keys

• Use faint grids/guidelines

• Avoid unnecessary textures and colors

Data Visualization Nikhil Srivastava, 2015

A Note on Color

• To label

• To emphasize

• To liven or decorate

Data Visualization Nikhil Srivastava, 2015

Color as a ChannelCategorical Quantitative

Hue Good (6-8 max)

Poor

Value Poor Good

Saturation Poor Okay

Data Visualization Nikhil Srivastava, 2015

Bad Color

Data Visualization Nikhil Srivastava, 2015

Good Color

Data Visualization Nikhil Srivastava, 2015

More Color Guidelines

• Use color only when necessary

• Saturated colors for small areas, labels

• Less saturated colors for large areas,

backgrounds

• Use tools like ColorBrewer

Data Visualization Nikhil Srivastava, 2015

• What is Data Visualization?

• Thinking and Seeing

• From Data to Graphics

• Principles and Guidelines

• Building Visualizations

• Advanced

introduction

foundation & theory

building blocks

design & critique

construction

Data Visualization Nikhil Srivastava, 2015

What Tools to Use?  Athi River   Machakos  139,380   

  Awasi   Kisumu  93,369   

  Kangundo-Tala   Machakos  218,557   

  Karuri   Kiambu  129,934   

  Kiambu   Kiambu  88,869   

  Kikuyu   Kiambu  233,231   

  Kisumu   Kisumu  409,928   

  Kitale   Trans-Nzoia  106,187   

  Kitui   Kitui  155,896   

  Limuru   Kiambu  104,282   

  Machakos   Machakos  150,041   

  Molo   Nakuru  107,806   

  Mwingi   Kitui  83,803   

  Naivasha   Nakuru  181,966   

  Nakuru   Nakuru  307,990   

  Nandi Hills   Trans-Nzoia  73,626   

 

CleanRestructure

ExploreAnalyze

DATA

Visualization Goals

Data Visualization Nikhil Srivastava, 2015

Visualization Tools

Excel

TableauPlotly

Python

R

Matlab

Ubiq/Silk

How hard is it to learn?

How powerful & flexible is it?

I’ll have to write code

Data Visualization Nikhil Srivastava, 2015

Visualization Tools

Excel

TableauPlotly

Python

R

Matlab

Ubiq/Silk

How hard is it to learn?

How powerful & flexible is it?

Google Charts

Highcharts

d3

I’ll have to write code

Data Visualization Nikhil Srivastava, 2015

Cheat Sheets

• For Hackathon participants

• Otherwise, email me

Data Visualization Nikhil Srivastava, 2015

• What is Data Visualization?

• Thinking and Seeing

• From Data to Graphics

• Principles and Guidelines

• Building Visualizations

• Advanced

introduction

foundation & theory

building blocks

design & critique

construction

Data Visualization Nikhil Srivastava, 2015

Small Multiples

Data Visualization Nikhil Srivastava, 2015

Treemap(Hierarchical Data)

Strengths: nested relationships

Concerns: order, aspect ratio

Data Visualization Nikhil Srivastava, 2015

Multi-Level Pie Chart(Hierarchical Data)

Strengths: nested relationships

Concerns: readability

Data Visualization Nikhil Srivastava, 2015

Heat Map

(Table/Field Data) Strengths: pattern/outlier detectionConcerns: ordering, clustering, color

Data Visualization Nikhil Srivastava, 2015

Choropleth(Region Data)

Strengths: geography

Concerns: region sizecolor

Data Visualization Nikhil Srivastava, 2015

Cartogram

(Region Data) Strengths: geographic patternConcerns: base map knowledge

Data Visualization Nikhil Srivastava, 2015

The Ebb and Flow of Movies

NY Times, 2008

Streamgraph

Data Visualization Nikhil Srivastava, 2015

“Data Visualization” Wikipedia PageWordle

Word Cloud

Data Visualization Nikhil Srivastava, 2015

Data Visualization Nikhil Srivastava, 2015

Twitter NetworksPJ Lamberson, 2012

Data Visualization Nikhil Srivastava, 2015

Blogs/Reference

• Infosthetics.com

• Visualizing.org

• FlowingData.com

Data Visualization Nikhil Srivastava, 2015

Nikhil Srivastava

nsri@wharton.upenn.edu