Representation - Universidade de Aveiro

66
Representation InfoVis, Universidade de Aveiro Beatriz Sousa Santos (R. Spence, 2007)

Transcript of Representation - Universidade de Aveiro

Page 1: Representation - Universidade de Aveiro

Representation

InfoVis, Universidade de Aveiro Beatriz Sousa Santos

(R. Spence, 2007)

Page 2: Representation - Universidade de Aveiro

Data

Ah HA ! !We look at

that picture

and gain

insight

Information visualization

The process of information visualization: graphically encoded data is

viewed in order to form a mental model of that data (Spence, 2007)

Page 3: Representation - Universidade de Aveiro

3

Interaction with data governed by high-order cognitive processes:

- Representation (how to code visually the data)

- Presentation (what/when/where to show on the screen)

- Interaction (how to let users explore the data)

DATA

PERCEPTION

INTERPRETATION

REPRESENTAT ION of data

PRESENTATION of the

represented data

INTERACT ION to select

the required view of data

The scope of this book

HIGHER-ORDER

COGNITIVE

PROCESSES

Internal modelling

Strategy formulation

Problem (re)formulation

Evaluation of options

Decision making

etc.

(Spence, 2007)

Page 4: Representation - Universidade de Aveiro

4

• The Human Visual system is the product of millions of years of

evolution

• Although very flexible, it is tuned to

data represented in specific ways

• If we understand how its mechanisms

work we will be able to produce better results

Page 6: Representation - Universidade de Aveiro

6

Representing quantitative data using color is particularly difficult

Which map is easier to understand?

(Tufte, 1990)

Page 8: Representation - Universidade de Aveiro

• Visual attributes as size, proximity, color are quickly processed by visual

perception, before the cognitive processes come into play

320

260

380

280

Example: mapping numerical values

to the length of bars:

• Computers have been responsible for advances in InfoVis, due to:

- Less expensive and more rapid memory

- More processing power and faster

- High resolution graphic displays

(Mazza, 2009)

Page 9: Representation - Universidade de Aveiro

An example illustrating the potential benefit of rearranging the visual

representation of the data:

Results of an experiment: ten crops received seven different treatments.

• black square – successful treatment

• white square – not successful

Rearranging rows and columns allows to better notice that groups of treatments seem effective for groups of crops

Treatments

A B C D E F G

1

2

3

4

5

6

7

8

9

10

Crops10

A D C E G B F

1

3

8

2

6

4

7

9

5

Treatments

Crops

Rearrange

(Spence,2007)

Page 10: Representation - Universidade de Aveiro

But, after all,

How to create a Visual representation?

Page 11: Representation - Universidade de Aveiro

Creating Visual Representations

(Mazza, 2009)

• Good design is the key to success in producing a Visualization

• Visualization S/W can provide many visual templates

• In spite of variation, all S/W packages follow the same generation process

reference model:

Page 12: Representation - Universidade de Aveiro

1. Preprocessing:

• Abstract data (which don´t have a specific connection with physical space)

are rarely in a suitable format for automatic treatment and visualization

• Raw data (data supplied by the world around us, a.k.a. datasets) have to be

given an organized logical structure to be processed by the Visualization S/W

Page 13: Representation - Universidade de Aveiro

2. Visual mapping:

• It is necessary to decide:

- which visual structures use to map the data

• Some types of abstract data can be easily

mapped to a spatial location

• Examples:

. data with a topological or geographical structure

http://www.Visualcomplexity.com

http://www3.cancer.gov/atlas

• Many types of data don’t have an easy

correspondence with the dimensions of the

physical space around us

Page 14: Representation - Universidade de Aveiro

• Three structures must be defined in the visual mapping:

- spatial substrate

- graphical elements

- graphical properties

• Graphical substrate - dimensions in physical space where the visual representation

is created (can be defined in terms of axes and type of data)

• Graphical elements - anything visible appearing in the space

points, lines, surfaces, volumes

• Graphical properties – properties of the graphical elements to which the human retina

is very sensitive - retinal variables:

size, orientation, color, texture, and shape

Page 15: Representation - Universidade de Aveiro

- Spatial substrate axes (x, y, …)

type of data (quantitative, ordinal, categorical)

- Graphical elements points

lines

surfaces

volumes

- Graphical properties retinal variables:

size,

orientation

color (depends on physiology and culture)

texture

shape

Page 16: Representation - Universidade de Aveiro

Interpretation of Bertin’s guidance

regarding the suitability of various

encoding methods to support

common tasks (Spence, 2007)

The marks are perceived as PROPORTIONAL to each other

Association Selection Order Quantity

Size

Value

Texture

Colour

Orientation

Shape

The marks canbe perceived as SIMILAR

The marks are perceived as DIFFERENT,forming families

The marks are perceived as ORDERED

Page 17: Representation - Universidade de Aveiro

The relative difficulty of assessing quantitative value as a function of

encoding mechanism, as established by Cleveland and McGill (Spence, 2007)

Length

Position

Angle

Slope

Area

Volume

Colour

Density

Most accurate

Least accurate

Page 18: Representation - Universidade de Aveiro

Mackinlay’s guidance for the encoding of quantitative, ordinal and categorical data (Spence, 2007)

Quantitative

PositionLength

Angle

SlopeArea

Volume

Density

Shape

Ordinal

PositionDensity

Colour saturation

Texture

ConnectionContainment

Length

AngleSlope

AreaVolume

Colour hue

Categorical

PositionColour hue

Texture

ConnectionContainment

Density

Colour saturation

Shape

LengthAngle

SlopeArea

Volume

Treble

Ba ss

Page 19: Representation - Universidade de Aveiro

3. Creation of views :

• Views are the final result of the generation process

• Producing them corresponds to the computer graphics phase:

• Often the quantity of data to represent is too large for the available space

• To overcome this problem there are presentation techniques as:

• Zooming

• Panning

• Scrolling

• Focus + context

• Magic lenses

• ...

(Spence, 2007)

Page 20: Representation - Universidade de Aveiro

Designing a Visualization application

• The generation process must be preceded by good design

• The main problem in designing a visual representation is the choice of mapping,

as to:

- faithfully reproduce the information codified in the data

- help the user to attain their goals

• The visual representation suitable depends on:

- the nature of the data

- the users’ profile

- the type of information to represent

- how it will be used, …

Page 21: Representation - Universidade de Aveiro

Procedure to follow to create visual representations of abstract data

1. Define the problem

2. Examine the nature of the data to represent

3. Determine the number of attributes

4. Choose the data structures

5. Establish the type of interaction

Page 22: Representation - Universidade de Aveiro

• Five reasons for Tukey’s progress:

Cleveland, B., The Collected Works of John W. Tukey: Graphics 1965-1985, Vol. 5,

Chapman Hill, 1988 (p. xxxv)

• Tukey has defined the term Exploratory Data Analysis

Page 23: Representation - Universidade de Aveiro

quantitative

nature of the data to represent ordinal

categorical

univariate

number of attributes bivariate

trivariate

multivariate

linear

temporal

data structures spatial or geographical

hierarchical

network

,

static type of interaction transformable manipulable

communicate

nature of the problem explore

confirm

Next: representation methods

organized according the n. of

attributes

Page 24: Representation - Universidade de Aveiro

28

Common Visualization Techniques

for univariate, bivariate and trivariate data

Univariate data dot plot

box plot

bar chart

histogram

pie chart

Bivariate data scatter plot

line plot

time series

Trivariate data surface plot

contour plot

3D representation

bubble plot

Cherries

Apples

Kiwis

Grapes

Oranges

y

x

x

t

x

z

y

Page 25: Representation - Universidade de Aveiro

Representing univariate data

• A single number can be difficult to represent

ensuring a user is made aware of it

Example: the altimeter

(Spence, 2007)

The original aircraft altimeter,

responsible for many accidents Two altimeter representations

easily assumed to be the same

due to change blindness

2000

1600

2200

182000

stop1200

1400

The modern

aircraft altimeter,

Page 26: Representation - Universidade de Aveiro

Example of change blindness (Spence, 2007)

Page 27: Representation - Universidade de Aveiro

Example of change blindness (Spence, 2007)

Page 28: Representation - Universidade de Aveiro

- Innattentional blindness

https://www.youtube.com/watch?v=IGQmdoK_ZfY

- Change blindness

http://www.youtube.com/watch?v=vBPG_OBgTWg&feature=related

Page 29: Representation - Universidade de Aveiro

Representing univariate data (cont.)

• A more common situation consists in representing a set of values

• Well established techniques exist

• But new ones can be invented!

Example:

Price for a number of cars:

- dots on a linear scale

- box plot

(that will answer many questions:

median value, outliers,...)

60

50

40

30

20

10

Price (£K)

(Spence, 2007)

Dot plot Tukey boxplot

Page 30: Representation - Universidade de Aveiro

• much of the data is aggregated

• precise detail is often not needed

• We can represent derived values

• The histogram is a well known technique

representing derived values

• Tipping over the bars of the histogram a bargram is obtained

10-20 20-30 30-40 40-50 50-60

Price (£K)

2

4

6

8

(Spence, 2007)

10 - 12 12 - 14 16 - 18£kPrice

• Categorical or ordinal data can also be represented in bargrams

Nissan Ford Ferrari MG Cadillac

Missing characteristic:Empty bin

Page 31: Representation - Universidade de Aveiro

Simple (and common) representations of data

• Two common techniques not to be confused !

Histogram represents a distribution of numerical data

Bar chart represents the number of occurrences of a categorical/ordinal

data

Both represent data by rectangular bars(vertical or horizontal) with length

proportional to the values they represent

Cherries

Apples

Kiwis

Grapes

Oranges

Tons Marks of the Exam

Fruit production

Bins

0-2 3-4 5-6 7-8 9-10 11-12 13-14 15-16 17-18 19-20

(mark)

Page 32: Representation - Universidade de Aveiro
Page 33: Representation - Universidade de Aveiro

day Max T Min. T

1 15 7

2 14 8

3 13 6

4 13 6

5 12 6

6 13 7

7 13 7

8 14 8

9 15 5

10 12 5

11 13 6

12 12 7

13 11 8

14 11 8

15 12 8

16 12 9

17 13 9

18 14 9

19 14 8

20 13 8

21 13 8

22 12 7

23 12 7

24 11 7

25 11 6

26 11 7

27 13 6

28 14 6

39

Example: how to select simple charts?

Temperatures along the month of February (in ºC):

Q1- What were the maximum and minimum values of MaxT?

Q2- What was the most frequent Max temperature?

Q3- In how many days was that Max temperature attained?

day

n. of days

11 12 13 14 15 Max (ºC)

Which chart would you use to

answer Q1?

Q2? And Q3?

ºC

11 ºC

12

13

14

15

Page 34: Representation - Universidade de Aveiro

day Max T Min. T

1 15 7

2 14 8

3 13 6

4 13 6

5 12 6

6 13 7

7 13 7

8 14 8

9 15 5

10 12 5

11 13 6

12 12 7

13 11 8

14 11 8

15 12 8

16 12 9

17 13 9

18 14 9

19 14 8

20 13 8

21 13 8

22 12 7

23 12 7

24 11 7

25 11 6

26 11 7

27 13 6

28 14 6

40

Simple example

Temperatures along the month of February (in ºC):

day

Anything “odd” about this chart? ºC

day

ºC

Would you prefer this one?

Page 35: Representation - Universidade de Aveiro

Representing bivariate data

• The scatterplot is the conventional representation

Example: A collection of houses characterized by: - Price

- Number of bedrooms Each represented by a point on a two dimensional space

The axes are associated with these two attributes This representation affords awareness of: - general trends - local trade-offs - outliers

100 120 140 160

5 4 3 2 1

Price £

N. of bedrooms

(Spence, 2007)

Page 36: Representation - Universidade de Aveiro

• The time series is a special (and important) case of the scatterplot

• One of the axes represents time and the other some function of time

Example

Data set of 52 weekly stock prices

for 1430 stocks

The graph overview shows the entire

data providing some idea of densities

and distributions

Timeboxes limit the display to items

according to some time and price criteria

(Spence, 2007)

Page 37: Representation - Universidade de Aveiro

• Alternative representation of a time series

• More suited for gaining an initial impression of data

Example:

Level of ozone concentration over 10 years

above Los Angeles

- each square represents one day

- and is colored to indicate the ozone level

It is apparent that:

- ozone levels are higher in Summer

- the concentration has been decreasing

(Spence, 2007)

Page 38: Representation - Universidade de Aveiro

• If one attribute is more important than the other or must be examined first,

• it may be appropriate to employ logical or semantic zoom

Example:

Analyzing a list of cars:

- price is the first attribute to examine

- semantic zoom reveals data about

a second attribute

• This technique is quite general: it can encompass many attributes and many levels of progressive zoom

60

50

20

10

Pri ce (£ K)

40

30

40

30

35

FordNissanVWM erc

JagJagFordSE AT

(Spence, 2007)

Page 39: Representation - Universidade de Aveiro

Example: Zoom in Google Maps

Page 40: Representation - Universidade de Aveiro

Representing Trivariate data

• Since we live in a 3D world, representing trivariate data as points in a 3D

space and displaying a 2D view is natural

Price

Time

Bedrooms

AB

C

D• However, these representations

can be ambiguous

• This can be solved by interaction,

allowing the user to reorient the

representation

“for 3D to be useful, you’ ve got to

be able to move it” (Spence, 2007)

Page 41: Representation - Universidade de Aveiro

• Interaction (brushing) can help – objects identified in one view are highlighted

in the other two planes

• change blindness must be taken into account and ensure that the user

notices the highlight in the other two planes

The highlighting of houses in one

plane is brushed into the remaining

planes.

(Spence, 2007)

Page 42: Representation - Universidade de Aveiro

“Augmenting” a scatterplot: representing 5 variables

Hans Rosling's 200 Countries, 200 Years, 4 Minutes: 120 000 values

https://www.youtube.com/watch?v=jbkSRLYSojo

Income (x), Age expectancy (y) , Time (t), Continent (colour), Population (size of circle)

Page 43: Representation - Universidade de Aveiro

• An alternative representation for trivariate (and hypervariate) data is a

structure formed from the three possible 2D views of the data

Price

Time

Bedrooms

AB

C

D

Scatterplot matrix

Example: houses (price, number of

bedrooms, time of journey to work )

A

B C

D

Price

Bedrooms

(Spence, 2007)

Page 44: Representation - Universidade de Aveiro

Simple representations of a function

(field) of two variables

57

• Contour plots

• contour line (also isoline, isopleth,

or equipotential curve) of a a function of

two variables is a curve along which the

function has a constant value, so that the

curve joins points of equal value.

• Typical in meteorological charts (isobars

and isothermal curves)

• and maps (to represent altitude or depth)

Page 45: Representation - Universidade de Aveiro

58

• Surface plots

• May be combined with color

Page 46: Representation - Universidade de Aveiro

Population of major cities in England,

Wales and Scotland. Circle area is

proportional to population. (Spence, 2007)

• A special category of trivariate data:

maps (latitute and longitude + a value)

Things that “pop-out”

Page 47: Representation - Universidade de Aveiro

Pre-attentive processing: Things that “pop out”

“We can do certain things to symbols to make it much more likely that they will

be visually identified even after a very brief exposure” (Ware, 2004)

Orientation

Shape Enclosure

Colour

Where is the blue square?

(Spence, 2007)

Page 48: Representation - Universidade de Aveiro

Representing Hypervariate (or multivariate) data

• Many real problems are of high dimensionality

• The challenge of representing hypervariate data is substantial and continues

to stimulate invention

• Some of the mentioned representation techniques can be scaled to represent

hypervariate data (to a limited extent)

Page 49: Representation - Universidade de Aveiro

Techniques for Hypervariate (or multivariate) data Visualization

• Coordinate plots parallel coordinate plots

star plots

• Scatterplot Matrix

• Linked histograms

• Mosaic Plots

• Icons

Page 50: Representation - Universidade de Aveiro

• Parallel coordinates plots are one of the most popular techniques for

hypervariate data

• They have a very simple basis

(Spence, 2007)

Page 51: Representation - Universidade de Aveiro

Consider a simple case of bivariate data:

1- A scatterplot represents the price and number of

bedrooms associated with two houses

2- the axes are detached and made parallel; each

house is represented by a point on each axis

3- To avoid ambiguity the pair of points representing a

house are joined and labeled

A

B

Price

Numberofbedrooms

Price Numberofbedrooms

Price Numberofbedrooms

A

B

Page 52: Representation - Universidade de Aveiro

• For objects characterized by many attributes the parallel coordinate plots

offer many advantages

A example for six objects, each characterized by seven attributes:

A B C D E F G

The trade-off between A and B, and the correlation between B and C, are

immediately apparent. The trade-off between B and E, and the correlation

between C and G, are not.

trade-off between A and B

correlation between B and C

objects

attributes

Page 53: Representation - Universidade de Aveiro

A parallel coordinate plot representation of a collection of cars, in which a

range of the attribute Year has been selected to cause all those cars

manufactured during that period to be highlighted.

Page 54: Representation - Universidade de Aveiro

Properties of parallel coordinate plots:

• Suitable to identify relations between attributes

• Objects are not easily discriminable; each object is represented by a polyline

which intersects many others

• They offer attribute visibility (the characteristics of the separate attributes are

particularly visible)

• The complexity of parallel coordinate plots (number of axes) is directly

proportional to the number of attributes

• All attributes receive uniform treatment

Page 55: Representation - Universidade de Aveiro

• Star plots have many features in common

with parallel coordinate plots

• An attribute value is represented by a point

on a coordinate axis

• Attribute axes radiate from a common origin

• For a given object, points are joined by straight lines

• Other useful information such as average values or thresholds can be

encoded

Sport

Literature

Mathematics

Physics

History

Geography

Art

Chemistry

(Spence, 2007)

object

Page 56: Representation - Universidade de Aveiro

Properties of star plots:

• Their shape can provide a reasonably rapid appreciation of the attributes of

the objects

• They offer object visibility and are suitable to compare objects

(by visibility it is meant the

ability to gain insight pre-attentively;

without a great cognitive effort)

Sport

Literature

Mathematics

Physics

History

Geography

Art

Chemistry

Bob’s performance Tony’s performance

(Spence, 2007)

Page 57: Representation - Universidade de Aveiro

• The scatterplot matrix is applicable to higher dimensions

• However, as the number of attributes increase, the number of different pairs

of attributes increases rapidly:

– 2 attributes -> 1 scatterplot

– 3 attributes -> 3 scatterplots

– 4 attributes -> 6 scatterplots

Scatterplot matrix for 6 attributes of a car dataset

Page 58: Representation - Universidade de Aveiro

• A single scatterplot can be

used together with other

encoding techniques to

represent data of higher

dimension

https://spotfire.tibco.com/resources

/product-demonstration-

interactive/expense-analytics

https://www.mathworks.com/matla

bcentral/fileexchange/48005-

bubbleplot-multidimensional-

scatter-plots

Page 59: Representation - Universidade de Aveiro

Details of the Titanic disaster involving 4 attributes:

• Gender

• Survival

• Class

• Adult/child

Survived Age Gender

Class

1st 2nd 3rd Crew

NoYesNoYesNoYesNoYes

Adult

Child

Adult

Child

Male

Female

118 57 0 5 4140 0 1

154 14 0 11 13 80 0 13

387 75 35 13 89 76 17 14

670192 0 0 3 20 0 0

Mosaic plots can be demonstrated by an example using data from the Titanic (http://www.math.yorku.ca/SCS/Online/mosaics/about.html)

(https://en.wikipedia.org/wiki/Mosaic_plot)

(Spence, 2007)

Page 60: Representation - Universidade de Aveiro

Steps in the creation of a Mosaic Plot representing the Titanic disaster.

2201 885706285325

First Second Third Crew

Female

Male

First Second Third Crew

Female

Male

Adult

First Second Third Crew

Ch ild

Survived

Died

Survived

Died

(a) (b)

(c)(d)

Total number of passengers + crew

Page 61: Representation - Universidade de Aveiro

Chernoff Faces allow attribute values to be encoded in the features of

cartoon faces

They were originally used to study geological samples, each characterized

by 18 attributes

Icons represent a number of attributes qualitatively or quantitatively

(https://en.wikipedia.org/wiki/Chernoff_face)

y

x

A stranger

(Tufte, 2001)

Page 62: Representation - Universidade de Aveiro

Multidimensional icons representing eight attributes of a dwelling

house£400,000garagecentral heatingfour bedroomsgood repairlarge gardenVictoria 15 mins

flat£300,000no garagecentral heatingtwo bedroomspoor repairsmall gardenVictoria 20 mins

houseboat£200,000no garageno central heatingthree bedroomsgood repairno gardenVictoria 15 mins

Textual descriptions of the dwellings represented by the multidimensional icons (Spence, 2007)

Page 63: Representation - Universidade de Aveiro

• Data related glyphs (icons) seem more favored by users (Siirtola, 2005)

(doi>10.1016/j.intcom.2006.03.006)

• Direct metaphorical icons find wide application (Miller and Stasko, 2001)

http://www.youtube.com/watch?v=w_J2ojj5Fu0

• Two examples of metaphorical icons:

- with direct relation between icon and object (house icon)

- no direct relation between facial features and attributes they represent

(Chernoff faces)

The InfoCanvas

Page 64: Representation - Universidade de Aveiro

Example:

Use visualization techniques to help answer the following questions:

Is there a relation between wanted salary and experience?

How many candidates ask for a salary in [30000, 50000] and in [55000, 75000]?

How many candidates have an advanced level of English?

Education

Age

Prof.

Experience

Gender

English Wanted

salary

# (MSc/PhD) (years) (years) (F/M) (Bas/Adv) ($$/year)

1 MSc 22 0 M Advanced 36000

2 MSc 23 0 M Basic 36000

3 MSc 24 1 F Advanced 36000

4 PhD 30 7 F Advanced 72000

5 MSc 25 1 M Basic 40000

6 PhD 29 5 M Advanced 60000

7 MSc 31 7 M Advanced 55000

8 MSc 23 0 F Advanced 36000

9 MSc 26 2 F Intermediate 40000

10 PhD 32 9 M Intermediate 65000

11 BSc 30 7 M Intermediate 30000

12 PhD 40 17 F Advanced 80000

13 MSc 28 4 M Advanced 40000

… the complete table has many more candidates, but you may test with these …

Page 65: Representation - Universidade de Aveiro

Bibliography

Main Bibliography

• Spence, R., Information Visualization, Design for Interaction, 2nd ed.,

Prentice Hall, 2007

• Munzner, T., Visualization Analysis and Design, A K Peters/CRC Press, 2014

• Mazza, R., Introduction to Information Visualization, Springer, 2009

• Ware, C., Information Visualization, Perception to Design, 2nd ed.,Morgan

Kaufmann,2004

• Tufte, E., The Visual Display of Quantitative Information, 2nd ed., Graphics

Press, 2001

Acknowledgement The author of these slides is very grateful to Professor Robert Spence as he

provided the electronic version of his book figures

Page 66: Representation - Universidade de Aveiro

Links

• http://www.datavis.ca/

• http://www.infovis-wiki.net/

Prize winning pie chart! Mandelblubs: 3D Fractals