Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset...

26
ics of data representat v2.0

Transcript of Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset...

Page 1: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Ethics of data representation

v2.0

Page 2: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Collect Raw Data Process and Filter Data Clean Dataset

Exploratory Analysis

Generate Conclusion

Generate Visualisation

Data Visualisation Process

Page 3: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

What is Ethics when it comes to data visualisation?• The figure/graph/image should show what is actually

happening and not what you want to happen.

• Different ways of being unethical:– knowingly:

• deliberately showing the data in a misleading manner,• choosing the ‘most representative’ image/experiment.

– unknowingly:• not exploring/getting to know the data well enough,• misusing your chosen graphical representation.

Page 4: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Before After0

200

400

600

800

1000

1200

1400

Cheating knowingly: Choice of graph

You know that what is going on

Before After0

200

400

600

800

1000

1200

1400

• Hypothesis (what you want to see): Applying a treatment will decrease the levels of a variable.

Exp2Exp1

Exp3

Exp4

You choose to plot your data like that

Page 5: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Cheating knowingly: Choice of axis/scale

You know that what is going on

• You want to show an increase in salary in the last term.

You choose to plot your data like thatJune July Aug Sept Oct Nov Dec

0

5000

10000

15000

20000

25000

Sal

ary

June July Aug Sept Oct Nov Dec19200

19400

19600

19800

20000

20200

Sal

ary

Page 6: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Cheating knowingly: Choice of axis/scale

• Be careful with Linear vs. logarithmic scale.

Page 7: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Cheating knowingly: Choice of axis/scale

• If you want to cheat, a bar graph using a log axis is a great tool, as it lets you either exaggerate differences between groups or minimize them.

Linear scale

Logarithmic scale

Page 8: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Cheating knowingly: Choice of axis/scale

• Logarithmic axis should be used for:

Lognormal data

Logarithmically spaced values

Page 9: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Original Brightness and Contrast

Adjusted

Brightness and Contrast

Adjusted Too Much:

Oversaturation

Cheating knowingly: Manipulating images: Western blot

• Presenting bands out of context • ‘Playing’ too much with contrast

• ‘Rebuilding’ a Western blot from several cuts

Page 10: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Cheating unknowingly: Not exploring/getting to know the data well enough

CondA CondB0

10

20

30

40

50

60

70

CondA CondB0

20

40

60

80

100

120

CondA CondB0

20

40

60

80

100

120

• Hypothesis: increase from CondA to CondB.You run the experiment once and you choose to plot the data as a bar chart.

Page 11: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Cheating unknowingly: Not exploring/getting to know the data well enough

Control Treatment 1 Treatment 2 Treatment 30

20

40

60

80

100

120

Val

ue

p=0.04

p=0.32

p=0.001

Comparisons: Treatments vs. Control

Control Treatment 1 Treatment 2 Treatment 30

20

40

60

80

100

120

140

Val

ue

Exp3Exp4

Exp1

Exp5

Exp2

Treat1 Treat2 Treat3-100

-50

0

50

100

Sta

nd

ard

ised

val

ues

Page 12: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Types of plotThings you can illustrate

Page 13: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Plot types – Distribution/ExplorationHistograms

• Very good for exploring data. Better on big dataset. • Rules: Number of intervals ≈√N and Interval width ≈ Range ÷√N• Histograms are great but careful with the resolution (= number of bins) as it affects the

shape of the distribution.

Page 14: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

• Be careful with the resolution …

… and the type of data you are dealing with.

0 1 2 3 4 5 6 7 8 9 100

2

4

6

8

10

Bin width = 1

Nu

mb

er o

f va

lues

0.00 1.25 2.50 3.75 5.00 6.25 7.50 8.75 10.000

2

4

6

8

10

12

Bin width = 1.25

Nu

mb

er o

f va

lues

0.0 1.5 3.0 4.5 6.0 7.5 9.0 10.50

2

4

6

8

10

12

14

16

18

Bin width = 1.5

Nu

mb

er o

f va

lues

Plot types – Distribution/ExplorationHistograms

• Histograms are great but careful with discrete data.

Page 15: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Male Female60

70

80

90

100

110

Le

ng

th (

cm

)

Cutoff = Q1 – 1.5*IQR

Median

Maximum

Interquartile Range (IQR): 50% of the data

Lower Quartile (Q1) 25th percentile (1st quartile)

Outlier

Upper Quartile (Q3) 75th percentile (3rd quartile)

Plot types – Distribution/ExplorationBoxplots and Bean plots

Minimum

Page 16: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Plot types – Distribution/ExplorationBoxplots and Bean plots

Bimodal Uniform NormalDistributions

A bean= a ‘batch’ of data

Data density mirrored by the shape of the polygon

Scatterplot shows individual data

• Very good for exploring data. Better on medium size dataset. • Boxplots are great but be careful with underlying distribution.

Page 17: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Plot types – Exploration/ComparisonStripcharts/Scatterplots

Control CondA CondB CondC CondD0.0

0.5

1.0

1.5

2.0

Val

ue

s

• Very good for exploring data. Better on small/medium dataset. • Very informative: exploration AND comparison.• Very hard to cheat with these. • Stripcharts are great but they don’t work so well with big samples.

Page 18: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Plot types – ComparisonsBarcharts

Control CondA CondB CondC CondD0.0

0.5

1.0

1.5

2.0

2.5

3.0

Control CondA CondB CondC CondD0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Control CondA CondB CondC CondD0.0

0.5

1.0

1.5

2.0

2.5

3.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Control

CondA

CondB

CondC

CondD

Standard deviation Standard error

Confidence intervalStar wars (cool graph!)

Control CondA CondB CondC CondD

Page 19: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Plot types – ComparisonsBarcharts

• Be careful with the scale when plotting ratio

• Very good for presenting results and emphasizing differences.• Effectiveness: most important info with the most effective

channel.• Barcharts are great but after data exploration and the y-axis

needs to be chosen wisely.

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

rati

o

0 10 20 30 40 50 60 70 80 90 100-3

-2

-1

0

1

2

3

log

2(ra

tio

)

Page 20: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Plot types – Relationship/ComparisonLine graphs

Except for exploration …

Control Treatment 1 Treatment 2 Treatment 30

20

40

60

80

100

120

140

Val

ue

0 10 20 30 40 50 60 70 80 90 100-2

-1

0

1

2

3

Arb

itra

ry c

ha

ng

e o

ve

r ti

me

0 10 20 30 40 50 600

20

40

60

80

100

Time

Pe

rcen

t s

urv

iva

l

CaPO CaPA CaPOA CaP

5 experiments

• Very good for presenting results of matched/paired/repeated data.• Linecharts are great but careful with the axes.

Page 21: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Plot types – RelationshipsScatterplot

• Very good for understanding the relationship between quantitativevariables.

Page 22: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Plot types – RelationshipsScatterplots

• Solution: smoothed densities colour representation

• Scatterplots are great but big data can be tricky.

Page 23: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

Plot types – RelationshipsHeatmaps

• Great for big data sets, allow to plot a third quantitative value: colour scheme for grouping.

Euclidean distance Correlation Colour scheme

• Heatmaps are great but plot data that are changing.

Page 24: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

A heatmap is basically a table that has colors in place of numbers.Simon’s data from simple numbers to correlation

Page 25: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.

ABCD

Total=62

E

ABCDE

Total=62

Plot types – CompositionStack charts/Pie charts

Group A Group B0

20

40

60

80

100

Pe

rcen

tag

e

ABCDE

• Stack /pie charts are great but keep an eye on the sample size.

Page 26: Ethics of data representation v2.0. Collect Raw Data Process and Filter Data Clean Dataset Exploratory Analysis Generate Conclusion Generate Visualisation.