Data Visualisation & Interpretationmarmakoide.org › download › teaching › dm ›...

Post on 29-Jun-2020

0 views 0 download

Transcript of Data Visualisation & Interpretationmarmakoide.org › download › teaching › dm ›...

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Data Visualisation & InterpretationThe art of reading datasets

Devert AlexandreSchool of Software Engineering of USTC

14 February 2012 — Slide 1/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Table of Contents

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 2/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Descriptive statistics

descriptive statistics helps to give a general summary ofdata

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 3/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Mean

Example of descriptive statistics quantity

arithmetic mean

a =1

n

n∑i=1

ai

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 4/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Mean

Example of descriptive statistics quantity

arithmetic mean

a =1

n(a1 + a2 + · · ·+ an)

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 4/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Mean

The mean is defined in Rn ⇒ geometric center

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 5/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Mean computation

You think, it is easy to compute the mean ?

0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 6/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Mean computation

A naive summation algorithm will return this

>>> 0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.10.8999999999999999

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 7/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Mean computation

An accurate summation algorithm will return this

>>> impor t math>>> math . fsum (0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1)0 .9

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 8/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Mean computation

Algorithms like Kahan summation algorithm or Shewchuksummation algorithm reduces the numerical error

de f KahanSum( data ) :s = 0 .0c = 0 .0f o r i i n range ( l e n ( data ) ) :

y = data [ i ] − ct = s + yc = ( t − s ) − ys = t

r e t u r n s

Listing 1: Kahan summation

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 9/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Central tendencyThe mean is a measure of central tendency ⇒ the mainbehaviour, the main value of some phenomenon

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 10/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Central tendencyThe mean is a measure of central tendency ⇒ the mainbehaviour, the main value of some phenomenon

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 10/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Mean robustnessThe mean is not a robust estimator of the centraltendency

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 11/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Median

The median is the value such as 50% of the values arehigher, 50% of the values are lower

a = [6, 1, 7, 9, 6, 3, 4, 5, 2]

a = [1, 2, 3, 4, 5, 6, 6, 7, 9]

a = 5

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 12/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Median

The median is the value such as 50% of the values arehigher, 50% of the values are lower

a = [6, 1, 7, 9, 6, 3, 4, 8, 5, 2]

a = [1, 2, 3, 4, 5, 6, 6, 7, 8, 9]

a =1

2(5 + 8) = 6.5

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 12/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Median computation

To compute the median, you can

1 sort the list of samples

2 • if size is odd → a = a n+12

• if size is even → a = 12(a n

2+ a n+1

2)

Note that it is for indexes starting from 1

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 13/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Median computation

Let’s code some python

de f median ( data ) :data . s o r t ( )i f l e n ( data ) % 2 == 0 :m = l e n ( data ) / 2r e t u r n 0 .5 ∗ ( data [m−1] + data [m] )

e l s e :r e t u r n data [ ( l e n ( data ) − 1) / 2 ]

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 14/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Median computation

Let’s code some python

>>> a =[6 , 1 , 7 , 9 , 6 , 3 , 4 , 5 , 2 ]>>> median ( a )5

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 14/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Median computationThe median have an equivalent in Rn ⇒ median center

Compute the median for each dimension to get themedian center

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 15/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Median robustness

The median is a more robust estimator of the centraltendency

• green is the median

• pink is the arithmeticmean

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 16/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Statistical dispersionThe following datasets have the same central tendency

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 17/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Statistical dispersionThe following datasets have the same central tendency

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 17/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Statistical dispersionBut they have different dispersions

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 18/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Standard deviation

A traditional measure of dispersion is the standarddeviation sigma

σ2 =1

n − 1

N∑i=1

(ai − a)2

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 19/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Standard deviation computation

Robust computation of the standard deviation ⇒Knuth-Welford algorithm

de f stdDev ( data ) :n = 0mean = 0M2 = 0meanEst imate = math . fsum ( data ) / l e n ( data )

f o r x i n data :y = x − meanEst imaten = n + 1d e l t a = y − meanmean = mean + d e l t a / nM2 = M2 + d e l t a ∗ ( y − mean )

r e t u r n math . s q r t (M2 / ( n − 1) )

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 20/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Standard deviation

Standard deviation suffers from the same robustnessissues as mean. We will look why, later.

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 21/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Quartiles

The lower quartile or first quartile is the value such as75% of the values are higher, 25% of the values are lower

a = [6, 1, 2, 7, 9, 6, 3, 4, 5, 2, 6]

a = [1, 2, 2, 3, 4, 5, 6, 6, 6, 7, 9]

q1 = 2

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 22/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Quartiles

The higher quartile or third quartile is the value such as25% of the values are higher, 75% of the values are lower

a = [6, 1, 7, 9, 6, 3, 4, 5, 2, 6]

a = [1, 2, 2, 3, 4, 5, 6, 6, 6, 7, 9]

q3 = 6

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 22/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Quartiles

Where is the second quartile ? ⇒ it’s the median

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 23/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Interquartile range

The difference Q3− Q1 is the interquartile range or IQR⇒ it’s a more robust dispersion measure

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 24/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

normal distributionA model for random variables, with 2 parameters µ and σ

−6 −4 −2 0 2 4 60.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 25/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

normal distribution

The normal distributions have 2 parameters µ and σ.

Φ(x) =1√

2πσ2e

−(x−µ)2

2σ2

This is the probability density of the normal distribution.

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 26/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

normal distribution

The normal distributions have 2 parameters µ and σ.

Φ(x) =1√

2πσ2e

−(x−µ)2

2σ2

It tells the probability for x to appear, according to thisdistribution.

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 26/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

normal distributionµ is the mode, the central tendency of the normaldistribution

−6 −4 −2 0 2 4 60.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 27/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

normal distribution

If some data are following a normal distribution, then

µ = a

The more sample, the more ”true“ it will be

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 28/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

normal distributionσ controls the shape of the normal distribution

−6 −4 −2 0 2 4 60.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 29/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

normal distribution

If some data are following a normal distribution

σ2 =1

n − 1

N∑i=1

(ai − a)2

The standard deviation comes from here ⇒ dispersion ofa normal distribution

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 30/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

normal distributionµ and σ are completely independent parameters

−6 −4 −2 0 2 4 60.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 31/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

normal distribution

Practical interpretation of the normal distribution0.0

0.1

0.2

0.3

0.4

−2σ −1σ 1σ−3σ 3σµ 2σ

34.1% 34.1%

13.6%2.1%

13.6% 0.1%0.1%2.1%

68% of the values within [µ− σ, µ + σ]

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 32/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

normal distribution

Practical interpretation of the normal distribution0.0

0.1

0.2

0.3

0.4

−2σ −1σ 1σ−3σ 3σµ 2σ

34.1% 34.1%

13.6%2.1%

13.6% 0.1%0.1%2.1%

95% of the values within [µ− 2σ, µ + 2σ]

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 32/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

normal distribution

Practical interpretation of the normal distribution0.0

0.1

0.2

0.3

0.4

−2σ −1σ 1σ−3σ 3σµ 2σ

34.1% 34.1%

13.6%2.1%

13.6% 0.1%0.1%2.1%

99.7% of the values within [µ− 3σ, µ + 3σ]

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 32/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

skewed distributions

Your data might not have a symmetric distribution ⇒they might have a skewed distribution

0.0 0.5 1.0 1.5 2.0 2.5 3.00.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

• red is the true centraltendency

• green is the median

• pink is the arithmeticmean

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 33/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

skewed distributions

Your data might not have a symmetric distribution ⇒they might have a skewed distribution

0.0 0.5 1.0 1.5 2.0 2.5 3.00.0

0.2

0.4

0.6

0.8

1.0

• red is the true centraltendency

• green is the median

• pink is the arithmeticmean

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 33/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

skewed distributions

Your data might not have a symmetric distribution ⇒they might have a skewed distribution

0.0 0.5 1.0 1.5 2.0 2.5 3.00.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

• red is the true centraltendency

• green is the median

• pink is the arithmeticmean

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 33/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

skewed distributions

You can compute the skewness of your data

1n

∑ni=1(ai − a)3(

1n

∑ni=1(ai − a)2

) 32

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 34/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

multimodal distributionsYour data might have multiple modes

−3 −2 −1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 35/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

multimodal distributionsIn such case, the mean, median and other descriptivequantities might have no reliable meaning

−3 −2 −1 0 1 2 3 4 50.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 36/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

multimodal distributionsIn such case, the mean, median and other descriptivequantities might have no reliable meaning

−3 −2 −1 0 1 2 3 4 50.0

0.2

0.4

0.6

0.8

1.0

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 36/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

multimodal distributionsIn such case, the mean, median and other descriptivequantities might have no reliable meaning

−3 −2 −1 0 1 2 3 4 50.0

0.2

0.4

0.6

0.8

1.0

1.2

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 36/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

multimodal distributionsIn such case, the mean, median and other descriptivequantities might have no reliable meaning

−3 −2 −1 0 1 2 3 4 50.0

0.2

0.4

0.6

0.8

1.0

1.2

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 36/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Table of Contents

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 37/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Observe your data

Descriptive statistics can completely miss importantinformations from your data !

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 38/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Observe your dataThe Anscombe’s quartet

4

8

12

0 10 20

4

8

12

0 10 20

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 39/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Observe your data

Those 4 datasets have exactly the same

• mean

• variance

• regression line

But they are not quite the same things !

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 40/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

BoxplotA nice way to summarize data distribution is the boxplot

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 41/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

BoxplotA nice way to summarize data distribution is the boxplot

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 41/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

BoxplotA nice way to summarize data distribution is the boxplot

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 41/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Boxplot

The red mark shows the mean

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 42/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Boxplot

The box goes from the lower quartile to the upperquartile

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 42/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Boxplot

The box is thus centred on the median

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 42/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Boxplot

The whiskers are the minimum and maximum values

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 42/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Boxplot

Outliers values are shown as blue crosses

Outliers are values which are beyond 1.5× IQR from thequartiles

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 43/1

UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC

Scatter plotA scatter plot is simply a plot with the data as pointsalong 2 dimensions

−3 −2 −1 0 1 2 3−5

−4

−3

−2

−1

0

1

2

3

4

Devert Alexandre (School of Software Engineering of USTC) — Data Visualisation & Interpretation — Slide 44/1