Experimental design and analysis Graphical Exploration of Data Gerry Quinn & Mick Keough, 1998 Do...

27
Experimental design and analysis Graphical Exploration of Data Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Transcript of Experimental design and analysis Graphical Exploration of Data Gerry Quinn & Mick Keough, 1998 Do...

Page 1: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Experimental design and analysis

Graphical Exploration of Data

Gerry Quinn & Mick Keough, 1998Do not copy or distribute without permission of authors.

Page 2: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Graphical displays

• Exploration– assumptions (normality, equal variances)– unusual values– which analysis?

• Analysis– model fitting

• Presentation/communication of results

Page 3: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Space shuttle data

• NASA meeting Jan 27th 1986– day before launch of shuttle Challenger

• Concern about low air temperatures at launch

• Affect O-rings that seal joints of rocket motors

• Previous data studied

Page 4: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

50 55 60 65 70 75 80 85

0

1

2

3

Joint temp. oF

Num

ber

of in

cide

nts

O-ring failure vs temperaturePre 1986

Page 5: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

50 55 60 65 70 75 80 85

0

1

2

3

Joint temp. Fo

Num

ber

of in

cide

nts

O-ring failure vs temperature

Page 6: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Checking assumptions - exploratory data analysis (EDA)

• Shape of sample (and therefore population)– is distribution normal (symmetrical) or skewed?

• Spread of sample– are variances similar in different groups?

• Are outliers present– observations very different from the rest of the

sample?

Page 7: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Distributions of biological dataBell-shaped symmetrical

distribution:

• normal

y

Pr(y)

Pr(y)

y

Skewed asymmetrical distribution:

• log-normal• poisson

Page 8: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Common skewed distributions

Log-normal distribution:

• proportional to • measurement data, e.g. length, weight etc.

Poisson distribution:

• = 2

• count data, e.g. numbers of individuals

Page 9: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Exploring sample data

Page 10: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Example data set

• Quinn & Keough (in press)

• Surveys of 8 rocky shores along Point Nepean coast

• 10 sampling times (1988 - 1993)

• 15 quadrats (0.25m2) at each site

• Numbers of all gastropod species and % cover of macroalgae recorded from each quadrat

Page 11: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Frequency distributions

NORMAL LOG-NORMAL

Value of variable (class)

Num

ber

of o

bser

vati

ons

Observations grouped into classes

Value of variable (class)

Page 12: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Number of Cellana per quadrat

30

20

10

00 20 40 60 80 100

Number of Cellana per quadrat

Fre

quen

cy

Survey 5, all shores combinedTotal no. quadrats = 120

Page 13: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Dotplots

0 10 20 30 40

Number of Cellana per quadrat

• Each observation represented by a dot• Number of Cellana per quadrat, Cheviot

Beach survey 5• No. quadrats = 15

Page 14: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Boxplot

25% of values}

}

}

}

"

"

"

spread

outlier

hinge

hinge

median

*

GROUP

VA

RIA

BLE

largest value

smallest value

Page 15: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

1. IDEAL 2. SKEWED

4. UNEQUAL VARIANCES3. OUTLIERS

*

*

**

*

Page 16: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

0

20

40

60

80

100

S FPE RR SP CPE CB LB CPW

Site

Num

ber

of Cellana

per

qua

drat

Boxplots of Cellana numbers in survey 5

Page 17: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Scatterplots

• Plotting bivariate data

• Value of two variables recorded for each observation

• Each variable plotted on one axis (x or y)

• Symbols represent each observation

• Assess relationship between two variables

Page 18: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Cheviot Beach survey 5 n = 15

0 10 20 30 40 50 60 700

10

20

30

40

% cover of Hormosira per quadrat

Num

ber

of Cellana

per

qua

drat

Page 19: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Scatterplot matrix

• Abbreviated to SPLOM

• Extension of scatterplot

• For plotting relationships between 3 or more variables on one plot

• Bivariate plots in multiple panels on SPLOM

Page 20: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

SPLOM for Cheviot Beach survey 5

CELLANA- numbers of Cellana

SIPHALL- numbers of Siphonaria

HORMOS- % cover of Hormosira

n = 15 quadrats

Page 21: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Transformations

• Improve normality.

• Remove relationship between mean and variance.

• Make variances more similar in different populations.

• Reduce influence of outliers.

• Make relationships between variables more linear (regression analysis).

Page 22: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Log transformation

Lognormal Normal

y = log(y)

Measurement data

Page 23: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Power transformation

Poisson Normal

y = (y), i.e. y = y0.5, y = y0.25

Count data

Page 24: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Arcsin transformation

Square Normal

y = sin-1((y))

Proportions and percentages

Page 25: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Outliers

• Observations very different from rest of sample - identified in boxplots.

• Check if mistakes (e.g. typos, broken measuring device) - if so, omit.

• Extreme values in skewed distribution - transform.

• Alternatively, do analysis twice - outliers in and outliers excluded. Worry if influential.

Page 26: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Assumptions not met?

• Check and deal with outliers

• Transformation– might fix non-normality and unequal variances

• Nonparametric rank test– does not assume normality– does assume similar variances– Mann-Whitney-Wilcoxon– only suitable for simple analyses

Page 27: Experimental design and analysis Graphical Exploration of Data  Gerry Quinn & Mick Keough, 1998 Do not copy or distribute without permission of authors.

Category or line plot

1 2 3 4 5 6 7 8 9 10 0

5

10

15

20

25

30

Mea

n nu

mbe

r of

Cellana

per

qua

drat

Survey

1 2 3 4 5 6 7 8 9 10 0

5

10

15

20

25

30

Cheviot BeachSorrento

Mea

n nu

mbe

r of

Cellana

per

qua

drat

Survey