Experimental design and analysis Graphical Exploration of Data Gerry Quinn & Mick Keough, 1998 Do...
-
Upload
barry-kennedy -
Category
Documents
-
view
212 -
download
0
Transcript of Experimental design and analysis Graphical Exploration of Data Gerry Quinn & Mick Keough, 1998 Do...
Experimental design and analysis
Graphical Exploration of Data
Gerry Quinn & Mick Keough, 1998Do not copy or distribute without permission of authors.
Graphical displays
• Exploration– assumptions (normality, equal variances)– unusual values– which analysis?
• Analysis– model fitting
• Presentation/communication of results
Space shuttle data
• NASA meeting Jan 27th 1986– day before launch of shuttle Challenger
• Concern about low air temperatures at launch
• Affect O-rings that seal joints of rocket motors
• Previous data studied
50 55 60 65 70 75 80 85
0
1
2
3
Joint temp. oF
Num
ber
of in
cide
nts
O-ring failure vs temperaturePre 1986
50 55 60 65 70 75 80 85
0
1
2
3
Joint temp. Fo
Num
ber
of in
cide
nts
O-ring failure vs temperature
Checking assumptions - exploratory data analysis (EDA)
• Shape of sample (and therefore population)– is distribution normal (symmetrical) or skewed?
• Spread of sample– are variances similar in different groups?
• Are outliers present– observations very different from the rest of the
sample?
Distributions of biological dataBell-shaped symmetrical
distribution:
• normal
y
Pr(y)
Pr(y)
y
Skewed asymmetrical distribution:
• log-normal• poisson
Common skewed distributions
Log-normal distribution:
• proportional to • measurement data, e.g. length, weight etc.
Poisson distribution:
• = 2
• count data, e.g. numbers of individuals
Exploring sample data
Example data set
• Quinn & Keough (in press)
• Surveys of 8 rocky shores along Point Nepean coast
• 10 sampling times (1988 - 1993)
• 15 quadrats (0.25m2) at each site
• Numbers of all gastropod species and % cover of macroalgae recorded from each quadrat
Frequency distributions
NORMAL LOG-NORMAL
Value of variable (class)
Num
ber
of o
bser
vati
ons
Observations grouped into classes
Value of variable (class)
Number of Cellana per quadrat
30
20
10
00 20 40 60 80 100
Number of Cellana per quadrat
Fre
quen
cy
Survey 5, all shores combinedTotal no. quadrats = 120
Dotplots
0 10 20 30 40
Number of Cellana per quadrat
• Each observation represented by a dot• Number of Cellana per quadrat, Cheviot
Beach survey 5• No. quadrats = 15
Boxplot
25% of values}
}
}
}
"
"
"
spread
outlier
hinge
hinge
median
*
GROUP
VA
RIA
BLE
largest value
smallest value
1. IDEAL 2. SKEWED
4. UNEQUAL VARIANCES3. OUTLIERS
*
*
**
*
0
20
40
60
80
100
S FPE RR SP CPE CB LB CPW
Site
Num
ber
of Cellana
per
qua
drat
Boxplots of Cellana numbers in survey 5
Scatterplots
• Plotting bivariate data
• Value of two variables recorded for each observation
• Each variable plotted on one axis (x or y)
• Symbols represent each observation
• Assess relationship between two variables
Cheviot Beach survey 5 n = 15
0 10 20 30 40 50 60 700
10
20
30
40
% cover of Hormosira per quadrat
Num
ber
of Cellana
per
qua
drat
Scatterplot matrix
• Abbreviated to SPLOM
• Extension of scatterplot
• For plotting relationships between 3 or more variables on one plot
• Bivariate plots in multiple panels on SPLOM
SPLOM for Cheviot Beach survey 5
CELLANA- numbers of Cellana
SIPHALL- numbers of Siphonaria
HORMOS- % cover of Hormosira
n = 15 quadrats
Transformations
• Improve normality.
• Remove relationship between mean and variance.
• Make variances more similar in different populations.
• Reduce influence of outliers.
• Make relationships between variables more linear (regression analysis).
Log transformation
Lognormal Normal
y = log(y)
Measurement data
Power transformation
Poisson Normal
y = (y), i.e. y = y0.5, y = y0.25
Count data
Arcsin transformation
Square Normal
y = sin-1((y))
Proportions and percentages
Outliers
• Observations very different from rest of sample - identified in boxplots.
• Check if mistakes (e.g. typos, broken measuring device) - if so, omit.
• Extreme values in skewed distribution - transform.
• Alternatively, do analysis twice - outliers in and outliers excluded. Worry if influential.
Assumptions not met?
• Check and deal with outliers
• Transformation– might fix non-normality and unequal variances
• Nonparametric rank test– does not assume normality– does assume similar variances– Mann-Whitney-Wilcoxon– only suitable for simple analyses
Category or line plot
1 2 3 4 5 6 7 8 9 10 0
5
10
15
20
25
30
Mea
n nu
mbe
r of
Cellana
per
qua
drat
Survey
1 2 3 4 5 6 7 8 9 10 0
5
10
15
20
25
30
Cheviot BeachSorrento
Mea
n nu
mbe
r of
Cellana
per
qua
drat
Survey