Ecology reporting and statistical analysis Chris Luszczek Biol2050.

36
Ecology reporting and statistical analysis Chris Luszczek Biol2050

Transcript of Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Page 1: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Ecology reporting and statistical

analysis

Chris LuszczekBiol2050

Page 2: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Introduction

• Please treat this slide show as a statistics manual for Biol2050• This tutorial will provide you with the basics of various

common statistical methods and examples of how to perform these tests using SPSS statistical software available in York computer labs and accessible from home using York’s remote File Access System (FAS)

• *WARNING* The FAS may involve a lengthy installation procedure and I have found it to be finicky, sometimes requiring multiple tempts at installation. Be aware of this if you are downloading the software at home… at midnight the evening before your report is due.

Page 3: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Outline1) Hypothesis Building– Null hypothesis/alternate hypothesis

2) Hypothesis Testing3) Common Statistical tests and how to run them

A) CorrelationB) t-testC) ANOVA

4) Graphing– how to present your findings– Types of graphs and usage– formatting

Page 4: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

1) Hypothesis Building• Creating testable hypothesis is central to scientific method

– Null (Ho) hypothesis – ‘no effect’ or ‘no difference’ between samples or treatments

– Alternative (Ha) hypothesis – experimental treatment has a certain statistically significant

– A claim for which we are trying to find evidence

ExampleHo: “Different habitats on the York university campus display no differences in diversity”(Ho: x2=x1 or x2-x1=0)

Ha: “Grassland habitats at York University contain higher diversity than managed or landscaped areas”(Ha: x2>x1or x2-x1> 0)

Page 5: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

2) Hypothesis Testing

• Either reject or fail to reject the H0 based on statistical testing

• Statistical testing compares the p-value of observed data to an assigned significance level (α)– p-value – the frequency or probability with which the observed

event would occur

– α = the probability that the outcome did not occur by chance• Popular levels of significance are 5% (0.05), 1% (0.01), and 0.1% (0.001)

IF p-value is SMALLER than α reject the null hypothesis (H0)

Page 6: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Hypothesis Testing Visual Summary

Sample distribution 1 Sample distribution 2

Mean 1 Mean 2

Are the means significantly different?

Page 7: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

3) Common Statistical Tests in Ecology

• T-tests• ANOVA - Analysis Of VAriance • Correlation

Page 8: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Common Statistical Tests in Ecology

• T-tests: used to determine if two sets of data (2 means) are significantly different from each other. It assumes that data is normally distributed and samples are equal.– 2 decisions must be made when selecting a t-test:• Paired vs. independent • 1-tailed vs 2-tailed

• ANOVA - Analysis Of VAriance • Correlation

Page 9: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

3A) T-test• One-sample (paired) t-test: Compares two samples in

cases where each value in one sample has a natural partner in the other (data are not independent). Used on pre/post data . Also compares a sample mean to a specified value– Comparing patient performance before and after the

application of a drug (repeated measures sampling – the same subjects are measures before and after treatment)

• Two- sample (Independent) t-test: compares means for two groups of cases.– Comparing patient performance in a group receiving a drug

versus a separate group receiving a trial drug

Page 10: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

3A) T-test• One-tailed/sided t-test: expect the effect to be in a

certain direction– “is the sample mean greater than µ?”– “is the sample mean less than µ?”

H0 : µ = where is known

HA : µ > or µ <

• Two-tailed/sided t-test: testing for different means regardless of direction– “is there a significant difference?”

H0 : =

HA :

Page 11: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Match Your Hypothesis and Test!A carefully stated experimental hypothesis with indicate the type of effect you are looking for

For example, the hypothesis that "Coffee improves memory“

– suggests paired, one tailed because you will repeatedly measure the same participants and expect an improvement.

"Men weigh a different amount from women“ - suggests an independent two tailed test as no direction is implied.

So remember, don't be vague with your hypothesis if you are looking for a specific effect! Be careful with the null hypothesis too - avoid "A does not effect B" if you really mean "A does not improve B".

Page 12: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Running a T-test in SPSS

• Question: Do the fish in lake 1 and lake 2 weigh the same?

• Null hypothesis: = (the fish in lake 1 weigh the same as the fish in lake 2)– An independent, 2-tailed test!

• Alternative hypothesis : (the fish in lake 1 and lake 2 DO NOT weigh the same)

Page 13: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

1) Data from an excel sheet can be opened in SPSS –Sometimes will automatically see a summary of your data rather than the data – to correct:

2) Click Data view tab rather than Variable view

1)

2)

Page 14: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Data View / entry

Page 15: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Select Analyze compare means independent samples t testWeight is the test variable Lake is the grouping variable (Click on define groups and type the two names used in the data view)

Page 16: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Our example: Levene’s (p = 0.669) so we can interpret the t-test (p = 0.01) so we can reject the null hypothesis, thus the fish from lake 1 and lake 2 DO NOT weigh the same. How to report: two-sample t(df) = t-value, p = p-value(two-sample t(12) = -3.065, p = 0.01)

Levene’s test – Assesses if variances are equal, if greater p > 0.05 you can interpret the t results

* Given the quality of data collected in these labs assume that the data fulfills the Levene’s test and go on to interpret t-test*

Output

Page 17: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Common Statistical Tests in Ecology

• T-tests:

• ANOVA - Analysis Of VAriance – Comparing more than two groups of means– Compares variance within groups and between

groups– Parametric, extension of two-tailed t-test

• Correlation:

Page 18: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

3B) ANOVA• Analysis Of VAriance (ANOVA)Examples:• Is tree density at all York habitats the same?• Does insect diversity in York grasslands differ

from insect diversity in York woodlots and human impacted?

• 3 means being comparedH0 : µ1 = µ2 = µ3 = … = µk where k = number of related groups

HA: one or more means are different

Page 19: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Running an ANOVAYou sample four fish from each of three lakes to determine if the fish from the three lakes all weight the same.H0 : There is no difference in fish weight between lakesH0 : HA :

Select Analyze compare means one way ANOVA

*IMPORTANT*Select post hoc Tukey continue OK

Running the ANOVA will identify IF differences between groups exist.

Running a post hoc test will test all combinations to determine WHICH groups are difference from each other

Page 20: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Sig. difference between groups

Lake 1 and 2 are not significantly different but both are sig. different from lake 3 (based on α = 0.05)

Page 21: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Common Statistical Tests in Ecology

• T-tests• ANOVA - Analysis Of VAriance

• Correlation: Indicates the strength and direction of a linear relationship between two random variables

H0 : no relationship between variables

HA : there is a relationship between variables

Page 22: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

3C) Correlation• Pearson’s Correlation Coefficient (r) – measures

the relationship between two variables• r always lies between -1 and +1

– Positive r-values means that the two variables increase with each other. Negative r-values mean they decrease with each other

– r-values close to zero mean the variables have no relationship. r-values close to either -1 or 1 mean the relationship is strong.

– Generally, for ecological data, r greater than 0.5 is considered very strong and a correlation less than 0.2 is considered weak.

– R2 (coefficient of determination) is the percent of the data that is closest to the line of best fit or a measure of how well the regression lines represents the data.

Page 23: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Correlation Example 1• Is there a relationship between the bird

diversity and plant diversity in a given habitat?

H0 : no relationship between variables

HA : there is a relationship between variables

r=0.3

Page 24: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Correlation Example 2• Is there a relationship between plant density

and a) bare ground b) soil pH c) species richness?

H0 : no relationship between variables

HA : there is a relationship between variables

Page 25: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Running a CorrelationHypothesize that there is a relationship between mean fish length and lake size (larger lakes might have larger fish).Collected data from 21 lakes.

Select Graphs legacy dialogs scatter define

(lake size x variable and fish length y variable)

Page 26: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Select Graphs legacy dialogs scatter define (lake size x variable and fish length y variable)

Page 27: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

r = 0.824p < 0.001

Therefore, there is a HIGHLY SIGNIFICANT, STRONG, POSITIVE relationship between fish length and lake size.

Page 28: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Outline1) Hypothesis Building– Null hypothesis/alternate hypothesis

2) Hypothesis Testing3) Common Statistical tests and how to run them

A) CorrelationB) t-testC) ANOVA

4) Graphing– how to present your findings– Types of graphs and usage– formatting

Page 29: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Choosing Graphs

• As we have seen some tests are related to specific figures– Correlations and

Scatter plots

The following slides outline the basic use of several common graphs

– Scatter plots– Line Graphs– Bar graphs– Histograms

Your hypothesis and statistical test should guide your choice of figures!

Page 30: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Scatter plot• Displays 2 variables for a set of data• Dependant vs. independent – one variable is under the control of the other

variable (Regression Analysis)OR

• If we have no dependent variable, a scatter plot will show the degree of correlation (NOT CAUSATION!)

Page 31: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Line graph• Shows relationship between values plotted on

each axis (dependant vs. independent)– Used on continuous variables

Page 32: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Bar graph • Used for discreet quantitative variables which

are similar but not necessarily related• Often use ANOVA to test difference

Page 33: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Making Proper Error bars in ExcelExcel will apply the same error to all bars if you use the automatic error bar feature.

To produce proper, interpretable error bars you must:1) Calculate standard error for your data:

- First calculate standard deviation using the “STDEV.S” function - Then divide standard deviation by the square root of n (observations per group)

to give you Standard Error.

2) Different versions of excel hide the ‘custom error bar’ option in different places – - try selecting the data bars right click select ‘format data series’ ‘error

bars’OR

- try clicking the graph move to ‘layout’ under ‘chart tools’ tab ‘error bars’

3) select ‘custom’ and ‘specify value’

4) Be sure to select the ‘range’ of SE values to match the range of selected data for both the positive and negative error value

Page 34: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Proper Error Bars

1)2)

3)4)See previous slide

for explanation of steps.

Page 35: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Histogram• Used exclusively for showing the

distribution of data that are continuous.

Page 36: Ecology reporting and statistical analysis Chris Luszczek Biol2050.

Conclusion

• This tutorial has provided you with the basic theory, mechanics and applications of common statistical tests.

• You should now be able to carry out scientific reporting from hypothesis formation to statistical testing and figure formatting.