Big questions in Science…

57
Big questions in Science… What do I need to know about statistics to succeed in IB Biology?

description

Big questions in Science…. What do I need to know about statistics to succeed in IB Biology?. Statisticians…. ‘..people who like figures, but don ’ t have the personality skills to become accountants…’ d o uncertainty, randomness and chance have a place in science? - PowerPoint PPT Presentation

Transcript of Big questions in Science…

Page 1: Big questions in Science…

Big questions in Science…

What do I need to know about statistics to succeed in IB

Biology?

Page 2: Big questions in Science…

Statisticians…

‘..people who like figures, but don’t have the personality skills to become accountants…’

• do uncertainty, randomness and chance have a place in science?

• How should we react to them?...

Page 3: Big questions in Science…
Page 4: Big questions in Science…

Hummingbirds are nectarivores (herbivores that feed on the nectar of some species of flower).

In return for food, they pollinate the flower. This is an example of mutualism – benefit for all.

As a result of natural selection, hummingbird bills have evolved.

Birds with a bill best suited to their preferred food source have

the greater chance of survival.

Photo: Archilochus colubris, from wikimedia commons, by Dick Daniels.

Page 5: Big questions in Science…

Researchers studying comparative anatomy collect data on bill-length in two species of hummingbirds: Archilochus colubris (red-throated hummingbird) and Cynanthus latirostris (broadbilled hummingbird).

To do this, they need to collect sufficientrelevant, reliable data so they can testthe Null hypothesis (H0) that:

“there is no significant difference in bill length between the two species.”

Photo: Archilochus colubris (male), wikimedia commons, by Joe Schneid

Page 6: Big questions in Science…

The sample size must be large enough to provide

sufficient reliable data and for us to carry out relevant statistical

tests for significance.

We must also be mindful of uncertainty in our measuring tools

and error in our results.

Photo: Broadbilled hummingbird (wikimedia commons).

Page 7: Big questions in Science…
Page 8: Big questions in Science…

The mean is a measure of the central tendency of a set of data.

 Table 1: Raw measurements of bill length in A. colubris and C. latirostris.     Bill length (±0.1cm)   n A. colubris C. latirostris  1 13.0 17.0   2 14.0 18.0   3 15.0 18.0   4 15.0 18.0   5 15.0 19.0   6 16.0 19.0   7 16.0 19.0   8 18.0 20.0   9 18.0 20.0   10 19.0 20.0  Mean      s           

Calculate the mean using: • Your calculator (sum of values / n)

• Excel

=AVERAGE(highlight raw data)

n = sample size. The bigger the better. In this case n=10 for each group.

All values should be centred in the cell, with decimal places consistent with the measuring tool uncertainty.

Page 9: Big questions in Science…

The mean is a measure of the central tendency of a set of data.

 Table 1: Raw measurements of bill length in A. colubris and C. latirostris.     Bill length (±0.1cm)   n A. colubris C. latirostris  1 13.0 17.0   2 14.0 18.0   3 15.0 18.0   4 15.0 18.0   5 15.0 19.0   6 16.0 19.0   7 16.0 19.0   8 18.0 20.0   9 18.0 20.0   10 19.0 20.0  Mean 15.9 18.8   s       

Raw data and the mean need to have consistent decimal places (in line with uncertainty of the measuring tool)

Uncertainties must be included.

Descriptive table title and number.

Page 10: Big questions in Science…

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

16.0

18.0

20.0

A. colubris, 15.9mm

C. latirostris, 18.8mm

Graph 1: Comparing mean bill lengths in two hummingbird species, A. colubris and C. latirostris.

Species of hummingbird

Mea

n Bi

ll le

ngth

(±0

.1cm

) Descriptive title, with graph number (graph number must match table number)

Labeled point

Y-axis clearly labeled, with uncertainty.

Make sure that the y-axis begins at zero.

x-axis labeled

Page 11: Big questions in Science…

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

16.0

18.0

20.0

A. colubris, 15.9mm

C. latirostris, 18.8mm

Graph 1: Comparing mean bill lengths in two hummingbird species, A. colubris and C. latirostris.

Species of hummingbird

Mea

n Bi

ll le

ngth

(±0

.1cm

)

From the means alone you might conclude that C. latirostris has a longer bill than A. colubris.

But the mean only tells part of the story.

Page 12: Big questions in Science…
Page 13: Big questions in Science…
Page 14: Big questions in Science…
Page 15: Big questions in Science…
Page 16: Big questions in Science…
Page 17: Big questions in Science…
Page 18: Big questions in Science…
Page 19: Big questions in Science…
Page 20: Big questions in Science…
Page 21: Big questions in Science…

How can I find the mean and the standard deviation on my calculator?

http://click4biology.info/c4b/1/gcStat.htm

Head over to click4biology for instructions on how to calculate the mean and standard deviation using the TI 83 plus and the TI 84 plus calculator

Page 22: Big questions in Science…

Standard deviation is a measure of the spread of most of the data.

 Table 1: Raw measurements of bill length in A. colubris and C. latirostris.     Bill length (±0.1cm)   n A. colubris C. latirostris  1 13.0 17.0   2 14.0 18.0   3 15.0 18.0   4 15.0 18.0   5 15.0 19.0   6 16.0 19.0   7 16.0 19.0   8 18.0 20.0   9 18.0 20.0   10 19.0 20.0  Mean 15.9 18.8   s 1.91 1.03        

Standard deviation can have one more decimal place. =STDEV (highlight RAW data).

Which of the two sets of data has:

a. The longest mean bill length?

b. The greatest variability in the data?

Page 23: Big questions in Science…

Standard deviation is a measure of the spread of most of the data.

 Table 1: Raw measurements of bill length in A. colubris and C. latirostris.     Bill length (±0.1cm)   n A. colubris C. latirostris  1 13.0 17.0   2 14.0 18.0   3 15.0 18.0   4 15.0 18.0   5 15.0 19.0   6 16.0 19.0   7 16.0 19.0   8 18.0 20.0   9 18.0 20.0   10 19.0 20.0  Mean 15.9 18.8   s 1.91 1.03        

Standard deviation can have one more decimal place. =STDEV (highlight RAW data).

Which of the two sets of data has:

a. The longest mean bill length?

b. The greatest variability in the data?

C. latirostris

A. colubris

Page 24: Big questions in Science…

Standard deviation is a measure of the spread of most of the data. Error bars are a graphical representation of the variability of data.

Which of the two sets of data has:

a. The highest mean?

b. The greatest variability in the data?

Error bars could represent standard deviation, range or confidence intervals.

Page 25: Big questions in Science…

Standard deviation is a measure of the spread of most of the data. Error bars are a graphical representation of the variability of data.

Which of the two sets of data has:

a. The highest mean?

A

b. The greatest variability in the data?

Error bars could represent standard deviation, range or confidence intervals.

B

Page 26: Big questions in Science…

0.0

5.0

10.0

15.0

20.0

A. colubris, 15.9mm

C. latirostris, 18.8mm

Graph 1: Comparing mean bill lengths in two hummingbird species, A. colubris and C.

latirostris. (error bars = standard deviation)

Species of hummingbird

Mea

n Bi

ll le

ngth

(±0

.1cm

) Title is adjusted to show the source of the error bars. This is very important.

You can see the clear difference in the size of the error bars.

Variability has been visualised.

The error bars overlap somewhat.

What does this mean?

Page 27: Big questions in Science…

The overlap of a set of error bars gives a clue as to the significance of the difference between two sets of data.

Large overlap No overlap

Lots of shared data points within each data set.

Results are not likely to be significantly different from each other.

Any difference is most likely due to chance.

No (or very few) shared data points within each data set.

Results are more likely to be significantly different from each other.

The difference is more likely to be ‘real’.

Page 28: Big questions in Science…
Page 29: Big questions in Science…
Page 30: Big questions in Science…
Page 31: Big questions in Science…

-3.0

2.0

7.0

12.0

17.0

22.0

A. colubris, 15.9mm(n=10)

C. latirostris, 18.8mm(n=10)

Graph 1: Comparing mean bill lengths in two hummingbird species, A. colubris and C.

latirostris.(error bars = standard deviation)

Species of hummingbird

Mea

n Bi

ll le

ngth

(±0

.1cm

) Our results show a very small overlap between the two sets of data.

So how do we know if the difference is significant or not?

We need to use a statistical test.

The t-test is a statistical test that helps us determine the significance of the difference between the means of two sets of data.

Page 32: Big questions in Science…

Inferential StatisticsComparing two data sets: The T-test…

• Used to compare two normally distributed data sets (ideally with similar variances)

• A t-test is a statistic that checks if the means of 2 groups are reliably different

• Just looking at the means may show you that they are different, but doesn’t show if the difference is reliable

• We always test the NULL Hypothesis (H0)• T-test…the movie…

Page 33: Big questions in Science…
Page 34: Big questions in Science…
Page 35: Big questions in Science…

So what are degrees of freedom?

Degrees of freedom represent sample size.For only one group, df = n-1, where n = number of samplesUsually we are looking at 2 groups, so df = (n1 + n2) -2

Page 36: Big questions in Science…

P value = 0.1 0.05 0.02 0.01confidence 90% 95% 98% 99%

degrees of freedom

1 6.31 12.71 31.82 63.66 2 2.92 4.30 6.96 9.92 3 2.35 3.18 4.54 5.84 4 2.13 2.78 3.75 4.60 5 2.02 2.57 3.37 4.03 6 1.94 2.45 3.14 3.71 7 1.89 2.36 3.00 3.50 8 1.86 2.31 2.90 3.36 9 1.83 2.26 2.82 3.25

10 1.81 2.23 2.76 3.17

We can calculate the value of ‘t’ for a given set of data and compare it to critical values that depend on the size of our sample and the level of confidence we need.

Example two-tailed t-table.

What happens to the value of P as the confidence in the results increases?

What happens to the critical value as the confidence level increases?

“critical values”

Page 37: Big questions in Science…

P value = 0.1 0.05 0.02 0.01confidence 90% 95% 98% 99%

degrees of freedom

1 6.31 12.71 31.82 63.66 2 2.92 4.30 6.96 9.92 3 2.35 3.18 4.54 5.84 4 2.13 2.78 3.75 4.60 5 2.02 2.57 3.37 4.03 6 1.94 2.45 3.14 3.71 7 1.89 2.36 3.00 3.50 8 1.86 2.31 2.90 3.36 9 1.83 2.26 2.82 3.25

10 1.81 2.23 2.76 3.17

We can calculate the value of ‘t’ for a given set of data and compare it to critical values that depend on the size of our sample and the level of confidence we need.

Example two-tailed t-table.

We usually use P<0.05 (95% confidence) in Biology, as our data can be highly variable

“critical values”

Page 38: Big questions in Science…

2-tailed t-table source: http://www.medcalc.org/manual/t-distribution.php

Page 39: Big questions in Science…

t was calculated as 2.15 (this is done for you)

t cv 2.15

If t < cv, accept H0 (there is no significant difference)If t > cv, reject H0 (there is a significant difference)

2-tailed t-table source: http://www.medcalc.org/manual/t-distribution.php

Page 40: Big questions in Science…

0.05

t was calculated as 2.15 (this is done for you)

t cv 2.15

If t < cv, accept H0 (there is no significant difference)If t > cv, accept H0 (there is a significant difference)

2-tailed t-table source: http://www.medcalc.org/manual/t-distribution.php

Page 41: Big questions in Science…

2.069

0.05

t was calculated as 2.15 (this is done for you)

t cv 2.15 > 2.069

If t < cv, accept H0 (there is no significant difference)If t > cv, reject H0 (there is a significant difference)

2-tailed t-table source: http://www.medcalc.org/manual/t-distribution.php

Page 42: Big questions in Science…

2.069

0.05

t was calculated as 2.15 (this is done for you)

t cv 2.15 > 2.069

If t < cv, accept H0 (there is no significant difference)If t > cv, accept H0 (there is a significant difference)

Conclusion: “There is a significant difference in the wing spans of the two populations of birds.”

2-tailed t-table source: http://www.medcalc.org/manual/t-distribution.php

Page 43: Big questions in Science…

2-tailed t-table source: http://www.medcalc.org/manual/t-distribution.php

Page 44: Big questions in Science…

2-tailed t-table source: http://www.medcalc.org/manual/t-distribution.php

Page 45: Big questions in Science…

2.0452.045

2-tailed t-table source: http://www.medcalc.org/manual/t-distribution.php

“There is no significant difference in the size of shells between north-side and south-side snail populations.”

Page 46: Big questions in Science…

2-tailed t-table source: http://www.medcalc.org/manual/t-distribution.php

Page 47: Big questions in Science…

2.0862.086

2-tailed t-table source: http://www.medcalc.org/manual/t-distribution.php

“There is a significant difference in the resting heart rates between the two groups of swimmers.”

Page 48: Big questions in Science…

Cartoon from: http://www.xkcd.com/552/

Correlation does not imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing "look over there."

Page 49: Big questions in Science…
Page 50: Big questions in Science…
Page 51: Big questions in Science…
Page 52: Big questions in Science…

http://diabetes-obesity.findthedata.org/b/240/Correlations-between-diabetes-obesity-and-physical-activity

Diabetes and obesity are ‘risk factors’ of each other. There is a strong correlation between them, but does this mean one causes the other?

Page 54: Big questions in Science…

Correlation does not imply causality.

Pirates vs global warming, from http://en.wikipedia.org/wiki/Flying_Spaghetti_Monster#Pirates_and_global_warming

Where correlations exist, we must then design solid scientific experiments to determine the cause of the relationship. Sometimes a correlation exist because of confounding variables – conditions that the correlated variables have in common but that do not directly affect each other.

To be able to determine causality through experimentation we need: • One clearly identified independent variable• Carefully measured dependent variable(s) that can be attributed to change in the

independent variable• Strict control of all other variables that might have a measurable impact on the

dependent variable.

We need: sufficient relevant, repeatable and statistically significant data.

Some known causal relationships: • Atmospheric CO2 concentrations and global warming• Atmospheric CO2 concentrations and the rate of photosynthesis• Temperature and enzyme activity

Page 55: Big questions in Science…

Question check:

Page 56: Big questions in Science…

Question check:

Page 57: Big questions in Science…