Ecology is a Science – Queen of Sciences Follows Scientific Method Hypothetico-deductive approach...
-
Upload
moses-george -
Category
Documents
-
view
213 -
download
0
Transcript of Ecology is a Science – Queen of Sciences Follows Scientific Method Hypothetico-deductive approach...
Ecology is a Science – Queen of Sciences
Follows Scientific Method
Hypothetico-deductive approach (Popper) based on principle of falsification: theories are disproved because proof is logically impossible. A theory is disproved if there exists a logically possible explanation that is inconsistent with it
Model Explanation or theory (maybe >1)
Hypothesis Prediction deduced from modelGenerate null hypothesis – H0: Falsification test
Test Experiment•IF H0 rejected – model supported•IF H0 accepted – model wrong
Pattern Observation Rigorously Describe
**
*
StatisticsCan only really test hypotheses by experimentation
Notiluca give off light when disturbed
Pattern Observation
Rigorously Describe
Model Explanation or theory (maybe >1)
Give off light when attacked by copepods to attract fish (to eat the copepods)
Hypothesis Prediction deduced from modelGenerate null hypothesis – H0: Falsification test
H0: Bioluminescence has no effect on predation of copepods by fish (or decreases predation)
H1: Bioluminescence increases predation of copepods by fish
TestExperiment•IF H0 rejected – model supported•IF H0 accepted – model wrong
Statistics – summary, analysis and interpretation of data
Data pl (datum, s) are observations, numerical facts
Nominal data – gender, colour, species, genus, class, town, country, model etc
Continuous data – concentration, depth, height, weight, temperature, rate etc
Discrete data – numbers per unit space, numbers per entity etc
RAW MATERIAL OF SCIENCE
Often referred to as VARIABLES because they vary
Types of Data
The type of data collected influences their analysis
Variability – key feature of the natural world
Genotypic/Phenotypic variation – differences between individuals of the same species (blood-type, colour, height etc)
Variability in time/space – changes in numbers per unit space, time
Uniform Random Clumped
Space/Time
Measurement variability – experimental error (bias)
Variability = Uncertainty
Variability means that it is impossible to describe data exactly – Accuracy, Precision
Accuracy – how close a measure is to the real value
20 cm +
20.63 cm
6 mm +
300 μm +
20.631506542 cm
Accept a level of measurement error: be upfront
Precision – how close repeat measures are to each other
20.632
19.986
21.102
20.493 20.578
20.710
22.356
20.623
20.755
Describing data and variability
Population – the entire collection of measurements, e.g. mass of 19 yr old elephants, the blood pressure of women between 16-18 yrs of age, number of earthworms on UWC rugby field, height of UWC BSc II students, oxygen content of water
When taking samples it is vital that they are Random and Independent
One sample from a large population is meaningless – need to take replicate samples and obtain a sample measure, which is then assumed to be representative of the population
If population small, then possible to obtain all measurements in the population. However, if population very large, then impractical or impossible to measure all
- must take Samples
What is the weight of a UWC, Second year student?
Number Mass 1 (kg) Mass 2 (kg) Height 1 (m) Height 2 (m) GenderYear of study
Age
Student No
Estimate Scale 1 Estimate Measure 1 M or F 2, 3 etc Months
What do you notice from the data?
Differences – variability! – natural and machine
Population – the entire collection of measurements
If population small, then possible to obtain all measurements in the population. However, if population very large, then impractical or impossible to measure all
- must take Samples
When might your sample be the population?
IF: what is the weight of a BDC222 student in 2012?
Describing data and variability
How high is a UWC BSc II student?What is the NO3 concentration in the Black River?
N = 20Σ = 69
Mean = 3.45Mean = Σx
N
Sample measure - Central Tendency
Arithmetic mean or Average
Population mean = μ; sample mean = x
We use x as a proxy for μ
Units?
Units?
25424175214321546326
Enter data (x) into MSExcel spreadsheet
Calculate N
Calculate Total
Calculate Mean
=COUNT(DATA:RANGE)
=SUM(DATA:RANGE)
=TOTAL / N
MSExcel also allows you to calculate the mean from a data series…
=AVERAGE(DATA:RANGE)
N = 20Σ = 69
Mean = 3.45Mean = Σx
N
Units?
What we actually doing here?
In the numerator we are probably doing this…..
2 + 5 + 4 + 2 + 4 + 1 + 7 + 5…etc etc
But there is another of looking at it…..
(3*1)+(5*2)+(2*3)+(4*4)+(3*5)+(2*6)+(1*7)
In other words, we are summing the products of x and F(x) in our data set
We have now generated a Frequency Table
25424175214321546326
11122222334444555667
Minimum
Maximum
Sum x.F(x) = 69
Sum F(x) = 20 – what is this?
N = 20Σ = 69
Mean = 3.45
Mean = Σx
N
Units?}Even and
equal CLASS intervals
X F X.F1 3 32 5 103 2 64 4 165 3 156 2 127 1 7
SUMS 20 69
2 83 92 14 45 53 76 34 48 56 73 32 5
1) Calculate the mean femur length
2) Construct a frequency table of femur length
Length (cm) of femur from domestic rabbits shot on Robben Island during March 2010
Femur Length (cm) Frequency
1
2
3
3) Recalculate the mean femur length
MSExcel allows you to calculate a frequency table
0
2
4
6
8
10
12
14
16
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9 2
2.1
2.2
Height (m)
Fre
qu
en
cy
Height (m) Frequency1.2 1
1.25 11.3 1
1.35 21.4 2
1.45 31.5 5
1.55 61.6 8
1.65 101.7 12
1.75 151.8 11
1.85 91.9 7
1.95 52 3
2.05 22.1 1
2.15 12.2 1
Mode – the most commonly represented value
How? Construct a frequency table from the data: whichever “class” of data occurs at the highest frequency is the MODE
It also allows you to calculate MODE: = MODE(DATA:RANGE)
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10
Data Class
Fre
qu
ency
0123456789
10
1 2 3 4 5 6 7 8 9 10
Data Class
Fre
qu
ency
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10
Data Class
Fre
qu
ency
UNIMODAL BIMODAL TRIMODAL
Datum No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27Datum 1.2 1.25 1.3 1.35 1.35 1.4 1.4 1.45 1.45 1.45 1.5 1.5 1.5 1.5 1.5 1.55 1.55 1.55 1.55 1.55 1.55 1.6 1.6 1.6 1.6 1.6 1.6
Datum No 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54Datum 1.6 1.6 1.65 1.65 1.65 1.7 1.7 1.65 1.65 1.65 1.65 1.65 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.7 1.75 1.75 1.75
Datum No 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81Datum 1.75 1.75 1.75 1.75 1.75 1.8 1.8 1.75 1.75 1.75 1.75 1.75 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.8 1.85 1.85 1.85 1.85
Datum No 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106Datum 1.85 1.85 1.85 1.85 1.85 1.9 1.9 1.9 1.9 1.9 1.9 1.9 1.95 1.95 1.95 1.95 1.95 2 2 2 2.05 2.05 2.1 2.15 2.2
Median – the middle value in a ranked data set
Step 1 – Order the data from low to high
Step 2 – Determine the middle data point
If there are an odd number of data points this is easy
Data (x) Ordered Data Order No1.85 1.4 11.6 1.45 2
1.95 1.6 31.65 1.65 41.45 1.8 51.9 1.85 61.4 1.9 71.8 1.9 81.9 1.95 9
If there are an even number of data points you will need to
interpolateData (x) Ordered Data Order No
1.85 1.4 11.6 1.45 2
1.95 1.6 31.65 1.65 41.45 1.75 51.9 1.8 61.4 1.85 71.8 1.9 81.9 1.9 9
1.75 1.95 10
The middle data point lies half way between that associated with observation no 5 (1.75) and observation no 6 (1.8) = 1.775
Can be calculated as either (1.75 + 1.8) / 2
OR as ((1.8 – 1.75) / 2) + 1.75
MSExcel also allows you to calculate the median from a data series…
=MEDIAN(DATA:RANGE)
24.625.925.926.625.825.425.224.724.125
24.824.526.425.325
25.425.825.325.923.6
Unordered data Ordered data
In MSExcel: Highlight data range to be sorted (including variable header row), select data, sort….
1.Tick “data has headers” box…2.Select variable to sort by…3.Sort from smallest to largest…
THINGS TO REMEMBER…
….COUNTER…..
Rank Order X1 23.62 24.13 24.54 24.65 24.76 24.87 258 259 25.210 25.311 25.312 25.413 25.414 25.815 25.816 25.917 25.918 25.919 26.420 26.6
Measures of Dispersion – how data are distributed around the mean
1.85 1.65 1.55 1.91.6 1.95 1.7 1.7
1.95 1.75 1.8 1.71.65 1.55 1.65 1.751.45 1.85 1.85 1.81.9 1.75 1.7 2.051.4 2 1.35 21.8 1.65 1.5 1.81.9 2.1 1.8 1.5
1.75 1.2 1.5 2.151.3 1.7 1.6 1.55
1.85 1.45 1.8 1.851.5 1.75 1.75 1.251.8 1.95 1.75 21.9 1.7 1.8 1.9
1.75 1.85 1.8 1.751.7 1.9 1.45 1.65
1.35 1.65 1.7 1.61.75 1.5 1.55 1.551.6 1.8 1.75 1.85
2.05 1.6 1.85 1.71.65 1.7 1.4 1.751.95 1.9 1.65 1.61.75 1.65 1.7 1.851.8 1.75 1.95 1.65
1.55 2.2 1.751.7 1.6 1.6
Range:
Essentially the lowest and highest value in the data set
N.B. Subject to measurement errors, typographic mistakes and freaks
In MSExcel: =MIN(DATA:RANGE), =MAX(DATA:RANGE)
Mean DeviationData (x) x - mean !(x - mean)!
3 -1 14 0 05 1 16 2 27 3 32 -2 23 -1 14 0 05 1 16 2 23 -1 12 -2 23 -1 14 0 05 1 12 -2 264 0 2016 154 1.33
ΣN
mean
Always = Zero
Data (x) x - mean !(x - mean)!3 -1 14 0 05 1 16 2 27 3 32 -2 23 -1 14 0 05 1 16 2 23 -1 12 -2 23 -1 14 0 05 1 12 -2 264 0 2016 154 1.33
Data (x) x - mean !(x - mean)!3 -1 14 0 05 1 16 2 27 3 32 -2 23 -1 14 0 05 1 16 2 23 -1 12 -2 23 -1 14 0 05 1 12 -2 264 0 2016 154 1.33
Convert negatives to positives to give overall deviation from the mean; SUM, Divide by N to give average deviation of any data point from the mean – MEAN DEVIATION
2 83 92 14 45 53 76 34 48 56 73 32 5
Calculate the Mean Deviation
Length (cm) of femur from domestic rabbits shot on Robben Island during March 2010
Variance and Standard Deviation
Data (x) x - mean (x - mean)2
3 -1 14 0 05 1 16 2 47 3 92 -2 43 -1 14 0 05 1 16 2 43 -1 12 -2 43 -1 14 0 05 1 12 -2 464 0 3616 154 2.4
ΣN
mean
Length (mm) of Drosophila melanogaster Instar III larvae
Data (x) x - mean (x - mean)2
3 -1 14 0 05 1 16 2 47 3 92 -2 43 -1 14 0 05 1 16 2 43 -1 12 -2 43 -1 14 0 05 1 12 -2 464 0 3616 154 2.4
ΣN
mean
Always = Zero
(Variance) = √ Standard Deviation (sample)
s = 1.5
Square units?
Sum of Squares (Sample)
Mean Sum of Squares (sample)(Variance)
Data (x) x - mean (x - mean)2
3 -1 14 0 05 1 16 2 47 3 92 -2 43 -1 14 0 05 1 16 2 43 -1 12 -2 43 -1 14 0 05 1 12 -2 464 0 3616 154 2.4
ΣN
mean16
2.25
The values of x = 4, sample variance (2.25) and sample standard deviation
(1.5) ALL refer to the sample of 16 measures
There is another way to remove the negatives – and that is to square the (x
– mean) values
(n-1) = Degrees of Freedom = v
Are they the best estimators of these properties for the population?
In the case of the mean (x), there is no reason to suppose that the mean of all observations in the sample will not provide the best estimator of the population mean (μ)
However, we cannot use sample variance and sample standard deviation as estimators of σ2 and σ respectively!WHY? Because not all the measures are completely independent of each other.
X
5Mean
5N
25Total
7
6
5
4
3
In this table of five measures, the total is 25 and x is 5
If you had collected only the first four of the measures (in pink), then the total would be 18. In order for you to get a mean of 5 from five measures, the last value would HAVE TO BE seven (7).
In other words the last number is not independent of the others, and when we deal with the population we have to use independent data.
Consequently when we calculate σ2 we divide the sum of squares by (n-1) and NOT n (as previously): σ is still calculated as
X
5Mean
5N
25Total
7
6
5
4
3
√σ2
Sample derived estimates of population variance and population standard deviation are referred to as s2 and s respectively
IF the sample mean is 13.5, N = 8 AND
X1 = 5X2 = 18X3 = 22X4 = 38X5 = 2X6 = 10X7 = 19
WHAT IS THE VALUE OF X8?
2 83 92 14 45 53 76 34 48 56 73 32 5
Calculate the Variance and Standard Deviation….
Length (cm) of femur from domestic rabbits shot on Robben Island during March 2010
The smaller the standard deviation, the closer the data are to the mean
The bigger the standard deviation, the greater the spread of data around the mean – the greater the variability
A B C5 2 06 4 127 6 115 8 16 10 07 2 125 4 116 6 17 8 126 10 06 6 610 10 10
0.67 8.89 35.110.82 2.98 5.93
Mean
s2
N
s
Calculating Variance and Standard Deviation using MSExcel
=VAR(DATA:RANGE)
=STDEV(DATA:RANGE)
2 83 92 14 45 53 76 34 48 56 73 32 5
Length (cm) of femur from domestic rabbits shot on Robben Island during March 2010
Sum x.F(x) = 109
Sum F(x) = 24 – what is this?
Calculating Variance and Standard Deviation from a frequency table
X F XF X-Mean (X-Mean)2 F.(X-Mean)2
1 1 1 -3.54 12.54 12.542 3 6 -2.54 6.46 19.383 5 15 -1.54 2.38 11.884 4 16 -0.54 0.29 1.175 4 20 0.46 0.21 0.846 2 12 1.46 2.13 4.257 2 14 2.46 6.04 12.098 2 16 3.46 11.96 23.929 1 9 4.46 19.88 19.88
N 24 109 105.96Mean 4.54
Variance 4.61STDEV 2.15
REMEMBER……
N = 24Σ = 109
Mean = 4.54
Mean = Σx
N
Units?
Sum F.(X-mean)2 = 105.96
Sum F = 24
Var = Sum F.(X-mean)2 / (Sum F -1) = 4.61
X F XF X-Mean (X-Mean)2 F.(X-Mean)2
1 1 1 -3.54 12.54 12.542 3 6 -2.54 6.46 19.383 5 15 -1.54 2.38 11.884 4 16 -0.54 0.29 1.175 4 20 0.46 0.21 0.846 2 12 1.46 2.13 4.257 2 14 2.46 6.04 12.098 2 16 3.46 11.96 23.929 1 9 4.46 19.88 19.88
N 24 109 105.96Mean 4.54
Variance 4.61STDEV 2.15
So……
Mean – measure of central tendency of sample data
Variance and Standard Deviation – index of dispersion of data around the sample and/or population mean
Two other commonly reported measures of central tendency:
Standard Error – index of dispersion of sample means around population mean
95% confidence intervals – describes limits around your sample mean within which you are 95% confident that the REAL value of the population mean lies
To calculate the last two measures, it is necessary to digress a little……
View uncertainty in terms of probability
•What is the probability of a particular event occurring?•What is the probability of a particular observation being made?
Variability = Uncertainty
A coin has two sides: 1 heads and 1 tails: 1 + 1 = 2The probability of throwing heads = ½ = 0.5P(heads) = 0.5P(tails) = 1 – P(heads) = 1 – 0.5 = 0.5
A die has six sides:
The probability of throwing = 1/6 = 0.167
The probability of NOT throwing a = 1 – 0.167 = 0.833
NB – the sum of probabilities = 1.0
What is the probability of picking a student of 1.65 m high from the class?Depends on how the data are distributed
0
2
4
6
8
10
12
14
16
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9 2
2.1
2.2
Height (m)
Fre
qu
ency
Height (m) Frequency1.2 1
1.25 11.3 1
1.35 21.4 2
1.45 31.5 5
1.55 61.6 8
1.65 101.7 12
1.75 151.8 11
1.85 91.9 7
1.95 52 3
2.05 22.1 1
2.15 12.2 1
If the total number of students in the class is 106, and 10 of them are 1.65 m high, then the chance of picking (at random) a student measuring 1.65 m is 10 in 106: P(1.65) = 0.094
If the total number of students in the class is 106, and 96 (106-10) of them are NOT 1.65 m high, then the chance of
picking (at random) a student NOT measuring 1.65 m is 96 in 106: P(NOT 1.65) = 0.906
P(NOT 1.65) = 1 – P(1.65) = 1 – 0.094 = 0.906
If you know the probability of picking a student of 1.65 m high from the class (0.094), and you know how many students there are in TOTAL (106) in the class, you can calculate the number of students that are 1.65 m high.
106 * 0.094 = 10
When data are displayed as a frequency distribution, the area under any part of the curve reflects the number of observations involved.
0
2
4
6
8
10
12
14
16
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9 2
2.1
2.2
Height (m)
Fre
qu
ency In this case, 10 observations are
of 1.65 m (in red) 96 (in blue) are not
Frequency distributions do not only have to be displayed in terms of numbers, they can also be displayed as proportions or percentages.
0
2
4
6
8
10
12
14
16
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9 2
2.1
2.2
Height (m)
Fre
qu
ency
(%
)
Same rules – the area under any part of the curve reflects the proportion of observations involved...or PROBABILITIES
In this case, 0.094 (9.4%) are of 1.65 m (in red) and 0.906 (90.6%) are not (in blue)
The total area under the curve = the total number of
observations
The total area under the curve = 1.0
0
2
4
6
8
10
12
14
16
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9 2
2.1
2.2
Height (m)
Fre
qu
ency
(%
)Most of the data are clustered around the mean, which means that there is a fairly good chance (high probability) of your picking at random from the class a student with a height close to the mean
On the other hand, there is a relatively small chance that you will pick a student (by random) that is either very tall or very short: i.e. those whose measures are located in the tails of the distribution
Most data that scientists collect is what we call normally distributed – but NOT all.
No Worms per quadrat
Fre
qu
ency
0
100
200
300
400
500
0 2 4 6 8 10
12
14
16
18
20
22
24
N = 4064
No Worms per quadrat
Fre
qu
ency
(%
)
02468
1012
0 2 4 6 8 10 12 14 16 18 20 22 24
Σ = 100%
The shape of the curve depends on the variance or standard deviation: the spread of values about the mean
0
0.05
0.1
0.15
0.2
0.25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
X
Fre
qu
en
cy X
Mean = 10
s2 = 4s2 = 8s2 = 12s2 = 16
For data that are normally distributed:The mean, median and mode are the sameThe frequency distribution is completely symmetrical either side of the meanThe area under the curve is proportional to number of observations
The normal curve has fixed mathematical properties, irrespective of •The scale on which it is drawn
•The magnitude or units of its mean•The magnitude or units of its Standard Deviation
…….and these render it susceptible to statistical analysis
No Worms per quadrat
Fre
qu
ency
(%
)
02468
1012
0 2 4 6 8 10 12 14 16 18 20 22 24
Σ = 100%
To calculate the probability of a particular value x being drawn from a normally distributed population of data, you need to know the mean AND the standard deviation of the data
Z = (x – μ)
σ
μ = population mean, σ = population standard deviation
Equation 1
What Z describes is the difference between the mean and any value x, expressed as a proportion of the standard deviation, i.e. how many standard deviations away from the mean is the value x
Obviously, the smaller the value of Z, the closer the value of x is to the mean
Because Z is based on data that are normally distributed, it too is normally distributed (the Z distribution).
With a knowledge of Z, we can go to statistical tables drawn up based on the normal distribution and calculate the associated probability
Calculating proportions of a Normal Distribution
e.g. if μ = 1.55 m, σ = 0.3 m, what is the probability of a student measuring more than 1.95 m being drawn at random from the population?
Z = (x – μ)
σ
Z = (1.95 – 1.55)
0.3Z = (0.4)
0.3
Z = 1.33
A student measuring 1.95 m is 1.33 times the standard deviation away from the mean, and this corresponds to a value of 0.0918 from the Z Tables
0.450.550.650.750.850.951.051.151.251.351.451.551.651.751.851.952.052.152.252.352.452.55
Height (m)
Fre
qu
ency
-3.67-3.33-3.00-2.67-2.33-2.00-1.67-1.33-1.00-0.67-0.330.000.330.671.001.331.672.002.332.673.003.33
Z
Fre
qu
ency
?
0.0918
Z 0 1 2 3 4 5
0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.48010.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.44040.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.40130.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.36320.4 0.3466 0.3409 0.3372 0.3336 0.3300 0.3264
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.29120.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.25780.7 0.2420 0.2389 0.2358 0.2327 0.2297 0.22660.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.19770.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711
1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.14691.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.12511.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.10561.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.08851.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735
The Distribution of Means
If random samples of size n are drawn from a normal population, the means of those samples will form a normal distributionThe variance of the distribution of means will decrease as n increases
σ2
nσ2
x = Equation 2
σ2
x = population variance of the mean
1 2 3 4 5 6 7 8 9 10 11 12 133 3.5 4.0 4.5 5.0 4.5 4.3 4.3 4.3 4.5 4.4 4.2 4.14 4.5 5.0 5.5 4.8 4.5 4.4 4.5 4.7 4.5 4.3 4.2 4.25 5.5 6.0 5.0 4.6 4.5 4.6 4.8 4.6 4.3 4.2 4.2 4.26 6.5 5.0 4.5 4.4 4.5 4.7 4.5 4.2 4.1 4.1 4.2 4.07 4.5 4.0 4.0 4.2 4.5 4.3 4.0 3.9 3.9 4.0 3.82 2.5 3.0 3.5 4.0 3.8 3.6 3.5 3.6 3.7 3.53 3.5 4.0 4.5 4.2 3.8 3.7 3.8 3.9 3.74 4.5 5.0 4.5 4.0 3.8 3.9 4.0 3.85 5.5 4.7 4.0 3.8 3.8 4.0 3.86 4.5 3.7 3.5 3.6 3.8 3.63 2.5 2.7 3.0 3.4 3.22 2.5 3.0 3.5 3.23 3.5 4.0 3.54 4.5 3.75 3.52
Mean 4.0 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1 4.1Variance 2.40 1.40 0.87 0.51 0.30 0.20 0.17 0.17 0.16 0.12 0.08 0.02 0.01
n
N 16 15 14 13 12 11 10 9 8 7 6 5 4
Get students to do this for themselves
σx = standard error of the meanσ
xσ
n
=
√ So… = √
σ2
n
Z = (x – μ)
σ
Just as is a normal deviate referring to the normal distribution of Xi values
Z = (x – μ)
σx
So is a normal deviate referring to the normal distribution of means
What is the probability of obtaining a random sample of nine measurements with a mean greater than 50.0 mm, from a population having a mean of 47 mm and a standard deviation of 12.0 mm?
N = 9, X = 50.0 mm, μ = 47.0 mm, σ = 12.0 mm
σx
= 12.0
√ 9= 4= 12
3
Z = (50.0 – 47.0) = 3 = 0.75
4 4
What is the probability of obtaining a random sample of nine measurements with a mean greater than 50.0 mm, from a population having a mean of 47 mm and a standard deviation of 12.0 mm?
N = 9, X = 50.0 mm, μ = 47.0 mm, σ = 12.0 mm
σx
= 12.0
√ 9= 4= 12
3
Z = (50.0 – 47.0) = 3 = 0.75
4 4
Looking up 0.75 on the Z Tables gives – 0.2266
Z 0 1 2 3 4 5
0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.48010.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.44040.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.40130.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.36320.4 0.3466 0.3409 0.3372 0.3336 0.3300 0.3264
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.29120.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.25780.7 0.2420 0.2389 0.2358 0.2327 0.2297 0.22660.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.19770.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711
1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.14691.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.12511.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.10561.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.08851.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735
The observant amongst you will have noted that in the last couple of equations for Z we have used the population parameters: μ, σ and
Trouble is we don’t usually have access to population data and must make do with sample estimators x, s and
σx
sx
σx
sx = IF n is large: we use Z distribution to calculate normal deviates
IF n is small, then must use t distribution: t = (x – μ)
sx
Equation 3
Z = (x – μ)
σx
Shape of the t distribution varies with v (Degrees of Freedom: n-1): the bigger the n, the less spread the distribution
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
t
V = 100V = 10V = 5V = 1
t distribution given rise to many statistical tests!
Because it is based on the normal distribution, the t distribution has all the attributes of the normal distribution:• Completely symmetrical• Area under any part of the curve reflects proportion of t values involved• etc….
For a particular area of the curve we can calculate the associated t values, using t-tables at the end of most text books on statistics
For example: if our sample size is 11 (v = 10), what is the value of t beyond which 10% (0.1) of the curve is enclosed? – Two possible answers
-4 -3 - 2 -1 0 1 2 3 4
t
α (1)
0.1
1.372
-4 -3 - 2 -1 0 1 2 3 4
t
α (1)
0.1
-1.372
One-Tailed
0.05 0.05
-4 -3 - 2 -1 0 1 2 3 4
t1.812-1.812
α (2)
Two-Tailed
-4 -3 - 2 -1 0 1 2 3 4
t
α (1)
0.1
-1.372
One-Tailed
0.05 0.05
-4 -3 - 2 -1 0 1 2 3 4
t1.812-1.812
α (2)
Two-Tailed
How do you get the t values from the t-tables?
α (2) 0.5 0.2 0.1 0.05 0.02
α (1) 0.25 0.1 0.05 0.025 0.011 1.000 3.078 6.314 12.7062 0.816 1.886 2.920 4.3033 0.765 1.638 2.353 3.1824 0.741 1.533 2.132 2.7765 0.727 1.476 2.015 2.571
6 0.718 1.440 1.943 2.4477 0.711 1.415 1.895 2.3658 0.706 1.397 1.860 2.3069 0.703 1.383 1.833 1.26210 0.700 1.372 1.812 2.228
11 0.697 1.363 1.796 2.20112 0.695 1.356 1.782 2.17913 0.694 1.350 1.771 2.16014 0.692 1.345 1.761 2.14515 0.691 1.341 1.753 2.131
16 0.690 1.337 1.746 2.12017 0.689 1.333 1.740 2.11018 0.688 1.330 1.734 2.10119 0.688 1.328 1.729 2.09320 0.687 1.325 1.725 2.086
21 0.686 1.323 1.721 2.08022 0.686 1.321 1.717 2.07423 0.685 1.319 1.714 2.06924 0.685 1.318 1.711 2.06425 0.684 1.316 1.708 2.060
v
We can now use the t distribution to demonstrate the term statistical significance – which is something that you will get confronted with regularly when reading EIA reports…
The mean nitrate concentration of water in all the upstream tributaries of a large river prior to intensive agriculture is 22 mg.l-1.
Afterwards the mean nitrate concentration in 25 of these tributaries is 24.23 mg.l-1 and s = 4.24 mg.l-1
This is an observation, and we want to determine if the intensification of agricultural practices has resulted in any change to the nitrate concentration of the freshwater resources.
Step 1: establish the hypotheses H0: μ = 22 H1: μ ≠ 22
Step 2: Need to determine the probability that a random sample (size 25) will generate a mean of 24.23 mg.l-1 from a population with a mean of 22 mg.l-1?
How? – use t-test t = (x – μ)
sxx = 24.23
s = 4.24 μ = 22.00 n = 25
sx
s
n
=
√ 4.24
25
=
√ 4.24
5= = 0.848
(24.23 – 22)
0.848
= 2.23
0.848
= = 2.629
Step 3: Determine, from the t-tables, the (critical) value of t, beyond which we consider such a random sample mean as being unlikely
Generally we consider an event as being unlikely if it occurs in the extreme 5% of the normal distribution
t
α (1)
0.05
One-Tailed
0.025 0.025
t
α (2)
Two-Tailed
So we need to determine the (critical) value of t, beyond which 5% of the curve is enclosed – for v = 24
But do we use α (1) or α (2)?
Go to the hypotheses H0: μ = 22 H1: μ ≠ 22
The critical value of t, α (2) 0.05, v = 24, is 2.064
α (2) 0.5 0.2 0.1 0.05 0.02
α (1) 0.25 0.1 0.05 0.025 0.011 1.000 3.078 6.314 12.7062 0.816 1.886 2.920 4.3033 0.765 1.638 2.353 3.1824 0.741 1.533 2.132 2.7765 0.727 1.476 2.015 2.571
6 0.718 1.440 1.943 2.4477 0.711 1.415 1.895 2.3658 0.706 1.397 1.860 2.3069 0.703 1.383 1.833 1.26210 0.700 1.372 1.812 2.228
11 0.697 1.363 1.796 2.20112 0.695 1.356 1.782 2.17913 0.694 1.350 1.771 2.16014 0.692 1.345 1.761 2.14515 0.691 1.341 1.753 2.131
16 0.690 1.337 1.746 2.12017 0.689 1.333 1.740 2.11018 0.688 1.330 1.734 2.10119 0.688 1.328 1.729 2.09320 0.687 1.325 1.725 2.086
21 0.686 1.323 1.721 2.08022 0.686 1.321 1.717 2.07423 0.685 1.319 1.714 2.06924 0.685 1.318 1.711 2.06425 0.684 1.316 1.708 2.060
v
-4 -3 - 2 -1 0 1 2 3 4
t 2.064-2.064
0.025 0.025
Our value of t is 2.629, which lies beyond the critical value of t
That means it is very unlikely that a random sample (size 25) would generate a mean of 24.23 mg.l-1 from a population with a mean of 22 mg.l-1
2.629
So unlikely, in fact, that we don’t believe it can happen by chance
Reject H0 and accept H1
What we can then say, is that the before and after nitrate levels in the water are (statistically) significantly different from each other (p < 0.05)
We are not making any judgment about whether there is more nitrate in the water after than before, only that the concentrations are different – though some things are self evident!
You will frequently come across the terms p<0.05, p<0.01: these mean that the probability of a particular event occurring by chance alone are less than 5% and 1% respectively, which is unlikely
On the other hand if results are reported as p>0.05, it means that the probability of a particular event occurring by chance alone is greater than 5%, which is possible.
The t-Distribution allows us to calculate the 95% (or 99%) confidence intervals around an estimate of the population mean
0.025 0.025
t
α (2)
Two-Tailed
In other words, what are limits around our estimate of the population mean, WITHIN which we 95% (or 99%) confident that the REAL value of the population mean lies
To do this, we need a set of t-tables, and V (N-1)sx
t = (x – μ)sx *
Difference between population and sample mean
To do this, we need a set of t-tables, and V (N-1)sx
IF
N
sx
x = 42.3 mm
= 26 (V = 25)
= 2.15
α (2) 0.5 0.2 0.1 0.05 0.02
α (1) 0.25 0.1 0.05 0.025 0.011 1.000 3.078 6.314 12.7062 0.816 1.886 2.920 4.3033 0.765 1.638 2.353 3.1824 0.741 1.533 2.132 2.7765 0.727 1.476 2.015 2.571
6 0.718 1.440 1.943 2.4477 0.711 1.415 1.895 2.3658 0.706 1.397 1.860 2.3069 0.703 1.383 1.833 1.26210 0.700 1.372 1.812 2.228
11 0.697 1.363 1.796 2.20112 0.695 1.356 1.782 2.17913 0.694 1.350 1.771 2.16014 0.692 1.345 1.761 2.14515 0.691 1.341 1.753 2.131
16 0.690 1.337 1.746 2.12017 0.689 1.333 1.740 2.11018 0.688 1.330 1.734 2.10119 0.688 1.328 1.729 2.09320 0.687 1.325 1.725 2.086
21 0.686 1.323 1.721 2.08022 0.686 1.321 1.717 2.07423 0.685 1.319 1.714 2.06924 0.685 1.318 1.711 2.06425 0.684 1.316 1.708 2.060
v
Then the 95% CI around the mean will be
sx
* tά 2
= 2.15 *2.06 = 4.429
The expression is then written as:
42.3 mm 4.43 mm±