1.6 Probability 9.7 Probability of Multiple Events 12.2 Conditional Probability.
CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20...
-
Upload
christian-sparks -
Category
Documents
-
view
214 -
download
1
Transcript of CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20...
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
1
CS1512Foundations of
Computing Science 2
Lecture 20
Probability and statistics (2)
© J R W Hunter, 2006
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
2
Ordinal data
• X is an ordinal variable with values: a1, a2, a3, ... ak, ... aK
• ‘ordinal’ means that:
a1 ≤ a2 ≤ a3 ≤ ... ≤ ak ≤ ... ≤ aK
• cumulative frequency at level k:
ck = sum of frequencies of values less than or equal to ak
ck = f1 + f2 + f3 + ... + fk = (f1 + f2 + f3 + ... + fk-1 ) + fk = ck-1 + fk
• also (%) cumulative relative frequency
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
3
CAS marks
0
10
20
30
40
50
60
70
80
90
100
0 2 4 6 8 10 12 14 16 18 20
CAS%
% relative frequencies % cumulative relative frequencies
0
5
10
15
20
25
0 2 4 6 8 10 12 14 16 18 20
CAS
%
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
4
Enzyme concentrations
Concentration Freq. Rel.Freq. % Cum. Rel. Freq.
19.5 ≤ c < 39.5 1 0.033 3.3%
39.5 ≤ c < 59.5 2 0.067 10.0%
59.5 ≤ c < 79.5 7 0.233 33.3%
79.5 ≤ c < 99.5 7 0.233 56.6%
99.5 ≤ c < 119.5 7 0.233 79.9%
119.5 ≤ c < 139.5 3 0.100 89.9%
139.5 ≤ c < 159.5 2 0.067 96.6%
159.5 ≤ c < 179.5 0 0.000 96.6%
179.5 ≤ c < 199.5 0 0.000 96.6%
199.5 ≤ c < 219.5 1 0.033 100.0%
Totals 30 1.000
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
5
Cumulative histogram
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
6
Discrete two variable data
0
5
10
15
20
25
0 5 10 15 20 25
CS1512 Assessment 1 CAS
CS1012 C
AS
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
7
Continuous two variable data
X Y
4.37 24.19
8.10 39.57
11.45 55.53
10.40 51.16
3.89 20.66
11.30 51.04
11.00 49.89
6.74 35.50
5.41 31.53
13.97 65.51
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
8
Time Series
•Time and space are fundamental (especially time)
•Time series: variation of a particular variable with time
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
9
Summarising data by numerical means
Further summarisation (beyond frequencies)
Measures of location (Where is the middle?)
• Mean • Median• Mode
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
10
Mean
_ sum of observed values of XSample Mean (X) = number of observed values
x = n
use only for quantitative data
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
11
Sigma
Sum of n observations
n
xi = x1 + x2 + ... + xi + ... + xn-1 + xn
i = 1
If it is clear that the sum is from 1 to n then:
x = x1 + x2 + ... + xi + ... + xn-1 + xn
Sum of squares
x2 = x12 + x2
2 + ... + xi2 + ... + xn-1
2 + xn2
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
12
x from frequencies
If X is a categorical variable with values: a1, a2, a3, ... ak, ... aK
x = x1 + x2 + ... + xi + ... + xn-1 + xn (order of summation isn’t important)
(e.g. piglets: 5 + 11 + 12 + 7 + + 8 + 14 + 7 + ... + 14 + ...)
Group together those x’s which have value a1, those with value a2, ...
x = x.. + x.. + x.. ... + x’s which have value a1 - there are f1 of them x.. + x.. ... + x’s which have value a2 - there are f2 of them
... x.. + x.. x’s which have value aK - there are fK of them
= f1 * a1 + f2 * a2 + ... + fk * ak + ... + fK * aK
K
= fk * ak k = 1
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
13
Mean
Litter size Frequency Cum. Freq
ak fk
5 1 1 6 0 1 7 2 3 8 3 6 9 3 9 10 9 18 11 8 26 12 5 31 13 3 34 14 2 36
Total 36
Kx = fk * ak k = 1
= 1*5 + 0* 6 + 2*7 + 3*8 3*9 + 9*10 + 8*11 5*12 + 3*13 + 2*14 = 375
_ X = 375 / 36
= 10.42
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
14
Median
Sample median of X = middle value when n sample observations are ranked in increasing order
= the ((n + 1)/2)th value
n odd: values: 183, 163, 152, 157 and 157 rank order: 152, 157, 157, 163, 183 median: 157
n even: values: 165, 173, 180, 164 rank order: 164, 165, 173, 180 median: (165 + 173)/2 = 169
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
15
Median
Litter size Frequency Cum. Freq
5 1 1 6 0 1 7 2 3 8 3 6 9 3 9 10 9 18 11 8 26 12 5 31 13 3 34 14 2 36
Total 36
Median = 10.5
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
16
Median from cumulative distribution
cumulative % frequency polygon
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
17
Mode
Sample mode = value with highest frequency (may not be unique)
Litter size Frequency Cum. Freq
5 1 1 6 0 1 7 2 3 8 3 6 9 3 9 10 9 18 11 8 26 12 5 31 13 3 34 14 2 36
Mode = 10
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
18
Skew
left skewed symmetric right skewed
mean < mode mean mode mean > mode
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
19
Variance
Measure of spread: variance
0
5
10
15
20
25
30
35
40
45
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 0
5
10
15
20
25
30
35
40
45
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
20
Variance
sample variance = s2
sample standard deviation = s = √ variance
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
21
Variance and standard deviation
Litter size Frequency Cum. Freq ak fk
5 1 1 6 0 1 7 2 3 8 3 6 9 3 9 10 9 18 11 8 26 12 5 31 13 3 34 14 2 36
Total 36
K
x2 = fk * ak2
k = 1
= 1*25 + 2*49 + 3*64 3*81 + 9*100 + 8*121 5*144 + 3*169 + 2*196 = 4145
x = 375
(x)2 / n = 375*375 / 36
= 3906
s2 = (4145-3906) / (36-1)
= 6.83
s = 2.6
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
22
Piglets
Mean = 10.42
Median = 10.5
Mode = 10
Std. devn. = 2.6
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
23
Quartiles and Range
Lower quartile: value such that 25% of observations are below it (Q1).Median: value such that 50% of observations are below (above) it (Q2).Upper quartile: value such that 25% of observations are above it (Q3).
Range: the minimum (m) and maximum (M) observations.
Box and Whisker plot:
m Q1 Q2 Q3 M
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
24
Estimating quartiles
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
25
Linear Regression
y = mc + c
Calculate m and c so that (distance of point from line)2 is minimised
y
x
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/
CS1512
CS1512
26
Time Series - Moving Average
Time Y 3 point MA 0 24 * 1 18 23.0000 2 27 22.3333 3 22 25.6667 4 28 28.0000 5 34 31.0000 6 31 36.6667 7 45 38.0000 8 38 39.3333 9 35 *
• smoothing function• can compute median, max, min, std. devn, etc. in window