CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20...

26
www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/ CS1512 CS1512 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J R W Hunter, 2006

Transcript of CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20...

Page 1: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

1

CS1512Foundations of

Computing Science 2

Lecture 20

Probability and statistics (2)

© J R W Hunter, 2006

Page 2: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

2

Ordinal data

• X is an ordinal variable with values: a1, a2, a3, ... ak, ... aK

• ‘ordinal’ means that:

a1 ≤ a2 ≤ a3 ≤ ... ≤ ak ≤ ... ≤ aK

• cumulative frequency at level k:

ck = sum of frequencies of values less than or equal to ak

ck = f1 + f2 + f3 + ... + fk = (f1 + f2 + f3 + ... + fk-1 ) + fk = ck-1 + fk

• also (%) cumulative relative frequency

Page 3: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

3

CAS marks

0

10

20

30

40

50

60

70

80

90

100

0 2 4 6 8 10 12 14 16 18 20

CAS%

% relative frequencies % cumulative relative frequencies

0

5

10

15

20

25

0 2 4 6 8 10 12 14 16 18 20

CAS

%

Page 4: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

4

Enzyme concentrations

Concentration Freq. Rel.Freq. % Cum. Rel. Freq.

19.5 ≤ c < 39.5 1 0.033 3.3%

39.5 ≤ c < 59.5 2 0.067 10.0%

59.5 ≤ c < 79.5 7 0.233 33.3%

79.5 ≤ c < 99.5 7 0.233 56.6%

99.5 ≤ c < 119.5 7 0.233 79.9%

119.5 ≤ c < 139.5 3 0.100 89.9%

139.5 ≤ c < 159.5 2 0.067 96.6%

159.5 ≤ c < 179.5 0 0.000 96.6%

179.5 ≤ c < 199.5 0 0.000 96.6%

199.5 ≤ c < 219.5 1 0.033 100.0%

Totals 30 1.000

Page 5: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

5

Cumulative histogram

Page 6: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

6

Discrete two variable data

0

5

10

15

20

25

0 5 10 15 20 25

CS1512 Assessment 1 CAS

CS1012 C

AS

Page 7: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

7

Continuous two variable data

X Y

4.37 24.19

8.10 39.57

11.45 55.53

10.40 51.16

3.89 20.66

11.30 51.04

11.00 49.89

6.74 35.50

5.41 31.53

13.97 65.51

Page 8: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

8

Time Series

•Time and space are fundamental (especially time)

•Time series: variation of a particular variable with time

Page 9: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

9

Summarising data by numerical means

Further summarisation (beyond frequencies)

Measures of location (Where is the middle?)

• Mean • Median• Mode

Page 10: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

10

Mean

_ sum of observed values of XSample Mean (X) = number of observed values

x = n

use only for quantitative data

Page 11: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

11

Sigma

Sum of n observations

n

xi = x1 + x2 + ... + xi + ... + xn-1 + xn

i = 1

If it is clear that the sum is from 1 to n then:

x = x1 + x2 + ... + xi + ... + xn-1 + xn

Sum of squares

x2 = x12 + x2

2 + ... + xi2 + ... + xn-1

2 + xn2

Page 12: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

12

x from frequencies

If X is a categorical variable with values: a1, a2, a3, ... ak, ... aK

x = x1 + x2 + ... + xi + ... + xn-1 + xn (order of summation isn’t important)

(e.g. piglets: 5 + 11 + 12 + 7 + + 8 + 14 + 7 + ... + 14 + ...)

Group together those x’s which have value a1, those with value a2, ...

x = x.. + x.. + x.. ... + x’s which have value a1 - there are f1 of them x.. + x.. ... + x’s which have value a2 - there are f2 of them

... x.. + x.. x’s which have value aK - there are fK of them

= f1 * a1 + f2 * a2 + ... + fk * ak + ... + fK * aK

K

= fk * ak k = 1

Page 13: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

13

Mean

Litter size Frequency Cum. Freq

ak fk

5 1 1 6 0 1 7 2 3 8 3 6 9 3 9 10 9 18 11 8 26 12 5 31 13 3 34 14 2 36

Total 36

Kx = fk * ak k = 1

= 1*5 + 0* 6 + 2*7 + 3*8 3*9 + 9*10 + 8*11 5*12 + 3*13 + 2*14 = 375

_ X = 375 / 36

= 10.42

Page 14: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

14

Median

Sample median of X = middle value when n sample observations are ranked in increasing order

= the ((n + 1)/2)th value

n odd: values: 183, 163, 152, 157 and 157 rank order: 152, 157, 157, 163, 183 median: 157

n even: values: 165, 173, 180, 164 rank order: 164, 165, 173, 180 median: (165 + 173)/2 = 169

Page 15: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

15

Median

Litter size Frequency Cum. Freq

5 1 1 6 0 1 7 2 3 8 3 6 9 3 9 10 9 18 11 8 26 12 5 31 13 3 34 14 2 36

Total 36

Median = 10.5

Page 16: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

16

Median from cumulative distribution

cumulative % frequency polygon

Page 17: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

17

Mode

Sample mode = value with highest frequency (may not be unique)

Litter size Frequency Cum. Freq

5 1 1 6 0 1 7 2 3 8 3 6 9 3 9 10 9 18 11 8 26 12 5 31 13 3 34 14 2 36

Mode = 10

Page 18: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

18

Skew

left skewed symmetric right skewed

mean < mode mean mode mean > mode

Page 19: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

19

Variance

Measure of spread: variance

0

5

10

15

20

25

30

35

40

45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 0

5

10

15

20

25

30

35

40

45

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Page 20: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

20

Variance

sample variance = s2

sample standard deviation = s = √ variance

Page 21: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

21

Variance and standard deviation

Litter size Frequency Cum. Freq ak fk

5 1 1 6 0 1 7 2 3 8 3 6 9 3 9 10 9 18 11 8 26 12 5 31 13 3 34 14 2 36

Total 36

K

x2 = fk * ak2

k = 1

= 1*25 + 2*49 + 3*64 3*81 + 9*100 + 8*121 5*144 + 3*169 + 2*196 = 4145

x = 375

(x)2 / n = 375*375 / 36

= 3906

s2 = (4145-3906) / (36-1)

= 6.83

s = 2.6

Page 22: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

22

Piglets

Mean = 10.42

Median = 10.5

Mode = 10

Std. devn. = 2.6

Page 23: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

23

Quartiles and Range

Lower quartile: value such that 25% of observations are below it (Q1).Median: value such that 50% of observations are below (above) it (Q2).Upper quartile: value such that 25% of observations are above it (Q3).

Range: the minimum (m) and maximum (M) observations.

Box and Whisker plot:

m Q1 Q2 Q3 M

Page 24: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

24

Estimating quartiles

Page 25: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

25

Linear Regression

y = mc + c

Calculate m and c so that (distance of point from line)2 is minimised

y

x

Page 26: CS1512 jhunter/teaching/CS1512/lectures/ 1 CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J.

www.csd.abdn.ac.uk/~jhunter/teaching/CS1512/lectures/

CS1512

CS1512

26

Time Series - Moving Average

Time Y 3 point MA 0 24 * 1 18 23.0000 2 27 22.3333 3 22 25.6667 4 28 28.0000 5 34 31.0000 6 31 36.6667 7 45 38.0000 8 38 39.3333 9 35 *

• smoothing function• can compute median, max, min, std. devn, etc. in window