The two way frequency table The 2 statistic Techniques for examining dependence amongst two...
-
Upload
julian-walters -
Category
Documents
-
view
213 -
download
0
Transcript of The two way frequency table The 2 statistic Techniques for examining dependence amongst two...
![Page 1: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/1.jpg)
The two way frequency table
The 2 statistic
Techniques for examining dependence amongst two categorical
variables
![Page 2: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/2.jpg)
Situation
• We have two categorical variables R and C.
• The number of categories of R is r.
• The number of categories of C is c.
• We observe n subjects from the population and count
xij = the number of subjects for which R = i and
C = j.
• R = rows, C = columns
![Page 3: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/3.jpg)
Example
Both Systolic Blood pressure (C) and Serum Chlosterol (R) were meansured for a sample of n = 1237 subjects.
The categories for Blood Pressure are:
<126 127-146 147-166 167+
The categories for Chlosterol are:
<200 200-219 220-259 260+
![Page 4: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/4.jpg)
Table: two-way frequency
Serum Cholesterol
Systolic Blood pressure <127 127-146 147-166 167+ Total
< 200 117 121 47 22 307200-219 85 98 43 20 246220-259 115 209 68 43 439
260+ 67 99 46 33 245
Total 388 527 204 118 1237
![Page 5: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/5.jpg)
3 dimensional bargraph
![Page 6: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/6.jpg)
Example
This comes from the drug use data.
The two variables are:
1. Age (C) and
2. Antidepressant Use (R)
measured for a sample of n = 33,957 subjects.
![Page 7: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/7.jpg)
Two-way Frequency Table
Took anti-depressants - 12 mo * Age - (G) Crosstabulation
Count
322 523 570 522 265 249 2451
5007 6201 5822 4982 4114 5380 31506
5329 6724 6392 5504 4379 5629 33957
YES
NO
Took anti-depressants- 12 mo
Total
20-29 30-39 40-49 50-59 60-69 70+
Age - (G)
Total
Age - (G)
20-29 30-39 40-49 50-59 60-69 70+6.04% 7.78% 8.92% 9.48% 6.05% 4.42%
Percentage antidepressant use vs Age
![Page 8: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/8.jpg)
Antidepressant Use vs Age
0.0%
5.0%
10.0%
20-29 30-39 40-49 50-59 60-69 70+
![Page 9: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/9.jpg)
The 2 statistic for measuring dependence
amongst two categorical variables
DefineTotal row
1
thc
jiji ixR
1
column Totalc
thj ij
i
C x j
n
CRE ji
ij
= Expected frequency in the (i,j) th cell in the case of independence.
![Page 10: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/10.jpg)
Columns
1 2 3 4 5 Total
1 x11 x12 x13 x14 x15 R1
2 x21 x22 x23 x24 x25 R2
3 x31 x32 x33 x34 x35 R3
4 x41 x42 x43 x44 x45 R4
Total C1 C2 C3 C4 C5 N
Total row 1
thc
jiji ixR
1
column Totalc
thj ij
i
C x j
![Page 11: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/11.jpg)
Columns
1 2 3 4 5 Total
1 E11 E12 E13 E14 E15 R1
2 E21 E22 E23 E24 E25 R2
3 E31 E32 E33 E34 E35 R3
4 E41 E42 E43 E44 E45 R4
Total C1 C2 C3 C4 C5 n
n
CRE ji
ij
![Page 12: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/12.jpg)
Justification if i jij
R CE
n then ij j
i
E C
R n
1 2 3 4 5 Total
1 E11 E12 E13 E14 E15 R1
2 E21 E22 E23 E24 E25 R2
3 E31 E32 E33 E34 E35 R3
4 E41 E42 E43 E44 E45 R4
Total C1 C2 C3 C4 C5 n
Proportion in column j for row i
overall proportion in column j
![Page 13: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/13.jpg)
and if i jij
R CE
n then ij i
j
E R
C n
1 2 3 4 5 Total
1 E11 E12 E13 E14 E15 R1
2 E21 E22 E23 E24 E25 R2
3 E31 E32 E33 E34 E35 R3
4 E41 E42 E43 E44 E45 R4
Total C1 C2 C3 C4 C5 n
Proportion in row i for column j
overall proportion in row i
![Page 14: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/14.jpg)
The 2 statistic
r
i
c
j ij
ijij
E
Ex
1 1
2
2
Eij= Expected frequency in the (i,j) th cell in the case of independence.
xij= observed frequency in the (i,j) th cell
![Page 15: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/15.jpg)
Example: studying the relationship between Systolic Blood pressure and Serum Cholesterol
In this example we are interested in whether Systolic Blood pressure and Serum Cholesterol are related or whether they are independent.
Both were measured for a sample of n = 1237 cases
![Page 16: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/16.jpg)
Serum Cholesterol
Systolic Blood pressure <127 127-146 147-166 167+ Total
< 200 117 121 47 22 307200-219 85 98 43 20 246220-259 115 209 68 43 439
260+ 67 99 46 33 245
Total 388 527 204 118 1237
Observed frequencies
![Page 17: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/17.jpg)
Serum Cholesterol
Systolic Blood pressure <127 127-146 147-166 167+ Total
< 200 96.29 130.79 50.63 29.29 307200-219 77.16 104.8 40.47 23.47 246220-259 137.70 187.03 72.40 41.88 439
260+ 76.85 104.38 40.04 23.37 245
Total 388 527 204 118 1237
Expected frequencies
In the case of independence the distribution across a row is the same for each rowThe distribution down a column is the same for each column
![Page 18: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/18.jpg)
Table Expected frequencies, Observed frequencies, Standardized Residuals
Serum Systolic Blood pressure
Cholesterol <127 127-146 147-166 167+ Total <200 96.29 130.79 50.63 29.29 307 (117) (121) (47) (22) 2.11 -0.86 -0.51 -1.35 200-219 77.16 104.80 40.47 23.47 246 (85) (98) (43) (20) 0.86 -0.66 0.38 -0.72 220-259 137.70 187.03 72.40 41.88 439 (119) (209) (68) (43) -1.59 1.61 -0.52 0.17 260+ 76.85 104.38 40.04 23.37 245 (67) (99) (46) (33) -1.12 -0.53 0.88 1.99 Total 388 527 204 118 1237
2 = 20.85
ij
ijijij
E
Exr
![Page 19: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/19.jpg)
Standardized residuals
ij
ijijij
E
Exr
85.20
1 1
2
1 1
2
2
r
i
c
jij
r
i
c
j ij
ijij rE
Ex
The 2 statistic
![Page 20: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/20.jpg)
Properties of the 2 statistic
1. The 2 statistic is always positive.
2. Small values of 2 indicate that Rows and Columns are independent. In this case will be in the range of (r – 1)(c – 1).
3. Large values of 2 indicate that Rows and columns are not independent.
4. Later on we will discuss this in more detail (when we study Hypothesis Testing).
![Page 21: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/21.jpg)
Example
This comes from the drug use data.
The two variables are:
1. Role (C) and
2. Antidepressant Use (R)
measured for a sample of n = 33,957 subjects.
![Page 22: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/22.jpg)
Two-way Frequency Table
Percentage antidepressant use vs Role
Took anti-depressants - 12 mo * role Crosstabulation
Count
344 101 201 275 455 63 224 414 2077
6268 967 1150 5150 5249 392 3036 2679 24891
6612 1068 1351 5425 5704 455 3260 3093 26968
YES
NO
Took anti-depressants- 12 mo
Total
parent,partner,worker
parent,partner parent, worker
partner,worker worker only parent only partner only no roles
role
Total
Role parent, partner, worker
parent, partner
parent, worker
partner, worker
worker only parent only
partner only no roles
5.20% 9.46% 14.88% 5.07% 7.98% 13.85% 6.87% 13.39%
![Page 23: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/23.jpg)
Antidepressant Use vs Role
0.0%
5.0%
10.0%
15.0%
20.0%
parent,partner,worker
parent,partner
parent,worker
partner,worker
workeronly
parentonly
partneronly
no roles
2 = 381.961
![Page 24: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/24.jpg)
Calculation of 2
1 2 3 4 5 6 7 8 Total
YES 344 101 201 275 455 63 224 414 2077NO 6268 967 1150 5150 5249 392 3036 2679 24891
Total 6612 1068 1351 5425 5704 455 3260 3093 26968
The Raw data
Expected frequencies1 2 3 4 5 6 7 8 Total (R i )
YES 509.24 82.25 104.05 417.82 439.31 35.04 251.08 238.21 2077NO 6102.76 985.75 1246.95 5007.18 5264.69 419.96 3008.92 2854.79 24891
Total (C j ) 6612 1068 1351 5425 5704 455 3260 3093 26968
ij
ijijij
E
Exr
i jij
R CE
n
![Page 25: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/25.jpg)
The Residuals
The calculation of 2
ij
ijijij
E
Exr
1 2 3 4 5 6 7 8
YES -7.32 2.07 9.50 -6.99 0.75 4.72 -1.71 11.39NO 2.12 -0.60 -2.75 2.02 -0.22 -1.36 0.49 -3.29
2
2 2 381.961ij ij
iji j i j ij
x Er
E
![Page 26: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/26.jpg)
Probability Theory
Modelling random phenomena
![Page 27: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/27.jpg)
Some counting formulae
Permutations
the number of ways that you can order n objects is:
n! = n(n-1)(n-2)(n-3)…(3)(2)(1)
Example:
the number of ways you can order the three letters A, B, and C is 3! = 3(2)(1) = 6
ABC ACB BAC BCA CAB CBA
![Page 28: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/28.jpg)
Definition
0! = 1
Reason
mathematical consistency.
In many of the formulae given later, this definition leads to consistency.
![Page 29: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/29.jpg)
Permutations
the number of ways that you can choose k objects from n objects in a specific order:
Example:
the number of ways you choose two letters from the four letters A, B, D, C in a specific order is
)1()1()!(
!
knnn
kn
nPkn
12)3)(4(!2
!4
)!24(
!424
P
![Page 30: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/30.jpg)
AB BA AC CA AD DA
BC CB BD DB CD DC
![Page 31: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/31.jpg)
Example:
Suppose that we have a committee of 10 people. We want to choose a chairman, a vice-chairman, and a treasurer for the committee. The chairman is chosen first, the vice chairman second and the treasures third. How many ways can this be done.
)1()1()!(
!
knnn
kn
nPkn
10 3
10! 10!(10)(9)(8) 720
(10 3)! 7!P
![Page 32: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/32.jpg)
Example:
How many ways can we order n objects.
Answern!
or
Choose n objects from n objects in a specific order
! !! if 0! 1.
( )! 0!n n
n nP n
n n
This is what is meant by the statement that the definition 0! = 1 leads to mathematical consistency
![Page 33: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/33.jpg)
Combinations
the number of ways that you can choose k objects from n objects (order irrelevant) is:
)1()1(
)1()1(
)!(!
!
kk
knnn
knk
n
k
nCkn
![Page 34: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/34.jpg)
Example:
the number of ways you choose two letters from the four letters A, B, D, C
{A,B} {A,C} {A,D} {B,C} {B,D}{C,D}
62
12
)1)(2(
)3)(4(
!2!2
!4
)!24(!2
!4
2
424
C
![Page 35: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/35.jpg)
Example:
Suppose we have a committee of 10 people and we want to choose a sub-committee of 3 people. How many ways can this be done
45)1)(2)(3(
)3)(9)(10(
!7!3
!10
3
10310
C
![Page 36: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/36.jpg)
Example: Random sampling
Suppose we have a club of N =1000 persons and we want to choose sample of k = 250 of these individuals to determine there opinion on a given issue. How many ways can this be performed?
The choice of the sample is called random sampling if all of the choices has the same probability of being selected
2422501000 10823.4
!750!250
!1000
250
1000
C
![Page 37: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/37.jpg)
Important Note:
0! is always defined to be 1.
Also
are called Binomial Coefficients
)!(!
!
knk
n
k
nCkn
![Page 38: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/38.jpg)
Reason:
The Binomial Theorem
nyx
0222
111
00 yxCyxCyxCyxC n
nnn
nn
nn
n
022110
210yx
n
nyx
nyx
nyx
n nnnn
![Page 39: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/39.jpg)
Binomial Coefficients can also be calculated using Pascal’s triangle
11 1
1 2 11 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
![Page 40: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/40.jpg)
Random Variables
Probability distributions
![Page 41: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/41.jpg)
Definition:
A random variable X is a number whose value is determined by the outcome of a random experiment (random phenomena)
![Page 42: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/42.jpg)
Examples1. A die is rolled and X = number of spots
showing on the upper face.2. Two dice are rolled and X = Total number
of spots showing on the two upper faces.3. A coin is tossed n = 100 times and
X = number of times the coin toss resulted in a head.
4. A person is selected at random from a population and
X = weight of that individual.
![Page 43: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/43.jpg)
5. A sample of n = 100 individuals are selected at random from a population (i.e. all samples of n = 100 have the same probability of being selected) .
X = the average weight of the 100 individuals.
![Page 44: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/44.jpg)
In all of these examples X fits the definition of a random variable, namely:– a number whose value is determined by the
outcome of a random experiment (random phenomena)
![Page 45: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/45.jpg)
Random variables are either
• Discrete– Integer valued – The set of possible values for X are integers
• Continuous– The set of possible values for X are all real
numbers – Range over a continuum.
![Page 46: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/46.jpg)
Examples
• Discrete
– A die is rolled and X = number of spots showing on the upper face.
– Two dice are rolled and X = Total number of spots showing on the two upper faces.
– A coin is tossed n = 100 times and X = number of times the coin toss resulted in a head.
![Page 47: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/47.jpg)
Examples
• Continuous– A person is selected at random from a
population and X = weight of that individual.– A sample of n = 100 individuals are selected
at random from a population (i.e. all samples of n = 100 have the same probability of being selected) . X = the average weight of the 100 individuals.
![Page 48: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/48.jpg)
Probability distribution of a Random Variable
![Page 49: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/49.jpg)
The probability distribution of a discrete random variable is describe by its :
probability function p(x).
p(x) = the probability that X takes on the value x.
![Page 50: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/50.jpg)
Examples
• Discrete
– A die is rolled and X = number of spots showing on the upper face.
– Two dice are rolled and X = Total number of spots showing on the two upper faces.
x 1 2 3 4 5 6
p(x) 1/6 1/6 1/6 1/6 1/6 1/6
x 2 3 4 5 6 7 8 9 10 11 12p(x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
![Page 51: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/51.jpg)
Graphs
To plot a graph of p(x), draw bars of height p(x) above each value of x.
Rolling a die
0
1 2 3 4 5 6
![Page 52: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/52.jpg)
Rolling two dice
0
![Page 53: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/53.jpg)
Note:1. 0 p(x) 1
2.
3.
x
xp 1
b
ax
xpbXaP )(
![Page 54: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/54.jpg)
The probability distribution of a continuous random variable is described by its :
probability density curve f(x).
![Page 55: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/55.jpg)
i.e. a curve which has the following properties :• 1. f(x) is always positive.
• 2. The total are under the curve f(x) is one.
• 3. The area under the curve f(x) between a and b is the probability that X lies between the two values.
![Page 56: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/56.jpg)
0
0.005
0.01
0.015
0.02
0.025
0 20 40 60 80 100 120
f(x)
![Page 57: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/57.jpg)
An Important discrete distribution
The Binomial distribution
Suppose we have an experiment with two outcomes – Success(S) and Failure(F).
Let p denote the probability of S (Success).
In this case q=1-p denotes the probability of Failure(F).
Now suppose this experiment is repeated n times independently.
![Page 58: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/58.jpg)
Let X denote the number of successes occuring in the n repititions.
Then X is a random variable.
It’s possible values are
0, 1, 2, 3, 4, … , (n – 2), (n – 1), n
and p(x) for any of the above values of x is given by:
xnxxnx qpx
npp
x
nxp
1
![Page 59: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/59.jpg)
X is said to have the Binomial distribution with parameters n and p.
![Page 60: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/60.jpg)
Summary:
X is said to have the Binomial distribution with parameters n and p.
1. X is the number of successes occuring in the n repititions of a Success-Failure Experiment.
2. The probability of success is p.
3. xnx pp
x
nxp
1
![Page 61: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/61.jpg)
Examples:
1. A coin is tossed n = 5 times. X is the number of heads occuring in the 5 tosses of the coin. In this case p = ½ and
3215
215
21
21
555
xxxxp xx
x 0 1 2 3 4 5
p(x)321
325
325
321
3210
3210
![Page 62: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/62.jpg)
Random Variables
Numerical Quantities whose values are determine by the outcome of a
random experiment
![Page 63: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/63.jpg)
Discrete Random VariablesDiscrete Random Variable: A random variable usually assuming an integer value.
• a discrete random variable assumes values that are isolated points along the real line. That is neighbouring values are not “possible values” for a discrete random variable
Note: Usually associated with counting• The number of times a head occurs in 10 tosses of a coin
• The number of auto accidents occurring on a weekend
• The size of a family
![Page 64: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/64.jpg)
Continuous Random Variables
Continuous Random Variable: A quantitative random variable that can vary over a continuum
• A continuous random variable can assume any value along a line interval, including every possible value between any two points on the line
Note: Usually associated with a measurement• Blood Pressure
• Weight gain
• Height
![Page 65: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/65.jpg)
Probability Distributionsof a Discrete Random Variable
![Page 66: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/66.jpg)
Probability Distribution & Function
Probability Distribution: A mathematical description of how probabilities are distributed with each of the possible values of a random variable.
Notes: The probability distribution allows one to determine probabilities
of events related to the values of a random variable. The probability distribution may be presented in the form of a
table, chart, formula.
Probability Function: A rule that assigns probabilities to the values of the random variable
![Page 67: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/67.jpg)
x 0 1 2 3
p(x) 6/14 4/14 3/14 1/14
ExampleIn baseball the number of individuals, X, on base when a home run is hit ranges in value from 0 to 3. The probability distribution is known and is given below:
P X( )the random variable equals 2 p ( ) 23
14
Note: This chart implies the only values x takes on are 0, 1, 2, and 3. If the random variable X is observed repeatedly the probabilities,
p(x), represents the proportion times the value x appears in that sequence.
2least at is variablerandom the XP 32 pp 14
4
14
1
14
3
![Page 68: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/68.jpg)
A Bar Graph
No. of persons on base when a home run is hit
0.429
0.286
0.214
0.071
0.000
0.100
0.200
0.300
0.400
0.500
0 1 2 3
# on base
p(x)
![Page 69: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/69.jpg)
Comments:Every probability function must satisfy:
1)(0 xp
1. The probability assigned to each value of the random variable must be between 0 and 1, inclusive:
x
xp
1)(
2. The sum of the probabilities assigned to all the values of the random variable must equal 1:
b
ax
xpbXaP )(3.
)()1()( bpapap
![Page 70: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/70.jpg)
Mean and Variance of aDiscrete Probability Distribution
• Describe the center and spread of a probability distribution
• The mean (denoted by greek letter (mu)), measures the centre of the distribution.
• The variance (2) and the standard deviation () measure the spread of the distribution.
is the greek letter for s.
![Page 71: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/71.jpg)
Mean of a Discrete Random Variable• The mean, , of a discrete random variable x is found by
multiplying each possible value of x by its own probability and then adding all the products together:
Notes: The mean is a weighted average of the values of X.
x
xxp
kk xpxxpxxpx 2211
The mean is the long-run average value of the random variable.
The mean is centre of gravity of the probability distribution of the random variable
![Page 72: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/72.jpg)
-
0.1
0.2
0.3
1 2 3 4 5 6 7 8 9 10 11
![Page 73: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/73.jpg)
2
Variance and Standard DeviationVariance of a Discrete Random Variable: Variance, 2, of a discrete random variable x is found by multiplying each possible value of the squared deviation from the mean, (x )2, by its own probability and then adding all the products together:
Standard Deviation of a Discrete Random Variable: The positive square root of the variance:
x
xpx 22
2
2
xx
xxpxpx
22 x
xpx
![Page 74: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/74.jpg)
ExampleThe number of individuals, X, on base when a home run is hit ranges in value from 0 to 3.
x p (x ) xp(x) x 2 x 2 p(x)
0 0.429 0.000 0 0.0001 0.286 0.286 1 0.2862 0.214 0.429 4 0.8573 0.071 0.214 9 0.643
Total 1.000 0.929 1.786
)(xp )(xxp )(2 xpx
![Page 75: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/75.jpg)
• Computing the mean:
Note: • 0.929 is the long-run average value of the random variable • 0.929 is the centre of gravity value of the probability
distribution of the random variable
929.0x
xxp
![Page 76: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/76.jpg)
• Computing the variance:
x
xpx 22
2
2
xx
xxpxpx
923.0929.786.1 2
• Computing the standard deviation:
2
961.0923.0
![Page 77: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/77.jpg)
The Binomial distribution1. We have an experiment with two outcomes
– Success(S) and Failure(F).
2. Let p denote the probability of S (Success).
3. In this case q=1-p denotes the probability of Failure(F).
4. This experiment is repeated n times independently.
5. X denote the number of successes occuring in the n repititions.
![Page 78: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/78.jpg)
The possible values of X are
0, 1, 2, 3, 4, … , (n – 2), (n – 1), n
and p(x) for any of the above values of x is given by:
xnxxnx qpx
npp
x
nxp
1
X is said to have the Binomial distribution with parameters n and p.
![Page 79: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/79.jpg)
Summary:
X is said to have the Binomial distribution with parameters n and p.
1. X is the number of successes occurring in the n repetitions of a Success-Failure Experiment.
2. The probability of success is p.
3. The probability function
xnx ppx
nxp
1
![Page 80: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/80.jpg)
Example:
1. A coin is tossed n = 5 times. X is the number of heads occurring in the 5 tosses of the coin. In this case p = ½ and
3215
215
21
21
555
xxxxp xx
x 0 1 2 3 4 5
p(x)321
325
325
321
3210
3210
![Page 81: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/81.jpg)
0.0
0.1
0.2
0.3
0.4
1 2 3 4 5 6
number of heads
p(x
)
![Page 82: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/82.jpg)
Computing the summary parameters for the distribution – , 2,
x p (x ) xp(x) x 2 x 2 p(x)
0 0.03125 0.000 0 0.0001 0.15625 0.156 1 0.1562 0.31250 0.625 4 1.2503 0.31250 0.938 9 2.8134 0.15625 0.625 16 2.5005 0.03125 0.156 25 0.781
Total 1.000 2.500 7.500
)(xp )(xxp )(2 xpx
![Page 83: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/83.jpg)
• Computing the mean: 5.2
x
xxp
• Computing the variance:
x
xpx 22
2
2
xx
xxpxpx
25.15.25.7 2
• Computing the standard deviation:
2
118.125.1
![Page 84: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/84.jpg)
Example:
• A surgeon performs a difficult operation n = 10 times.
• X is the number of times that the operation is a success.
• The success rate for the operation is 80%. In this case p = 0.80 and
• X has a Binomial distribution with n = 10 and p = 0.80.
![Page 85: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/85.jpg)
xx
xxp
1020.080.0
10
x 0 1 2 3 4 5p (x ) 0.0000 0.0000 0.0001 0.0008 0.0055 0.0264
x 6 7 8 9 10p (x ) 0.0881 0.2013 0.3020 0.2684 0.1074
Computing p(x) for x = 1, 2, 3, … , 10
![Page 86: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/86.jpg)
The Graph
-
0.1
0.2
0.3
0.4
0 1 2 3 4 5 6 7 8 9 10
Number of successes, x
p(x
)
![Page 87: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/87.jpg)
Computing the summary parameters for the distribution – , 2,
)(xxp )(2 xpx
x p (x ) xp(x) x 2 x 2 p(x)
0 0.0000 0.000 0 0.0001 0.0000 0.000 1 0.0002 0.0001 0.000 4 0.0003 0.0008 0.002 9 0.0074 0.0055 0.022 16 0.0885 0.0264 0.132 25 0.6616 0.0881 0.528 36 3.1717 0.2013 1.409 49 9.8658 0.3020 2.416 64 19.3279 0.2684 2.416 81 21.743
10 0.1074 1.074 100 10.737Total 1.000 8.000 65.600
![Page 88: The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables.](https://reader036.fdocuments.us/reader036/viewer/2022081603/56649f325503460f94c4e40b/html5/thumbnails/88.jpg)
• Computing the mean: 0.8
x
xxp
• Computing the variance:
x
xpx 22
2
2
xx
xxpxpx
60.10.86.65 2
• Computing the standard deviation:
2 118.125.1