Post on 21-Dec-2015
Chi Square Test
Dealing with categorical dependant variable
So Far:
Continuous DV
Categorical DV
Categorical IV
Continuous IV
•T-test•ANOVA
•Correlation•Regression
Categorical IV
•CHI Square
Pearson Chi-Square:
•Frequencies No mean and SD 2
statisticsNo assumption of normality Non-parametric test
Chi-Square test for goodness of fit
50 30 30 10
Observed Frequencies
-Is the frequency of balls with different colors equal in our bag?
25% 25% 25% 25%
Expected Frequencies
Chi-Square test for goodness of fit
50 30 30 10
Observed Frequencies
25% 25% 25% 25%
Expected Frequencies
120
120
Total
=
30 30 30 30
Expected Frequencies
H0
Chi-Square test for goodness of fit
50 30 30 10
Observed Frequencies
30 30 30 30
Expected Frequencies
€
2 =( f0− fe)
2
fe∑
€
2 =(50 − 30)2
30+
(30 − 30)2
30+
(30 − 30)2
30+
(10 − 30)2
30= 26.6
Difference
Normalize
Chi-Square test for goodness of fit
€
2=26.6
€
df =C−1= 4−1= 3
25% 25% 25% ? 100
TotalFixed = 25%
€
(1/2)k /2
Γ(k /2)xk /2 −1e−x /2
Chi-Square test for goodness of fit
€
2=26.6
€
df =C−1= 4−1= 3
Critical value = 7.81
26.6
2(3,n=120) = 26.66, p< 0.001
Chi-Square test for Goodness of fit
•Chi-Square test for goodness of fit is like one sample t-test
•You can test your sample against any possible expected values
25% 25% 25% 25%
10% 10% 10% 70%
H0
H0
Chi-Square test for independence
•When we have tow or more sets of categorical data (IV,DV both categorical)
10 50 35
15 60 40
Male
Female
None
Obama McCain
95
115
25 110 75 210
FO
Chi-Square test for independence•Also called contingency table analysis
•H0: There is no relation between gender and voting preference (like correlation)
OR
•H0: There is no difference between the voting preference of males and females (like t-test)
•The logic is the same as the goodness of fit test: Comparing observed freq and Expected freq if the two variables were independent
Chi-Square test for independence
10 50 35
15 60 40
Male
Female
None
Obama McCain
95
115
25 110 75 210
FO
Male
Female
None
Obama McCain
12% 52% 36% 100%
FE
Chi-Square test for independenceIn case of
independence:
12% 52% 36%
12% 52% 36%
Male
Female
None
Obama McCain
12% 52% 36% 100%
FE
95
115
Finaly:
11.4 49.4 34.2
13.8 59.8 41.4
Male
Female
None
Obama McCainFE
Chi-Square test for independence
•Anotehr way:
Male
Female
None
Obama McCain
95
25210
FE
95 x 25 210
€
fe= fc× fr
n= column×row
total
Chi-Square test for independence
•Now we can calculate the chi square value :
11.4 49.4 34.2
13.8 59.8 41.4
FE
10 50 35
15 60 40
FO
€
2 =( f0− fe)
2
fe∑
€
2 =(10 −11.4)2
11.4+
(15 −13.8)2
13.8+ ...= 0.35
€
df =(C−1)×(R−1)=(3−1)×(2−1)= 2
Chi-Square test for independence
€
df =(C−1)×(R−1)=(3−1)×(2−1)= 2
11.4 49.4 Fixed
Fixed Fixed Fixed
Male
Female
None
Obama McCain
95
115
25 110 75 210
FE
Chi-Square test for independence
2(2, n=210) = 0.35, p= 0.83
There is no significant effect of gender on vote preference
Or
We cannot reject the null hypothesis that gender and vote preference are independent
Effect size in Chi square
•For a 2 x 2 table -> Phi Coefficient
•For larger tables -> Cramer’s V coeffiecient€
φ= 2
n
€
V =χ 2
n×df *
Correlation between two categorical variables
Df* is the smallest of C-1, R-1
Phi of 0.1 small, 0.3 medium, 0.5 large
Assumptions of Chi Square
•Independence of observations each subject in only one category
•Size of expected frequencies: be cautious with small cell frequencies
•No assumption of Normality: Nonparametric test
Likelihood ratio test: an alternative
•Instead of using Chi-Square, when dealing with categorical data we can calculate log likelihood ratio:
€
G= 2× f0×ln(fofe
∑ )
•A ration of observed and expected frequencies
Likelihood ratio test: an alternative
€
G= 2× f0×ln(fofe
∑ )11.4 49.4 34.2
13.8 59.8 41.4
FE
10 50 35
15 60 40
FO
€
G = 2 × (10 × ln(10
11.4) +15 × ln(
15
13.8) + ...) = 0.355
•Follows a Chi-square distribution with df of (R-1)(C-1)
Chi Square test with rank ordered data
€
M 2 =(N−1)r2
10 50 35
15 60 40
•Rank order your data for the two variables•Get the correlation of the two variables: Spearman r•Calculate chi Square as follows:
1
2
1 2 3
Anxiety Level
1 1 1
2 3 2
3 2 2
4 2 1
5 1 1
6 2 1
7 1 2
S A G