Quantitative Methods Partly based on materials by Sherry O’Sullivan

28
Quantitative Methods Partly based on materials by Sherry O’Sullivan Part 3 Chi - Squared Statistic

description

Quantitative Methods Partly based on materials by Sherry O’Sullivan. Part 3 Chi - Squared Statistic. Recap on T-Statistic. It used the mean and standard error of a population sample The data is on an “interval” or scale Mean and standard error are the parameters - PowerPoint PPT Presentation

Transcript of Quantitative Methods Partly based on materials by Sherry O’Sullivan

Page 1: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Quantitative Methods

Partly based on materials by Sherry O’Sullivan

Part 3Chi - Squared Statistic

Page 2: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Recap on T-Statistic

• It used the mean and standard error of a population sample

• The data is on an “interval” or scale• Mean and standard error are the parameters• This approach is known as parametric • Another approach is non-parametric testing

Page 3: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Introduction to Chi-Squared

• It does not use the mean and standard error of a population sample

• Each respondent can only choose one category (unlike scale in t-Statistic)

• The expected frequency must be greater than 5 in each category for the test to succeed.

• If any of the categories have less than 5 for the expected frequency, then you need to increase your sample size– Or merge categories

Page 4: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Example using Chi-Squared

• “Is there a preference amongst the UW student population for a particular web browser? “ (Dr C Price’s Data)– They could only indicate one choice– These are the observed frequencies responses

from the sample– This is called a ‘contingency table’

Firefox IExplorer Safari Chrome Opera

Observed frequencies

30 6 4 8 2

Page 5: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Was it just chance?• How confident am I?

– Was the sample representative of all UW students?

– Was the variation in the measurements just chance?

• Chi-Squared test for significance– Several ways to use the test– Simplest is Null Hypothesis

• H0: The students show “no preference” for a particular browser

Page 6: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Chi-Squared: “Goodness of fit” (No preference)

• H0: The students show “no preference” for a particular browser

• This leads to Hypothetical or Expected distribution of frequency– We would expect an equal number of

respondents per category– We had 50 respondents and 5 categories

Firefox IExplorer Safari Chrome Opera

Expected frequencies

10 10 10 10 10

Expected frequency table

Page 7: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Stage1: Formulation of Hypothesis

• H0: There is no preference in the underlying population for the factor suggested.

• H1: There is a preference in the underlying population for the factors suggested.

• The basis of the chi-squared test is to compare the observed frequencies against the expected frequencies

Page 8: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Stage 2: Expected Distribution

• As our “null- hypothesis” is no preference, we need to work out the expected frequency:– You would expect each category to have the

same amount of respondents– Show this in “Expected frequency” table– Each expected frequency must be more than 5

to be validFirefox IExplorer Safari Chrome Opera

Expected frequencies

10 10 10 10 10

Page 9: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Stage 3a: Level of confidence

• Choose the level of confidence (often 0.05; sometimes 0.01)– 0.05 means that there is 5% chance that

conclusion is chance– 95% chance that our conclusions are accurate

Stage 3b: Degree of freedomStage 3b: Degree of freedom

We need to find the degree of freedomThis is calculated with the number of

categories◦We had 5 categories, df = 5-1 (4)

Page 10: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Stage 3b: Critical value of Chi-Squared

• In order to compare our calculated chi-square value with the “critical value” in the chi-squared table we need:

– Level of confidence (0.05)– Degree of freedom (4)

• Our critical value from the table = 9.49

Page 11: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Chi-Squared Table from http://ourwayit.com/CA517/LearningActivities.htm

Page 12: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Stage 4: Calculate statistics• We find the differences between the observed

and the expected values for each category• We square each difference, and divide the

answer by its expected frequency• We add all of them up

Firefox IExplorer Safari Chrome Opera

Observed 30 6 4 8 2

Expected 10 10 10 10 10

= 52

Page 13: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Stage 5: Decision

• Can we reject the H0 that students show no preference for a particular browser?– Our value of 52 is way beyond 9.49. We are (at least)

95% confident the value did not occur by chance– And probably much more confident than that

• So yes we can safely reject the null hypothesis• Which browser do they prefer?

– Firefox as it is way above expected frequency of 10

Page 14: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Alternative Method

• Outline: Calculate chi-squared, and use the table to find the confidence

• In this case, calculated Χ2 = 52• Go to the appropriate row of the table, and

look across for the highest value that is LOWER than the measured value

• The top of that column gives our confidence that the effect is real

Page 15: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Chi-Squared Table from http://ourwayit.com/CA517/LearningActivities.htm

•The probability of this result happening by chance is less than 0.001•We can be at least 99.9% confident of our result

Page 16: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Chi-Squared: “No Difference from a Comparison Population”.

• RQ: Are drivers of high performance cars more likely to be involved in accidents?– Sample n = 50 and Market Research data of

proportion of people driving these categories

High Performance

Compact Midsize Full size

FO = observed accident frequency

20 14 9 9

Ownership (%) 10% 40% 30% 20%

Page 17: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Contingency Table– Null hypothesis H0: type of car has no effect on

accident frequency– Once the expected frequencies (under the null

hypothesis) have been calculated, the analysis is the same as the ‘no preference’ calculation

High Performance

Compact Midsize Full size

FO = observed accident frequency

20 14 9 9

Ownership (%) 10% 40% 30% 20%FE = expected accident frequency

5 (10% of 50) 20 15 10

Page 18: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Chi-Squared test for “Independence”.

• What makes computer games fun? • Review found the following

– Factors (Mastery, Challenge and Fantasy)– Is there a different opinion depending on

gender?• Research sample of 50 males and 50 females

Mastery Challenge Fantasy

Male 10 32 8

Female 24 8 18

Observed frequency table

Page 19: Quantitative Methods Partly based on materials by Sherry O’Sullivan

What is the research question?1. A single sample with individuals

measured on 2 variables– RQ: ”Is there a relationship between fun factor

and gender?”– HO : “There is no such relationship”

2. Two separate samples representing 2 populations (male and female)– RQ: ““Do male and female players have different

preferences for fun factors?”– HO : “Male and female players do not have

different preferences”

Page 20: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Chi-Squared analysis for “Independence”.

• Establish the null hypothesis (previous slide)• Determine the critical value of chi-squared

dependent on the confidence limit (0.05) and the degrees of freedom.– df = (Rows – 1)*(Columns – 1) = 1 * 2 = 2 (R=2, C=3)

• Look up in chi-squared table– Critical chi-squared value = 5.99

Mastery Challenge Fantasy

Male 10 32 8

Female 24 8 18

Page 21: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Chi-Squared Table from http://ourwayit.com/CA517/LearningActivities.htm

Page 22: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Chi-Squared analysis for “Independence”.

• Calculate the expected frequencies– Add each column and divide by types (in this case 2)– Easier if you have equal number for each gender (if

not come and see me)

Mastery Challenge Fantasy Respondents

Male (FObs) 10 32 8 50

Female (FObs) 24 8 18 50

Cat total 34 40 26

Male (FExp) 17 20 13

Female (FExp) 17 20 13

Page 23: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Chi-Squared analysis for “Independence”.

• Calculate the statistics using the chi-squared formula– Ensure you include both male and female data

Mastery Challenge Fantasy

Male (FObs) 10 32 8

Female (FObs) 24 8 18

Male (FExp) 17 20 13

Female (FExp) 17 20 13

2 2 2 22 (10 17) (32 20) (24 17) (8 20)

...17 20 17 20

24.01

Page 24: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Stage 5: Decision• Can we reject the null hypothesis?

– Our value of 24.01 is way beyond 5.99. We are 95% confident the value did not occur by chance

• Conclusion: We are 95% confident that there is a relationship between gender and fun factor

• But else can we get from this?– Significant fun factor for males = Challenge– Significant fun factor for females = Mastery and Fantasy

Mastery Challenge Fantasy

Male (FObs) 10 32 8

Female (FObs) 24 8 18

Male (FExp) 17 20 13

Female (FExp) 17 20 13

Page 25: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Alternative Method:

• Outline: Calculate chi-squared, and use the table to find the confidence

• In this case, calculated Χ2 = 24.01• Go to the appropriate row of the table, and

look across for the highest value that is LOWER than the measured value

• The top of that column gives our confidence that the effect is real

Page 26: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Chi-Squared Table from http://ourwayit.com/CA517/LearningActivities.htm

•The probability of this result happening by chance is less than 0.001•We can be at least 99.9% confident of our result

Page 27: Quantitative Methods Partly based on materials by Sherry O’Sullivan

Computers

• A computer can be used to calculate the expected values – but you have to tell it how– Use formulae in Excel

• Then the computer will calculate the p value for you– p = probability that the observed difference is due

to chance – There is a nice command in Excel that will do this

Page 28: Quantitative Methods Partly based on materials by Sherry O’Sullivan

End