Lesson 2 Chi squared

23
Lesson Layout Theory Introduction Worked Example Application Past paper questions Further application & review

description

 

Transcript of Lesson 2 Chi squared

Page 1: Lesson 2 Chi squared

Lesson Layout

Theory› Introduction› Worked Example› Application

Past paper questions Further application & review

Page 2: Lesson 2 Chi squared

Statistics 2 – (Chi square hypothesis testing)IB Mathematical Studies SL

Page 3: Lesson 2 Chi squared

Syllabus reference

Content Detail

Page 4: Lesson 2 Chi squared

Hypothesis Testing

Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample, performing a specific mathematical test, and then deciding whether or not the conjecture is true.

A conjecture must be stated in two parts:› The null hypothesis (H0) – states that there is no significant

difference between the two parameters being tested (they are “not related to” each other, i.e. independent)

› The alternative hypothesis (H1) states that this is a significant difference.(they are “related” in some way, i.e dependent)

The only hypothesis test covered by the Studies SL course is the Chi Squared test.

Page 5: Lesson 2 Chi squared

Chi-square (X 2) test by GDC

The Chi-square test itself is quite straight forward, your GDC can do it in two steps but you also must know the formula and be able to do it by hand

The hypothesis test which uses Chi-square determines whether or not two variables are related. It follows a general pattern:(1) Make a conjecture(2) Write the null hypothesis using “is not related to, or “independent”;

and write the alternative hypothesis using is related to or “dependent”

(3) Calculate the chi-square test(4) Determine reference values(5) Compare the two and either accept or reject the null hypothesis

Page 6: Lesson 2 Chi squared

Using the GDC

You can find chi-squared on your GDC by using the statistics mode Press F3 {Test} Press F3 again for {Chi}

Note : You must have entered the data in to Matrix A from Matrix mode first!!

Page 7: Lesson 2 Chi squared

Example 1 - Question A researcher conjectures

that seat belt usage, for drivers, is related to gender.She gathers data by recording seat belt usage at several randomly selected intersections. The data has been recorded in the table as shown.

Construct a chi-square hypothesis test to determine if there is enough evidence to support the researcher’s conjecture.

Seat belt usage

Gender Yes No

Female 50 25

Male 40 45

This type of table is called a

contingency table.

(It’s what you put in matrix A on the

GDC)

Page 8: Lesson 2 Chi squared

Example 1 - Solution Since the conjecture has already

been made we can start at step (2) – write the null and alternative hypotheses› H0 – Seat belt usage is not related to

gender › H1 – Seat belt usage is related to

gender Step (3) – calculate chi-square

› Enter the data into Matrix A, from the Run screen press F1 {MAT}

› Enter the dimensions for matrix A which are 2 x 2, then press EXE

› Enter the contigency table data in to the matrix, press EXIT twice to go back to the RUN screen

› In STAT mode, press F3 then F3 again› Highlight EXECUTE and press EXE

Seat belt usage

Gender Yes No

Female 50 25

Male 40 45

Exam hint – you must also be able to do a

contingency table by hand. The largest to be

tested will be 4 x 4.

X 2 TestX 2 =

6.22471211p =

0.01259793df = 1

Page 9: Lesson 2 Chi squared

Example 1 - Solution Step (4) – determine reference

values› There are two reference values of

importance, the p-value (which was calculated during the chi-square test) and the Critical Value which you read off the CV distribution table on your IB formula sheet.

› In this case assume α=0.01 (1%) Step (5) – make a comparison

between either› p-value and the significance levelOR› Chi-square test and the Critical Value

Hence p-value > alpha level, since 0.0126 > 0.01

In other words , we accept the null hypothesis that there is no relationship between seat belt usage and gender.

Exam hint – the only significance levels that will be tested are 1%,

5% and 10%.

If p-value > α level, then we can accept

H0 I.e there is not

enough evidence to reject it

Page 10: Lesson 2 Chi squared

Example 2 - Question From what Lauren

observed, she believes that the number of hours exercised per week is dependent on gender. She collected data randomly and organised the results in the table shown.

Determine whether there is enough evidence to accept or reject the null hypothesis:› a) for α=0.01› b) for α=0.05› c) for α=0.10

Hours exercised per week

Male 5 10 12

Female 9 8 4

Page 11: Lesson 2 Chi squared

Example 2 - Solution Write the null and alternative

hypotheses› H0 – The number of hours exercised

each week independent on gender › H1 – The number of hours exercised

each week is dependent on gender Calculate chi-square and the p-

valueX 2 Test

X 2 = 4.69 (3sf)p = 0.0959

(3sf)df = 2

Hours exercised per week

Male 5 10 12

Female 9 8 4

• Compare p-value to each signficance level

a) 0.09>0.01, hence accept null hypothesis

b) 0.09>0.05, hence accept null hypothesis

c) 0.09<0.10, hence we reject the null hypothesis

Whilst it is not technically correct to say “accept H0” it is

still accepted in the IB.

Page 12: Lesson 2 Chi squared

Questions

Page 13: Lesson 2 Chi squared

Questions

Page 14: Lesson 2 Chi squared

Questions

Page 15: Lesson 2 Chi squared

The chi-square test formula

This formula is on the IB formula sheet

› fo is the observed frequencies(i.e the raw data)

› fe is the expected frequencies

It is easiest to perform this sum calculation using a table one step at a time.

Exam hint – you are expected to be able to calculate the X 2 test

statistic with your GDC when raw data is given.

You should also be able to perfrom an entire X 2

hypothesis test without your GDC.

* Don’t forget you can check your expected values

using Matrix B.

Page 16: Lesson 2 Chi squared

Chi-square (X 2) test by hand

Remember these steps below can be checked using your GDC, especially the expected values, using Matrix B.

Completing a hypothesis test which uses Chi-square by hand follows a similar process to the previous one except some of the steps are much longer:(1) Make a conjecture (same as before)(2) Write the null hypothesis using “is not related to, or “independent”;

and write the alternative hypothesis using is related to or “dependent” (same as before)(3) Calculate the chi-square test statistic (X 2)(4) Determine reference value called the Critical Value (CV)(5) Compare the two and either accept or reject the null hypothesis

Steps 3 & 4 are much

longer!

Page 17: Lesson 2 Chi squared

Step (3) - Calculating the chi-square (X 2) test statistic

This step has as series of sub-parts:(A) Expand the contingency table to have both row totals, column totals and an overall total. The raw data in the table are called “observed values”.(B) Calculate the “expected values” for each cell in the table

based on the probabilities using the totals of each row and column.(C) Organise the frequencies in to a new table to calculate X 2.

Using the Example 1 from before, below is part A shown.Seat belt usage

Gender Yes No Row total

Female 50 25 75

Male 40 45 85

Col total

90 70 160

Page 18: Lesson 2 Chi squared

Part B (cont) Using the Example 1 from before, below part B is shown.

To calculate the expected frequencies (fe) in each cell we use the formula [Col total] x [ Row total] / [Total sum]

Seat belt usage

Gender Yes No Row total

Female 50 25 75

Male 40 45 85

Col total

90 70 160

These values are the

observed frequencies (fo)

Expected frequencies (fe)

Female 90*75/160 = 42.1875

70*75/160 = 32.81235

Male 90*85/160 = 47.8125

70*85/160 = 37.1875

Remember to check

these with Matrix B

Page 19: Lesson 2 Chi squared

Part C (cont) Using the Example 1 from before, below part C is shown.

fo fe fo-fe (fo-fe)2 (fo-fe)2

fe

50 42.1875 7.8125 61.035 1.4468

25 32.81235 -7.8125 61.035 1.8601

40 47.8125 -7.8125 61.035 1.2765

45 37.1875 7.8125 61.035 1.6413

Sum = 6.22 (3sf)

Don’t round to 3sf during these

calculations!

Page 20: Lesson 2 Chi squared

Step (4) – Determine the Critical Value (CV)

If you had a Million Dollars and you gave $1 away, how much would you say you had? (When does it become significant?)

A critical value is a number which represents the boundary in determining whether a statistic is significant or not. That is it separates the choice to accept or reject the null hypothesis.

If the chi-square test value falls below (less than) of the CV then we accept the null hypothesis (H0)

If the chi squared test value falls to the right (greater than) of the CV then we reject the null hypothesis (H0)

The critical value is found using the distribution table on the IB formula sheet.› Left side column represents degrees of freedom (df) = (#cols-

1)*(#rows-1)› Top row has alpha values and p-values, since the chi-squared test is a

right tail test we will always use the five right columns, and since the iB only uses significance levels of 0.01, 0.05 and 0.1 we will only ever need the corresponding p-values 0.99, 0.95, 0.9

For our example, p=0.99, df=1 hence CV = 6.635

Page 21: Lesson 2 Chi squared

Step (5) – Compare X 2 to CV and make conclusion

X 2 = 6.22 CV = 6.635 Hence X 2 < CV and we will accept the null

hypothesis that there is no relationship between seat belt usage and gender.

Remember : can be done in two ways Step (5) – make a comparison between either

› p-value and the significance levelOR› Chi-square test and the Critical Value

Previously we found p-value > alpha level, since 0.0126 > 0.01 and we accepted the null hypothesis.

Can you spot the

difference?

Page 22: Lesson 2 Chi squared

Understanding the final comparison method

If you are comparing p-value with α-level then if: › p > α accept the null hypothesis› p < α reject the null hypothesis

If you are comparing X 2 with CV then if: › X 2 < CV accept the null hypothesis› X 2 > CV reject the null hypothesis

Page 23: Lesson 2 Chi squared

Questions

H&H 2nd Ed – Exercise 20E.1 a-d (p615)

Worked example 8 (p618)

Exercise 20E.2Exercise 20E.3 (p621)