Chi Square Test

14
Chi-Square Test Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8 males, then you might want to know about the "goodness to fit" between the observed and expected. Were the deviations (differences between observed and expected) the result of chance, or were they due to other factors. How much deviation can occur before you, the investigator, must conclude that something other than chance is at work, causing the observed to differ from the expected. The chi-square test is always testing what scientists call the null hypothesis, which states that there is no significant difference between the expected and observed result. The formula for calculating chi-square ( 2 ) is: 2 = (o-e) 2 /e That is, chi-square is the sum of the squared difference between observed (o) and the expected (e) data (or the deviation, d), divided by the expected data in all possible categories. For example, suppose that a cross between two pea plants yields a population of 880 plants, 639 with green seeds and 241 with yellow seeds. You are asked to propose the genotypes of the parents. Yourhypothesis is that the allele for green is dominant to the allele for yellow and that the parent plants were both heterozygous for this trait. If your hypothesis is true, then the predicted ratio of offspring from this cross would be 3:1 (based on Mendel's

description

chi square test

Transcript of Chi Square Test

Page 1: Chi Square Test

Chi-Square Test 

Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8 males, then you might want to know about the "goodness to fit" between the observed and expected. Were the deviations (differences between observed and expected) the result of chance, or were they due to other factors. How much deviation can occur before you, the investigator, must conclude that something other than chance is at work, causing the observed to differ from the expected. The chi-square test is always testing what scientists call the null hypothesis, which states that there is no significant difference between the expected and observed result. 

The formula for calculating chi-square (  2) is:

2=   (o-e)2/e

That is, chi-square is the sum of the squared difference between observed (o) and the expected (e) data (or the deviation, d), divided by the expected data in all possible categories. 

For example, suppose that a cross between two pea plants yields a population of 880 plants, 639 with green seeds and 241 with yellow seeds. You are asked to propose the genotypes of the parents. Yourhypothesis is that the allele for green is dominant to the allele for yellow and that the parent plants were both heterozygous for this trait. If your hypothesis is true, then the predicted ratio of offspring from this cross would be 3:1 (based on Mendel's laws) as predicted from the results of the Punnett square (Figure B. 1).

Figure B.1 - Punnett Square. Predicted offspring from cross between green and yellow-seeded plants. Green (G) is dominant (3/4 green; 1/4 yellow).

Page 2: Chi Square Test

To calculate  2 , first determine the number expected in each category. If the ratio is 3:1 and the total number of observed individuals is 880, then the expected numerical valuesshould be 660 green and 220 yellow. 

Chi-square requires that you use numerical values, not percentages or ratios.

Then calculate  2 using this formula, as shown in Table B.1. Note that we get a value of 2.668 for  2. But what does this number mean? Here's how to interpret the  2 value:

1. Determine degrees of freedom (df). Degrees of freedom can be calculated as the number of categories in the problem minus 1. In our example, there are two categories (green and yellow); therefore, there is I degree of freedom.

2. Determine a relative standard to serve as the basis for accepting or rejecting the hypothesis. The relative standard commonly used in biological research is p > 0.05. The p value is the probability that the deviation of the observed from that expected is due to chance alone (no other forces acting). In this case, using p > 0.05, you would expect any deviation to be due to chance alone 5% of the time or less.

3. Refer to a chi-square distribution table (Table B.2). Using the appropriate degrees of 'freedom, locate the value closest to your calculated chi-square in the table. Determine the closestp (probability) value associated with your chi-square and degrees of freedom. In this case ( 2=2.668), the p value is about 0.10, which means that there is a 10% probability that any deviation from expected results is due to chance only. Based on our standard p > 0.05, this is within the range of acceptable deviation. In terms of your hypothesis for this example, the observed chi-squareis not significantly different from expected. The observed numbers are consistent with those expected under Mendel's law.

Step-by-Step Procedure for Testing Your Hypothesis and Calculating Chi-Square

1. State the hypothesis being tested and the predicted results. Gather the data by conducting the proper experiment (or, if working genetics problems, use the data provided in the problem).

2. Determine the expected numbers for each observational class. Remember to use numbers, not percentages. 

Page 3: Chi Square Test

Chi-square should not be calculated if the expected value in any category is less than 5.

3. Calculate  2 using the formula. Complete all calculations to three significant digits. Round off your answer to two significant digits.

4. Use the chi-square distribution table to determine significance of the value.

a. Determine degrees of freedom and locate the value in the appropriate column.b. Locate the value closest to your calculated  2 on that degrees of

freedom df row.c. Move up the column to determine the p value.

5. State your conclusion in terms of your hypothesis.

a. If the p value for the calculated  2 is p > 0.05, accept your hypothesis. 'The deviation is small enough that chance alone accounts for it. A p value of 0.6, for example, means that there is a 60% probability that any deviation from expected is due to chance only. This is within the range of acceptable deviation.

b. If the p value for the calculated  2 is p < 0.05, reject your hypothesis, and conclude that some factor other than chance is operating for the deviation to be so great. For example, a p value of 0.01 means that there is only a 1% chance that this deviation is due to chance alone. Therefore, other factors must be involved.

The chi-square test will be used to test for the "goodness to fit" between observed and expected data from several laboratory investigations in this lab manual. 

Table B.1Calculating Chi-Square

Green YellowObserved (o) 639 241Expected (e) 660 220Deviation (o - e) -21 21Deviation2 (d2) 441 441d2/e 0.668 2

2 =  d2/e = 2.668 . .

Page 4: Chi Square Test

Table B.2Chi-Square Distribution

Degrees of

Freedom

(df)Probability (p)

0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.0011 0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.832 0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 9.21 13.823 0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.82 11.34 16.274 0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.28 18.475 1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.526 1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.467 2.17 2.83 3.82 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.328 2.73 3.49 4.59 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.129 3.32 4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.8810 3.94 4.86 6.18 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59

Nonsignificant Significant

Source: R.A. Fisher and F. Yates, Statistical Tables for Biological Agricultural and Medical Research, 6th ed., Table IV, Oliver & Boyd, Ltd., Edinburgh, by permission of the authors and publishers.

Main Page | Introduction and Objectives | Scientific Investigation | Experimental Procedures | Writing Procedures | Mendelian Inheritance |  Monohybrid and Dihybrid

Exercises | Reference | Miscellaneous |Scientific Writing | Chi-Square Test | Graphing Techniques

Page 5: Chi Square Test

Chi-square testFrom Wikipedia, the free encyclopedia

This article includes a list of references, but its sources remain unclear because it has insufficient inline citations. Please help to improve this article by introducing more precise citations. (November 2014)

Chi-square distribution, showing X2 on the x-axis and P-value on the y-axis.

A chi-square test, also referred to as   test (infrequently as the chi-squared test), is any statistical hypothesis test in which the sampling distribution of the test statistic is a chi-square distribution when the null hypothesis is true. Also considered a chi-square test is a test in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-square distribution as closely as desired by making the sample size large enough. The chi-square (I) test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. Do the number of individuals or objects that fall in each category differ significantly from the number you would expect? Is this difference between the expected and observed due to sampling variation, or is it a real difference?

Contents  [hide] 

1 Examples of chi-square testso 1.1 Pearson's chi-square testo 1.2 Yates's correction for continuityo 1.3 Other chi-square tests

2 Exact chi-square distribution 3 Chi-square test requirements 4 Chi-square test for variance in a normal population 5 Chi-square test for independence and homogeneity in tables 6 See also 7 References

Examples of chi-square tests[edit]

Page 6: Chi Square Test

The following are examples of chi-square tests where the chi-square distribution is approximately valid:

Pearson's chi-square test[edit]Main article: Pearson's chi-square test

Pearson's chi-square test, also known as the chi-square goodness-of-fit test or chi-square test for independence. When the chi-square test is mentioned without any modifiers or without other

precluding context, this test is often meant (for an exact test used in place of  , see Fisher's exact test).

Yates's correction for continuity[edit]Main article: Yates's correction for continuity

Using the chi-square distribution to interpret Pearson's chi-square statistic requires one to assume that the discrete probability of observed binomial frequencies in the table can be approximated by the continuous chi-square distribution. This assumption is not quite correct, and introduces some error.

To reduce the error in approximation, Frank Yates, an English statistician, suggested a correction for continuity that adjusts the formula for Pearson's chi-square test by subtracting 0.5 from the difference between each observed value and its expected value in a 2 × 2 contingency table.[1] This reduces the chi-square value obtained and thus increases its p-value.

Other chi-square tests[edit]

Cochran–Mantel–Haenszel chi-square test . McNemar's test , used in certain 2 × 2 tables with pairing Tukey's test of additivity The portmanteau test in time-series analysis, testing for the

presence of autocorrelation Likelihood-ratio tests  in general statistical modelling, for testing

whether there is evidence of the need to move from a simple model to a more complicated one (where the simple model is nested within the complicated one).

Exact chi-square distribution[edit]

One case where the distribution of the test statistic is an exact chi-square distribution is the test that the variance of a normally distributed population has a given value based on a sample variance. Such a test is uncommon in practice because values of variances to test against are seldom known exactly.

Chi-square test requirements[edit]

1. Quantitative data.2. One or more categories.3. Independent observations.4. Adequate sample size (at least 10).5. Simple random sample.6. Data in frequency form.7. All observations must be used.

Page 7: Chi Square Test

Chi-square test for variance in a normal population[edit]

If a sample of size n is taken from a population having a normal distribution, then there is a result (see distribution of the sample variance) which allows a test to be made of whether the variance of the population has a pre-determined value. For example, a manufacturing process might have been in stable condition for a long period, allowing a value for the variance to be determined essentially without error. Suppose that a variant of the process is being tested, giving rise to a small sample of n product items whose variation is to be tested. The test statistic T in this instance could be set to be the sum of squares about the sample mean, divided by the nominal value for the variance (i.e. the value to be tested as holding). Then T has a chi-square distribution with n − 1 degrees of freedom. For example if the sample size is 21, the acceptance region for T for a significance level of 5% is the interval 9.59 to 34.17.

Chi-square test for independence and homogeneity in tables[edit]

Suppose a random sample of 650 of the 1 million residents of a city is taken, in which every resident of each of four neighborhoods, A, B, C, and D, is equally likely to be chosen. A null hypothesis says the randomly chosen person's neighborhood of residence is independent of the person's occupational classification, which is either "blue collar", "white collar", or "service". The data are tabulated:

Let us take the sample proportion living in neighborhood A, 150/650, to estimate what proportion of the whole 1 million people live in neighborhood A. Similarly we take 349/650 to estimate what proportion of the 1 million people are blue-collar workers. Then the null hypothesis independence tells us that we should "expect" the number of blue-collar workers in neighborhood A to be

Then in that "cell" of the table, we have

The sum of these quantities over all of the cells is the test statistic. Under the null hypothesis, it has approximately a chi-square distribution whose number of degrees of freedom is

Page 8: Chi Square Test

If the test statistic is improbably large according to that chi-square distribution, then one rejects the null hypothesis of independence.

A related issue is a test of homogeneity. Suppose that instead of giving every resident of each of the four neighborhoods an equal chance of inclusion in the sample, we decide in advance how many residents of each neighborhood to include. Then each resident has the same chance of being chosen as do all residents of the same neighborhood, but residents of different neighborhoods would have different probabilities of being chosen if the four sample sizes are not proportional to the populations of the four neighborhoods. In such a case, we would be testing "homogeneity" rather than "independence". The question is whether the proportions of blue-collar, white-collar, and service workers in the four neighborhoods are the same. However, the test is done in the same way.

See also[edit]

Statistics portal

Chi-square test nomogram G -test Minimum chi-square estimation The Wald test can be evaluated against a chi-

square distribution.

References[edit]

1. Jump up^ Yates, F (1934). "Contingency table involving small numbers and the χ2 test". Supplement to the Journal of the Royal Statistical Society 1(2): 217–235. JSTOR 2983604

Weisstein, Eric W. , Test.html "Chi-Square Test", MathWorld.

Corder, G.W. & Foreman, D.I. (2014). Nonparametric Statistics: A Step-by-Step Approach. Wiley, New York. ISBN 978-1118840313

Greenwood, P.E., Nikulin, M.S. (1996) A guide to chi-square testing. Wiley, New York. ISBN 0-471-55779-X

Nikulin, M.S. (1973). "Chi-square test for normality". In: Proceedings of the International

Page 9: Chi Square Test

Vilnius Conference on Probability Theory and Mathematical Statistics, v.2, pp. 119–122.

Bagdonavicius, V., Nikulin, M.S. (2011) "Chi-square goodness-of-fit test for right censored data". The International Journal of Applied Mathematics and Statistics, p. 30-50.[full citation needed]

[hide]

V

T

E

Statistics

[show]

Descriptive statistics

[show]

Data collection

[show]

Statistical inference

[show] Correlation

Regression analysis

[show]

Categorical / Multivariate / Time-series

[show]

Applications

Category Portal Outline Index

Categories:  Statistical tests Non-parametric statistics Categorical data

Navigation menu Create account

Log in

Article Talk

Page 10: Chi Square Test

Read Edit View history

Main page Contents Featured content Current events Random article Donate to Wikipedia Wikimedia Shop

Interaction Help About Wikipedia Community portal Recent changes Contact page Tools What links here Related changes Upload file Special pages Permanent link Page information Wikidata item Cite this page Print/export Create a book Download as PDF Printable version Languages العربية Dansk Deutsch Español Euskara فارسی Gaeilge 한국어 हि�न्दी Italiano עברית Latviešu Magyar Nederlands 日本語 Polski Português

Go

Page 11: Chi Square Test

Русский Simple English Basa Sunda Suomi Türkçe Українська Tiếng Việt 中文

Edit links

This page was last modified on 8 December 2014 at 04:09.

Text is available under the Creative Commons Attribution-ShareAlike

License; additional terms may apply. By using this site, you agree to

the Terms of Use and Privacy Policy. Wikipedia® is a registered

trademark of the Wikimedia Foundation, Inc., a non-profit

organization.

Privacy policy

About Wikipedia

Disclaimers

Contact Wikipedia

Developers

Mobile view