315H Handout ChiSquare

download 315H Handout ChiSquare

of 3

Transcript of 315H Handout ChiSquare

  • 8/4/2019 315H Handout ChiSquare

    1/3

    Bio 315H Mendelian Genetics and Chi-Square Test (Aug 31 andfollowing)

    In the study of transmission genetics we use Punnett squares to predict the possiblegenotypes and phenotypes of offspring based on known information about the parentindividuals. Each Punnett square represents one cross (the products of one mating) andgives an expected ratio of offspring phenotypes. Geneticists use the ratio of

    phenotypes produced by a mating to infer the genotype of the parents. However, insome cases there is more than one possible explanation of the offspring phenotyperesults. How do scientists use the concept of probability to choose between alternativehypotheses? Learn how to apply the Chi-square test to a particular hypothesis.

    Probabilities, Null Hypotheses, and the Chi-Square ( 2) Test (see Genetics,section 3.4)

    Many predictions are stochastic (= probabilistic), such as the outcome of flipping a coinor dealing cards from a shuffled deck. Transmission genetics is heavily influenced bystochastic factors. A diploid organism generates haploid gametes, and if that organismis heterozygous then for any given genetic locus it will place one allele in some of itsgametes and the second allele in the remaining gametes. Which allele is actually

    passed along to its offspring is a matter of chance. A Punnett Square is a quantitativeprediction of possible offspring genotypes from a specific parental cross; think of it as apattern or model of how offspring inherit traits. Mendels expected ratios of offspringtraits depend on the assumptions that alleles segregate normally, that there isindependent assortment of traits, and that random fertilization occurs. All of these areinfluenced by chance and random events. When we look at the actual products of agenetic cross (actual offspring phenotypes), what can we conclude with confidence?Are the results exactlythe same as predicted by our genetic model (the Punnettsquare)? If not, are they close enough? (Hey! What is close enough anyway?)

    In statistics we do not "prove" a hypothesis; rather, we discount or disprove the "nullhypothesis". To address genetics questions statistically, the first step is to construct anull hypothesis (null= having no real value, zero). The null hypothesis proposes thatthere is no real relationship between results and hypothesis; any deviation of observedresults from the predicted outcome is small and due to chance alone. In the coinflipping example, the null hypothesis would be something like this: When a coin isflipped repeatedly, there is no difference between the number of heads results andthe number of tails results. In a test of the Punnett square genetic model, the nullhypothesis would be: "In a genetic cross, there is no difference between the ratio ofoffspring phenotypes expected in the cross by our genetic model and the actualobserved ratio of offspring phenotypes. Any difference between observed and expectedis negligible and must be due to chance alone.

    Using the Chi-Square test:The Chi Square test is a quantitative examination of how well the observed data fit the expected

    data (i.e., the results expectedfrom a particular genetic model). When the value of 2 is low,the probability calculated from the chi-square test is high (greater than p=0.05). In thatcase, we assume that chance alone produced the difference (that is, the null hypothesisis true; we cannot disprove the null hypothesis). When the value of 2 is high (above acertain threshold), the probability calculated from the chi-square test is low (less thanp=0.05). In this case, there is so much difference between observed and expected datathat chance alone cannot explain it.

  • 8/4/2019 315H Handout ChiSquare

    2/3

    For example, say that a cross of two short-haired cats yields 8 kittens, 4 short-haired kittens

    and 4 long-haired kittens. Fur length is controlled by a single genetic locus with the short-hair

    allele (H) being dominant to the long-hair allele (h). To produce these kittens, both of these

    short-haired parents would have to be heterozygotes (Hh), and their offspring should

    therefore exhibit a 3:1 ratio of short hair-to-long hair. We apply the 2 Test, with the Null

    Hypothesis being that the difference between 3:1 (predicted ratio) and 4:4 (observed numbers)

    is not significant, that is, it is due to chance alone. Once the Null Hypothesis is formulated, we

    compare the observed results to the predicted results and calculates a statistical parametercalled 2. Calculation of this parameter is straightforward:

    2= (Number observed - Number expected)2 2= (4-6)2 + (4-2)2 =2.667

    Number expected 6 2

    For each class of predicted results, just square the difference between the number of observed

    cases and the number of expected cases, and divide that value by the number of expected

    cases. It is important to use the actual number of cases, not percentages. The value of 2 is the

    sum (represented by the symbol ' ') of the number calculated for all of the different classes of

    predicted result.

    In the cross of two heterozygous short-haired cats, we expect to see a 3:1 ratio of short:longhair in the 8 kittens. Thus we expect 3/4 X 8 = 6 short-haired kittens and 1/4 X 8 = 2 long-

    haired kittens. In calculating chi-square it is helpful to use this chart:

    If observed and expected numbers are a perfect match, then 2 = 0. But if there is disparity

    between the predicted numbers and the observed numbers then 2 will be > 0, and 2 will

    increase in value as the amount of disparity increases. If the disparity is large enough, we candiscount the null hypothesis. That is, the disparity is too high to be explained by chance alone.

    Probability (p)

    0.90 0.50 0.20 0.10 0.05 0.01 0.001

    df 1 0.02 0.46 1.64 2.71 3.84 6.64 10.83

    df 2 0.21 1.39 3.22 4.60 5.99 9.21 13.82

    df 3 0.58 2.37 4.64 6.25 7.82 11.35 16.27

    df 4 1.06 3.36 5.99 7.78 9.49 13.28 18.47

    df 5 1.61 4.35 7.29 9.24 11.07 15.09 20.52

    df 6 2.20 5.35 8.56 10.64 12.59 16.81 22.46

    df 7 2.83 6.35 9.80 12.02 14.07 18.48 24.32

    df 8 3.49 7.34 11.03 13.36 15.51 20.09 26.13df 9 4.17 8.34 12.24 14.68 16.92 21.67 27.88

    df 10 4.87 9.34 13.44 15.99 18.31 23.21 29.59

    df 15 8.55 14.34 19.31 22.31 25.00 30.58 37.30

    df 25 16.47 24.34 30.68 34.38 37.65 44.31 52.62

    This Table allows one to use the value of 2 to calculate a probabilityp. This is the probability

    that the Null Hypothesis would have yielded the observed difference (or greater) by chance

    alone. To use this Table, you need to have calculated a value of 2 as shown above, and to

    know the number of 'degrees of freedom' (df = the number of outcome classes minus 1).

    Hypothesized Ratio: o e d d2 d2/e

    3/4

    1/4

    Total cases:

  • 8/4/2019 315H Handout ChiSquare

    3/3

    For the cross of the short-haired cats, there is only 1 degree of freedom (two possible outcomes

    minus 1) and 2 = 2.67. The table of 2 values indicates that under these conditionsp is

    between 0.05 and 0.20. This means that ifthe Null Hypothesis were correct, there would be

    about a 10% chance of encountering the observed deviation from the expected 3:1 ratio by

    chance alone. Thus, the Null Hypothesis cannotbe discounted or rejected, given the information

    from just this one litter. So, the observed values do not differ significantly from the expected

    values, so our genetic hypothesis (a heterozygote cross) fits the data.