Chi Square Basics

15
Chi-square Basics

description

Intro to Chisquare

Transcript of Chi Square Basics

  • Chi-square Basics

  • The Chi-square distributionPositively skewed but becomes symmetrical with increasing degrees of freedomMean = k where k = degrees of freedomVariance = 2kAssuming a normally distributed dataset and sampling a single z2 value at a time2(1) = z2If more than one 2(N) =

  • Why used?Chi-square analysis is primarily used to deal with categorical (frequency) dataWe measure the goodness of fit between our observed outcome and the expected outcome for some variableWith two variables, we test in particular whether they are independent of one another using the same basic approach.

  • One-dimensionalSuppose we want to know how people in a particular area will vote in general and go around asking them.

    How will we go about seeing whats really going on?

    RepublicanDemocratOther203010

  • Hypothesis: Dems should win districtSolution: chi-square analysis to determine if our outcome is different from what would be expected if there was no preference

  • Plug in to formula

    RepublicanDemocratOtherObserved203010Expected202020

  • Reject H0The district will probably vote democraticHowever

  • ConclusionNote that all we really can conclude is that our data is different from the expected outcome given a situationAlthough it would appear that the district will vote democratic, really we can only conclude they were not responding by chanceRegardless of the position of the frequencies wed have come up with the same resultIn other words, it is a non-directional test regardless of the prediction

  • More complexWhat do stats kids do with their free time?

    TVNapWorryStare at CeilingMales30402010Females20304010

  • Is there a relationship between gender and what the stats kids do with their free time?

    Expected = (Ri*Cj)/NExample for males TV: (100*50)/200 = 25

    TVNapWorryStare at CeilingTotalMales30402010100Females2030401010050706020200

  • df = (R-1)(C-1)R = number of rowsC = number of columns

    TVNapWorryStare at CeilingTotalMales (E)30 (25)40 (35)20 (30)10 (10)100Females (E)20 (25)30 (35)40 (30)10 (10)10050706020200

  • Interpretation

    Reject H0, there is some relationship between gender and how stats students spend their free time

  • OtherImportant point about the non-directional nature of the test, the chi-square test by itself cannot speak to specific hypotheses about the way the results would come outNot useful for ordinal data because of this

  • AssumptionsNormalityRule of thumb is that we need at least 5 for our expected frequencies valueInclusion of non-occurencesMust include all responses, not just those positive onesIndependenceNot that the variables are independent or related (thats what the test can be used for), but rather as with our t-tests, the observations (data points) dont have any bearing on one another.To help with the last two, make sure that your N equals the total number of people who responded

  • Measures of AssociationContingency coefficientPhiCramers PhiOdds RatiosKappa

    These were discussed in 5700