Chi Square

50
Chapter 18 Andy field text Categorical Data

Transcript of Chi Square

Page 1: Chi Square

Chapter 18 Andy field text

Categorical Data

Page 2: Chi Square

Slide 2

OutlineStatistical• Categorical Data• Contingency Tables• Chi-Square test• Likelihood Ratio Statistic• Odds Ratio

Diagnostic• Sensitivity • Specificity

Likelihood ratios in diagnostic testing.

Page 3: Chi Square

Slide 3

Categorical DataNumericalValues or observations that can be measured. And these numbers can be placed in ascending or descending order. Examples: Height, Arm Span and Weight.Scatter plots and line graphs are used to graph numerical data.

Sometimes we have data consisting of the frequency of cases falling into unique categoriesCategoricalValues or observations that can be sorted into groups or categories.Examples: Sex, Eye colour and Favourite colour.Bar charts and pie graphs are used to graph categorical data.

Examples:• Number of people voting for different politicians• Numbers of students who pass or fail their degree in different subject areas.• Number of patients or waiting list controls who are ‘free from diagnosis’ (or

not) following a treatment.

Page 4: Chi Square

An Example: Dancing CatsAnalyzing two or more categorical variables• The mean of a categorical variable is meaningless

• The numeric values you attach to different categories are arbitrary• The mean of those numeric values will depend on how many members each category has.

• Therefore, we analyze frequencies.An example• Can animals be trained to line-dance with different rewards?• Participants: 200 cats• Training

• The animal was trained using either food or affection, not both.• Dance

• The animal either learnt to line-dance or it did not.https://www.youtube.com/watch?v=pLCOTpjBGcs

• Outcome:• The number of animals (frequency) that could dance or not in each reward condition.

• We can tabulate these frequencies in a contingency table.

Page 5: Chi Square

A Contingency Table

Page 6: Chi Square

Chi SquaredΧ2

Page 7: Chi Square

Pearson’s Chi-Square TestUse to see whether there’s a relationship between two categorical variables• Compares the frequencies you observe in certain categories to the

frequencies you might expect to get in those categories by chance.The equation:

• i represents the rows in the contingency table and j represents the columns.

• The observed data are the frequencies the contingency table

ij

ijij

ModelModelObserved 2

2 -

Page 8: Chi Square

Pearson’s Chi-Square Test The ‘Model’ is based on ‘expected frequencies’.• Calculated for each of the cells in the contingency table.• n is the total number of observations (in this case 200).

nE ji

ijijTotalColumn Total RowModel

Page 9: Chi Square

Pearson’s Chi-Square Test Test Statistic• Checked against a distribution

with (r − 1)(c − 1) degrees of freedom.

• If significant then there is a significant association between the categorical variables in the population.

• The test distribution is approximate so in small samples use Fisher’s exact test.

Page 10: Chi Square

https://www.youtube.com/watch?v=WXPBoFDqNVkFirst 8mins

Video

Page 11: Chi Square

Pearson’s Chi-Square Test

44.100200162124CTRTModel

56.6120016276CTRTModel

56.2320038124CTRTModel

44.142003876CTRTModel

AffectionNoNo Affection,

AffectionYesYes Affection,

FoodNoNo Food,

FoodYesYes Food,

n

n

n

n

Page 12: Chi Square

Interpretation of Pearson Chi SquareThere was a significant association between the type of training and whether or not cats would dance χ2(1) = 25.36, p < .001.

Page 13: Chi Square

Yi = (b0 + b1X1i) + ErrorI

Yi = (b0 + b1Trainingi+ b2Dancej ) + Errorij

You can use dummy variables (zeros and ones) to code your training and dance groups.Substitute the zeros and ones into the equation and you will get your 4 conditions. See page 727 of text.

Chi Square as a regression

Page 14: Chi Square

An alternative to Pearson’s chi-square

Preferred when the samples are small

Likelihood Ratio Statistic

Page 15: Chi Square

Likelihood Ratio StatisticBased on maximum-likelihood theory.• Create a model for which the probability of obtaining the observed set of

data is maximized• This model is compared to the probability of obtaining those data under

the null hypothesis• The resulting statistic compares observed frequencies with those predicted

by the model:• i and j are the rows and columns of the contingency table and ln is the

natural logarithm

Test Statistic• Has a chi-square distribution with (r − 1)(c − 1) degrees of freedom.• Preferred to the Pearson’s chi-square when samples are small.

ij

ijij Model

ObservedObservedL ln 2 2

Page 16: Chi Square

Likelihood Ratio Statistic

94.2444.1494.1157.854.182

127.0114249.048857.010662.0282100.44114ln11461.56

48ln4823.5610ln1014.44

28ln2822

L

Page 17: Chi Square

Interpreting Chi-SquareThe test statistic gives an ‘overall’ result.We can break this result down ratiosEffect Size• The odds ratio can be used as an effect size measure.

Page 18: Chi Square

Important PointsThe chi-square test has two important assumptions:• Independence:

• Each person, item or entity contributes to only one cell of the contingency table.

• The expected frequencies should be greater than 5.• In larger contingency tables up to 20% of expected frequencies can

be below 5, but there is a loss of statistical power.• Even in larger contingency tables no expected frequencies should be

below 1.• If you find yourself in this situation consider using Fisher’s exact test.

Proportionately small differences in cell frequencies can result in statistically significant associations between variables if the sample is large enough• Look at row and column percentages or ratios to interpret effects.

Page 19: Chi Square

General Procedure for analysing categorical outcomes

Page 20: Chi Square

SPSS

Page 21: Chi Square

https://www.youtube.com/watch?v=532QXt1PM-QOr link fromhttp://www.statisticshell.com/html/limbo.html

How to do Chi Squared in SPSS

Page 22: Chi Square

Chi-Square in SPSS: Weighting Cases

Page 23: Chi Square
Page 24: Chi Square

Output

Page 25: Chi Square

Output

Page 26: Chi Square

The Odds Ratio

8.21028

dance tdidn' but food had that Numberdanced and food had that NumberOdds food afterdancing

421.011448

dance tdidn' butaffection had that Numberdanced andaffection had that NumberOdds affection afterdancing

65.6421.08.2

OddsOddsRatio Odds

affection afterdancing

food afterdancing

Page 27: Chi Square

InterpretationThere was a significant association between the type of training and whether or not cats would dance χ2(1) = 25.36, p < .001. Based on the odds ratio, the odds of cats dancing were 6.65 times higher if they were trained with food than if trained with affection.

Page 28: Chi Square

Slide 28

To Sum Up …We approach categorical data in much the same way as any other kind of data:• we fit a model, we calculate the deviation between our model and the

observed data, and we use that to evaluate the model we’ve fitted.• We fit a linear model.

Two categorical variables• Pearson’s chi-square test• Likelihood ratio test

Effect Sizes• The odds ratio is a useful measure of the size of effect for categorical

data.

Page 30: Chi Square

https://www.youtube.com/watch?v=wXQUhX89vtQ

Dancing cats

Page 31: Chi Square

31

Credit to Steve Simon at P. Mean consulting

www.pmean.com

http://www.pmean.com/webinars/20100217/Sensitivity

Sensitivity. Specificity.…& Likelihood ratios in diagnostic testing.

Page 32: Chi Square
Page 33: Chi Square

33

Contingency Table

“True” Dry Eye Status (The disease)Dry Eye No Dry eye

InflammaDry Status (The test)

Positive 127 2

Negative 30 78

Questions about detecting the diseaseCan the test accurately detect the ‘diseased’ people?Sensitivity (Sn) (127/157 = 81%)Can the test accurately detect ‘well’ people?Specificity (Sp) (78/80 = 98%)

Questions about the test resultIf your patient’s test comes back positive, what’s the chance it’s correct?Positive predictive value (PPV) (127/129 = 98%)If your patient’s test comes back negative, what’s the chance it’s correct?Negative predictive value (NPV) (78/108 = 72%)

Accuracy 205/237 = 86%

Page 34: Chi Square

34

The sensitivity (Sn) of a test is the probability that the test is positive when given to a group of patients with the disease. Notice that the denominator for sensitivity is the number of patients who have the disease. A large sensitivity means that a negative test can rule out the disease.

Sensitivity

Page 35: Chi Square

35

The specificity of a test is the probability that the test will be negative among patients who do not have the disease. Notice that the denominator for specificity is the number of healthy patients.A large specificity means that a positive test can rule in the disease.

Specificity

Page 36: Chi Square

36

The positive predictive value (PPV) of a test is the probability that the patient has the disease when restricted to those patients who test positive.

PPV

Page 37: Chi Square

37

The negative predictive value (NPV) of a test is the probability that the patient will not have the disease when restricted to those patients who test negative.

NPV

Page 38: Chi Square

Likelihood ratio test statistic• Compare the fit of two models, (the null model vs

alternative).• The test is based on the likelihood ratio LR.• Use LR to compute a p-value, or compared to a critical

value. Then decide whether to reject the null model in favour of the alternative model.

Likelihood ratios in diagnostic testing• Assess the value of performing a diagnostic test. • Use the sensitivity and specificity of the test to determine

whether a test result usefully changes the probability that a condition (such as a disease state) exists.

There are 2 types of Likelihood ratios

Page 39: Chi Square

39

• Summarizes information about the diagnostic test itself.

• Combines information about the sensitivity and specificity.

• They tell you how much a positive or negative result changes the likelihood that a patient would have the disease.

Likelihood ratios in diagnostic testing

Page 40: Chi Square

40

•Likelihood ratio positive (LR+) (sensitivity/(1-specificity)) = 32

•Likelihood ratio negative (LR-) ((1-sensitivity)/specificity) = 0.19

Likelihood ratios in diagnostic testing continued

Page 41: Chi Square

41

General rules. Positive likelihood ratio:

Anything less than 2 is worthless. A good likelihood ratio should be 10 or higher. Anything bigger than 50 represents an excellent diagnostic test.

Negative likelihood ratio: The corresponding boundaries are 0.5 (1/2), 0.1 (1/10), and 0.02 (1/50).

Values for a likelihood ratios

Page 42: Chi Square

42

You combine the likelihood ratio with information about: • The prevalence of the disease,• Characteristics of your patient pool• Information about this particular patient • To determine the post-test odds of

disease.

What is the point of LR?

Page 43: Chi Square

43

The patient is a boy with no special risk factors. The diagnostic test is positive.

What can we say about the chances that this boy will

develop hip dysplasia?

The prevalence of this condition is 1.5% in boys.This corresponds to an odds of one to 66. 1/66 = 1.5%Multiply the odds by the likelihood ratio, you get 6.6 to 66 or roughly 1 to 10. The post test odds of having the disease is 1 to 10 which corresponds to a probability of 9%.

Hip Dysplasia Example

Page 44: Chi Square
Page 45: Chi Square

45

Suppose we had a negative result, but it was with a boy who had a family history of hip dysplasia. Suppose the family history would change the pre-test probability to 25%.

How likely is hip dysplasia, factoring in both the family history and the negative

test result?

A probability of 25% corresponds to an odds of 1 to 3. The likelihood ratio for a negative result is 0.09 or 1/11. So the post-test odds would be roughly 1 to 33, which corresponds to a probability of 3%.

Hip Dysplasia Example continued

Page 46: Chi Square
Page 47: Chi Square

Confidence Interval (CI) (Table 4).Boundaries within which we believe the population will fall.95% of the time the confidence interval will genuinely contain the population mean.By comparing the CIs or different means we can get some idea about whether the means came form the same or different populations.

47

Page 48: Chi Square

AcknowledgementFor further explanation Steve Simonhttp://www.pmean.com/

http://www.pmean.com/webinars/20100217/Sensitivity

48

Page 49: Chi Square

See: http://www.pmean.com/webinars/20100217/Sensitivity

Page 50: Chi Square