RELIABILITY OF DISEASE CLASSIFICATION

27
RELIABILITY OF DISEASE CLASSIFICATION Nigel Paneth

description

RELIABILITY OF DISEASE CLASSIFICATION. Nigel Paneth. TERMINOLOGY. Reliability is analogous to precision Validity is analogous to accuracy Reliability is how well an observer classifies the same individual under different circumstances. - PowerPoint PPT Presentation

Transcript of RELIABILITY OF DISEASE CLASSIFICATION

Page 1: RELIABILITY OF DISEASE CLASSIFICATION

RELIABILITY OF DISEASE CLASSIFICATION

Nigel Paneth

Page 2: RELIABILITY OF DISEASE CLASSIFICATION

TERMINOLOGY

Reliability is analogous to precision

Validity is analogous to accuracy

Reliability is how well an observer classifies the same individual under different circumstances.Validity is how well a given test reflects another test of known greater accuracy.

Page 3: RELIABILITY OF DISEASE CLASSIFICATION

RELIABILITY AND VALIDITYReliability includes:• assessments of the same observer at different times - INTRA-OBSERVER RELIABILITY• assessments of different observers at the same time - INTER-OBSERVER RELIABILITY

Reliability assumes that all tests or observers are equal; Validity assumes that there is a gold standard to which a test or observer should be compared.

Page 4: RELIABILITY OF DISEASE CLASSIFICATION

ASSESSING RELABILITY

How do we assess reliability?

One way is to look simply at percent agreement.

Percent agreement is the proportion of all diagnoses classified the same way by two observers.

Page 5: RELIABILITY OF DISEASE CLASSIFICATION

EXAMPLE OF PERCENT AGREEMENT

Two physicians are each given a set of 100 X-rays to look at independently and asked to judge whether pneumonia is present or absent. When both sets of diagnoses are tallied, it is found that 95% of the diagnoses are the same.

Page 6: RELIABILITY OF DISEASE CLASSIFICATION

IS PERCENT AGREEMENT GOOD ENOUGH?

Do these two physicians exhibit high diagnostic reliability?

Can there be 95% agreement between two observers without really having good reliablity?

Page 7: RELIABILITY OF DISEASE CLASSIFICATION

Compare the two tables below:

Table 1 Table 2

MD#1

Yes No

MD#2

Yes 1 3

No 2 94

MD#1

Yes No

MD#2

Yes 43 3

No 2 52

In both instances, the physicians agree 95% of the time. Are the two physicians equally reliable in the two tables?

MD#1

Yes No

MD#2

Yes 43 3

No 2 52

Page 8: RELIABILITY OF DISEASE CLASSIFICATION

• What is the essential difference between the two tables?

• The problem arises from the ease of agreement on common events (e.g. not having pneumonia in the first table).

• So a measure of agreement should take into account the “ease” of agreement due to chance alone.

Page 9: RELIABILITY OF DISEASE CLASSIFICATION

USE OF THE KAPPA STATISTIC TO ASSESS

RELIABILITY

Kappa is a widely used test of inter or intra-observer agreement (or reliability) which corrects for chance agreement.

Page 10: RELIABILITY OF DISEASE CLASSIFICATION

KAPPA VARIES FROM + 1 to - 1+ 1 means that the two observers are perfectly

reliable. They classify everyone exactly the same way.

0 means there is no relationship at all between the two observer’s classifications, above the agreement that would be expected by chance.

- 1 means the two observers classify exactly the opposite of each other. If one observer says yes, the other always says no.

Page 11: RELIABILITY OF DISEASE CLASSIFICATION

GUIDE TO USE OF KAPPAS IN EPIDEMIOLOGY AND MEDICINE

Kappa > .80 is considered excellent

Kappa .60 - .80 is considered good

Kappa .40 - .60 is considered fair

Kappa < .40 is considered poor

Page 12: RELIABILITY OF DISEASE CLASSIFICATION

1st WAY TO CALCULATE KAPPA

1. Calculate observed agreement (cells in which the observers agree/total cells). In both table 1 and table 2 it is 95%

 

2. Calculate expected agreement (chance agreement) based on the marginal totals

Page 13: RELIABILITY OF DISEASE CLASSIFICATION

Table 1’s marginal totals are:

OBSERVED

MD#1

Yes No

MD#2

Yes 1 3 4

No 2 94 96

3 97 100

Page 14: RELIABILITY OF DISEASE CLASSIFICATION

• How do we calculate the N expected by chance in each cell?

• We assume that each cell should reflect the marginal distributions, i.e. the proportion of yes and no answers should be the same within the four-fold table as in the marginal totals.

OBSERVED MD #1

Yes No

MD#2 Yes 1 3 4

No 2 94 96

3 97 100

EXPECTED MD #1

Yes No

MD#2 Yes 4

No 96

3 97 100

Page 15: RELIABILITY OF DISEASE CLASSIFICATION

To do this, we find the proportion of answers in either the column (3% and 97%, yes and no respectively for MD #1) or row (4% and 96% yes and no respectively for MD #2) marginal totals, and apply one of the two proportions to the other marginal total. For example, 96% of the row totals are in the “No” category. Therefore, by chance 96% of MD #1’s “No’s” should also be in the “No” column. 96% of 97 is 93.12.

EXPECTED

MD#1

Yes No

MD#2 Yes 4

No 93.12 96

3 97 100

Page 16: RELIABILITY OF DISEASE CLASSIFICATION

By subtraction, all other cells fill in automatically, and each yes/no distribution reflects the marginal distribution. Any cell could have been used to make the calculation, because once one cell is specified in a 2x2 table with fixed marginal distributions, all other cells are also specified.

EXPECTED MD #1

Yes No

MD#2 Yes 0.12 3.88 4

No 2.88 93.12 96

3 97 100

Page 17: RELIABILITY OF DISEASE CLASSIFICATION

Now you can see that just by the operation of chance, 93.24 of the 100

observations should have been agreed to by the two observers. (93.12 + 0.12)

EXPECTED MD #1

Yes No

MD#2 Yes 0.12 3.88 4

No 2.88 93.12 96

3 97 100

Page 18: RELIABILITY OF DISEASE CLASSIFICATION

Lets now compare the actual agreement with the expected agreement.

• Expected agreement is 6.76% from perfect agreement of 100% (100 – 93.24)

• Actual agreement is 5.0% from perfect agreement (100 – 95).

• So our two observers were 1.76% better than chance, but if they had agreed perfectly they would have been 6.76% better than chance. So they are really only about ¼ better than chance (1.76/6.76)

Page 19: RELIABILITY OF DISEASE CLASSIFICATION

Below is the formula for calculating Kappa from expected agreement

Observed agreement - Expected Agreement

1 - Expected Agreement

 

95% - 93.24% = 1.76% = .26

1 - 93.24% 6.76%

Page 20: RELIABILITY OF DISEASE CLASSIFICATION

How good is a Kappa of 0.26?

Kappa > .80 is considered excellent

Kappa .60 - .80 is considered good

Kappa .40 - .60 is considered fair

Kappa < .40 is considered poor

Page 21: RELIABILITY OF DISEASE CLASSIFICATION

In the second example, the observed agreement was also 95%, but the

marginal totals were very different

 

ACTUAL MD #1

Yes No

MD#2 Yes 46

No 54

45 55 100

Page 22: RELIABILITY OF DISEASE CLASSIFICATION

Using the same procedure as before, we calculate the expected N in any one cell, based on the marginal totals. For example, the lower right cell is 54% of 55, which is 29.7 

 

ACTUAL MD #1

Yes No

MD#2 Yes 46

No 29.7 54

45 55 100

Page 23: RELIABILITY OF DISEASE CLASSIFICATION

And, by subtraction the other cells are as below. The cells which indicate agreement are highlighted in yellow, and add up to 50.4%

ACTUAL MD #1

Yes No

MD#2 Yes 20.7 25.3 46

No 24.3 29.7 54

45 55 100

Page 24: RELIABILITY OF DISEASE CLASSIFICATION

Enter the two agreements into the formula: Observed agreement - Expected Agreement

1 - Expected Agreement 

95% - 50.4% = 44.6% = .901 - 50.4% 49.6%

 In this example, the observers have the same % agreement, but now they are much different from chance. Kappa of 0.90 is considered excellent

Page 25: RELIABILITY OF DISEASE CLASSIFICATION

A 2nd WAY TO CALCULATE THE KAPPA STATISTIC

MD#1

Yes No

MD#2

Yes A B N1

No C D N2

N3 N4total

2(AD - BC)

N1N4 + N2N3

where the Ns are the marginal totals, labeled thus:

Page 26: RELIABILITY OF DISEASE CLASSIFICATION

Look again at the tables on slide 7.For Table 1: 

2(94 x 1 - 2 x 3) = 176 = .26 4 x 97 + 3 x 96 676

 For Table 2: 

2(52 x 43 - 3 x 2) = 4460 = .90 46 x 55 + 45 x 54 4960

Page 27: RELIABILITY OF DISEASE CLASSIFICATION

Note parallels between:

THE ODDS RATIO

THE CHI-SQUARE STATISTIC

THE KAPPA STATISTIC

Note that the cross-products of the four-fold table, and their relation to marginal totals, are central to all three expressions