RELIABILITY OF DISEASE CLASSIFICATION

RELIABILITY OF DISEASE CLASSIFICATION

Nigel Paneth

TERMINOLOGY

Reliability is analogous to precision

Validity is analogous to accuracy

Reliability is how well an observer classifies the same individual under different circumstances.Validity is how well a given test reflects another test of known greater accuracy.

RELIABILITY AND VALIDITYReliability includes:• assessments of the same observer at different times - INTRA-OBSERVER RELIABILITY• assessments of different observers at the same time - INTER-OBSERVER RELIABILITY

Reliability assumes that all tests or observers are equal; Validity assumes that there is a gold standard to which a test or observer should be compared.

ASSESSING RELABILITY

How do we assess reliability?

One way is to look simply at percent agreement.

Percent agreement is the proportion of all diagnoses classified the same way by two observers.

EXAMPLE OF PERCENT AGREEMENT

Two physicians are each given a set of 100 X-rays to look at independently and asked to judge whether pneumonia is present or absent. When both sets of diagnoses are tallied, it is found that 95% of the diagnoses are the same.

IS PERCENT AGREEMENT GOOD ENOUGH?

Do these two physicians exhibit high diagnostic reliability?

Can there be 95% agreement between two observers without really having good reliablity?

Compare the two tables below:

Table 1 Table 2

MD#1

Yes No

MD#2

Yes 1 3

No 2 94

MD#1

Yes No

MD#2

Yes 43 3

No 2 52

In both instances, the physicians agree 95% of the time. Are the two physicians equally reliable in the two tables?

MD#1

Yes No

MD#2

Yes 43 3

No 2 52

• What is the essential difference between the two tables?

• The problem arises from the ease of agreement on common events (e.g. not having pneumonia in the first table).

• So a measure of agreement should take into account the “ease” of agreement due to chance alone.

USE OF THE KAPPA STATISTIC TO ASSESS

RELIABILITY

Kappa is a widely used test of inter or intra-observer agreement (or reliability) which corrects for chance agreement.

KAPPA VARIES FROM + 1 to - 1+ 1 means that the two observers are perfectly

reliable. They classify everyone exactly the same way.

0 means there is no relationship at all between the two observer’s classifications, above the agreement that would be expected by chance.

- 1 means the two observers classify exactly the opposite of each other. If one observer says yes, the other always says no.

GUIDE TO USE OF KAPPAS IN EPIDEMIOLOGY AND MEDICINE

Kappa > .80 is considered excellent

Kappa .60 - .80 is considered good

Kappa .40 - .60 is considered fair

Kappa < .40 is considered poor

1st WAY TO CALCULATE KAPPA

1. Calculate observed agreement (cells in which the observers agree/total cells). In both table 1 and table 2 it is 95%

2. Calculate expected agreement (chance agreement) based on the marginal totals

Table 1’s marginal totals are:

OBSERVED

MD#1

Yes No

MD#2

Yes 1 3 4

No 2 94 96

3 97 100

• How do we calculate the N expected by chance in each cell?

• We assume that each cell should reflect the marginal distributions, i.e. the proportion of yes and no answers should be the same within the four-fold table as in the marginal totals.

OBSERVED MD #1

Yes No

MD#2 Yes 1 3 4

No 2 94 96

3 97 100

EXPECTED MD #1

Yes No

MD#2 Yes 4

No 96

3 97 100

To do this, we find the proportion of answers in either the column (3% and 97%, yes and no respectively for MD #1) or row (4% and 96% yes and no respectively for MD #2) marginal totals, and apply one of the two proportions to the other marginal total. For example, 96% of the row totals are in the “No” category. Therefore, by chance 96% of MD #1’s “No’s” should also be in the “No” column. 96% of 97 is 93.12.

EXPECTED

MD#1

Yes No

MD#2 Yes 4

No 93.12 96

3 97 100

By subtraction, all other cells fill in automatically, and each yes/no distribution reflects the marginal distribution. Any cell could have been used to make the calculation, because once one cell is specified in a 2x2 table with fixed marginal distributions, all other cells are also specified.

EXPECTED MD #1

Yes No

MD#2 Yes 0.12 3.88 4

No 2.88 93.12 96

3 97 100

Now you can see that just by the operation of chance, 93.24 of the 100

observations should have been agreed to by the two observers. (93.12 + 0.12)

EXPECTED MD #1

Yes No

MD#2 Yes 0.12 3.88 4

No 2.88 93.12 96

3 97 100

Lets now compare the actual agreement with the expected agreement.

• Expected agreement is 6.76% from perfect agreement of 100% (100 – 93.24)

• Actual agreement is 5.0% from perfect agreement (100 – 95).

• So our two observers were 1.76% better than chance, but if they had agreed perfectly they would have been 6.76% better than chance. So they are really only about ¼ better than chance (1.76/6.76)

Below is the formula for calculating Kappa from expected agreement

Observed agreement - Expected Agreement

1 - Expected Agreement

95% - 93.24% = 1.76% = .26

1 - 93.24% 6.76%

How good is a Kappa of 0.26?

Kappa > .80 is considered excellent

Kappa .60 - .80 is considered good

Kappa .40 - .60 is considered fair

Kappa < .40 is considered poor

In the second example, the observed agreement was also 95%, but the

marginal totals were very different

ACTUAL MD #1

Yes No

MD#2 Yes 46

No 54

45 55 100

Using the same procedure as before, we calculate the expected N in any one cell, based on the marginal totals. For example, the lower right cell is 54% of 55, which is 29.7

ACTUAL MD #1

Yes No

MD#2 Yes 46

No 29.7 54

45 55 100

And, by subtraction the other cells are as below. The cells which indicate agreement are highlighted in yellow, and add up to 50.4%

ACTUAL MD #1

Yes No

MD#2 Yes 20.7 25.3 46

No 24.3 29.7 54

45 55 100

Enter the two agreements into the formula: Observed agreement - Expected Agreement

1 - Expected Agreement

95% - 50.4% = 44.6% = .901 - 50.4% 49.6%

In this example, the observers have the same % agreement, but now they are much different from chance. Kappa of 0.90 is considered excellent

A 2nd WAY TO CALCULATE THE KAPPA STATISTIC

MD#1

Yes No

MD#2

Yes A B N1

No C D N2

N3 N4total

2(AD - BC)

N1N4 + N2N3

where the Ns are the marginal totals, labeled thus:

Look again at the tables on slide 7.For Table 1:

2(94 x 1 - 2 x 3) = 176 = .26 4 x 97 + 3 x 96 676

For Table 2:

2(52 x 43 - 3 x 2) = 4460 = .90 46 x 55 + 45 x 54 4960

Note parallels between:

THE ODDS RATIO

THE CHI-SQUARE STATISTIC

THE KAPPA STATISTIC

Note that the cross-products of the four-fold table, and their relation to marginal totals, are central to all three expressions

RELIABILITY OF DISEASE CLASSIFICATION

Documents

Transcript of RELIABILITY OF DISEASE CLASSIFICATION