EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم...

26
EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) ( ر ف اری ب ج م ی هرا ب د ا ب س ر کت د( Dr. jabarifar : خ ی ار ت1388 / 2010 ر گ ت عه ام ی ج ک. ش ر0 ت ن دا دت. ش خ ی4 هان ف ص ی ا ک. ش ر0 ب وم ل عاه گ. ش نر دا ا ب. ش ن دا

Transcript of EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم...

Page 1: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

EPIDEMIOLOGY 4 RELIABILITY

AND VALIDITY (TRAINING AND

CALIBRATION)

/ 1388 تاریخ : Dr. jabarifar)دکتر سید ابراهیم جباری فر) 2010

دانشیار دانشگاه علوم پزشکی اصفهان بخش دندانپزشکی جامعه نگر

Page 2: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

RELIABILITY AND VALIDITY OF DATA

Two main reasons for variability in scoring (WHO, 1997):

Difficulty in scoring the different levels of oral diseases,

particularly dental caries and periodontal diseases

Physical and psychological factors (fatigue, fluctuations in

interest in the study, variations in visual acuity and tactile

sense) that affect the judgement of examiners format time to

time and to a different degree

Page 3: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

RELIABILITY AND VALIDITY OF DATA

What is the principal problem/ issue with the variability?

To decide whether examiners are sufficiently close to each

other in their interpretation and application of the clinical

criteria. In this sense, data from their samples can be pooled

together to provide area/district estimates, whose variances

reflect true inter-subject , differences in oral health and not an

inflation due to examiner differences (Pine et al. 1997).

Page 4: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

RELIABILITY AND VALIDITY OF DATA

Objectives of standardisation and calibration (WHO , 1997):

To ensure uniform interpretation, understanding and

application by all examiners of the codes and criteria for the

various diseases and conditions to be observed and recorded .

To ensure that each examiner can examine consistently.

Page 5: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

RELIABILITY AND VALIDITY OF DATA

How can this problem / issue be tackled?

1. Training of examiners and interviewers

2. Calibration exercise

3. Repeat examinations

Page 6: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

TRAINING EXERCISE

What do we mean by a training exercise?

The training exercise aims to thoroughly and intensively teach to the

survey examiners the logistics of the examination protocol and the

agreed interpretation of the diagnostic criteria.

In practical terms, the full range of diagnostic situations

Are presented and discussed in detail: a) on slides, b) on actual

subjects. It takes place before the survey and requires at least 2 days

of intensive work. It may be repeated at specific intervals during the

survey.

Page 7: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

CALIRBATION EXERCISE

What do we mean by a calibration exercise?

The calibration exercise completes the training and reflects a

formal measure of how well the examiner can interpret the

criteria, compared to the "gold-standard" set by, the trainer.

It takes place before the survey and may be (repeated

annually.

Page 8: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

CALIBRATION EXERICISE

How does this practically happen?

Some subjects are examined by some (or even all) examiners and

by the gold-standard examiner and the data are compared.

Repeated annually, in order to ensure consistency in the

interpretation of criteria and familiarity with new measures.

The calibration exercise should include a sufficient number of

cases (≥20 subjects), on which a wide range of diagnostic

decisions have to be made (i.e. treated and untreated caries, as

well as caries-free subjects).

Page 9: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

CALIBRATION EXERCISE

What is the action taken?

Outlier" examiners and the specific areas of over- or under-

scoring are identified. The issue is discussed and thoroughly

clarified. A repeat calibration exercise should be undertaken. On

a repetitive unsatisfactory result, the outlier may be excluded

from the survey. (practical difficulties)

NB .Ability to standardise clinical examination results is not a

measure of clinical skill

(Claritfii1 advance)

Page 10: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

REPEAT EXAMINATIONS

What do we mean by repeat examinations?

The repeat examinations can be carried out: a) by the same

examiner, aiming to monitor the intra-examiner diagnostic

consistency (single examiner), or b) by the gold-standard

examiner, aiming to ensure inter examiner diagnostic

consistency (group of examiners).

In practical terms, this implies performing duplicate

examinations on 5·10% of the survey sample (≥ 25 subjects). It

should take place in various stages of the survey (beginning,

half-way, end).

Page 11: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

TRAINING AND CALIBRATION OF EXAMINERS

1. Intensive training in the examination protocol and criteria,

guided by gold-standard examiner(s).

2. Calibration exercise for key measures.

3 Identification of problems, clarification with respective

examiners.

4. Final training session and meeting with interviewers .before

each wave of examinations (refresh knowledge, highlight key

problematic areas)

5. Repeat examinations by examiner (single examiner) or by

gold standard examiner (group of examiners).

Page 12: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

TRAINING OF INTERVIEWERS

Familiarise with the procedure and appropriate order of clinical

examination (gold-standard examiner).

2. Training in the administration of the questionnaire

(explanation, instructions on the format and the administration

of questions, practical exercises).

3. Final meeting with examiners before each wave of fieldwork

(meet examiners, highlight key points, discuss issues raised

during fieldwork in previous waves)

Re- training for interviewers that have not participated in the

survey for a predefined period (e.g. 1 month).

Page 13: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه
Page 14: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

ASSESSMENT OF REPRODUCIBILITY: METHODS

1. Use of master sheets.

2. Calculation of mean indices by examiner and the size and

direction of deviation from gold-standard examiner.

3. Calculation of group means and 95% confidence limits.

4. Assessment of percentage of agreement between examiner

and gold-standard examiner.

5. Sensitivity and Specificity estimations.

6. Dice’s concordance index.

T. Kappa and weighted Kappa statistic.

Page 15: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه
Page 16: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

DEVIATION FROM GOLD STANDARD EXAMINER

1. Establishment of an arbitrary cut-off point for acceptable

deviation from the gold-standard examiner (e.g±.5

dmft/DMFT).

Calculation of mean dmft/DMFT for the gold-standard

examiner .

4. Estimation of the size and direction of deviation from the

gold- standard examiner for each examiner, comparison with

the chosen level of acceptance.

Page 17: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

GROUP MEAN AND 95% CONFIDENCE LIMITS

The basic concept is to identify the outliers, if any, whose

mean scores fall outside the 95% confidence interval of

the mean score for all examiners.

The calculation of the group mean score excludes the

gold-standard examiner. The value of t varies

according to the number of examiners .

The general formula for the 95% confidence limits is:

group mean ± t (0.05, df=n-1) x sd

Page 18: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

PERCENTAGE OF AGREEMENT

Estimated as the exact number of agreements expressed

as a percentage of the total.

Very simple

Takes no account of where in the table the agreement was

Some agreement expected even by chance.

Lack of accuracy when the prevalence of disease or

condition is rather low.

Page 19: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

SENSITIVITY AND SPECIFICITY

Sensitivity refers to the ability to correctly identify the true

positive cases. It is the proportion of true positive cases which

are tested positive.

Specificity refers to the ability to detect the true negative

cases. It is the proportion of true negative cases which are

tested negatives.

Sensitivity=TP I (TP+FN) , Specificity=TN I (TN+FP)

Affected by disease experience and treatment provision (e.g

caries experience and proportion restored).

Page 20: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

DICE’S CONCORDANCE INDEX

Appropriate when only one outcome is the object of interest (e.g.decayed teeth)

Quick and easy

Does NOT use all available data

D=2a / (2a+b+c)- +

B A +

d c -

Examiner

Examiner B

Page 21: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

)KAPPA (K) STATISTIC

Kappa (Cohen, 1960) is a measure of agreement that can be

calculated between a pair of examiners (examiner and gold-

standard examiner) that takes chance agreement into account.

It reflects the chance corrected proportional agreement.

It may involve a comparison on a surface or on a tooth level,

or even on aggregate indices (e.g. DMF). It may also Include

all possible codes for a condition, as well as different

groupings of data (flexibility in application).

Page 22: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

KAPPA CALCULATION

Eexaminer 1

Examiner 2Total Caries Sound

a+b b a Sound

c+d d c Caries

n b+d a+c Total

K=(P0-Pe)

P0=(a+d)/n

Pe([a+c)×(a+b)+ (b+d)

×(c+d)]/n2

Po reflects the proportion

of observed agreement and

pe the proportion of

agreement that could be

expected by chance

Page 23: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

Kappa does NOT take into account the degree of

disagreement. In ordinal variables, it is preferable to use

the weighted Kappa, which provides weights to

disagreements according to the magnitude of discrepancy

(the closer to the diagonal, the better).

Kappa and weighted Kappa represent the best approach to

measuring variability - "statistics cannot provide a simple

substitute to clinical judgement" (Altman , 1991).

Page 24: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

KAPPA INTERPRETATION

Strength of agreement Value of K

Poor <0.20

Fair 0.21-0.40

Moderate 0.41-0.60

Good 0.61-0.80

Very good 0.81-1.00

Landis and koch (1977)

Page 25: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

CORRELATION

Correlation is an expression of how much two variables

vary together; it does not reflect their proximity to 1: 1

correspondence

Correlation is a measure of the strength of the association

between two variables, not of their agreement.

Consequently:

Correlation should be avoided for the analysis of

calibration exercise.

Page 26: EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه

TRAINING AND CALIBRATION

KEY POINTS

Use the minimum number of examiners in surveys,

Training and calibration exercise at baseline and repeated at

later stages,

Follow standardised procedures and agreed criteria,

Include sufficient number of cases in calibration, so as

to cover a wide range of diagnostic decisions.

Determine key clinical variables and appropriate data

Grouping, to be included in the calibration exercise.

Calculate and interpret Kappa scores.

Re- calibrate exclude outliers.

Plan repeat examinations during the survey.