EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم...
-
Upload
katarina-postlethwait -
Category
Documents
-
view
213 -
download
1
Transcript of EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم...
EPIDEMIOLOGY 4 RELIABILITY
AND VALIDITY (TRAINING AND
CALIBRATION)
/ 1388 تاریخ : Dr. jabarifar)دکتر سید ابراهیم جباری فر) 2010
دانشیار دانشگاه علوم پزشکی اصفهان بخش دندانپزشکی جامعه نگر
RELIABILITY AND VALIDITY OF DATA
Two main reasons for variability in scoring (WHO, 1997):
Difficulty in scoring the different levels of oral diseases,
particularly dental caries and periodontal diseases
Physical and psychological factors (fatigue, fluctuations in
interest in the study, variations in visual acuity and tactile
sense) that affect the judgement of examiners format time to
time and to a different degree
RELIABILITY AND VALIDITY OF DATA
What is the principal problem/ issue with the variability?
To decide whether examiners are sufficiently close to each
other in their interpretation and application of the clinical
criteria. In this sense, data from their samples can be pooled
together to provide area/district estimates, whose variances
reflect true inter-subject , differences in oral health and not an
inflation due to examiner differences (Pine et al. 1997).
RELIABILITY AND VALIDITY OF DATA
Objectives of standardisation and calibration (WHO , 1997):
To ensure uniform interpretation, understanding and
application by all examiners of the codes and criteria for the
various diseases and conditions to be observed and recorded .
To ensure that each examiner can examine consistently.
RELIABILITY AND VALIDITY OF DATA
How can this problem / issue be tackled?
1. Training of examiners and interviewers
2. Calibration exercise
3. Repeat examinations
TRAINING EXERCISE
What do we mean by a training exercise?
The training exercise aims to thoroughly and intensively teach to the
survey examiners the logistics of the examination protocol and the
agreed interpretation of the diagnostic criteria.
In practical terms, the full range of diagnostic situations
Are presented and discussed in detail: a) on slides, b) on actual
subjects. It takes place before the survey and requires at least 2 days
of intensive work. It may be repeated at specific intervals during the
survey.
CALIRBATION EXERCISE
What do we mean by a calibration exercise?
The calibration exercise completes the training and reflects a
formal measure of how well the examiner can interpret the
criteria, compared to the "gold-standard" set by, the trainer.
It takes place before the survey and may be (repeated
annually.
CALIBRATION EXERICISE
How does this practically happen?
Some subjects are examined by some (or even all) examiners and
by the gold-standard examiner and the data are compared.
Repeated annually, in order to ensure consistency in the
interpretation of criteria and familiarity with new measures.
The calibration exercise should include a sufficient number of
cases (≥20 subjects), on which a wide range of diagnostic
decisions have to be made (i.e. treated and untreated caries, as
well as caries-free subjects).
CALIBRATION EXERCISE
What is the action taken?
Outlier" examiners and the specific areas of over- or under-
scoring are identified. The issue is discussed and thoroughly
clarified. A repeat calibration exercise should be undertaken. On
a repetitive unsatisfactory result, the outlier may be excluded
from the survey. (practical difficulties)
NB .Ability to standardise clinical examination results is not a
measure of clinical skill
(Claritfii1 advance)
REPEAT EXAMINATIONS
What do we mean by repeat examinations?
The repeat examinations can be carried out: a) by the same
examiner, aiming to monitor the intra-examiner diagnostic
consistency (single examiner), or b) by the gold-standard
examiner, aiming to ensure inter examiner diagnostic
consistency (group of examiners).
In practical terms, this implies performing duplicate
examinations on 5·10% of the survey sample (≥ 25 subjects). It
should take place in various stages of the survey (beginning,
half-way, end).
TRAINING AND CALIBRATION OF EXAMINERS
1. Intensive training in the examination protocol and criteria,
guided by gold-standard examiner(s).
2. Calibration exercise for key measures.
3 Identification of problems, clarification with respective
examiners.
4. Final training session and meeting with interviewers .before
each wave of examinations (refresh knowledge, highlight key
problematic areas)
5. Repeat examinations by examiner (single examiner) or by
gold standard examiner (group of examiners).
TRAINING OF INTERVIEWERS
Familiarise with the procedure and appropriate order of clinical
examination (gold-standard examiner).
2. Training in the administration of the questionnaire
(explanation, instructions on the format and the administration
of questions, practical exercises).
3. Final meeting with examiners before each wave of fieldwork
(meet examiners, highlight key points, discuss issues raised
during fieldwork in previous waves)
Re- training for interviewers that have not participated in the
survey for a predefined period (e.g. 1 month).
ASSESSMENT OF REPRODUCIBILITY: METHODS
1. Use of master sheets.
2. Calculation of mean indices by examiner and the size and
direction of deviation from gold-standard examiner.
3. Calculation of group means and 95% confidence limits.
4. Assessment of percentage of agreement between examiner
and gold-standard examiner.
5. Sensitivity and Specificity estimations.
6. Dice’s concordance index.
T. Kappa and weighted Kappa statistic.
DEVIATION FROM GOLD STANDARD EXAMINER
1. Establishment of an arbitrary cut-off point for acceptable
deviation from the gold-standard examiner (e.g±.5
dmft/DMFT).
Calculation of mean dmft/DMFT for the gold-standard
examiner .
4. Estimation of the size and direction of deviation from the
gold- standard examiner for each examiner, comparison with
the chosen level of acceptance.
GROUP MEAN AND 95% CONFIDENCE LIMITS
The basic concept is to identify the outliers, if any, whose
mean scores fall outside the 95% confidence interval of
the mean score for all examiners.
The calculation of the group mean score excludes the
gold-standard examiner. The value of t varies
according to the number of examiners .
The general formula for the 95% confidence limits is:
group mean ± t (0.05, df=n-1) x sd
PERCENTAGE OF AGREEMENT
Estimated as the exact number of agreements expressed
as a percentage of the total.
Very simple
Takes no account of where in the table the agreement was
Some agreement expected even by chance.
Lack of accuracy when the prevalence of disease or
condition is rather low.
SENSITIVITY AND SPECIFICITY
Sensitivity refers to the ability to correctly identify the true
positive cases. It is the proportion of true positive cases which
are tested positive.
Specificity refers to the ability to detect the true negative
cases. It is the proportion of true negative cases which are
tested negatives.
Sensitivity=TP I (TP+FN) , Specificity=TN I (TN+FP)
Affected by disease experience and treatment provision (e.g
caries experience and proportion restored).
DICE’S CONCORDANCE INDEX
Appropriate when only one outcome is the object of interest (e.g.decayed teeth)
Quick and easy
Does NOT use all available data
D=2a / (2a+b+c)- +
B A +
d c -
Examiner
Examiner B
)KAPPA (K) STATISTIC
Kappa (Cohen, 1960) is a measure of agreement that can be
calculated between a pair of examiners (examiner and gold-
standard examiner) that takes chance agreement into account.
It reflects the chance corrected proportional agreement.
It may involve a comparison on a surface or on a tooth level,
or even on aggregate indices (e.g. DMF). It may also Include
all possible codes for a condition, as well as different
groupings of data (flexibility in application).
KAPPA CALCULATION
Eexaminer 1
Examiner 2Total Caries Sound
a+b b a Sound
c+d d c Caries
n b+d a+c Total
K=(P0-Pe)
P0=(a+d)/n
Pe([a+c)×(a+b)+ (b+d)
×(c+d)]/n2
Po reflects the proportion
of observed agreement and
pe the proportion of
agreement that could be
expected by chance
Kappa does NOT take into account the degree of
disagreement. In ordinal variables, it is preferable to use
the weighted Kappa, which provides weights to
disagreements according to the magnitude of discrepancy
(the closer to the diagonal, the better).
Kappa and weighted Kappa represent the best approach to
measuring variability - "statistics cannot provide a simple
substitute to clinical judgement" (Altman , 1991).
KAPPA INTERPRETATION
Strength of agreement Value of K
Poor <0.20
Fair 0.21-0.40
Moderate 0.41-0.60
Good 0.61-0.80
Very good 0.81-1.00
Landis and koch (1977)
CORRELATION
Correlation is an expression of how much two variables
vary together; it does not reflect their proximity to 1: 1
correspondence
Correlation is a measure of the strength of the association
between two variables, not of their agreement.
Consequently:
Correlation should be avoided for the analysis of
calibration exercise.
TRAINING AND CALIBRATION
KEY POINTS
Use the minimum number of examiners in surveys,
Training and calibration exercise at baseline and repeated at
later stages,
Follow standardised procedures and agreed criteria,
Include sufficient number of cases in calibration, so as
to cover a wide range of diagnostic decisions.
Determine key clinical variables and appropriate data
Grouping, to be included in the calibration exercise.
Calculate and interpret Kappa scores.
Re- calibrate exclude outliers.
Plan repeat examinations during the survey.