Post on 08-Aug-2020
Systematic reviews of diagnostic test accuracy
Karen R Steingart, MD, MPHMadhukar Pai, MD, PhD
What is diagnostic test accuracy?What is diagnostic test accuracy?
• DiagnosisDiagnosis- Does this patient have this disease at this
point in time?p• Test accuracy
- What proportion of those with the disease p pdoes the test detect? (sensitivity)- What proportion of those without the
disease get negative test results? (specificity)
Test accuracy may not capture clinical impact
di Ruffano, BMJ 2012
Clinical impact of test results on diagnostic and treatment decisions, and eventually, patient outcomes
Test resultsChange in physician’s
Correct treatment
Improved patientTest results physician’s
decisionstreatment choices
patient outcomes
“Improved accuracy is not always a necessary prerequisite for improving patient health, nor does it guarantee other downstream improvements” [di Ruffano et al. BMJ 2012;344:e686]
Accuracy vs Impact:Rapid measurement of B‐type natriuretic peptidein the emergency diagnosis of heart failure
Maisel et al, N Engl J Med. 2002 Jul 18;347(3):
7
Road map for diagnostic accuracy reviews
Avoid simple pooling of sensand spec; need to use HSROC or bivariate random effects models; do not use funnel plots for publication bias
Stata (metandi command) for bivariate random effects pooling + HSROC)
Use PRISMA for reporting of SR
Pai M et al. Evid Based Med 2004;9:101-103
Key steps in a diagnostic test accuracy ireview
1 Framing focused questions1. Framing focused questions2. Searching for studies3 A i t d lit3. Assessing study quality4. Analyzing the data; undertaking meta-
analyses5. Drawing robust conclusions and g
informative presentation of results
1 Framing focused questions1. Framing focused questions
Begin with a well-framed question,
The objectives of the reviewPICO
Population InterventionInterventionComparison
OutcomeOutcome
+ Study design+ Purpose of the test/strategy
+ Reference standard
Richardson et al. ACP Journal Club 1995;A-12
PICO or PPPICPTR for systematic review f di ti t t ?of diagnostic test accuracy?
• Patients, Presentation, Prior tests• Index test, Comparator tests, p• Purpose: comparative question, role of
testtest• Target condition, Reference standard
2. Searching for studies
Sources of studies for diagnostic i
• MEDLINE, EMBASE, the Cochrane Register of Di ti T t A St di ( d
accuracy reviews
Diagnostic Test Accuracy Studies (under development)
• Search related diagnostic test accuracy reviews g y(for example HTA database, DARE etc)
• Check references of relevant studies/reviews• Use a highly sensitive (broad) search strategy• Use a highly sensitive (broad) search strategy• Use a wide variety of search terms, both text words
and database subject headings (MeSH terms)R ti f h filt h ld ll b• Routine use of search filters should generally be avoided
Bossuyt PM Leeflang MM Cochrane Handbook for Systematic Reviews ofBossuyt PM, Leeflang MM. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 0.4 [updated September 2008]. The Cochrane Collaboration, 2008
+ Influenza rapid tests: Search strategy
Influenza, Human [Mesh]Influenza A virus [Mesh]
Rapid test, rapid diagnos*, rapiddiagnostic test*, point‐of‐caretest*, antigen detection test*, antigen detection, rapid antigen
Influenza B virus [MeshInfluenza
Flugrippe
test*, immunoassay*, immunochromatographic test*Binax NOW, Directigen Flu, FluOIA, QuickVue Influenza, Rapide grippedetection Flu, SAS Influenza, TRU
FLU, XPECT FLU, Zstat flu
Databases: MEDLINE via Pubmed, EMBASE, Biosis et Web of ScienceMarch 2010, updated december 2011, p
Chartrand C et al. Annals of Int Med 2012
Chartrand C et al. Ann Intern Med 2012
The medical literature can be compared to a jungle. It is fast growing, full of deadwood, sprinkled with hidden treasure and infested with spiders and snakes. Morgan. Can Med Assoc J, 134,Jan 15, 19861986
3 Assessing study quality3. Assessing study quality
Sources of bias in diagnostic studies: 3 key issues
• Inclusion of right spectrum of patients• Verification of patientsp
- choice of reference standard- complete verificationp
• Independent assessment of index test and reference standard (blinding)( g)
Effects of study design, A Rutges CMAJ 2006
The Lancet Infect Dis 2003
Case-control studies had a two-fold higher DOR thantwo-fold higher DOR than cross-sectional studies
QUADAS 2003QUADAS, 2003
QUADAS-2, 2011
Suggested displays – QUADAS-2Suggested displays QUADAS 2
http://www.bris.ac.uk/quadas/
In general, diagnostic studies are poorly done and reported (contacting authors is helpful)
25
4 A l i th d t d t ki t l4. Analyzing the data; undertaking meta-analyses
Key stepsKey steps
• Extract TP, FP, FN, and TN to determine pairedExtract TP, FP, FN, and TN to determine paired estimates of sensitivity and specificity
• Visually examine results of individual studiesy• Calculate overall summary estimates using
HSROC/bivariate meta-analysisy• Look for and investigate possible reasons for
heterogeneity
http://ims.cochrane.org/revman
Forest plot – diagnostic test accuracy review o est p ot d ag ost c test accu acy e e
One row is displayed for each studyExtracted data are presented as TP, FP, FN, TNData shown in the graph are also displayed numerically Each study result is given a box for a point estimateHorizontal line = confidence interval
Steingart, PLoS Med 2011
Statistical models for meta-analysis of diagnostic studiesdiagnostic studies
• Simple, separate pooling of sens and spec should not be done
• Two recommended approaches:– hierarchical summary ROC model (HSROC, Gatsonis and Rutter 2001)– bivariate regression of sensitivity and specificity (Bivariate, Reitsma 2005)
+ Influenza rapid tests
Sensitivity: 62.3% (57.9 – 66.6)Specificity: 98.2% (97.5 – 98.7)LR+: 34.5 (23.8 – 45.2)LR : 0 38 (0 34 0 43)LR‐: 0.38 (0.34 – 0.43)
Chartrand C et al. Ann Intern Med 2012
Stata command, metandiStata command, metandi
Stata output
Pooled sensitivity = 80.8% (95% CI 74.3, 86,0)Pooled specificity = 99.3% (95% CI 97.1, 99.8)
Heterogeneity: very common in diagnostic SRs
• Refers to variation in results among studiesstudies
• May be caused by variation in – test thresholds (unique to meta-analyses of
diagnostic tests)l f di– prevalence of disease
– patient spectrumstudy quality– study quality
– chance variation
35
Variation due to threshold differencesVariation due to threshold differences
• Explicit threshold differences– studies have used different cut-off values
to define positive test resultsp• Implicit threshold differences
– differences in observersdifferences in observers– differences in equipmentConsequence: negative correlation arises• Consequence: negative correlation arises between sensitivity and specificity
J Reitsma, Cochrane DTA Workshop, Amsterdam, Sept 2011
Exploring heterogeneityExploring heterogeneity
• Subgroup analysis
• Meta-regression analysis
Example: subgroup analysis
Chartrand C et al. Ann Intern Med 2012
Meta-regression
• Is a form of linear regression in which studies th it f l iare the unit of analysis
• Aims to relate the size of effect to one or more characteristics of the studies involvedcharacteristics of the studies involved
• DOR is the dependent variable • Covariates that might be associated with theCovariates that might be associated with the
variability in DOR are the independent variables
• Tip: Specify covariates that you want to explore in advance
The threshold effect (-0.21) was significant(p = 0 01) This was also seen in the SROC plot(p = 0.01). This was also seen in the SROC plot,
Ling D et al. PLoS ONE 2008.
Determined using ‘Metareg’ command in Stata
Exploration of heterogeneity – urine LAM ELISA for TB
Minion J et al. ERJ 2011
Publication biasPublication bias
• Formal assessment of publication bias using methods such as funnel plots or regression tests is notregression tests is not recommended for diagnostic test accuracy studies
5. Drawing robust conclusions and informative presentation of results
- summary of findings tables
Issues to discussIssues to discuss
• What are the consequences of using the test in terms of the numbers of TP, FP, FN, and TN?
• How applicable are the results?• To what extent were the primary studies biased?
If serious study limitations were identified, could th i t th lt ?these impact the results?
• What were the limitations of the SR itself?Wh t th i li ti f f t h?• What are the implications for future research?
Steingart draft template
Some general limitations of diagnostic SRs
• Literature search strategies are imperfect and studies b i dcan be missed
• Publication bias is always a concern• Poor quality studies or poorly reported studies• Poor quality studies or poorly reported studies• Unexplained heterogeneity• Not enough studies on clinical impact of testsg p• Industry supported studies or COI of study authors• COI of systematic reviewers• Keeping up to date in rapidly evolving fields
Keeping systematic reviews updated!
2004
2007
2008
2012
Lateral flow urine lipoarabinomannan assay for detecting active tuberculosis in HIV-positive adults
Shah et al, The Cochrane Library 2016 Open accessOpen access
http://onlinelibrary.wiley.com/doi/10.1002/14651858.CD011420.pub2/full
Urine lateral-flow lipoarabinomannan (LF-LAM) Background - 1Background 1
• LAM: 17.5 kilodalton structural component of mycobacterial cell walls
• Detectable in urine of patients with TBM t l i f ELISA LAM*• Meta-analysis of ELISA-LAM*
• increased sensitivity in HIV-positive compared with HIV-positive compared with HIVnegative patients
• Pooled sensitivity 56% (40-71%)• Pooled specificity 95% (77-99%)• Higher sensitivity with more
severe immunosuppressionsevere immunosuppression
*Minion et al. ERJ 2011; Image: Se-Ho Park and Albert Bendelac. Nature 406, 788-792
LF-LAM Background – 2
• Urine LF-LAM is a new diagnostic test that may overcome limitations of other approaches– Point of care, lateral flow
format – Fast (< 20 minutes)– No equipment needed– Low cost (~$3.50)– Accuracy has varied across
published studies
Objectives
• To assess the accuracy of LF-LAM for the diagnosis of active TB disease in HIV-positivediagnosis of active TB disease in HIV positive adults who have signs and symptoms suggestive of TB (TB diagnosis).
• To assess the accuracy of LF-LAM as a screening test for active TB disease in HIV-positive adults irrespective of signs and symptoms suggestive of TB (TB i )TB (TB screening).
Selection criteriaSe ect o c te a
• Eligible study types included randomizedEligible study types included randomized controlled trials, cross-sectional studies, and cohort studies that determined LF-LAM accuracy for TB against a microbiological reference standard (culture or nucleic acid amplification t t f b d it )test from any body site).
• 12 studies were identified:6 f TB i– 6 for TB screening
– 6 for TB diagnosis
Data collection and analysisData collection and analysis
• We determined accuracy of LF-LAM. .. We determined accuracy of LF LAM combined with sputum microscopy or Xpert® MTB/RIF In addition we exploredXpert® MTB/RIF. In addition, we explored the influence of CD4 count on the accuracy estimatesaccuracy estimates.
Risk of bias
Shah Cochrane 2016
Forest plots of urine LAM sensitivity and specificity for TB diagnosis measured against a microbiological reference standard.
The studies are ordered by decreasing sensitivityThe studies are ordered by decreasing sensitivity
TP = True Positive; FP = False Positive; FN = False Negative; TN = True Negative.
Analysis Participants (number of
Pooled estimates (95% CrI)(studies)
Sensitivity SpecificityLF-LAM, ALL 2313 (5
studies)45% (29, 63) 92% (80, 97)
Microscopy 1876 (4 40% (27 54) 95% (94 97)Microscopy alone
1876 (4 studies)
40% (27, 54) 95% (94, 97)
LAM alone 1876 (4 38% (34, 42) 98% (93,100)(studies)
( ) ( )
LAM and microscopy*
1876 (4 studies)
59% (47, 70) 92% (73, 97)microscopy* studies)
* Either test positive
Plots of sensitivity and specificity of urine LAM for TB diagnosis stratified by CD4 count
Summary of Findings Table - 1• Question: what is the diagnostic accuracy of
LF-LAM for diagnosing TB in adults living with HIV?
• Participants: HIV-positive adults with symptoms f TBof TB
• Index test: LF-LAM • Role: a replacement test or test in combination
with sputum smear microscopy or sputum Xpert® MTB/RIFXpert® MTB/RIF
• Reference standard: microbiological (mainly mycobacterial culture)mycobacterial culture)
Summary of Findings Table - 2
• Studies: cross-sectionalS tti i ti t d t ti t• Setting: inpatient and outpatient
• Limitations: the main limitations of the review th f l lit fwere the use of a lower quality reference
standard in most included studies, and the small number of studies and participants included innumber of studies and participants included in the analyses
• Pooled sensitivity: 45% (95% CrI: 29 to 63);Pooled sensitivity: 45% (95% CrI: 29 to 63); pooled specificity: 92% (95% CrI: 80 to 97)
Summary of Findings Table - 3
P l d iti it 45% (29 63)Pooled sensitivity: 45% (29, 63)Pooled specificity: 92% (80, 97)
References and ToolsReferences and Tools• Cochrane Diagnostic Test Accuracy Working Group
http://srdta.cochrane.org/- Cochrane Handbook for Diagnostic Test Accuracy Reviews- Macaskill P. Analyzing and Presenting Results. Chapter 10. Cochrane Handbook for Diagnostic Test Accuracy ReviewsCochrane Handbook for Diagnostic Test Accuracy Reviews- http://training.cochrane.org/authors/dta-reviews
• Leeflang. Ann Intern Med. 2008;149:889-897• Whiting PF et al. QUADAS-2: a revised tool for the quality
assessment of diagnostic accuracy studies. Ann Intern Med. 2011 Oct 18;155(8):529-36.; ( )
• www.tbevidence.org• RevMan http://ims.cochrane.org/revman