QC test

15
Quality control arrives in echo lab.... November 2013 www.EchoQC.com

Transcript of QC test

Page 1: QC test

Quality control arrives in echo lab....

November 2013www.EchoQC.com

Page 2: QC test

Metrics for image analysis – validity and reliability

Validity (accuracy)

External gold standard.

Reliability

Focus on precision (i.e. control of error) ... influenced by random error and bias.

But also influenced by true variance .. the tool has to be sufficiently sensitive to pick up small differences.

For much of what we do, reliability may be more important than accuracy (a 5% variance in EF between readers or reads may be a greater problem than if EF consistently underestimates by 5% relative to CMR).

Page 3: QC test

The goal ...and defining quality

One test (the right test)In the right patientDone and interpreted the right wayWith the right impact on patient management

Douglas P. Achieving Quality in CV Imaging II. JACC-Imaging 2009

Page 4: QC test

Quality control arrives in echo lab....

Page 5: QC test

Why QC of interpretation

Much of echocardiography interpretation is based upon complex multiparametric assessments.

No hierarchy of individual parameters is provided by the guidelines. Consequently, patients with discordant findings may be readily classified differently by different observers.

Prominent examples of these multiparametric evaluations include; assessment of regurgitant heart valves. assessment of diastolic dysfunction. inter- and intra-observer variability in measuring EF (the 95% confidence intervals of EF from repeat 2D echos is >10%).

This section summarizes four different studies highlighting audit in observer variability and how this might be rectified.

Page 6: QC test

Readily attainableReadily attainable TargetsTargets

Inter reader variability

Intra reader

variability

Concordance sessions

Serial testing for variability

Study review process

(1% studies)

Accuracy

Concordance with:

Catheterization

Cardiac MRIInternet based

Standard image set

Management of paranoia

IT challenges

To date, we have completed studies on concordance in evaluation of EF, RV function, aortic regurgitation, LV diastolic function.

In progress: Aortic stenosis, Tricuspid regurgitation, Pericardial effusion.

We highly recommend that you adopt this process in your lab. Remember that statistical power is dependent on the number of readers.

Unfortunately, for young investigators, our advice is that this may not be a good research topic - this work has been difficult to publish.

Page 7: QC test

Complete diastolic evaluation (transmitral flow,left atrium volume, TDI, pulmonary venous flow, mitral flow propagation, LV images) was obtained in 20 consecutive patients and interpreted by 14 experts in 8 countries (280 case reads). Each investigator was asked to interpret diastolic class and LV filling pressure. BNP was drawn on the same day to corroborate filling pressures.

This is a great example of the impact of lack of a hierarchy on how experts come to different conclusions from the same data.

Page 8: QC test

Recognition of raised filling pressure: complete agreement between all readers was obtained in 10/20, sensitivity 83%, specificity 88%, kappa 0.71 (range 0.60-0.80).

Diagnosis of diastolic stages: correct in 71-95%, lowest values obtained for normal and pseudonormal filling. Kappa for diastolic class was 0.68 (range 0.54-0.86).

Variations appeared to be attributable to differences in weighting of conflicting observations.

Page 9: QC test

Moderate Severe

This is a typical case in which incongruent parameters lead to inter-observer variability;

Jet width/LV outflow tract ratio and pressure halftime (PHT) indicate moderate AR. The dilated left ventricle and holo-diastolic flow reversal indicate severe AR. We sought to create a consensus document to improve accuracy and concordance.

Page 10: QC test

17 level 3 readers graded 20 randomly selected patients with AR. At baseline, no uniform approach was used to combine the parameters, contributing to inter observer variability (kappa 0.5).

A consensus strategy was formulated and validated against CMR in a separate group of 80 patients. Readers were recalibrated (same cases).

A consensus strategy to categorize AR severity was developed in which the LV volume took precedence over the other parameters and was used to differentiate chronic severe AR from less severe categories.

Consensus schema for hierarchical grouping of key echocardiographic parameters in chronic AR. The key parameters are divided into 2hierarchical groups—diagnostic parameter (LV size) and specific parameters (indexes of regurgitant volume). It should be noted that the LV size/volume criteria might not be valid for acute AR or LV dilation from other causes.

Diagnostic Parameter

LV Size Volume/Index

Vena contracta width

Holo-diastolic flow reversal

Jet width to LVOT ratio

Specific Parameters DD

Dilated LV with oneor more specific parameter in

severe range

DD

DD

Normal LV size with oneor more specific parameter in

moderate range

Normal LV size with oneor more specific parameter in

severe range

Chronic severe AR

Chronic moderate AR

Chronic mild AR

Page 11: QC test

Recalibration with consensus strategy improved concordance (kappa to 0.7).

The new strategy also improved the accuracy relative to CMR, evidenced by full agreement on severe AR between the consensus document-based grading and AR severity defined by CMR in the separate validation group of 80 patients.

Overall agreement

Agreement on mild AR

Agreement on moderate AR

Agreement on severe AR

Page 12: QC test

31 readers provided EF estimates for 30 echos with a spectrum of EF, image quality, and contexts in patients undergoing CMR within 48 hours.

Participants received their own case-by-case variance from CMR EF, and the 10 cases with the largest reader variability were discussed along with corresponding CMR images. Self-directed learning was undertaken by side-by-side review of echo and CMR images.

Two months later, 20 new cases were shown to the same 31 readers, using the same methodology.

Page 13: QC test

Baseline interobserver variability of ±0.120 improved to ±0.097

EF misclassification (defined as ±0.05 of CMR EF) was reduced from 56% to 47% (p<.001)

Decrease in the absolute difference between CMR and echo for all cases and all readers (from 0.07±0.01 to 0.06±0.01, p=.0001), most prominent for the readers with lower baseline accuracy.

A combined physician-sonographer EF estimate improved the precision of EF determination by 25% compared with individual reads.

CMR EF (%) CMR EF (%)

Average of CMR and echo visual EF (%) Average of CMR and echo

visual EF (%)

Bias 2.4%LAO=12.5%

Bias 1.7%LAO=9%

Echo visual

EFpre

inrevention

%

CMR-Echo visual

EFpre

inrevention

%

CMR-Echo visual

EFpre

inrevention

%

CMR-Echo visual

EFpre

inrevention

%

Echo visual

EFpre

inrevention

%

Page 14: QC test

In this study, 15 readers evaluated RV function in 12 pts (360 readings), in order to define the

accuracy and interobserver concordance of qualitative and quantitative RV echo vs CMR.

Accurate echocardiographic assessment of RV size and systolic function is challenging.

The use of quantitative measurements increased accuracy and inter-reader agreement compared to qualitative assessment alone, with the mean kappa increasing from 0.28 to 0.55 across all levels of grading, and intra-class correlation coefficient increasing from 0.63 to 0.76. 

Findings support the use of quantitation to improve accuracy and reliability, especially in distinction of normal and abnormal.

Page 15: QC test

P<.0001p=.38

P<.0001

p=.48

p=<.0001

p=.69

p=.51

p=.02

p=<.0001

p=.08 p=.81