QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted...

33
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDICTORS ON THE PERFORMANCE OF A PROGNOSTIC MODEL Gary Collins, Emmanuel Ogundimu, Jonathan Cook, Yannick Le Manach, Doug Altman Centre for Statistics in Medicine University of Oxford 20-July-2016 [email protected]

Transcript of QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted...

Page 1: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDICTORS ON THE PERFORMANCE OF A

PROGNOSTIC MODEL

Gary Collins, Emmanuel Ogundimu, Jonathan Cook,

Yannick Le Manach, Doug Altman

Centre for Statistics in Medicine University of Oxford

20-July-2016

[email protected]

Page 2: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Outline

Existing guidance

What’s done in practice?

Brief overview of the study sample & simulation set-up

Findings & Discussion

2

Page 3: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Basis of this presentation

3

Page 4: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Not a new idea…

4

Page 5: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

It’s all in the title…(1994-2006) 1. Problems in dichotomizing continuous variables (Altman 1994) 2. Dangers of using "optimal" cutpoints in the evaluation of prognostic

factors. (Altman et al 1994) 3. How bad is categorization? (Weinberg; 1995) 4. Seven reasons why you should NOT categorize continuous data

(Dinero; 1996) 5. Breaking Up is Hard to Do: The Heartbreak of Dichotomizing

Continuous Data (Streiner; 2002) 6. Negative consequences of dichotomizing continuous predictor

variables (Irwin & McClelland; 2003) 7. Why carve up your continuous data? (Owen 2005) 8. Chopped liver? OK. Chopped data? Not OK. Chopped liver? OK.

Chopped data? Not OK (Butts & Ng 2005) 9. Categorizing continuous variables resulted in different

predictors in a prognostic model for nonspecific neck pain (Schellingerhout et al 2006)

5

Page 6: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

It’s all in the title…(2006-2014) 10.Dichotomizing continuous predictors in multiple regression: a bad idea

(Royston et el 2006) 11. The cost of dichotomising continuous variables (Altman & Royston; 2006) 12.Leave 'em alone - why continuous variables should be analyzed as such

(van Walraven & Hart; 2008) 13.Dichotomization of continuous data--a pitfall in prognostic factor studies

(Metze; 2008) 14. Analysis by categorizing or dichotomizing continuous variables is

inadvisable: an example from the natural history of unruptured aneurysms (Naggara et al 2011)

15.Against quantiles: categorization of continuous variables in epidemiologic research, and its discontents (Bennette & Vickers; 2012)

16.Dichotomizing continuous variables in statistical analysis: a practice to avoid (Dawson & Weiss; 2012)

17. The danger of dichotomizing continuous variables: A visualization (Kuss 2013)

18. The “anathema” of arbitrary categorization of continuous predictors (Vintzileos et al; 2014)

19. Ophthalmic statistics note: the perils of dichotomising continuous variables (Cumberland et al 2014)

6

Page 7: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Prognostic factor (PF)

A B C

PF not present (low risk)

PF present (high risk)

Cut-point

Biologically implausible

“Convoluted Reasoning and Anti-intellectual Pomposity”

“C.R.A.P”

(Norman & Streiner; Biostatistics: the Bare

Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’)

Page 8: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Still, what happens in practice…? Breast cancer models (Altman 2009)

– Categorised some/all - 34/53 (64%)

Diabetes models (Collins et al 2011) – Categorised some/all 21/43 (49%)

General medical journals (Bouwmeester et al 2012)

– Categorised 30/64 (47%) – Dichotomised 21/64 (21%)

Cancer models (Mallett et al 2010)

– All categorised/dichotomised 24/47 (51%)

8

Page 9: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Aim of the study

Investigate the impact of different approaches for handling continuous predictors on the – apparent performance (same data) – validation performance (different data; geographical validation)

Investigate the influence of sample size on the approach for handling continuous predictors

9

Page 10: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Sample characteristics (THIN) 10

80,800 CVD events

4688 CVD events

565 hip fractures

7721 hip fractures

Page 11: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Models

Cox models to predict – 10-year risk of CVD (men & women) – 10-year risk of hip fracture (women only)

CVD model contained 7 predictors

– Age, sex, family history, cholesterol, SBP, BMI, hypertension

Hip fracture model contained 5 predictors – Age, BMI, Townsend score, asthma, antidepressants

11

Page 12: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Resampling strategy

MODEL DEVELOPMENT – To ensure the number of events in each sample was fixed at

25, 50, 100, and 2000 events – Sample were drawn from those with and without the event

(separately) – 200 samples randomly drawn (with replacement)

MODEL VALIDATION

– All available data were used • CVD: n=110,934 (4688 CVD events) • Hip fracture: n=61,563 (565 hip fractures)

12

Page 13: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Approaches considered Dichotomised at the

– Median predictor value – ‘optimal’ cut-point based on the logrank test

Categorised into – 3 groups (using tertile predictor values) – 4 groups (using quartile predictor values) – 5 groups (using quintile predictor values) – 5-year age categories – 10-year age categories

Linear relationship Nonlinear relationship

– fractional polynomials (FP2; 4 degrees of freedom per predictor) – restricted cubic splines (3 knots)

13

Page 14: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Performance measures calculated Calibration

– Calibration plot – Harrell’s “val.surv” function; hazard regression with linear

splines

Discrimination – Harrell’s c-index

Clinical utility – Decision curve analysis (Vickers & Elkin 2006) – Net benefit;

• weighted difference between true positives and false positives

D-statistic; Brier Score; R-squared also examined – Not reported here - but in the supplementary material of

Collins et al Stat Med 2016.

14

Page 15: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Net benefit (recap)

pt is the probability threshold to denote ‘high risk’ – Used to weight the FP and FN results

TP and FP calculated using Kaplan-Meier estimates of the percentage surviving at 10 years among those with predicted risks greater than pt

Bottom line: model with highest NB ‘wins’

15

Page 16: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Age & CVD

16

Page 17: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Total serum cholesterol & CVD 17

Page 18: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Age, cholesterol, BMI, SBP & CVD 18

Page 19: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Age, BMI & Hip fracture

19

Page 20: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

RESULTS: CVD 25 events 20

Page 21: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

RESULTS: CVD 50 events 21

Page 22: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

RESULTS: CVD 100 events 22

Page 23: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

RESULTS: CVD 2000 events 23

Page 24: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

RESULTS: Hip fracture 25 events 24

Page 25: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

RESULTS: Hip fracture 50 events 25

Page 26: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

RESULTS: Hip fracture 100 events 26

Page 27: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

RESULTS: Hip fracture 2000 events 27

Page 28: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

RESULTS: Discrimination CVD

At small sample sizes (25 events) – Large difference in between apparent performance and

validation performance for ‘optimal’ dichotomisation • 0.84 (apparent); 0.72 (validation)

– Smaller differences observed for FP/RCS/Linear • 0.84 (apparent); 0.78 (validation)

Observed difference between dichotomisation (at the median) and linear/FP/RCS – Apparent performance: difference of 0.05 – Validation performance: difference of 0.05 – Observed over all 4 sample sizes examined

Negligible differences between linear/FP/RCS

28

Page 29: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

RESULTS: Discrimination Hip Fracture

At small sample sizes (25 events) – Large difference in between apparent performance and

validation performance for ‘optimal’ dichotomisation • 0.86 (apparent); 0.76 (validation)

– FP/RCS/Linear • 0.90 (apparent); 0.87 (validation)

Observed difference between dichotomisation (at the median) and linear/FP/RCS – Apparent performance: difference of 0.1 – Validation performance: difference of 0.1 – Observed over all 4 sample sizes examined

Negligible differences between linear/FP/RCS

29

Page 30: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

RESULTS: Discrimination Hip Fracture 30

Page 31: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

RESULTS: Decision Curve Analysis (CVD only) [higher NB better model]

31

FP/RCS

dichotomisation

Page 32: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

RESULTS: Net cases found per 1000 32

Page 33: QUANTIFYING THE IMPACT OF DIFFERENT ......Biostatistics: the Bare Essentials, 2008) Slide adapted from Michael Babyak (‘Modeling with Observational Data’) Still, what happens in

Conclusions Systematic reviews show dichotomising /

categorising continuous predictors routinely done when developing a prediction model

Dichotomising, either at the median or ‘optimal’ predictor value leads to models with substantially poorer performance – Poor discrimination; poor calibration; poor clinical utility

Large discrepancies between apparent performance and validation performance observed for ‘optimal’ split dichotomising

The impact of dichotomising continuous predictors are handled is more pronounced at smaller sample sizes

33