Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based...
-
Upload
conrad-green -
Category
Documents
-
view
216 -
download
1
Transcript of Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based...
![Page 1: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/1.jpg)
Modelling continuous variables with a spike at zero – on issues of a
fractional polynomial based procedure
Willi Sauerbrei
Institut of Medical Biometry and Informatics
University Medical Center Freiburg, Germany
Patrick Royston
MRC Clinical Trials Unit,
London, UK
![Page 2: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/2.jpg)
2
• Problem: A variable X has value 0 for a proportion of individuals “spike at zero”), and a quantitative value for the others • Examples: cigarette consumption, occupational exposure.
• How to model this?
• Setting here: case-control study
1. Motivation
![Page 3: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/3.jpg)
3
1. Motivation
Example : Distribution of smoking in a lung cancer case-control study______________________________________________________
Controls Cases n % n % No cigarettes/day
0 (Non-smokers) 289 21.5 16 2.71-9 78 810-19 247 7320-29 459 78.5 273 97.3 30-39 184 12340+ 86 107 .
100.0 100.0
![Page 4: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/4.jpg)
4
Ad hoc solution:
appropriate?
Adding binary variable smoker yes/no
![Page 5: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/5.jpg)
5
2. Theoretical results
The odds ratio
can be expressed as
where f1 and f0 are the probability density functions of X in cases and controls, respectively
Simplest case:
X is normal distributed with expectations μi with i=0 (1) for controls (cases) and equal variance 2.
We get ORX=x vs X=x0 = exp (β(x-x0)) with .
*)|0(
)|0(
)|1(
*)|1( 0
0 xXDP
xXDP
xXDP
xXDPOR
)()(
)()(*
001
00*
1
xfxf
xfxf
01
![Page 6: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/6.jpg)
6
Next case (spike at zero):
.
0
0
)()1()()(
,11
11 X
Xif
xp
pxfxfcases
and
0
0
)()1()()(
,00
00 X
Xif
xp
pxfxfcontrols
where ,0
is the probability density function of a
normal distributed variable with mean 0 in controls and 1 in cases and equal variance 2 .
2. Theoretical results
![Page 7: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/7.jpg)
7
If 0,0,0, *
010 xxpp we get
*012
21
20
0
1
1
0*
01
0*
10 2)1(
)1(lnexp
)()0(
)0()(* x
p
p
p
p
xff
fxfOR
XvsxX
= exp(0 + 1x*) Thus, the correct model requires X untransformed and a binary variable as indicator for X>0.
2. Theoretical results
![Page 8: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/8.jpg)
8
• So we have theoretically shown that the above situation requires the binary indicator for the correct model.
• Some other distributions also have simple solutions • In reality, we rarely have simple distributions
procedures are more complicated
New proposal:
Extension of fractional polynomial procedure
2. Theoretical results
![Page 9: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/9.jpg)
9
3. Fractional polynomial models
Standard procedure (FP degree 2, FP2 for one covariate X)
• Fractional polynomial of degree 2 for X with powers p1, p2 is given byFP2(X) = 1 X p1 + 2 X p2
• Powers p1, p2 are taken from a special set {2, 1, 0.5, 0, 0.5, 1, 2, 3} (0 = log )
• Repeated powers (p1=p2)
1 X p1 + 2 X p1log X• 36 FP2 models• 8 FP1 models
• Linear pre-transformation of X such that values are positive
![Page 10: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/10.jpg)
10
3. Fractional polynomial models
Standard procedure for one variable:
Test best FP2 against
1. Null model – not significant no effect
2. Straight line – not significant X linear
3. Best FP1– Not significant FP1
– significant FP2
![Page 11: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/11.jpg)
11
3. Fractional polynomial models
Extended procedure for variable with spike at zero
1. Generate binary indicator for exposure
2. Fit the most complex model (binary indicator z + 2nd degree FP)
3. If significant, follow same FP function selection procedure WITH z included (first stage)
4. Test both z and the remaining FP (resp the linear component) for removal(second stage)
![Page 12: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/12.jpg)
12
4. Examples 4.1 Cigarette consumption and lung cancer
Case-control study, 600 cases, 1343 controls.
X – average number of cigarettes smoked per day
FP2 Model with added binary variable:
)()()(x)X|1P(Ylogit 032211 xIxfxf X
![Page 13: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/13.jpg)
13
4. Examples4.1 Cigarette consumption and lung cancer
Model Deviance diff. d.f. P Power
First stage
Null 2402.1 225.7 5 <0.001 -
Linear + z 2195.4 19.0 3 <0.001 1
FP1+ + z 2177.0 0.6 2 0.76 -0.5
FP2+ + z 2176.4 - - - -2, -1
Second stage
FP1+ + z 2177.0 - 3 -0.5
FP1+ [dropping z] 2384.9 208.0 1 <0.001 -0.5
z [dropping FP1] 2259.4 82.4 2 <0.001 - Standard FP analysis (as alternative)
2176.8 -1, -1
![Page 14: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/14.jpg)
14
4. Examples4.1 Cigarette consumption and lung cancer
Result:• First step: selects FP1 transformation• Second step: Both the binary and the FP1 term are required
• FP2 without binary term gives similar result
![Page 15: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/15.jpg)
15
05
1015
2025
Odd
s ra
tio, s
mok
er v
s no
n-sm
oker
0 20 40 60 80Cigarette consumption
FP1-spike FP2
4. Examples4.1 Cigarette consumption and lung cancer
![Page 16: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/16.jpg)
16
4. Examples 4.2 Gleason Score and prostate cancer (predictors of PSA level)
Model Deviance Dev. diff. d.f. P Power
First stage
Null 302.1 29.8 5 0.001
Linear + z 273.7 1.4 3 0.73 1
FP1+ + z 272.7 0.4 2 0.84 0.5
FP2+ + z 272.3 1, 3
Second stage
Linear + z 273.7 2
Linear [dropping z] 282.7 9.0 1 0.003
z [dropping linear] 276.7 2.5 1 0.1
![Page 17: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/17.jpg)
17
4. Examples 4.2 Gleason Score and prostate cancer
Result:
The selected model from first stage is Linear + z
Dropping the linear does not worsen the fit
Dropping the binary is highly significant
The selected model only comprises the binary variable
![Page 18: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/18.jpg)
18
4. Examples 4.3 Alcohol consumption and breast cancer
(case-control study, 706 cases, 1381 controls) Model Deviance diff d.f. P Power
First stage
Null 2670.9 35.5 5 0.000 -
Linear + z 2644.1 8.7 3 0.033 1
FP1+ + z 2642.5 7.1 2 0.028 2
FP2+ + z 2635.4 - - - -0.5, 0.5
Second stage
FP2+ + z 2635.4 - 5 -0.5, 0.5
FP2+ [dropping z] 2661.31 24.9 1 0.000 -0.5, 0.5
z [dropping FP2] 2665.17 29.8 4 0.000 -
Standard FP analysis (as alternative)
2636.2 0, 0.5
![Page 19: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/19.jpg)
19
Result:• First step: FP2 is best transformation• Second step: Dropping of FP2 or binary variable worsens fit
FP2+ + z is best model
• Standard FP (other powers!) has similar fit
4. Examples 4.3 Alcohol consumption and breast cancer
![Page 20: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/20.jpg)
20
4. Examples 4.3 Alcohol consumption and breast cancer
![Page 21: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/21.jpg)
21
5. Summary
• Procedure to add binary indicator supported by theoretical results
• Subject matter knowledge (SMK) is an important criteria to decide whether inclusion of indicator is required
• SMK: indicator required – procedure useful to determine dose-response part
• SMK: indicator not required – nevertheless, indicator may improve model fit
• Suggested 2-step FP procedure with adding binary indicator appears to be a useful in practical applications
![Page 22: Modelling continuous variables with a spike at zero – on issues of a fractional polynomial based procedure Willi Sauerbrei Institut of Medical Biometry.](https://reader034.fdocuments.us/reader034/viewer/2022050714/56649d0a5503460f949dc419/html5/thumbnails/22.jpg)
22
References
• Becher, H. (2005). General principles of data analysis: continuous covariables in epidemiological studies, in W. Ahrens and I. Pigeot (eds), Handbook of Epidemiology, Springer, Berlin, pp. 595–624.
• Robertson, C., Boyle, P., Hsieh, C.-C., Macfarlane, G. J. and Maisonneuve, P. (1994). Some statistical considerations in the analysis of case-control studies when the exposure variables are continuous measurements, Epidemiology 5: 164–170.
• Royston P, Sauerbrei W (2008) Multivariable model-building - a pragmatic approach to regression analysis based on fractional polynomials for modelling continuous variables. Wiley.