statistik regresi logistik
-
Upload
ismail-andi-baso -
Category
Documents
-
view
231 -
download
0
Transcript of statistik regresi logistik
-
8/7/2019 statistik regresi logistik
1/36
Introduction to
Logistic Regression
Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
-
8/7/2019 statistik regresi logistik
2/36
Content
Simple and multiple linear regression
Simple logistic regression
The logistic function
Estimation of parameters
Interpretation of coefficients
Multiple logistic regression
Interpretation of coefficients
Coding of variables
Examples in Epiinfo 2002
-
8/7/2019 statistik regresi logistik
3/36
Simple linear regression
Age SBP Age SBP Age SBP
22 131 41 139 52 128
23 128 41 171 54 105
24 116 46 137 56 145
27 106 47 111 57 141
28 114 48 115 58 15329 123 49 133 59 157
30 117 49 128 63 155
32 122 50 183 67 176
33 99 51 130 71 172
35 121 51 133 77 178
40 147 51 144 81 217
Table 1 Age and systolic blood pressure (SBP) among 33 adult women
-
8/7/2019 statistik regresi logistik
4/36
80
100
120
140
160
180
200
220
20 30 40 50 60 70 80 90
SBP (mm Hg)
Age (years)
adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974
-
8/7/2019 statistik regresi logistik
5/36
Simple linear regression
y
x
xy 11+=Slope
-
8/7/2019 statistik regresi logistik
6/36
Multiple linear regression
Relation between a continuous variable and a set ofi continuous variables
Partial regression coefficients i Amount by which y changes on average when xi changes by one
unit and all the other xis remain constant
Measures association between xi and y adjusted for all other xi
Example SBP versus age, weight, height, etc
x...xxy ii2211 ++++=
-
8/7/2019 statistik regresi logistik
7/36
Multiple linear regression
Predicted Predictor variables
Response variable Explanatory variables
Outcome variable Covariables
Dependent Independent variables
x...xxy ii2211 ++++=
-
8/7/2019 statistik regresi logistik
8/36
General linear models
Family of regression models
Outcome variable determines choice of model
Uses
Control of confounding
Model building, risk prediction
Outcome Model
Continuous Linear regression
Counts Poisson regression
Survival Cox model
Binomial Logistic regression
-
8/7/2019 statistik regresi logistik
9/36
Logistic regression
Models relationship between set of variables xi dichotomous (yes/no)
categorical (social class, ... )
continuous (age, ...)
and
dichotomous (binary) variable Y
Dichotomous outcome most common situation inbiology and epidemiology
-
8/7/2019 statistik regresi logistik
10/36
Logistic regression (1)
Table 2 Age and signs of coronary heart disease (CD)
-
8/7/2019 statistik regresi logistik
11/36
How can we analyse these data?
Compare mean age of diseased and non-diseased
Non-diseased: 38.6 years
Diseased: 58.7 years (p
-
8/7/2019 statistik regresi logistik
12/36
Dot-plot: Data from Table 2
-
8/7/2019 statistik regresi logistik
13/36
Logistic regression (2)
Table 3 Prevalence (%) of signs of CD according to age group
-
8/7/2019 statistik regresi logistik
14/36
Dot-plot: Data from Table 3
0
20
40
60
80
100
0 1 2 3 4 5 6 7
Diseased %
Age group
-
8/7/2019 statistik regresi logistik
15/36
Logistic function (1)
0.0
0.2
0.4
0.6
0.8
1.0Probability ofdisease
x
-
8/7/2019 statistik regresi logistik
16/36
Logistic transformation
logit ofP(y|x)
{
-
8/7/2019 statistik regresi logistik
17/36
Advantages of Logit
Properties of a linear regression model
Logit between - and + Probability (P) constrained between 0 and 1
Directly related to notion of odds of disease
xP-1
Pln +=
e
P-1
P x+=
-
8/7/2019 statistik regresi logistik
18/36
Interpretation of coefficient
eP-1
P x+=
-
8/7/2019 statistik regresi logistik
19/36
-
8/7/2019 statistik regresi logistik
20/36
Example
Risk of developing coronary heart disease (CD)by age (
-
8/7/2019 statistik regresi logistik
21/36
Logistic Regression Model
Age2.0940.841-AgeP-1
Pln 1 +=+=
-
8/7/2019 statistik regresi logistik
22/36
Fitting equation to the data
Linear regression: Least squares
Logistic regression: Maximum likelihood
Likelihood function
Estimates parameters and with property that likelihood(probability) of observed data is higher than for any other values
Practically easier to work with log-likelihood
[ ] [ ] [ ]{ }=
+==n
i
iiii xyxylL
1
)(1ln)1()(ln)(ln)(
-
8/7/2019 statistik regresi logistik
23/36
Maximum likelihood
Iterative computing Choice of an arbitrary value for the coefficients (usually 0)
Computing of log-likelihood
Variation of coefficients values
Reiteration until maximisation (plateau)
Results
Maximum Likelihood Estimates (MLE) for and Estimates of P(y) for a given value of x
-
8/7/2019 statistik regresi logistik
24/36
-
8/7/2019 statistik regresi logistik
25/36
Effect modification
2132211 xxxx
P-1
Pln +++=
-
8/7/2019 statistik regresi logistik
26/36
Statistical testing
Question Does model including given independent variable
provide more information about dependent variable thanmodel without this variable?
Three tests
Likelihood ratio statistic (LRS)
Wald test
Score test
-
8/7/2019 statistik regresi logistik
27/36
Likelihood ratio statistic
Compares two nested modelsLog(odds) = + 1x1 + 2x2 + 3x3 + 4x4 (model 1)Log(odds) = + 1x1 + 2x2 (model 2)
LR statistic
-2 log (likelihood model 2 / likelihood model 1) =
-2 log (likelihood model 2) minus -2log (likelihood model 1)
LR statistic is a 2 with DF = number of extra parametersin model
-
8/7/2019 statistik regresi logistik
28/36
Example
0.2664)(SE0.2614)(SESmk0.7005Exc1.00470.7102
SmkExcP-1
Pln 21
++=
++=
P Probability for cardiac arrest
Exc 1= lack of exercise, 0 = exerciseSmk 1= smokers, 0= non-smokers
adapted from Kerr, Handbook of Public Health Methods, McGraw-Hill, 1998
-
8/7/2019 statistik regresi logistik
29/36
Interaction between smoking and exercise?
Product term 3 = -0.4604 (SE 0.5332)Wald test = 0.75 (1df)
-2log(L) = 342.092 with interaction term
= 342 .836 without interacti on term
LR statistic = 0.74 (1df), p = 0.39No evidence of any interaction
ExcSmkSmkExcP-1
Pln 321 +++=
-
8/7/2019 statistik regresi logistik
30/36
Coding of variables (1)
Dichotomous variables: yes = 1, no = 0 Continuous variables
Increase in OR for a one unit change in exposurevariable
Logistic model is multiplicative OR increases exponentially with x
If OR = 2 for a one unit change in exposure and x increasesfrom 2 to 5: OR = 2 x 2 x 2 = 23 = 8
Verify that OR increases exponentially with x.
When in doubt, treat as qualitative variable
-
8/7/2019 statistik regresi logistik
31/36
Continuous variable?
Relationship between SBP>160 mmHg and body weight
Introduce BW as continuous variable?
Code weight as single variable, eg. 3 equal classes:40-60 kg = 0, 60-80 kg = 1, 80-100 kg = 2
Compatible with assumption of multiplicative model
If not compatible, use indicator variables
-
8/7/2019 statistik regresi logistik
32/36
-
8/7/2019 statistik regresi logistik
33/36
Indicator variables: Type oftobacco
Neutralises artificial hierarchy between classes in thevariable "type of tobacco"
No assumptions made
3 variables (3 df) in model using same reference
OR for each type of tobacco adjusted for the others inreference to non-smoking
-
8/7/2019 statistik regresi logistik
34/36
i k f d h f b i l
-
8/7/2019 statistik regresi logistik
35/36
Risk of death from bacterialmeningitis according to treatment
161 observations Death (yes, no)
Treatment
1=Chloramphenicol, 2=Ampicillin
Delay before treatment (onset, in days)
Convulsions (1,0)
Level of consciousness (1-3)
Severity of dehydration (1-3)
Age in years
Pathogen
1 Others, 2 HiB, 3 Streptococcus pneumoniae
-
8/7/2019 statistik regresi logistik
36/36
Reference
Hosmer DW, Lemeshow S. Applied logisticregression. Wiley & Sons, New York, 1989