statistik regresi logistik

download statistik regresi logistik

of 36

Transcript of statistik regresi logistik

  • 8/7/2019 statistik regresi logistik

    1/36

    Introduction to

    Logistic Regression

    Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein

  • 8/7/2019 statistik regresi logistik

    2/36

    Content

    Simple and multiple linear regression

    Simple logistic regression

    The logistic function

    Estimation of parameters

    Interpretation of coefficients

    Multiple logistic regression

    Interpretation of coefficients

    Coding of variables

    Examples in Epiinfo 2002

  • 8/7/2019 statistik regresi logistik

    3/36

    Simple linear regression

    Age SBP Age SBP Age SBP

    22 131 41 139 52 128

    23 128 41 171 54 105

    24 116 46 137 56 145

    27 106 47 111 57 141

    28 114 48 115 58 15329 123 49 133 59 157

    30 117 49 128 63 155

    32 122 50 183 67 176

    33 99 51 130 71 172

    35 121 51 133 77 178

    40 147 51 144 81 217

    Table 1 Age and systolic blood pressure (SBP) among 33 adult women

  • 8/7/2019 statistik regresi logistik

    4/36

    80

    100

    120

    140

    160

    180

    200

    220

    20 30 40 50 60 70 80 90

    SBP (mm Hg)

    Age (years)

    adapted from Colton T. Statistics in Medicine. Boston: Little Brown, 1974

  • 8/7/2019 statistik regresi logistik

    5/36

    Simple linear regression

    y

    x

    xy 11+=Slope

  • 8/7/2019 statistik regresi logistik

    6/36

    Multiple linear regression

    Relation between a continuous variable and a set ofi continuous variables

    Partial regression coefficients i Amount by which y changes on average when xi changes by one

    unit and all the other xis remain constant

    Measures association between xi and y adjusted for all other xi

    Example SBP versus age, weight, height, etc

    x...xxy ii2211 ++++=

  • 8/7/2019 statistik regresi logistik

    7/36

    Multiple linear regression

    Predicted Predictor variables

    Response variable Explanatory variables

    Outcome variable Covariables

    Dependent Independent variables

    x...xxy ii2211 ++++=

  • 8/7/2019 statistik regresi logistik

    8/36

    General linear models

    Family of regression models

    Outcome variable determines choice of model

    Uses

    Control of confounding

    Model building, risk prediction

    Outcome Model

    Continuous Linear regression

    Counts Poisson regression

    Survival Cox model

    Binomial Logistic regression

  • 8/7/2019 statistik regresi logistik

    9/36

    Logistic regression

    Models relationship between set of variables xi dichotomous (yes/no)

    categorical (social class, ... )

    continuous (age, ...)

    and

    dichotomous (binary) variable Y

    Dichotomous outcome most common situation inbiology and epidemiology

  • 8/7/2019 statistik regresi logistik

    10/36

    Logistic regression (1)

    Table 2 Age and signs of coronary heart disease (CD)

  • 8/7/2019 statistik regresi logistik

    11/36

    How can we analyse these data?

    Compare mean age of diseased and non-diseased

    Non-diseased: 38.6 years

    Diseased: 58.7 years (p

  • 8/7/2019 statistik regresi logistik

    12/36

    Dot-plot: Data from Table 2

  • 8/7/2019 statistik regresi logistik

    13/36

    Logistic regression (2)

    Table 3 Prevalence (%) of signs of CD according to age group

  • 8/7/2019 statistik regresi logistik

    14/36

    Dot-plot: Data from Table 3

    0

    20

    40

    60

    80

    100

    0 1 2 3 4 5 6 7

    Diseased %

    Age group

  • 8/7/2019 statistik regresi logistik

    15/36

    Logistic function (1)

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0Probability ofdisease

    x

  • 8/7/2019 statistik regresi logistik

    16/36

    Logistic transformation

    logit ofP(y|x)

    {

  • 8/7/2019 statistik regresi logistik

    17/36

    Advantages of Logit

    Properties of a linear regression model

    Logit between - and + Probability (P) constrained between 0 and 1

    Directly related to notion of odds of disease

    xP-1

    Pln +=

    e

    P-1

    P x+=

  • 8/7/2019 statistik regresi logistik

    18/36

    Interpretation of coefficient

    eP-1

    P x+=

  • 8/7/2019 statistik regresi logistik

    19/36

  • 8/7/2019 statistik regresi logistik

    20/36

    Example

    Risk of developing coronary heart disease (CD)by age (

  • 8/7/2019 statistik regresi logistik

    21/36

    Logistic Regression Model

    Age2.0940.841-AgeP-1

    Pln 1 +=+=

  • 8/7/2019 statistik regresi logistik

    22/36

    Fitting equation to the data

    Linear regression: Least squares

    Logistic regression: Maximum likelihood

    Likelihood function

    Estimates parameters and with property that likelihood(probability) of observed data is higher than for any other values

    Practically easier to work with log-likelihood

    [ ] [ ] [ ]{ }=

    +==n

    i

    iiii xyxylL

    1

    )(1ln)1()(ln)(ln)(

  • 8/7/2019 statistik regresi logistik

    23/36

    Maximum likelihood

    Iterative computing Choice of an arbitrary value for the coefficients (usually 0)

    Computing of log-likelihood

    Variation of coefficients values

    Reiteration until maximisation (plateau)

    Results

    Maximum Likelihood Estimates (MLE) for and Estimates of P(y) for a given value of x

  • 8/7/2019 statistik regresi logistik

    24/36

  • 8/7/2019 statistik regresi logistik

    25/36

    Effect modification

    2132211 xxxx

    P-1

    Pln +++=

  • 8/7/2019 statistik regresi logistik

    26/36

    Statistical testing

    Question Does model including given independent variable

    provide more information about dependent variable thanmodel without this variable?

    Three tests

    Likelihood ratio statistic (LRS)

    Wald test

    Score test

  • 8/7/2019 statistik regresi logistik

    27/36

    Likelihood ratio statistic

    Compares two nested modelsLog(odds) = + 1x1 + 2x2 + 3x3 + 4x4 (model 1)Log(odds) = + 1x1 + 2x2 (model 2)

    LR statistic

    -2 log (likelihood model 2 / likelihood model 1) =

    -2 log (likelihood model 2) minus -2log (likelihood model 1)

    LR statistic is a 2 with DF = number of extra parametersin model

  • 8/7/2019 statistik regresi logistik

    28/36

    Example

    0.2664)(SE0.2614)(SESmk0.7005Exc1.00470.7102

    SmkExcP-1

    Pln 21

    ++=

    ++=

    P Probability for cardiac arrest

    Exc 1= lack of exercise, 0 = exerciseSmk 1= smokers, 0= non-smokers

    adapted from Kerr, Handbook of Public Health Methods, McGraw-Hill, 1998

  • 8/7/2019 statistik regresi logistik

    29/36

    Interaction between smoking and exercise?

    Product term 3 = -0.4604 (SE 0.5332)Wald test = 0.75 (1df)

    -2log(L) = 342.092 with interaction term

    = 342 .836 without interacti on term

    LR statistic = 0.74 (1df), p = 0.39No evidence of any interaction

    ExcSmkSmkExcP-1

    Pln 321 +++=

  • 8/7/2019 statistik regresi logistik

    30/36

    Coding of variables (1)

    Dichotomous variables: yes = 1, no = 0 Continuous variables

    Increase in OR for a one unit change in exposurevariable

    Logistic model is multiplicative OR increases exponentially with x

    If OR = 2 for a one unit change in exposure and x increasesfrom 2 to 5: OR = 2 x 2 x 2 = 23 = 8

    Verify that OR increases exponentially with x.

    When in doubt, treat as qualitative variable

  • 8/7/2019 statistik regresi logistik

    31/36

    Continuous variable?

    Relationship between SBP>160 mmHg and body weight

    Introduce BW as continuous variable?

    Code weight as single variable, eg. 3 equal classes:40-60 kg = 0, 60-80 kg = 1, 80-100 kg = 2

    Compatible with assumption of multiplicative model

    If not compatible, use indicator variables

  • 8/7/2019 statistik regresi logistik

    32/36

  • 8/7/2019 statistik regresi logistik

    33/36

    Indicator variables: Type oftobacco

    Neutralises artificial hierarchy between classes in thevariable "type of tobacco"

    No assumptions made

    3 variables (3 df) in model using same reference

    OR for each type of tobacco adjusted for the others inreference to non-smoking

  • 8/7/2019 statistik regresi logistik

    34/36

    i k f d h f b i l

  • 8/7/2019 statistik regresi logistik

    35/36

    Risk of death from bacterialmeningitis according to treatment

    161 observations Death (yes, no)

    Treatment

    1=Chloramphenicol, 2=Ampicillin

    Delay before treatment (onset, in days)

    Convulsions (1,0)

    Level of consciousness (1-3)

    Severity of dehydration (1-3)

    Age in years

    Pathogen

    1 Others, 2 HiB, 3 Streptococcus pneumoniae

  • 8/7/2019 statistik regresi logistik

    36/36

    Reference

    Hosmer DW, Lemeshow S. Applied logisticregression. Wiley & Sons, New York, 1989