HSRP 734: Advanced Statistical Methods June 19, 2008

Post on 19-Jan-2016

36 views 0 download

Tags:

description

HSRP 734: Advanced Statistical Methods June 19, 2008. Extensions of Logistic Regression. Outcomes with more than 2 categories Categories have order Unordered Conditional logistic regression Analysis of matched data. Extensions of Logistic Regression. Exact methods for small samples - PowerPoint PPT Presentation

Transcript of HSRP 734: Advanced Statistical Methods June 19, 2008

HSRP 734: Advanced Statistical Methods

June 19, 2008

Extensions of Logistic Regression

• Outcomes with more than 2 categories– Categories have order

– Unordered

• Conditional logistic regression– Analysis of matched data

Extensions of Logistic Regression

• Exact methods for small samples– Fisher’s exact

– Exact logistic regression

• Correlated/Clustered data– GEE method

– Mixed models

Extensions of Logistic Regression

• Outcomes with more than 2 categories

(polytomous or polychotomous)

• Cumulative logit model – Proportional odds model for ordinal outcomes (ordered categories)

• Generalized logit model for nominal outcomes or non-proportional odds models (unordered categories)

Extensions of Logistic Regression

• Cumulative logit model

– Fits a logistic regression model with g-1 intercepts for a g category outcome and one model coefficient for each predictor

– Models cumulative probability of being in a “lower” category

Ordinal Logistic Regression

• Odds ratios take on interpretation “% increase/decrease in the odds of being in a lower/higher category”

• Subject to the “Proportional Odds” assumption

Extensions of Logistic Regression

• Generalized logit model

– Fits a logistic regression model with g-1 intercepts and g-1 model coefficients for a g category outcome

– Model captures the multinomial probability of being in a particular category using generalized logits

Nominal Logistic Regression

• Odds ratios have regular interpretation, just have to be careful with which comparisons are being made (reference category)

• Does not assume “Proportional Odds”

SAS

Conditional logistic regression

• Can use for matched data (e.g., case-control studies)

• Provides unbiased estimates of odds ratios and CI’s

SAS

Extensions to Logistic Regression

• Exact Logistic Regression

• Small Sample Size

• Adequate sample size but rare event (sparse data)

Fisher’s exact test

• Exact test for RxC table where Chi-square test assumptions are doubtful

• Why not always use Fisher’s exact test and Exact logistic regression?

SAS

Extensions of Logistic Regression

• Longitudinal data / repeated measures data / Clustered data with binary outcomes

• Multilevel models (nested data structures)

GEE (Generalized Estimating Equations)GLMM (Generalized Linear Mixed Models)

Two methods for handling clustered outcomes

• Mixed models– Likelihood based– Use random effects to model clustered observations– continuous outcome (but now extended for categorical)

• Generalized Estimating Equation (GEE)– Non-likelihood based– Can handle large number of clusters– categorical outcome

GEE

• GEE can be used in – Longitudinal studies

• repeated measures of the same individual form a cluster– Community studies

• subjects clustered by neighborhood– Familial studies

• subjects clustered by family– Epidemiological studies

• Different forms of clusters – e.g., pedigree

GEE

• In general GEE has 3 sets of parameters to estimate:

– Regression parameter (population-averaged effects)

– Correlation parameter (cluster parameter)

– Scale factor (not uncommon to assume =1)

Comparing SLR and GEE

SLR GEE

No dispersion allowed for variance

Var (y)= mu(1-mu)

Dispersion allowed for variance

Var (y)= mu(1-mu)*scale_factor

No need to specify correlation matrix

Need to specify correlation structure

Has odds ratio interpretation of exp(coefficient)

Has odds ratio interpretation of exp(coefficient)

GEE• In its simplest form, GEE can be considered an extension of logistic regression for

clustered data

• Clustered data are common

– Time: Longitudinal analysis with repeated measurements on individual (e.g., BL, 1m, 2m, 6m follow-up)

– Individual: Cross-sectional analysis with multiple outcomes (e.g., left eye, right eye)

– Background: Subjects clustered because of common geographical or social background (e.g., clinic)

Correlation structure

• Correlation structure– Often called the working correlation structure in

GEE– Specifies how the observations within a cluster

are related– Often assumes correlation structure uniform

throughout clusters

• Unstructured – All correlation coefficients free to take any value– E.g.,

1

0.3 1

0.1 0.5 1

0.05 0.2 0.4 1

• Exchangeable– Any responses within the same cluster has the

same correlation– Simple (1 parameter to estimate)

1

1

1

1

• Autogressive AR(1)• Correlation between responses depends on the

interval of time between responses– Farther apart responses => weaker correlation– Only 1 parameter to estimate!

2

3 2

1

1

1

1

Correlation matrix

• Selection of a “working correlation structure” is at the discretion of the researcher!

• How does the correlation structure affects the results?

Properties of GEE estimators

• How about estimate of correlation if “working” correlation matrix is not correctly specified?

• Model-based estimate => not consistent

• Empirical (robust) estimate => still consistent

Properties of GEE estimators

• Even if correlation structure misspecified, estimate for logistic regression is still consistent

– if correlation misspecified, estimate not as efficient (SE is larger)

– This property contributes to the popularity of GEE

• GEE works well with larger #’s of clusters

SAS

Review