Analyzing Binary Data - SAS Group Presentations... · ID Replicate Progest Proest Type Foll_Dia...

20
Analyzing complex binary data using SAS (by a Non- statistician) Jaswant Singh Veterinary Biomedical Sciences

Transcript of Analyzing Binary Data - SAS Group Presentations... · ID Replicate Progest Proest Type Foll_Dia...

Analyzing∧complex binary data using SAS

(by a Non- statistician)

Jaswant Singh

Veterinary Biomedical Sciences

Most researchers use statistics that way a drunkard uses a lamp-post

–more for support than illumination

- Winfred Castle

Stat 101

Dependent Variable (Outcome)

Independent Variables (Predictor)

Covariates (Confounders)

Variable types: – Categorical (Qualitative)

• Nominal, Dichotomous, Ordinal/count

– Continuous (Quantitative)

Fixed versus random factors

First thing first…..

What is the primary question that I am

going to answer?

Are the any secondary questions?

Understand your Model: – What is my dependent variable?

– What is/are my independent variables?

– What is the type of data?

Are there any confounders (covariates)

Simplest Scenario

Response variable: Binary

Independent variable: Categorical

e.g. My Dean would like to know: Does the Mclean’s Prestige rating of an Institution

matters for admission into graduate program at UofS?

Let’s generate a Frequency Table

High Low

Rejected 125 148

Admitted 87 40

Prestige

Num

ber

of stu

dents

2x2 contingency table

Chi-square test

Chi-Square test

Data Grad;

Input Prestige$ Admission$ number;

Cards;

1 Rejected 125

1 Admitted 87

2 Rejected 148

2 Admitted 40

Proc freq;

Weight number;

Tables Prestige*admission/chisq

exact nocol norow;

Run;

Chi-Square: P-value=0.001

Dean got interested but….

Now want us to test Institutional Prestige

Rating on 1 to 4 scale (best to worst)

1 2 3 4

Rejected 28 97 93 55

Admitted 33 54 28 12

Prestige Rank (Highest to Lowest)

Num

ber

of stu

dents

Chi-square test: Two-tailed P-value = 0.001, Degrees of freedom = 3

Simple Situation

An Associate Vice-President (Research) is

interested in knowing what other factors affect

admission into graduate school

Variables of Interest (Independent variables): – GRE Score - continuous

– Percent Marks - continuous

– Prestige of the undergraduate institution – rank (1 to 4)

Outcome or Response variable: – Admission to Graduate School is Yes / No (binary)

Example and data from ULCA Academic Technology Service: http://www.ats.ucla.edu/stat/sas/dae/logit.htm

Logistic Regression

GRE Mark Prestige Adm

660 82 3 1

800 90 1 1

640 70 4 1

520 63 4 0

760 65 2 1

560 65 1 1

400 67 2 0

540 75 3 1

700 88 2 0

800 90 4 0

440 71 1 0

760 90 1 1

700 67 2 0

700 90 1 1

480 76 3 0

780 87 4 0

… …. . .

Data

proc means;

var gre mark;

run;

proc freq;

tables rank admission admission*rank;

run;

SAS Code

Proc Logistic

proc logistic descending;

class rank / param=ref;

model admission = gre mark rank;

contrast 'Rank 1 vs 2' rank 1 -1 0 /estimate=parm;

contrast 'Rank 2 vs 3' rank 0 1 -1 /estimate=parm;

contrast 'GRE200' intercept 1 gre 200 mark 74.78 rank 0 1 0 /estimate=prob;

contrast 'GRE300' intercept 1 gre 300 mark 74.78 rank 0 1 0 /estimate=prob;

contrast 'GRE400' intercept 1 gre 400 mark 74.78 rank 0 1 0 /estimate=prob;

contrast 'GRE500' intercept 1 gre 500 mark 74.78 rank 0 1 0 /estimate=prob;

contrast 'GRE600' intercept 1 gre 600 mark 74.78 rank 0 1 0 /estimate=prob;

contrast 'GRE700' intercept 1 gre 700 mark 74.78 rank 0 1 0 /estimate=prob;

contrast 'GRE800' intercept 1 gre 800 mark 74.78 rank 0 1 0 /estimate=prob;

Run;

How about the Crossed-Categorical Factors?

A researcher (me!) is interested to examine

factors leading to successful pregnancy

outcome: – Blood progesterone levels during previous cycle

(luteal- vs. subluteal-P4)

– Time between luteolysis and exogenous LH (long-

vs. short)

– Can subluteal progesterone compensate for short

treatment time? (P4*LH interaction)

– Does parity matter ? (first-time moms vs. others)

– Data were gathered over 2 years (replicate 1 and 2)

Approaches

LOGISTIC

GENMOD

GLIMMIX

GLM / PROC MIXED

Glimmix – Fixed Factors

PROC glimmix method=quad;

CLASS Progest Proest Type

Replicate;

MODEL Preg (event="1") =

Progest Proest Type Replicate

Progest*Proest Progest*Type

Proest*Type / dist=bin link=logit;

LSMEANS Progest*Proest /diff

lines ilink or adjust=tukey;

run;

ID Replicate Progest Proest Type Foll_Dia Preg

32 1 High Long A 14 0

46 1 High Long A 12 1

134 1 High Long A 11 1

171 1 High Long B 11 0

178 2 High Long B 12 1

12 2 High Long A 16 1

34 2 High Long A 15 1

36 2 High Long A 15 0

82 2 High Long B 15 1

1 1 High Short B 9 0

17 1 High Short A 9 0

21 1 High Short A 10 0

53 1 High Short A 12 0

……………..

Data

Glimmix – Mixed Factors

PROC glimmix method=quad;

CLASS Progest Proest Type Replicate;

MODEL Preg (event="1") =

Progest Proest Type

Progest*Proest Progest*Type

Proest*Type / dist=bin link=logit;

Random intercept

/subject=Replicate;

LSMEANS Progest*Proest /diff

lines ilink or adjust=tukey;

run;

ID Replicate Progest Proest Type Foll_Dia Preg

32 1 High Long A 14 0

46 1 High Long A 12 1

134 1 High Long A 11 1

171 1 High Long B 11 0

178 2 High Long B 12 1

12 2 High Long A 16 1

34 2 High Long A 15 1

36 2 High Long A 15 0

82 2 High Long B 15 1

1 1 High Short B 9 0

17 1 High Short A 9 0

21 1 High Short A 10 0

53 1 High Short A 12 0

……………..

Data

Conclusions

Use KISS principle

We can analyze dichotomous

response variable by: – Chi-Square

– Logistic regression

– GenMod / Glimmix