4.3 GENERALIZED LINEAR MODELS FOR COUNTS

Post on 03-Jan-2016

26 views 1 download

description

4.3 GENERALIZED LINEAR MODELS FOR COUNTS. count data - assume a Poisson distribution counts in contingency tables with categorical response variables. modeling count or rate data for a single discrete response variable. 4.3.1 Poisson Loglinear Models. - PowerPoint PPT Presentation

Transcript of 4.3 GENERALIZED LINEAR MODELS FOR COUNTS

1STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

4.3 GENERALIZED LINEAR MODELS FOR COUNTS

count data - assume a Poisson distribution

counts in contingency tables with categorical response variables.

modeling count or rate data for a single discrete response variable.

2STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

4.3.1 Poisson Loglinear Models

The Poisson distribution has a positive mean µ. Although a GLM can model a positive mean using the

identity link, it is more common to model the log of the mean.

Like the linear predictor , the log mean can take any real value.

The log mean is the natural parameter for the Poisson distribution, and the log link is the canonical link for a Poisson GLM.

A Poisson loglinear GLM assumes a Poisson distribution for Y and uses the log link.

3STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

Log linear model

The Poisson loglinear model with explanatory variable X is

For this model, the mean satisfies the exponential relationship x

A 1-unit increase in x has a multiplicative impact of on µ

The mean at x+1 equals the mean at x multiplied by .

4STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

4.3.2 Horseshoe Crab Mating Example

5STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

6STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

4.3.2 Horseshoe Crab Mating Example a study of nesting horseshoe crabs. Each female horseshoe crab had a male

crab resident in her nest. AIM: factors affecting whether the

female crab had any other males, called satellites, residing nearby.

Explanatory variables are : C - the female crab’s color, S - spine condition, Wt - weight, W - carapace width.

Outcome: number of satellites (Sa) of a female crab.

For now, we only study W (carapace width)

7STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

number of satellites (Sa) = f (W)

Scatter plot – weakly linear ? (N=173)

Grouped plot: To get a clearer picture, we grouped the female crabs into width categories

and calculated the sample mean number of satellites for female crabs in each category.

Figure 4.4 plots these sample means against the sample mean width for crabs in each category.

The sample means show a strong increasing trend.

WHY?

8STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

9STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

10STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

11STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

12STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

13STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

14STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

SAS code

data table4_3;

input C S W Wt Sa@@; cards;

2 3 28.3 3.05 8 3 3 22.5 …

;

proc genmod data=table4_3;

model Sa=W/dist=poisson link=identity;

ods output ParameterEstimates=PE1;

run;

proc genmod data=table4_3;

model Sa=w/dist=poisson link=log;

ods output ParameterEstimates=PE2;

run;

15STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Modelsdata _NULL_; set PE1;

if Parameter="Intercept" then

call symput("intercp1", Estimate);

if Parameter="W" then call symput("b1", Estimate);

data _NULL_; set PE2;

if Parameter="Intercept" then

call symput("intercp2", Estimate);

if Parameter="W" then call symput("b2", Estimate);

run;

data tmp;

do W=22 to 32 by 0.01;

mu1=&intercp1 + &b1*W;

mu2=exp(&intercp2 + &b2*W);

output;

end;

run;

16STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

Graphs

proc sort data=table4_3; by W;

data tmp1; merge table4_3 tmp; by W; run;

symbol1 i=join line=1 color=green value=none;

symbol2 i=join line=2 color=red value=none;

symbol3 i=none line=3 value=circle;

proc gplot data=tmp1;

plot mu1*W mu2*W Sa*W / overlay;

run;

17STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

18STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

Group data/*group data*/

data table4_3a; set table4_3;

W_g=round(W-0.75)+0.75;

*if W<23.25 then W_g=22.5;

*if W>29.25 then W_g=30.5;

run;

proc sql;

create table table4_3g as

select W_g, count(W_g) as Num_of_Cases,

sum(Sa) as Num_of_Satellites,

mean(Sa) as Sa_g, var(sa) as Var_SA

from table4_3a group by W_g;

quit;

proc print; run;

19STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

SAS output

Num_of_ Num_of_

Obs W_g Cases Satellites Sa_g Var_SA

1 20.75 1 0 0.00000 .

2 21.75 1 0 0.00000 .

3 22.75 12 14 1.16667 3.0606

4 23.75 14 20 1.42857 8.8791

5 24.75 28 67 2.39286 6.5437

6 25.75 39 105 2.69231 11.3765

7 26.75 22 63 2.86364 6.8853

8 27.75 24 93 3.87500 8.8098

9 28.75 18 71 3.94444 16.8791

10 29.75 9 53 5.88889 9.8611

11 30.75 2 6 3.00000 0.0000

12 31.75 2 6 3.00000 2.0000

13 33.75 1 7 7.00000 .

20STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

Graphs

data tmp2; merge table4_3g(rename=(W_g=W)) tmp; by W; run;

symbol1 i=join line=1 color=green value=none;

symbol2 i=join line=2 color=red value=none;

symbol3 i=none line=3 value=circle;

proc gplot data=tmp2;

plot mu1*W mu2*W Sa_g*W / overlay;

run;

21STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

22STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

4.3.3 Overdispersion for Poisson GLMs

23STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

Solution?

24STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

4.3.4 Negative binomial GLMs

25STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

26STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

/*fit negative binomial with identical link to count for overdispersion*/

proc genmod data=table4_3;

model Sa=W/dist=NEGBIN link=identity;

ods output ParameterEstimates=PE3;

run;

27STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models

4.3.6 Poisson GLM of independence in I × J contingence tables

28STA 517 – Chp4 STA 517 – Chp4 Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models