THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

31
THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN

Transcript of THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

Page 1: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

THE POISSON &

NEGATIVE BINOMIAL MODELS

By: ALVARD AYRAPETYAN

Page 2: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

OUTLINE OF PRESENTATION Poisson Regression

Model Assumptions, Assessment, and Interpretations Applications in SAS and R Quick Programming in SPSS and MINITAB

Negative Binomial Model Assumptions, Assessment, and Interpretations Applications in SAS and R Quick Programming in SPSS

Page 3: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

3

ASSUMPTIONS FOR POISSON MODEL• Number of events must occur at a

fixed period of time• Number of events must occur at a

constant rate• Events must be independent• Dependent variable’s conditional

mean and variance must be equal• Dependent variable must be an

integer

Page 4: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

4

THE POISSON MODEL

Random Component: Poisson Distribution for the # of lead changes

Systematic Component:

Mass Function: E(Y) = µ & V(Y)= µ Link Function: g(µ) = log(µ)

,...2,1,0

!

)(),,|(

)(

321

yy

XeXXXyYP

yX

332211

332211)log()(XXXe

XXXg

Page 5: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

5

EXAMPLES OF POISSON DISTRIBUTION• Number of earthquakes in a region

• Number of accidents on a highway in a certain area in a specified time

• Number of telephone calls received in one hour

• Number of customers that enter a bank in one hour

• Number of times an elderly person will fall in a month

Page 6: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

6

INTEPRETING COEFFICIENTSCONTINUOUS PREDICTOR Keeping all constant,

when is increased by one unit, Y increases/decreases (+/-) by

Keeping all constant, when is increased by one unit, the expected number of Y will go up/down (+/-) by

CATEGORICAL PREDICTOR Keeping all constant,

when , Y increases/decreases (+/-) by

Keeping all constant, when the expected number of Y will go up/down (+/-) by

1x

%100)1)ˆ(( 1 Exp

1x

11 x

%100)ˆ( 1 Exp11 x

Page 7: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

7

POTENTIAL PROBLEM WITH POISSON

• OVERDISPERSION-the variance is much larger than the mean

• Negative Binomial is the solution!

Page 8: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

8

THE DATA Trying to predict the number of field goal

attempts in NBA Extracted the top 100 highest scoring players

in the NBA for the 2013-2014 season The following were used as predictors:

Number of games played (GP) Number of defensive rebounds(DREB) Number of assists (AST) Number of steals (STL) Number of blocks (BLK) Number of turnovers (TOV) Number of free throws made (FTM)

Page 9: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

9

SAMPLE OF THE DATA

Rank Player GP FGA DREB AST STL FTM TOV

1 Kevin Love (MIN) 15 268 146 68 13 95 41

2 Kevin Durant (OKC) 12 209 72 62 17 131 45

3 Monta Ellis (DAL) 14 235 42 76 22 85 55

4 Blake Griffin (LAC) 15 242 129 47 19 59 40

5 LeBron James (MIA) 13 201 67 88 12 71 49

6 Evan Turner (PHI) 15 272 85 53 15 71 56

7 Kevin Martin (MIN) 14 248 48 33 18 71 20

8 Paul George (IND) 13 231 72 41 23 70 33

9 LaMarcus Aldridge (POR) 14 285 105 35 19 54 34

10 Carmelo Anthony (NYK) 12 264 79 33 15 72 36

11 Kyrie Irving (CLE) 14 268 40 89 14 55 47

12 Klay Thompson (GSW) 14 212 30 22 12 30 20

13 Dirk Nowitzki (DAL) 14 206 82 33 16 74 25

14 James Harden (HOU) 12 195 45 65 21 91 52

15 Chris Paul (LAC) 15 208 65 188 36 81 44

16 Arron Afflalo (ORL) 13 197 62 61 10 56 33

17 Damian Lillard (POR) 14 225 54 85 11 64 31

18 DeMarcus Cousins (SAC) 13 230 103 31 22 65 36

Page 10: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

10

POISSON-EXAMPLE WITH SAS

proc genmod data = nba;

model FGA= GP DREB AST STL TOV FTM /dist=poisson;

run;

/*check goodness of fit for model*/

data pvalue;

df = 93; chisq = 511.6210;

pvalue = 1 - probchi(chisq, df);

run;

proc print data = pvalue noobs;

run; /*pvalue is NOT significant, model isnt good*; dispersion parameter 5.5013 >> 1, major overdipsersion/

Page 11: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

11

EXAMPLE RESULTS-GOODNESS OF FIT

 The GENMOD Procedure  Model Information  Data Set WORK.NBA Distribution Poisson Link Function Log Dependent Variable FGA   Number of Observations Read 100 Number of Observations Used 100   Criteria For Assessing Goodness Of Fit  Criterion DF Value Value/DF  Deviance 93 511.6210 5.5013 Scaled Deviance 93 511.6210 5.5013 Pearson Chi-Square 93 518.3345 5.5735 Scaled Pearson X2 93 518.3345 5.5735 Log Likelihood 72301.7048 Full Log Likelihood -604.2412 AIC (smaller is better) 1222.4824 AICC (smaller is better) 1223.6998 BIC (smaller is better) 1240.7186   

Page 12: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

12

RESULTS: Analysis of Maximum Likelihood Parameter Estimates

PARAMETER DF ESTIMATE STANDARD ERROR

WALD 95% CONFIDENCE LIMITS

WALD CHI-SQUARE

PR>CHISQ

Intercept 1 4.1864 0.0749 (4.0396,43332)

3125.02 <.0001

GP 1 0.0422 0.0057 (0.0310,0.0534)

54.93 <.0001

DREB 1 0.0004 0.0003 (-0.0002,0.0010)

1.55 0.2131

AST 1 -0.0002 0.0003 (-0.0008,0.0005)

0.28 0.5995

STL 1 0.0028 0.0012 (0.0004,0.0052)

5.17 0.0230

TOV 1 0.0057 0.0010 (0.0038,0.0077)

33.53 <.0001

FTM 1 0.0040 0.0004 (0.0032,0.0048)

98.23 <.0001

Scale 0 1.000 0 (1.0, 1.0)

Page 13: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

13

ASSESSMENT OF RESULTSRatio of Deviance/Df=5.5013

>>>1==major overdispersionDeviance=511.6210, not well fit because

pvalue=1-prob(chisq,df) is NOT significant

Every term significant except for AST and DREB

False results possible if model is inaccurate

Must perform a NEGATIVE BINOMIAL

Page 14: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

14

POISSON-EXAMPLE WITH R

nba <- read.csv("F:/STATS544/nba.csv",header=TRUE)

poiss<-glm(FGA ~GP+DREB+AST+STL+TOV+FTM, family = "poisson", data = nba)

summary(poiss)

Page 15: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

15

R-GOODNESS OF FITS

Deviance Residuals:

Min 1Q Median 3Q Max

-5.5397 -1.2614 -0.1643 1.2650 6.2786

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 926.60 on 99 degrees of freedom

Residual deviance: 511.62 on 93 degrees of freedom

AIC: 1222.5

Page 16: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

R-ANALYSIS OF PARAMETER ESTIMATES

Call:

glm(formula = FGA ~ GP + DREB + AST + STL + TOV + FTM, family = "poisson",

data = nba)

Coefficients:

Estimate Std. Error z value Pr(>|z|)

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

16

ESTIMATE STD.ERROR Z VALUE PR(>|z|)

(Intercept) 4.1864100 0.0748885 55.902 < 2e-16 ***

GP 0.0422013 0.0056940 7.411 1.25e-13 ***

DREB 0.0003719 0.0002987 1.245 0.213

AST -0.0001778 0.0003387 -0.525 0.600

STL 0.0027777 0.0012221 2.273 0.023 *

TOV 0.0057220 0.0009882 5.790 7.02e-09 ***

FTM 0.0040405 0.0004077 9.911 < 2e-16 ***

Page 17: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

17

POISSON WITH SPSS & MINITAB

SPSS

genlin FGA with GP DREB AST STL TOV FTM

/model GP DREB AST STL TOV FTM INTERCEP=YESdistribution = poisson link = log

/print FIT SUMMARY SOLUTION.

MINITAB

Stat > Regression  > Poisson Regression > Fit Poisson Model.

Page 18: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

Detecting over-dispersionwith SAS

Poisson regression gives a ratio between DEVIANCE and DF >1.

proc genmod data = nba;

model FGA= GP DREB AST STL TOV FTM /dist=poisson;

run;

PROC MEANS--- the variance of FGA(Y) is much higher than its mean

proc means data = nba n mean var min max;

var FGA

run;

Page 19: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

Detecting over-dispersionwith R

Poisson regression gives a ratio between RESIDUAL DEVIANCE and DF >1 poiss<-glm(FGA ~GP+DREB+AST+STL+TOV+FTM, family = "poisson",

data = nba)

summary(poiss)

mean(nba$FGA) [1] 173.47

var(nba$FGA) [1] 1684.858

Page 20: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

20

NEGATIVE BINOMIAL REGRESSION

Generalization of Poisson regression

Used for over-dispersed count data

PMF:

E(Y)= m, V(Y) = +m k*(m2) K=dispersion parameter As k0, the V(Y) , m NB approaches Poisson and

V(Y)=E(Y)= m Link Function same as Poisson: g(m) = log(m.) Equation: Log(λ(X))= β0 + β1Χ1 + β2Χ2+……..+ βp-1Xp-1 Goodness Of fit Test-same as Poisson

,...2,1,0)1()(

)(),,,|( 321

y

kk

k

yk

kykXXXyYP

yk

Page 21: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

21

NEGATIVE BINOMAL-EXAMPLE WITH SAS

proc genmod data = nba;

model FGA= GP DREB AST STL TOV FTM /dist=negbin; (ONLY DIFFERENCE FROM POISSON)

run;

/*check goodness of fit for model*/

data pvalue;

df = 93; chisq = 99.3405;

pvalue = 1 - probchi(chisq, df);

run;

proc print data = pvalue noobs;

run;

Page 22: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

22

EXAMPLE RESULTS-GOODNESS OF FIT Data Set WORK.NBA

Distribution Negative Binomial Link Function Log Dependent Variable FGA   Number of Observations Read 100 Number of Observations Used 100   Criteria For Assessing Goodness Of Fit  Criterion DF Value Value/DF  Deviance 93 99.3405 1.0682 Scaled Deviance 93 99.3405 1.0682 Pearson Chi-Square 93 100.7383 1.0832 Scaled Pearson X2 93 100.7383 1.0832 Log Likelihood 72428.1189 Full Log Likelihood -477.8271 AIC (smaller is better) 971.6543 AICC (smaller is better) 973.2367 BIC (smaller is better) 992.4957 

Page 23: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

23

RESULTS: Analysis of Maximum Likelihood Parameter Estimates

PARAMETER

DF ESTIMATE

STANDARD ERROR

WALD 95% CONFIDENCE LIMITS

WALD CHI-SQUARE

PR>CHI-SQ

INTERCEPT 1 4.1742 0.1641 (3.8525,4.4958)

647.01 <.0001

GP 1 0.0426 0.0125 (0.0181,0.0671)

11.62 0.0007

DREB 1 0.0003 0.0007 (-0.0011,0.0016)

0.15 0.7028

AST 1 -0.0001 0.0008 (-0.0017,0.0014)

0.03 0.8619

STL 1 0.0024 0.0027 (-0.0029,0.0077)

0.78 0.3756

TOV 1 0.0060 0.0023 (0.0015,0.0105)

6.95 0.0084

FTM 1 0.0042 0.0010 (0.0023,0.0061)

19.32 <.0001

DISPERSION

1 0.0230 0.0040 (0.0163,0.0325)

Page 24: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

24

Assessment of Results Ratio of Deviance/Df=1.0682 ≈1 (over-dispersion fixed!) Deviance=99.3405, now is well fit because pvalue=1-

prob(chisq,df) IS significant Extra parameter in the “Analysis of Maximum Likelihood

Parameter Estimates” called “Dispersion” (aka ALPHA) Accounts for the over-dispersion factor we came across

in the Poisson regression This estimate has a value of .0230 with a Wald

Confidence Interval of (.0163, 0325). Based on the 95% Confidence Limits for our dispersion parameter, we can say that dispersion is significantly different from 0, justifying the negative binomial model is more appropriate

GP, TOV, & FTM only significant predictors

Page 25: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

25

NEGATIVE BINOMIAL-EXAMPLE WITH R

nba <- read.csv("F:/STATS544/nba.csv",header=TRUE)

install.packages('MASS') library(MASS) nb<-glm.nb(FGA

~GP+DREB+AST+STL+TOV+FTM, data = nba)

summary(nb)

Page 26: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

26

EXAMPLE RESULTS-GOODNESS OF FIT

(Dispersion parameter for Negative Binomial(43.4291) family taken to be 1)

Null deviance: 182.54 on 99 degrees of freedom

Residual deviance: 99.34 on 93 degrees of freedom

AIC: 971.65

Number of Fisher Scoring iterations: 1

Deviance Residuals:

Min 1Q Median 3Q Max

-2.36322 -0.60467 -0.06083 0.55227 2.72053

Theta: 43.43

Std. Err.: 7.62

2 x log-likelihood: -955.654

Page 27: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

27

RESULTS: Analysis of Maximum Likelihood Parameter Estimates

Call:

glm.nb(formula = FGA ~ GP + DREB + AST + STL + TOV + FTM, data = nba,

init.theta = 43.42912732, link = log)

Coefficients:

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

ESTIMATE STD.ERROR Z-VALUE PR(>|Z|)

(Intercept) 4.1741833 0.1626544 25.663 < 2e-16 ***

GP 0.0425988 0.0123895 3.438 0.000585 ***

DREB 0.0002619 0.0006835 0.383 0.701571

AST -0.000139 0.0007904 -0.176 0.860433

STL 0.0023962 0.0027055 0.886 0.375794

TOV 0.0060360 0.0022760 2.652 0.008001 **

FTM 0.0042121 0.0009430 4.467 7.95e-06 ***

Page 28: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

28

INTERPETATION OF SIGNIFICANT COEFFICIENTS

GP: Holding all other variables constant, for every one unit addition of games played, the expected log number of field goal attempts will go up by .0426. Or similarly, for every additional game played, the number of field goal attempts will increase by 4.35%

TOV: Holding all other variables constant, for every one extra TOV, the expected log number of field goal attempts will increase by 0.0060. Or similarly, for every additional turnover made, the number of field goal attempts will increase by 0.60%.

FTM: Holding all other variables constant, for every one unit addition of free throws made, the expected log number of field goal attempts will go up by 0.0042. Or similarly, for every additional free throw made, the number of field goal attempts will increase by 0.42%.

Page 29: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

29

NEGATIVE BINOMIAL WITH SPSS & MINITAB

SPSS

genlin FGA with GP DREB AST STL TOV FTM/model GP DREB AST STL TOV FTM INTERCEP=YESDistribution=negbin(mle) link = log /print FIT SUMMARY SOLUTION.

MINITAB

NA

Page 30: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

30

SUMMARY

Use Poisson regression when dealing with COUNT data

If there’s Overdispersion, switch to Negative binomial

Assumptions for both Poisson and NB are the same

Both model coefficients are interpreted same manner

Can perform both regressions in SAS, R, & SPSS

Page 31: THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

31