Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11...

29
Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad

Transcript of Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11...

Page 1: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Statistics for Socialand Behavioral Sciences

Part IV: CausalityMultivariate Regression

Chapter 11Prof. Amine Ouazad

Page 2: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Movie Buzz• Can we predict the success of a movie?

1. Avatar (2009)$760,505,847

2. Titanic (1997)$658,672,302

3. The Avengers (2012)$623,279,547

4. The Dark Knight (2008) $533,316,0615. Star Wars: Episode I – The Phantom Menace

(1999)$474,544,677

Page 3: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Data• Box_mil = First run U.S. box office (Millions of $)• MPRating = 1 if movie is PG13 or R, 0 if the movie is G or PG.• Budget = Production budget (Millions of $)• Starpowr = Index of star power• Sequel = 1 if movie is a sequel, 0 if not• Action = 1 if action film, 0 if not• Comedy = 1 if comedy film, 0 if not• Animated = 1 if animated film, 0 if not• Horror = 1 if horror film, 0 if not• Addict = Trailer views at traileraddict.com• Cmngsoon = Message board comments at comingsoon.net• Fandango = Attention at fandango.com • Cntwait3 = Percentage of Fandango votes that can't wait to see.

Page 4: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Statistics Course Outline

PART I. INTRODUCTION AND RESEARCH DESIGN

PART II. DESCRIBING DATA

PART III. DRAWING CONCLUSIONS FROM DATA: INFERENTIAL

STATISTICS

PART IV. : CORRELATION AND CAUSATION: TWO GROUPS,

REGRESSION ANALYSIS

Week 1

Weeks 2-4

Weeks 5-9

Weeks 10-14

Multivariate regression now!

Estimating a parameter using sample statistics. Confidence Interval at 90%, 95%, 99% Testing a hypothesis using the CI method and the t method.

Sample statistics: Mean, Median, SD, Variance, Percentiles, IQR, Empirical RuleBivariate sample statistics: Correlation, Slope

Four Steps of “Thinking Like a Statistician”Study Design: Simple Random Sampling, Cluster Sampling, Stratified Sampling

Biases: Nonresponse bias, Response bias, Sampling bias

Page 5: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Coming up

• “Comparison of Two Groups”Last week.

• “Univariate Regression Analysis”Last Saturday, Section 9.5.

• “Association and Causality: Multivariate Regression”Last Saturday, Chapter 10.Today, Tomorrow, Chapter 11.

• “Randomized Experiments and ANOVA”.Wednesday. Chapter 12.

• “Robustness Checks and Wrap Up”.Last Thursday.

Page 6: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Outline

1. Multivariate regression

2. Interpreting coefficientsCeteris Paribus

3. Standardized Coefficient

4. Multiple Correlation and R Squared

Next time: Multivariate regression: the F test (Continued)

Page 7: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Data: Variables

• y Box = First run U.S. box office ($)• x1 MPRating = 1 if movie is PG13 or R, 0 if the movie is G or PG.

• x2 Budget = Production budget ($Mil)

• x3 Starpowr = Index of star power

• x4 Sequel = 1 if movie is a sequel, 0 if not

• x5 Action = 1 if action film, 0 if not

• x6 Comedy = 1 if comedy film, 0 if not

• x7 Animated = 1 if animated film, 0 if not

• x8 Horror = 1 if horror film, 0 if not

• x9 Addict = Trailer views at traileraddict.com

• x10 Cmngsoon = Message board comments at comingsoon.net

• x11 Fandango = Attention at fandango.com

• x12 Cntwait3 = Percentage of Fandango votes that can't wait to see.

Page 8: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Multivariate Regression

• With variables x1, x2, …, x12.• We are trying to get the true impact:

b1 of variable x1 on y. b2 of variable x2 on y. … b12 of variable xK on y.

• True model: y = a + b1 x1 + b2 x2 + b3 x3 + … + b12 x12 + e

We would get those if we had the population of all possible movies.

Page 9: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

• Instead we estimate b1, b2, …, bK on the sample:– Minimizing the sum of the squared prediction

error !

• With these we can predict the success of a movie:

Multivariate Regression

Page 10: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Sampling Distribution of b3

• We only observe one coefficient estimate b3, because we have only one sample.

• But across all possible samples, the sampling distribution of b3 is bell-shaped.

• Hence we can design a test:• H0: “ b3 = 0 ”

follows a t distribution with N – (K + 1) degrees of freedom.

Under H0,

Page 11: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Hypothesis testing for H0 : “b3=0”

• Reject the null hypothesis at 95% if:

– The absolute value of the t statistic is greater than the t score with N – (K+1) degrees of freedom at 95%.

– Equivalently, if the p value is lower than 0.05.

There are as many null hypothesis as there are coefficients to estimate :

Here, there are

Page 12: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Outline

1. Multivariate regression

2. Interpreting coefficientsCeteris Paribus

3. Standardized Coefficient

4. Multiple Correlation and R Squared

Next time: Multivariate regression (Continued)

Page 13: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Ceteris Paribus=“All other things equal”

• “All other things equal”, what is the impact of variable x3 on box office outcome in millions of $?

Increase in starpower (variable x3) all other things equal.Keep x1,x2,x4,x5,x6,x7,x8,x9,x10,x12 constant ! And change x3.

Increase in x3

(Star power)

Page 14: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Ceteris Paribus=“All other things equal”

• “All other things equal”, what is the impact of variable x3 on box office outcome in millions of $?

Increase in budget(variable x2) all other things equal.Keep x1,x3,x4,x5,x6,x7,x8,x9,x10,x12 constant ! And change x3.

Increase in x2

(Budget)by 1 million $

Page 15: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.
Page 16: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Reading the coefficients

• An increase in budget by 1 million $ leads to a rise in box office $ of 0.144 million $, all other things equal.

• An action movie has on average all other things equal a lower box office outcome, by $12 million.

• An increase in the ‘Percentage of Fandango votes that can't wait to see’ (cntwait3) by 1 percentage point leads to a 0.01 * 32.15 = 0.3215 M$ increase in box office outcome in $.

We multiply by 0.01 (1%) because cntwait3 ranges from 0 to 1.

Page 17: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Which coefficients arestatistically significant?

• x1 MPRating = 1 if movie is PG13 or R, 0 if the movie is G or PG. ❏❏❏

• x2 Budget = Production budget ($Mil) ❏❏❏

• x3 Starpowr = Index of star power ❏❏❏

• x4 Sequel = 1 if movie is a sequel, 0 if not ❏❏❏

• x5 Action = 1 if action film, 0 if not ❏❏❏

• x6 Comedy = 1 if comedy film, 0 if not ❏❏❏

• x7 Animated = 1 if animated film, 0 if not ❏❏❏

• x8 Horror = 1 if horror film, 0 if not ❏❏❏

• x9 Addict = Trailer views at traileraddict.com ❏❏❏

• x10 Cmngsoon = Message board comments at comingsoon.net ❏❏❏

• x11 Fandango = Attention at fandango.com ❏❏❏

• x12 Cntwait3 = Percentage of Fandango votes that can't wait to see. ❏❏❏

At 1

0%At

5%

At 1

%

Read the p value !!! Or compare the t stat to the t score with N-13 degrees of freedom

Page 18: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

With Budget

Page 19: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Without Budget

Page 20: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Budget and Can’t Wait to See the movie !

• Without budget among the variables, the popularity cntwait3 has a bigger impact…

• Than with budget included.

Budget

Cntwait3

Box office (box_mil)

We know that Budget and Cntwait3 are correlated (an arrow either in one direction or in the other, or both) because including Budget affects the coefficient of Cntwait3

Other variables

Page 21: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Outline

1. Multivariate regression

2. Interpreting coefficientsCeteris Paribus

3. Standardized Coefficient

4. Multiple Correlation and R Squared

Next time: Multivariate regression (Continued)

Page 22: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Standardized CoefficientWe just saw:• An increase in budget by 1 million $ leads to a

rise in box office $ of 0.144 million $, all other things equal.

But is 1 million $ big? Is 0.144 million $ big?

Page 23: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

• “a 1 standard deviation increase in x2, leads to a …. % standard deviation increase in y.”

• Standard deviation of x2 (budget): 42.9.• Standard deviation of y (box office outcome):

17.5.• Coefficient of budget: 0.144.• Fill in the blank.

Standardized Coefficient

Page 24: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Standardized Coefficient

We multiply by 0.01 (1%) because cntwait3 ranges from 0 to 1.

• An increase in budget by 1 million $ leads to a rise in box office $ of 0.144 million $, all other things equal.

• An action movie has on average all other things equal a lower box office outcome, by $12 million.

• An increase in the ‘Percentage of Fandango votes that can't wait to see’ (cntwait3) by 1 percentage point leads to a 0.01 * 32.15 = 0.3215 M$ increase in box office outcome in $.

Page 25: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Outline

1. Multivariate regression

2. Interpreting coefficientsCeteris Paribus

3. Standardized Coefficient

4. Multiple Correlation and R Squared

Next time: Multivariate regression (Continued)

Page 26: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

R Squared

• How good are we at predicting the success of a movie?

• The multiple correlation is 1 if we are absolutely correct in our predictions. ei=0 for every movie.

• The multiple correlation is 0 if we do not better than taking the average. ei =

Page 27: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

ESS/TSS = 13356/18665 = 0.7156

Page 28: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Wrap up

• We can use a number of variables to explain a dependent variable.

• Multiple regression accounts for multiple causes.• The coefficients minimize the sum of the squared

residuals.• Understand the t test and the p value.• The coefficients should be understood “all other things

equal” or “ceteris paribus”.• The standardized coefficients express effects in terms of

standard deviations.• The R squared between 0 and 100% measures how

accurate our predictions are.

Page 29: Statistics for Social and Behavioral Sciences Part IV: Causality Multivariate Regression Chapter 11 Prof. Amine Ouazad.

Coming up:

• Schedule for next week:• Chapter on “Association and Causality”, and “Multivariate Regression”.• Make sure you come to sessions and recitations.

Sunday MondayMultivariate Regression

TuesdayMultivariate RegressionThe F test

WednesdayRandomized Experiments and ANOVA

ThursdayWrap up

Recitation Evening session 7.30pmWest Administration 002

Usual class12.45pmUsual room

Evening session7.30pmWest Administration 001

Usual class12.45pmUsual room