Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the...

22
Regression and interpretation low R-squared! Social Research Network 3 nd Meeting Noosa April 12-13, 2012 Kenshi Itaoka Mizuho Information & Research Institute, Inc.

Transcript of Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the...

Page 1: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Regression and interpretationlow R-squared!

Social Research Network 3nd MeetingNoosa

April 12-13, 2012

Kenshi ItaokaMizuho Information & Research Institute, Inc.

Page 2: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Contents

MotivationMotivationAbout rPurpose of regressionPurpose of regressionExampleConclusionConclusion

2

Page 3: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Motivation

We sometime encounter low R squared (Fitnessfunction in regression) in our results of a regressionanalysis.We sometime hear it is not possible or notWe sometime hear it is not possible or notappropriate to draw any meaningful insights fromregression analyses because the R squared is veryg y q ylow.Is this true?

3

Page 4: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Fitness function in regression

R-squared= (1- SSE) / SST

Defined as the ratio of the sum of squares explained by a regressionmodel and the "total" sum of squares around the mean.Interpreted as the ration of variance explained by a regression model

Adjuseted R squared= (1 MSE) / MSTAdjuseted R-squared= (1- MSE) / MSTMST = SST/(n-1) MSE = SSE/(n-p-1)

Other indicators such as AIC, BIC etc. also sometimeused for model selection.used for model selection.

4

Page 5: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Purpose of regression

Analysis of partial correlation between factors:to avoid risks of interpretation of simple correlationsto avoid risks of interpretation of simple correlationsin multi-variable analyses

Examination of influential factors on phenomena to bel i dexplained

Causal inference? Yes / No: need to check of logicof causalityof causality.

5

Page 6: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Purpose of regression 2 Forecast

• Example: temperature and energy consumptionPredictors (independent variables)Prediction (dependent variables)

Detection of influence of internal /external factorsInternal consistency check

• Example: demographics and public opinionsExplanatory variables (independent variables)Explained variable (dependent variables)Explained variable (dependent variables)

6

Page 7: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Model estimation in regression True model: y = b0 + b1 x1 + b2 x2 + uEstimated model: y = a0 + a1 x1 + v

X1 and x2 should be independentN l ti b t 1 d th th f th l tNo correlation between x1 and v as the theory of the leastsquare methodology

In this case,s case,a1 is equal to b1But variance of error term is influenced by x2Variance of u is smaller in than in v by x2Variance of u is smaller in than in v by x2R squared should be smaller in the estimated model than

that in the true model due to x2.

To examine the effectiveness of a1, the size of R squareddoes not matter.Only significance of at matterOnly significance of at matterIn practice, examination of correlation between x1 and x2 (ifobserved) is important.

7

Page 8: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Classification of independent variablesExamples are from a CCS related public perception study

“Understanding how individuals perceive carbon dioxide: its“Understanding how individuals perceive carbon dioxide: its relevance to CCS acceptance”

Exogenous variables:Exogenous variables:• Example: age, education….

Indigenous variables: Not directly related factors to dependent variable to be explained

• Example: value and beliefs, CO2 knowledge…..Directly related factors to dependent variable to beexplained

• Example: CCS knowledge, CCS perception…..Example: CCS knowledge, CCS perception…..

8

Page 9: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Change of R squaredin regression analyses in Understanding how individuals perceive carbon

dioxide: its relevance to CCS acceptance”p

9

Page 10: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Change of R squaredin regression analyses in Understanding how individuals perceive carbon

dioxide: its relevance to CCS acceptance”pIndependent variable Country On shore Off shore

Opinion 1 DemographicsValue and beliefsCO2 k l d 0 137 0 134 0 173CO2 knowledgeCCS awarenessTrustworthy sources

0.137 0.134 0.173

Opinion change (ANOVA)

Only type provided information package 0.009 0.007 0.008(ANOVA) information package

Opinion change(Regression)

Only perception of pieces of information included in provided 0.017 0.015 0.019information package

Opinion 2 Full set (not including CCS impression 0.440 0.361 0.424

In the case we add CCS impression variables (positivity cleanness, usefulness, safety and maturity), R squared increase more than 0.6 but they tendt hid ff t f f th i bl

pvariables )

to hide effects of some of other variables.

10

Page 11: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Example of literature In the case exogenous variables are mainly used:g y

11

literature cited: Geologic Storage of Carbon Dioxide: Risk Analyses and Implications for Public AcceptancebyGregory R. Singleton B.S., Systems Engineering, University of Virginia, 2002

Page 12: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Example of literature I th i di i bl i l dIn the case indigenous variables are mainly used:

12

literature cited:Impact of Knowledge and Misconceptions on Benefit and Risk Perception of CCSL Wallquist, VHM Visschers and M Siegrist - Environ. Sci. Technol., 2010

Page 13: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Example of literature A researcher’s comment:

……….in many social science settings, an Rsquare of 9% is considered respectable. That's about as good as it gets in most psychology studies where two distinct variables are correlated with each other. Example: extraversion explains only about that much of the variation is sales effectiveness.

When you measure variables with error, that can lower your Rsquare. What is S&P500 taken to be a measure of? The overall health of the economy, or just the stock market? Or just itself?

http://www.marketingprofs.com/ea/qst_question.asp?qstID=21047

13

Page 14: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Example of factors to influence of fitnessAccuracy of measurements (size of error)Resolution of measurementsLess number of unobserved factorsStrength of causalityFundamental randomness…

14

Page 15: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

ConclusionIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size of R squared does not matter.However, we need to explain why the R squared is low if it is the case.o s e caseAt least, we explain potential important covariates (independent variables) are included or not .

If those covariates are included in the model and the R squared is still low, we would claim the measurement of dependent variable and some independent variables are not p paccurate.If those covariates are not included, we should mention why we cannot include those and claimwe cannot include those and claim“further research is necessary!”

High R squared would sometime be dangerous when we use di tl l t d i di f t t d d t i bldirectly related indigenous factors to dependent variable.They might hide effects of some of other variables.Maybe better conduct SEM (path anaylsis) 15

Page 16: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Conclusion 2In social science settingIn social science setting, List all potentially influential factorsCheck simple correlationCheck simple correlation Conduct multiple regressionCheck residual (linearity)Check residual (linearity)Again try to find hidden factorsIf the list of variables for input of regression is p g

defendable and there is not much multi-colinearity, the model is considered to be fine even with low R-squared.M b b tt d t SEM ( th l i )Maybe better conduct SEM (path anaylsis)

16

Page 17: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Th k !Thank you!

Contact: [email protected]@ jp

17

Page 18: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

BackupBackup

18

Page 19: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Example of literature Analysis

19

literature cited:The public perspective of carbon capture and storage for CO2 emission reductions in ChinaH Duan - Energy Policy, 2010

Page 20: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

To what extent

R ecognition on m easures against global w arm ing

384339

454147

171614

'03 Visit (1003)'07 Visit (328)'07 W eb (2156)

Energy-

saving} 0.151

P-value in χ square test

<03 v.s 07>

do public know about

CCS?

526366

3337

443632

4546

2317

2

4

3

'03 Visit (1006)'07 Visit (334)'07 W eb (2156)

'03 Visit (1005)'07 Visit (332)

Low fuel-

efficient

Hydrogen

vehicle

} 0.270

} 0.063 *

CCS?(2007

survey)

43

545263

18

43

4242

34

34

14

7

48

4

3

( )'07 W eb (2156)

'03 Visit (1004)'07 Visit (332)'07 W eb (2156)

'03 Visit(1005)

vehicle

Nuclear

energy

Biom ass

} 0.000 * * *

} 0 000 * * *su ey) 182131

523645

3438

42

3842

43

4841

27

1022123

03 Visit (1005)'07 Visit (331)'07 W eb (2156)

'03 Visit (1004)'07 Visit (331)'07 W eb (2156)

Biom ass

energy

CO 2 sink &

fixation by

} 0.000

} 0.000 * * *

606165

9718

373732

2212

33

6981

49

3

32

'03 Visit (1005)'07 Visit (334)'07 W eb (2156)

'03 Visit (1003)'07 Visit (328)'07 W eb (2156)

Solor

energy

C O 2

C apture &

} 0.107

} 0.000 * * *

1 8

526060

33

443736

1319

49

82756

5

4

33

07 W eb (2156)

'03 Visit (1001)'07 Visit (332)'07 W eb (2156)

'03 Visit (1005)'07 Viit(334)

W ind energy

Iron

} 0.896

} 0.116

81923

7569

6

0% 20% 40% 60% 80% 100%

07 Visit (334)'07 W eb (2156)

Share of respondents [%]I know to som e extent I have heard of it I don't know at all

dispersal in

*(Num bers in parentheses

indicate valid responses.

20

Page 21: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

Backgro nd & objecti eBackground & objective

21

Page 22: Regression and interpretation low R-squared!€¦ · Conclusion zIn social science, to examine the effectiveness of a factor the size of R squared does not matterfactor, the size

BackgroundRecently implementation of large demonstration projects and commercial projects has become an important agenda for GHG mitigation in the worldimportant agenda for GHG mitigation in the world. In this move, the issue in public acceptance of CCS expands to cover from policy formulation in nationalexpands to cover from policy formulation in national policy arena to project implementation sites in local policy arena. Therefore needs for assessment of public opinions on CCS have changed to include not only those for general public in national policy context but also thosegeneral public in national policy context but also those for local public in project implementation context.

22