Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting...

42
Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public Affairs Penn State Harrisburg [email protected] PADM-HADM 503 Mallinson Day 13 November 16, 2017 1 / 42

Transcript of Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting...

Page 1: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Day 13: Multiple Regression andReporting Results

Daniel J. Mallinson

School of Public AffairsPenn State [email protected]

PADM-HADM 503

Mallinson Day 13 November 16, 2017 1 / 42

Page 2: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Road map

Multiple RegressionMulticollinearityAutoregression

Things to keep in mind when reporting results

Mallinson Day 13 November 16, 2017 2 / 42

Page 3: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

RecapBivariate Regression and Correlation

Figure: “U.S. Phillips Curve” by Farcaster, CC BY-SA 3.0

Mallinson Day 13 November 16, 2017 3 / 42

Page 4: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

Two or more IVs

Principles and interpretations from bivariate regression apply

Multi-dimensional analysis, so scatterplots are not as useful, butbasic notion still applies

Mallinson Day 13 November 16, 2017 4 / 42

Page 5: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

A generic model:

A generic multiple regression formula:

Y = a + b1X1 + b2X2 + b3X3 . . . + bnXn (1)

Mallinson Day 13 November 16, 2017 5 / 42

Page 6: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

Statistics You Should Know About:

R-square (R2)

A measure of association; indicates amount of variance explained inthe DV by the model (IVs)

Adjusted R2

A more conservative version of R2; adjusted for number of IVs

Unstandardized coefficients (beta)

Specific relationships between a individual IV and the DV

Mallinson Day 13 November 16, 2017 6 / 42

Page 7: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

Statistics You Should Know About:

Beta Weights

Standardized beta coefficients; used for comparing variables measuredon different scales

F-Ratio

Significance test for the model, recall F from ANOVA

Mallinson Day 13 November 16, 2017 7 / 42

Page 8: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

Steps in Analysis:

1 Need a theoretical model → prevents “garbage can” models

2 Enter all relevant IVs into analysis

3 Search for violations of regression assumptions (e.g.,multicollinearity, autocorrelation)

4 Make careful decisions about variables to remove/add/transform,do not remove just because they are not statistically significant

5 Interpret your final (“well-specified”) model

Mallinson Day 13 November 16, 2017 8 / 42

Page 9: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

An SPSS example:

Use country.sav data file

122 countriesTip: if you have multiple variables, you need more cases (e.g.,countries)

DV: Female Life Expectancy

Mallinson Day 13 November 16, 2017 9 / 42

Page 10: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

The Model:

Mallinson Day 13 November 16, 2017 10 / 42

Page 11: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

In SPSS:

Analyze

Regression

Linear

Select variables (see model above)

Method: “Enter”

Under “Statistics” select:

EstimatesModel fitCollinearity diagnostics

Mallinson Day 13 November 16, 2017 11 / 42

Page 12: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

There are four methods of selecting variables to enter into ananalysis:

Enter

Forward selection

Backward selection

Stepwise selection

Enter is the most common methods. For the other three, consult abook on regression analysis

Mallinson Day 13 November 16, 2017 12 / 42

Page 13: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple RegressionResults:

Mallinson Day 13 November 16, 2017 13 / 42

Page 14: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multicollinearity

IVs are not independent of each other

Violates regression assumption: each IV has impact on DV andnot on each other

Thus, their effects on DV cannot be isolated from each other

Need to identify if this problem exists and correct

Sampling problem, but not always a solution

Mallinson Day 13 November 16, 2017 14 / 42

Page 15: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multicollinearity

How to detect:

Look at last two columns of output

If any Tolerance value is close to zero, there is amulticollinearity problem for the IV it is associated with (listedon the left of the table)If the VIF is larger than 5 (or larger than 10, according to somestatisticians), then there is a multicollinearity problem for the IVit is associated with

Mallinson Day 13 November 16, 2017 15 / 42

Page 16: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multicollinearity

In our results:

Variables with multicollinearity problems include:

GDP per capitaPhones per 100 people

See the tolerance and VIF statistics

These are correlated highly with at least some of the othervariables

Mallinson Day 13 November 16, 2017 16 / 42

Page 17: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multicollinearity

Another method of detection:

A simpler method is to run bivariate Pearson’s r correlations forall IVs

Can create a correlation table and make decisions about whichvariables to include

Helps you identify IVs with high correlations

Ask whether they are measuring the same concept

Mallinson Day 13 November 16, 2017 17 / 42

Page 18: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

In SPSS: Analyze → Correlate → Bivariate Correlations → Chooseall variables

Mallinson Day 13 November 16, 2017 18 / 42

Page 19: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

Removing variables and re-running model

Only remove insignificant variables if they are not vital to yourtheory or models in past literature

Removing even insignificant variables will have an impact on theestimation of the model (including other coefficients)

Have more latitude in this area if the model is simply exploratory

Do need to address multicollinearity problem

Mallinson Day 13 November 16, 2017 19 / 42

Page 20: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple RegressionRe-run the model without Phones per 100 people

Mallinson Day 13 November 16, 2017 20 / 42

Page 21: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

SPSS Output

Table 1: Variables Entered/Removed

Shows the variables included, type of selection, and any removedvariables

Table 3: ANOVA

Shows F-ratio for entire model, Sig. of .000 means entire model issignificant

Mallinson Day 13 November 16, 2017 21 / 42

Page 22: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

SPSS Output

Table 2: Model Summary

Look at Pearson’s r, r-square, and adjusted r-square

These are scores for the two IVs combined

R2 (.66) means the two IVs together explain 68% of variance infemale life expectancy

Mallinson Day 13 November 16, 2017 22 / 42

Page 23: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

SPSS Output

Table 4: Coefficients

Tolerance and VIF are good

Pct Urban and Doctors are statistically significant predictors

The model is good

Mallinson Day 13 November 16, 2017 23 / 42

Page 24: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple RegressionWhat does the regression equation mean?

Female Life expectancy = 53.905+ .111 percent urban+ .000 GDP+ .040 Radios+ .009 Hospital beds+ .465 Doctors

The base for female life expectancy (no urban population and nodoctors) is 53.5 years

Every percentage point increase in urbanization increases femalelife expectancy by .142 years

Every additional doctor per 10,000 people increases female lifeexpectancy by .568 years

Isolated effectsMallinson Day 13 November 16, 2017 24 / 42

Page 25: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

The Beta Coefficients (Standardized Coefficients) in Table 4:

Which of the two IVs is more important?

The betas of the two variables can be compared:

Doctors per 10,000 people: .469Percent urban: .248

If you want to increase female life expectancy, the best methodis by increasing the number of doctors per 10,000 people

Urbanization helps, but comes in second

Mallinson Day 13 November 16, 2017 25 / 42

Page 26: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple RegressionSPSS Output

Table 5: Collinearity Diagnostics

Useful if you want to conduct more detailed analyses ofmulticollinearity

Guidelines for interpreting the table:

If eigenvalue is close to zero, there is a problem ofmulticollinearityIf the condition index is larger than 15, there is a problem ofmulticollinearity

Mallinson Day 13 November 16, 2017 26 / 42

Page 27: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

Presenting Results of a Single Model

Unstandardized RegressionCoefficient (s.e.) Beta Weight

Percent Urban, 1992 0.111∗ (0.035) 0.248GDP Per Capita 0.000 (0.000) 0.093Radios per 100 people 0.040 (0.027) 0.110Hospital Bed per 10,000 0.009 (0.030) 0.025Doctors per 10,000 0.465∗ (0.104) 0.469Constant 53.905∗ (1.483)R2 .679Adjusted R2 .664∗p < 0.05

Mallinson Day 13 November 16, 2017 27 / 42

Page 28: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple RegressionPresenting Results of a Multiple Models

Model 1 Model 2Coefficient Beta Coefficient Beta

Percent Urban, 1992 0.111∗ 0.248 0.142∗ 0.313(0.035) (0.033)

GDP Per Capita 0.000 0.093(0.000)

Radios per 100 people 0.040 0.110(0.027)

Hospital Bed per 10,000 0.009 0.025(0.030)

Doctors per 10,000 0.465∗ 0.469 0.568∗ 0.565(0.104) (0.074)

Constant 53.905∗ 53.546∗

(1.483) (1.379)R2 .679 .660Adjusted R2 .664 .654∗p < 0.05Standard errors in parentheses

Mallinson Day 13 November 16, 2017 28 / 42

Page 29: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Multiple Regression

Causal Inference:

Remember: Correlation 6= Causation

Regression implies causation

Statistical controls vs. experimental control

Beware of potential omitted variable bias

Interpret findings with appropriate caution

Mallinson Day 13 November 16, 2017 29 / 42

Page 30: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Communicating Research Findings

Outline

Preliminary Reminders

Guidelines for Paper Writing

Components of a Quantitative Research Paper

Oral Presentations

Ethical Issues

Mallinson Day 13 November 16, 2017 30 / 42

Page 31: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Preliminary Reminders

Audience

Presentation should be adjusted for the needs of the audience

Audience analysis:

Tailoring to audienceMaking contents and message clear

Contents

For all kinds of audiences, presentations should be accurate, clear,coherent, and concise

Mallinson Day 13 November 16, 2017 31 / 42

Page 32: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Guidelines for Paper Writing

Accuracy

Appropriate uses of concepts cited in a paper

Appropriate uses of analytical methods and accuracy incalculations

Citing sources properly

Mallinson Day 13 November 16, 2017 32 / 42

Page 33: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Guidelines for Paper Writing

Clarity

Use the following carefully and sparingly:

Complex sentencesConjunctions (“although,” “however”) and pronouns (“this,”“that”)Allusions (indirect, vague references) – AvoidMetaphors, Embellishments, Poetic Expressions, and Cliches

Double-check and revise your draft

Mallinson Day 13 November 16, 2017 33 / 42

Page 34: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Guidelines for Paper Writing

Coherence

Make an outline of your paper

Discuss few concepts or issues and clarify and/or elaborate onthem (focus)

Do not casually list many concepts or issues

Paragraphs should be right size (not too long or short)

Each should have one or few clear point(s)

Mallinson Day 13 November 16, 2017 34 / 42

Page 35: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Guidelines for Paper Writing

Conciseness

Keep it brief!

Make sure that every point you make is directly relevant to yourmain point(s) in the paper

Make sure that every word you use has a specific function in asentence

Mallinson Day 13 November 16, 2017 35 / 42

Page 36: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Components of a Quantitative ResearchPaper

See Example 15.1 in textbook

A generic outline:

Executive Summary or AbstractIntroductionReview/TheoryMethodologyFindings/ResultsRecommendations/Conclusions

Qualitative research papers may have somewhat differentoutlines

Mallinson Day 13 November 16, 2017 36 / 42

Page 37: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Components of a Quantitative ResearchPaper

Executive Summaries and Abstracts

Differences between the two (see Example 15.2 in text)

Background Information

Problem, research questions, purpose of study

Literature Review

Three types:

Chronological order

Organize discussion around key variables (method for classes)

Organize discussion around theoretical approaches

Mallinson Day 13 November 16, 2017 37 / 42

Page 38: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Components of a Quantitative ResearchPaper

Methodology

Design, sampling, operational definitions of variables, procedures ofdata collection, and data analysis methods used

Findings/Results

Focus on important findings (do not cover everything); use tablesand graphs

Recommendations/Conclusions

Include a summary and conclusions; recommendations part ofprofessional (not academic) report

Mallinson Day 13 November 16, 2017 38 / 42

Page 39: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Oral Presentations

Use the traditional plan of research papers (background,methods, results, recommendation) – but with differentemphases

May be necessary to be informal during the presentation

Practice, practice, practice!

Mallinson Day 13 November 16, 2017 39 / 42

Page 40: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Ethical Issues

Fabrication, falsification

Plagiarism

Full disclosure for handling research errors

Peer review: blind and double-blind review

Saving (and publishing) data for other’s use

APA guidelines: Keep data for at least 5 yearsUse a Dataverse: https://dataverse.harvard.edu/

Mallinson Day 13 November 16, 2017 40 / 42

Page 41: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Questions?

Figure: Q&A by Libby Levi, CC BY-SA 2.0

Mallinson Day 13 November 16, 2017 41 / 42

Page 42: Day 13: Multiple Regression and Reporting Results€¦ · Day 13: Multiple Regression and Reporting Results Daniel J. Mallinson School of Public A airs Penn State Harrisburg mallinson@psu.edu

Lab/HomeworkUsing the county data file edited.sav, I want you to conduct a briefstudy of the correlates of crime in North Carolina. For this, you willuse the crime index variable (CrimeIndex) as the dependent variable.Look through the dataset and documentation to identify at least 5variables that you believe to be associated with crime. Provide mewith the following:

Hypothesis for each variableTable of initial and final model resultsDiscussion of whether there is multicollinearity among some ofthe variables and what you will do about itInterpret the statistically significant coefficients, both in theirraw form and their beta weightsReport which of your hypotheses are supported and which arenot.

Include SPSS output in an appendix, do not rely on it for your briefreport. Instead, make actual tables in Word.

Mallinson Day 13 November 16, 2017 42 / 42