Introduction to Statistics: Political Science (Class 9) Review.

40
Introduction to Statistics: Political Science (Class 9) Review

Transcript of Introduction to Statistics: Political Science (Class 9) Review.

Page 1: Introduction to Statistics: Political Science (Class 9) Review.

Introduction to Statistics: Political Science (Class 9)

Review

Page 2: Introduction to Statistics: Political Science (Class 9) Review.

Probability of having cardiovascular disease

• Purpose of statistics: – Inferences about populations using samples

• We draw a random sample of 1,000 adults and 405 have some form of CVD

• Based on our sample, if we randomly select one adult from the population: what is the probability that they have cardiovascular disease?

Page 3: Introduction to Statistics: Political Science (Class 9) Review.

Conditional Probability

No CVD CVD

Exercise less than 3 days/week (N=602)

30.3% 28.9%

Exercise 3 or more days/week (N=398)

30.2% 10.6%

• Probability of exercising <3 days/week?• Probability of CVD among those who

exercise <3 days/week?• Probability of CVD among those who exercise 3

or more days/week?

Page 4: Introduction to Statistics: Political Science (Class 9) Review.

Association between exercise and CVD?

No CVD CVD

Exercise less than 3 days/week (N=602)

30.3% 28.9%

Exercise 3 or more days/week (N=398)

30.2% 10.6%

p1 = 28.9/(30.3+28.9) = 0.488

p2 = 10.6/(30.2+10.6) = 0.260

Difference = 0.488 - 0.260 = .228

Those who exercise less than 3 days/week .228 (22.8%) more likely to have CVD

Page 5: Introduction to Statistics: Political Science (Class 9) Review.

Specifying and testing hypotheses

• Difference of proportions = .228

• What’s our null hypothesis?

• Why a “null hypothesis”? Why not test whether the difference is .228?

• Central limit theorem– In repeated sampling, the distribution of our

estimates of the mean (or difference of means or slope) will be normally distributed and centered over the true population value

Page 6: Introduction to Statistics: Political Science (Class 9) Review.

Central limit theorem

1 standard error

0

Proposed true value

Page 7: Introduction to Statistics: Political Science (Class 9) Review.

Comparing proportions

• Difference of proportions = .228

p1 = 28.9/(30.3+28.9) = 0.488 (N=602)

p2 = 10.6/(30.2+10.6) = 0.260 (N=398)

• Standard error of this difference:

Page 8: Introduction to Statistics: Political Science (Class 9) Review.

Comparing proportions

• So, standard error of difference is the square root of: (.488*(1-.488)/602)+(.260*(1-.260)/398)– Which is .0299

• Difference of proportions = .237

Page 9: Introduction to Statistics: Political Science (Class 9) Review.

Hypotheses

• Null hypothesis: – There is no difference in the rate of CVD

between those who exercise less than 3 days/week and those who do

• Alternate hypothesis: – There is a difference in the rate of CVD

between those who exercise less than 3 days/week and those who do

• (i.e., the difference is not 0)

Page 10: Introduction to Statistics: Political Science (Class 9) Review.

If 0 is was the true difference, it would be very unlikely that we would find a difference 7.93 (.237/.0299)

standard errors from that value by chance

1 standard error

0

Proposed true value

Page 11: Introduction to Statistics: Political Science (Class 9) Review.

Does exercise cause lower CVD?

• Reverse causation? Might CVD cause exercise?

• Failure to account for confounds – Typically leads to over-estimating the strength

of a relationship (not always… but usually)

Page 12: Introduction to Statistics: Political Science (Class 9) Review.

0

10

20

30

40

50

60

70

80

90

100

0 5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Bush FT

Obam

a F

TDemocrats Republicans

Page 13: Introduction to Statistics: Political Science (Class 9) Review.

Specification and Interpretation

Multivariate Regression

Page 14: Introduction to Statistics: Political Science (Class 9) Review.

Does exercise make CDV less likely?

• Regression (predict CDV)

• Estimated likelihood of CDV if exercise 4 days/week?

• What might confound our estimate of the relationship between exercise and CVD?

Coef. SE T P-valueDays Exercise (0-7) -0.06 .001 ? 0.000 Constant 0.56 .002 ? 0.000

Page 15: Introduction to Statistics: Political Science (Class 9) Review.

Controlling for confounds

Coef. SE T P-valueDays Exercise (0-7) -0.03 .001 -3.0 0.002Days Fast Food (0-7) 0.04 .002 2.0 0.048 Constant 0.42 .002 21.0 0.000

Page 16: Introduction to Statistics: Political Science (Class 9) Review.

0

10

20

30

40

50

60

70

80

90

100

0 5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Bush FT

Obam

a F

TDemocrats Republicans

% C

hance CV

D

Days per Week Exercise

High Fast Food

Low Fast Food

Page 17: Introduction to Statistics: Political Science (Class 9) Review.

Controlling for dichotomous confounds

• Predicted probability of CVD for – 2 days exercise, 2 days Fast food, smoker

Coef. SE T P-valueDays Exercise (0-7) -0.03 .001 -3.0 0.002Days Fast Food (0-7) 0.04 .002 2.0 0.048 Smoker (1=yes) 0.11 .001 11.0 0.000 Constant 0.38 .002 19.0 0.000

Page 18: Introduction to Statistics: Political Science (Class 9) Review.

Nominal Variables

• Variable that does not have an “order” to it– Nothing is “higher” or “lower”

• Create set of dichotomous variables

• Always interpret coefficients with respect to the reference category

Page 19: Introduction to Statistics: Political Science (Class 9) Review.
Page 20: Introduction to Statistics: Political Science (Class 9) Review.

Controlling for nominal confounds

Coef. SE T P-valueDays Exercise (0-7) -0.03 .001 -3.0 0.002Days Fast Food (0-7) 0.03 .002 1.5 0.135 Smoker (1=yes) 0.09 .001 9.0 0.000 South (1=yes) 0.03 .002 1.5 0.137 West (1=yes) -0.01 .002 -0.5 0.642 Northeast (1=yes) 0.02 .002 1.0 0.410 Constant 0.34 .002 17.0 0.000(Midwest is excluded category)

What if we wanted to test whether including region indicators improves fit of the model?

Page 21: Introduction to Statistics: Political Science (Class 9) Review.

Non-linear relationships

Page 22: Introduction to Statistics: Political Science (Class 9) Review.

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

1,600,000

1,800,000

60,0

00

660,

000

1,26

0,00

0

1,86

0,00

0

2,46

0,00

0

3,06

0,00

0

3,66

0,00

0

4,26

0,00

0

4,86

0,00

0

5,46

0,00

0

6,06

0,00

0

6,66

0,00

0

Yearly Income ($s)

Ho

me

Va

lue

($

s)

Logarithms

Why use a logarithmic transformation?You think the relationship looks like this…

Page 23: Introduction to Statistics: Political Science (Class 9) Review.

0

200,000

400,000

600,000

800,000

1,000,000

1,200,000

1,400,000

1,600,000

1,800,000

10 11 12 13 14 15 16

Logged Yearly Income

Ho

me

Va

lue

Logarithms

Page 24: Introduction to Statistics: Political Science (Class 9) Review.

Squared term – U(or ∩)-shaped relationship

Coef. SE T P

Age -0.007 0.004 -1.740 0.082

Constant 0.122 0.209 0.580 0.561

Coef. SE T P

Age -0.065 0.025 -2.630 0.009

Age-squared 0.001 0.000 2.390 0.017

Constant 1.554 0.635 2.450 0.015

Age and political ideology (-2=very conservative, 2=very liberal)

Page 25: Introduction to Statistics: Political Science (Class 9) Review.

Age and Political IdeologyCoef. SE T P

Age -0.065 0.025 -2.630 0.009

Age-squared 0.001 0.000 2.390 0.017

Constant 1.554 0.635 2.450 0.015

Age Age2 -0.065*Age .0005574*Age2 Constant Predicted Value

18 324 -1.178 0.181 1.554 0.557

28 784 -1.832 0.437 1.554 0.159

38 1444 -2.487 0.805 1.554 -0.128

48 2304 -3.141 1.284 1.554 -0.303

58 3364 -3.795 1.875 1.554 -0.366

68 4624 -4.450 2.577 1.554 -0.319

78 6084 -5.104 3.391 1.554 -0.159

Page 26: Introduction to Statistics: Political Science (Class 9) Review.

-1

-0.5

0

0.5

1

18 28 38 48 58 68 78 88

Age

Ide

olo

gy

(-

2=

ve

ry c

on

se

rva

tiv

e, 2

=v

ery

lib

era

l)

Page 27: Introduction to Statistics: Political Science (Class 9) Review.

Create indicators from an ordered variable

Party Identification (-3 to 3)

Seven Variables:Strong Republican (1=yes) Weak Republican (1=yes) Lean Republican (1=yes) Pure Independent (1=yes) Lean Democrat (1=yes) Weak Democrat (1=yes) Strong Democrat (1=yes)

Page 28: Introduction to Statistics: Political Science (Class 9) Review.

Predict Obama Favorability (1-4)

Coef. SE T P

Strong Republican -1.632 0.161 -10.160 0.000

Weak Republican -0.707 0.198 -3.580 0.000

Lean Republican -1.235 0.181 -6.810 0.000

Lean Democrat 0.674 0.197 3.430 0.001

Weak Democrat 0.494 0.187 2.640 0.009

Strong Democrat 0.595 0.159 3.750 0.000

Constant 2.940 0.134 21.870 0.000

Excluded category: Pure Independents

Page 29: Introduction to Statistics: Political Science (Class 9) Review.

1

2

3

4

Str

ong

Rep

ublic

an

Wea

kR

epub

lican

Lean

Rep

ublic

an

Pur

eIn

depe

nden

t

Lean

Dem

ocra

t

Wea

kD

emoc

rat

Str

ong

Dem

ocra

t

Obama Favorability

Page 30: Introduction to Statistics: Political Science (Class 9) Review.

Predict Obama Favorability (1-4)

Coef. SE T P

Strong Republican -0.397 0.150 -2.650 0.008

Weak Republican 0.528 0.189 2.790 0.006

Pure Independent 1.235 0.181 6.810 0.000

Lean Democrat 1.909 0.188 10.150 0.000

Weak Democrat 1.729 0.179 9.680 0.000

Strong Democrat 1.831 0.148 12.360 0.000

Constant 1.705 0.122 14.010 0.000

New excluded category: Leaning Republicans

Page 31: Introduction to Statistics: Political Science (Class 9) Review.

Interactions

• One variable moderates the effect of another – i.e., the relationship between one variable and an outcome depends on the value of another variable

Page 32: Introduction to Statistics: Political Science (Class 9) Review.

Coef. SE T P

Party Affiliation (-3=strong R; 3=strong D) 1.286 0.878 1.460 0.143

Voted in 2008 -1.138 1.484 -0.770 0.443

Party Affiliation x Voted in 2008 3.575 0.918 3.900 0.000

Constant 61.100 1.358 44.980 0.000

61.100 + 1.286*Party – 1.138*Voted + 3.575*Party*Voted + u

61.100 + Party*1.286 + Party*Voted*3.575 – 1.138*Voted + u

61.100 + Party(1.286 + Voted*3.575) – 1.138*Voted + u

61.100 + Party*1.286 + Voted*Party*3.575 – Voted*1.138 + u

61.100 + Party*1.286 + Voted(Party*3.575 –1.138) + u

OR

Regression estimates an equation…

Page 33: Introduction to Statistics: Political Science (Class 9) Review.

Party Aff. Voted Party Aff. Voted Party x Voted Constant Predicted Value

Coefficients 1.286 -1.138 3.575 61.100

-3 0 -3.858 0 0 61.100 57.242

-2 0 -2.572 0 0 61.100 58.528

-1 0 -1.286 0 0 61.100 59.814

0 0 0.000 0 0 61.100 61.100

1 0 1.286 0 0 61.100 62.386

2 0 2.572 0 0 61.100 63.672

3 0 3.858 0 0 61.100 64.959

Party Aff. Voted Party Aff. Voted Party x Voted Constant Predicted Value

Coefficients 1.286 -1.138 3.575 61.100

-3 1 -3.858 -1.13775 -10.7258 61.100 45.378

-2 1 -2.572 -1.13775 -7.1505 61.100 50.240

-1 1 -1.286 -1.13775 -3.57525 61.100 55.101

0 1 0.000 -1.13775 0 61.100 59.962

1 1 1.286 -1.13775 3.575252 61.100 64.824

2 1 2.572 -1.13775 7.150504 61.100 69.685

3 1 3.858 -1.13775 10.72576 61.100 74.547

Page 34: Introduction to Statistics: Political Science (Class 9) Review.

40

50

60

70

80

Strong Republican Weak Republican Lean Republican Independent Lean Democrat Weak Democrat Strong DemocratSu

pp

ort

fo

r C

om

pa

rati

ve

Eff

ec

tiv

en

es

s R

es

ea

rch

Did not Vote Voted

Page 35: Introduction to Statistics: Political Science (Class 9) Review.

Establishing causality

Page 36: Introduction to Statistics: Political Science (Class 9) Review.

Dealing with confounds

• Theory + multivariate regression

• Experiments

Page 37: Introduction to Statistics: Political Science (Class 9) Review.

Dealing with reverse causation

• Theory

• Experiments

Page 38: Introduction to Statistics: Political Science (Class 9) Review.

Experiments

• What is the key characteristic of an experiment?

• How does this address reverse causality?

• How does it address confounds?

• Weaknesses/limitations of experiments?

Page 39: Introduction to Statistics: Political Science (Class 9) Review.

Exam Expectations

• Describe probabilities / conditional probabilities• Write hypotheses

– Demonstrate understanding of how null hypotheses relate to the central limit theorem

• Test difference of proportions (formula for SE will be provided)

• Interpreting multivariate regression– Relationships (slopes)– Predicted values– Sketch graphs of relationships

• Discuss strengths and limitations of analyses – Why an estimated slope might be biased– Benefits and limitations of experiments

Page 40: Introduction to Statistics: Political Science (Class 9) Review.

Notes

• Homework 3 graded

• Homework 4 due Thursday 12/9

• Office hours next week – email to come

• Exam December 14 at 2pm