Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and...

78

Transcript of Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and...

Page 1: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 2: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Alternatively, dependent variable and independent variable.

Alternatively, endogenous variable and exogenous variable.

Page 3: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Association versus causation

Page 4: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 5: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 6: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Scatterplots

Page 7: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Weeks since beginning of semester

Per

cent

age

of c

ompu

ters

use

d in

com

pute

r la

bsfr

ee

Page 8: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 9: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Stata Exercise 1

Page 10: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Stata Exercise 2

Suppose we were considering the effect of hiring more people into the firm. On average, what total billings can we expect from a staff of 50? 150?

Page 11: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Stata Exercise 3

Page 12: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 13: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 14: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Stata Exercise 4

Page 15: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Stata Exercise 5

Adding Categorical Values to a Scatterplot

Often it is useful to have a way of distinguishing groups of data in a scatterplot

Page 16: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Stata Exercise 6

Page 17: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Transforming Data

Data analysts often look for a transformation of the data that simplifies the overall pattern.

The transformation typically involves turning a non-Normally distributed variable into a more-or-less Normally distributed variable.

Stata Exercise 7

Page 18: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Categorical Explanatory Variable

What if the explanation for the numbers is not another number but the category?

For example, investing in a particular sector of the economy might be great in some years or terrible in others.

Stata Exercise 8

Page 19: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

More scatterplots

Relations between competitors

Stata Exercise 9

Page 20: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Correlation

Page 21: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Which one has the stronger correlation?

Page 22: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

r = covariance(x,y) / [stdev(x)*stdev(y)]

r = (1/(n-1)) * sum of [(standardized values of x) (standardized values of] y)

Page 23: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

week w - mean of w z-score of wprop of comps

p - mean of p z-score of pz-score * z-score

1 73.12 89.73 71.34 65.35 54.66 57.97 51.68 41.29 59.1

10 48.511 2412 4313 29.114 19.715 12.116 10.1

sum 0.00

8.5 4.8 46.9 23.1 count 16mean of w stdev of w mean of p stdev of p corr

Page 24: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Correlation

The r coefficient between measures of height and weight is positive because people who are of above-average height tend to be of above-average weight … so if the z-score for height is large, the z-score for weight tends to be large.

r = (1/(n-1)) * sum of [(standardized values of x) (standardized values of] y)

Correlation applet at www.whfreeman.com/pbs

Page 25: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Stata Exercise 11

Page 26: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 27: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Correlation

Correlation coefficients, as well as scatterplots can be used for comparisons.

For example, how well did Vanguard International Growth Fund (an investment vehicle) do compared to an average of the stocks in Europe, Australasia and the Far East?

Stata Exercise 12

Page 28: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Correlation

Doesn’t tell you anything about causality Variables must be numerical It is indifferent to units of measurement r>0 means positive association; r<0, negative -1 < r < 1. r = -1 means a perfectly straight

downward-sloping line. r=0 means no relation. r only measures linear relations r is not resistant to outliers

Stata Exercise 13

Page 29: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 30: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 31: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Regression

Page 32: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

The Linear Regression Model

Errors have a mean 0 and a constant sd of and are independent of x.

iii errorbxay

Page 33: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

05

0000

01

0000

00

1000 2000 3000 4000Square Footage of Homes

Linear prediction Price of Homes

Page 34: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

05

0000

01

0000

00

1000 2000 3000 4000Square Footage of Homes

Price of Homes Linear prediction

Page 35: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

05

01

001

50F

req

uenc

y

0 500000 1000000Price of Homes

1000<sqft<=1500

05

01

001

50F

req

uenc

y

0 500000 1000000Price of Homes

1500<sqft<=20000

50

100

150

Fre

que

ncy

0 500000 1000000Price of Homes

2000<sqft<=2500

05

01

001

502

00F

req

uenc

y

0 500000 1000000Price of Homes

2500<sqft<=3000

05

01

001

50F

req

uenc

y

0 500000 1000000Price of Homes

3000<sqft<=35000

50

100

150

Fre

que

ncy

0 500000 1000000Price of Homes

3500<sqft<=4000

Page 36: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

05

0000

01

0000

00

1000 2000 3000 4000Square Footage of Homes

Price of Homes Linear prediction

Page 37: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 38: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 39: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 40: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 41: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 42: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

05

0000

100

000

150

000

200

000

An

nual

ear

nin

gs (

dolla

rs)

55 60 65 70 75 80Height (inches)

earn Fitted values

(66.5’’, $20,000)

(76.5’’, $35,600)

(61.5’’, $12,200)

y – 20,000 = 1560 (x - 66.5)

y = – 84,000 + 1560 x

Sketch a scatterplot of the data consistent with this line

$37,694

95% of values

Page 43: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

05

0000

100

000

150

000

200

000

An

nual

ear

nin

gs (

dolla

rs)

55 60 65 70 75 80Height (inches)

earn Fitted values

Page 44: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

01

23

y

0 1 2 3x

Draw the best-fitting line through the circles

Page 45: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Draw the best-fitting line through the circles

01

23

4y

0 1 2 3 4 5 6x

Page 46: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

01

23

y

0 1 2 3x

Mark with an “X” the average “y” value for each “x” value. Then draw the best-fitting line through the Xs

Page 47: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

01

23

4y

0 1 2 3 4 5 6x

Mark with an “X” the average “y” value for each “x” value. Then draw the best-fitting line through the Xs

Page 48: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Regression (unlike correlation) is sensitive to your determination of which variable is explanatory and which response.

Sales = a + b(item)Item = a + b(sales)

Fac

t 1

Stata Exercise 14

Page 49: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Facts 2 and 3

If x changes by one standard deviation ofx, y changes by r standard deviations of y.– E.g., sx = 1, sy = 2, and r = 0.61.

If x changes by 1, y will change by 2*0.61 = 1.22

The regression line goes through the point– The point-slope form of the line requires only the

information on this slide to draw a line.

),( yx

Page 50: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Fact 4

Correlation r is related to the slope of the regression line and therefore to the relation between x and y.

Actually, the square of r, that is, R2 is the fraction of the variation in y that is explained by the variation in x.

),( yx

y

xyR

of valuesobservedin variationtotal

line thealongit pulls as ˆin variation 2

Page 51: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 52: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Because most of the variation in gas consumption is explained by temperature, the R2 of this regression is very high.

Page 53: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 54: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 55: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

tbill98 tbill98_hat residuals

11.5 10.84649  

12.6 12.19961  

13.8 14.81564  

6.4 5.975251  

5.3 6.336083  

Excel Exercise 1

Page 56: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Stata Exercises 15 and 16

Page 57: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 58: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

With influential observations

Without influential observation 21

Page 59: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Stata Exercise 17

Page 60: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.

Cautions about Correlation and Regression

Don’t extrapolate too far Correlations are stronger for averages than

for individuals Beware of lurking (latent, hidden, excluded,

neglected) variables Association is not causation

– Establishing causation takes a lot of work (see p. 139).

Page 61: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 62: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 63: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 64: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 65: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 66: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 67: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 68: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 69: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 70: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 71: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 72: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 73: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 74: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 75: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 76: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 77: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.
Page 78: Alternatively, dependent variable and independent variable. Alternatively, endogenous variable and exogenous variable.