Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor...

27
Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics

Transcript of Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor...

Page 1: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Linear correlation and linear regression + summary of tests

Dr. Omar Al JadaanAssistant Professor – Computer Science &

Mathematics

Page 2: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Recall: Covariance

1

))((),(cov 1

n

YyXxyx

n

iii

Page 3: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

cov(X,Y) > 0 X and Y are positively correlated

cov(X,Y) < 0 X and Y are inversely correlated

cov(X,Y) = 0 X and Y are independent

Interpreting Covariance

Page 4: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Correlation coefficient

Pearson’s Correlation Coefficient is standardized covariance (unitless):

yx

yxariancer

varvar

),(cov

Page 5: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Correlation Measures the relative strength of the linear

relationship between two variables Unit-less Ranges between –1 and 1 The closer to –1, the stronger the negative linear

relationship The closer to 1, the stronger the positive linear

relationship The closer to 0, the weaker any positive linear

relationship

Page 6: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Scatter Plots of Data with Various Correlation Coefficients

Y

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = +.3r = +1

Y

Xr = 0

Page 7: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Y

X

Y

X

Y

Y

X

X

Linear relationships Curvilinear relationships

Linear Correlation

Page 8: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Y

X

Y

X

Y

Y

X

X

Strong relationships Weak relationships

Linear Correlation

Page 9: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Linear Correlation

Y

X

Y

X

No relationship

Page 10: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Some calculation formulas…

yx

xy

n

ii

n

ii

n

iii

n

ii

n

ii

n

iii

SSSS

SS

yyxx

yyxx

n

yy

n

xx

n

yyxx

r

1

2

1

2

1

1

2

1

2

1

)()(

))((

1

)(

1

)(

1

))((

ˆ

yx

xy

SSSS

SSr ˆ

Note: Easier computation formulas:

22

22

ynySS

xnxSS

yxnyxSS

iy

ix

iixy

Page 11: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Sampling distribution of correlation coefficient:

*note, like a proportion, the variance of the correlation coefficient depends on the correlation coefficient itselfsubstitute in estimated r

2

1)ˆ(

2

n

rrSE

The sample correlation coefficient follows a T-distribution with n-2 degrees of freedom (since you have to estimate the standard error).

Page 12: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

What is “Linear”?

Remember this: Y=mX+B?

B

m

Page 13: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

What’s Slope?

A slope of 2 means that every 1-unit change in X yields a 2-unit change in Y.

Page 14: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Simple linear regression

The linear regression model:

Love of Math = 5 + .01*math SAT score

intercept

slope

P=.22; not significant

Page 15: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

PredictionIf you know something about X, this knowledge helps you

predict something about Y. (Sound familiar?…sound like conditional probabilities?)

Page 16: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

EXAMPLE The distribution of baby weights at

Stanford ~ N(3400, 360000)

Your “Best guess” at a random baby’s weight, given no information about the baby, is what?

3400 grams

But, what if you have relevant information? Can you make a better guess?

Page 17: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Predictor variable X=gestation time

Assume that babies that gestate for longer are born heavier, all other things being equal.

Pretend (at least for the purposes of this example) that this relationship is linear.

Example: suppose a one-week increase in gestation, on average, leads to a 100-gram increase in birth-weight

Page 18: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Y depends on X

Y=birth- weight

(g)

X=gestation time (weeks)

Best fit line is chosen such that the sum of the squared (why squared?) distances of the points (Yi’s) from the line is minimized:

Or mathematically… (remember max and mins from calculus)…

Derivative[(Yi-(mx+b))2]=0

Page 19: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Prediction

A new baby is born that had gestated for just 30 weeks. What’s your best guess at the birth-weight?

Are you still best off guessing 3400? NO!

Page 20: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Y=birth- weight

(g)

X=gestation time (weeks)

At 30 weeks…

3000

30

Page 21: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Y=birth weight

(g)

X=gestation time (weeks)

At 30 weeks…

(x,y)=

(30,3000)

3000

30

Page 22: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

At 30 weeks…

The babies that gestate for 30 weeks appear to center around a weight of 3000 grams.

In Math-Speak… E(Y/X=30 weeks)=3000 grams

Note the conditional expectation

Page 23: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

But…Note that not every Y-value (Yi) sits on the line. There’s variability.

Yi=3000 + random errori

In fact, babies that gestate for 30 weeks have birth-weights that center at 3000 grams, but vary around 3000 with some variance 2

Approximately what distribution do birth-weights follow? Normal. Y/X=30 weeks ~ N(3000, 2)

Page 24: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Y=birth- weight

(g)

X=gestation time (weeks)

And, if X=20, 30, or 40…

20 30 40

Page 25: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Y=baby weights

(g)

X=gestation times (weeks)

If X=20, 30, or 40…

20 30 40

Y/X=40 weeks ~ N(4000, 2)

Y/X=30 weeks ~ N(3000, 2)

Y/X=20 weeks ~ N(2000, 2)

Page 26: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Mean values fall on the line

E(Y/X=40 weeks)=4000 E(Y/X=30 weeks)=3000 E(Y/X=20 weeks)=2000

E(Y/X)= Y/X = 100 grams/week*X weeks

Page 27: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.

Linear Regression Model

Y’s are modeled…

Yi= 100*X + random errori

Follows a normal distribution

Fixed – exactly on the line