Cor. & Regression

7/29/2019 Cor. & Regression

http://slidepdf.com/reader/full/cor-regression 1/27



Measures the relative strength of the linear relationship between two variablesUnit-less

Ranges between –1 and 1The closer to –1, the stronger the negative linearrelationshipThe closer to 1, the stronger the positive linear

relationshipThe closer to 0, the weaker any positive linearrelationship



Y

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = +.3r = +1

Y

X

r = 0



Y

X

Y

X

Y

Y

X

X

Linear relationships Curvilinear relationships



Y

X

Y

X

Y

Y

X

X

Strong relationships Weak relationships



Y

X

Y

X

No relationship



In correlation, the two variables are treatedas equals. In regression, one variable is

considered independent (=predictor)variable ( X ) and the other the dependent(=outcome) variable Y .



Y=mX+B?

B

m



A slope of 2 means that every 1-unitchange in X yields a 2-unit change in Y.



The linear regression model:

Love of Math = 5 + .01*math SAT score

intercept

slope

P=.22; not

significant



If you know something about X, this knowledge helpsyou predict something about Y.



The average baby weights in Mumbai is3400 gm

Your “Best guess” at a random baby’s weight,given no information about the baby, is what?

3400 grams

But, what if you have relevant information? Canyou make a better guess?



X=gestation time

Assume that babies that gestate forlonger are born heavier, all other thingsbeing equal.Pretend (at least for the purposes of thisexample) that this relationship is linear.

Example: suppose a one-week increasein gestation, on average, leads to a 100-gram increase in birth-weight



Y=birth - weight

(g)

X=gestatio n time (weeks)

Best fit line is chosensuch that the sum of thesquared (why squared?) distances of the points(Y i ’s) from the line isminimized:

Or mathematically..(maxand mins fromcalculus)…

Derivative[ (Yi-(mx+b)) 2]=0



A new baby is born that had gestated for just 30 weeks. What’s your best guess atthe birth-weight?Are you still best off guessing 3400? NO!



Y=birth - weight

(g)


3000

30



Y=birth weight

(g)


(x,y)=

(30,3000)

3000

30



The babies that gestate for 30 weeksappear to center around a weight of 3000grams.

In Math- Speak… E(Y/X=30 weeks)=3000 grams



Note that not every Y-value (Y i ) sits on the line. There’svariability.

Yi=3000 + random error

i

In fact, babies that gestate for 30 weekshave birth-weights that center at 3000grams, but vary around 3000 with some

variance2

◦ Approximately what distribution do birth-weights follow? Normal. Y/X=30 weeks ~ N(3000, 2)



Y=birth - weight

(g)


20 30 40



Y=baby weights

(g)

X=gestatio n times (weeks)

20 30 40

Y/X=40 weeks ~ N(4000, 2)

Y/X=30 weeks ~ N(3000, 2)

Y/X=20 weeks ~ N(2000, 2)



E(Y/X=40 weeks)=4000E(Y/X=30 weeks)=3000

E(Y/X=20 weeks)=2000

E(Y/X)= Y/X = 100 grams/week*X weeks



Y’s are modeled…

Yi= 100*X + random error i

Follows anormaldistribution

Fixed – exactlyon theline



Linear regression assumes that…◦ 1. The relationship between X and Y is linear◦ 2. Y is distributed normally at each value of X◦ 3. The variance of Y at every value of X is the same

(homogeneity of variances)

Why? The math requires it —themathematical process is called ―leastsquares‖ because it fits the regression lineby minimizing the squared errors from theline (mathematically easy, but not general —relies on above assumptions).



More than one predictor…

= + 1*X + 2 *W + 3 *Z

Each regression coefficient is the amount of change in the outcome variable that wouldbe expected per one-unit change of thepredictor, if all other variables in the modelwere held constant.



PurchaseSatisfaction

ControlVariables

Revisit

Intention

Product

Quality

5 ITEM SCALE 5 ITEM SCALE 10 ITEM SCALE



• Cluster Sampling

• Sample Size: 450

Cor. & Regression

Documents

Transcript of Cor. & Regression