Chapter 3 Examining Relationships
description
Transcript of Chapter 3 Examining Relationships
![Page 1: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/1.jpg)
Chapter 3Examining Relationships
“Get the facts first, and then you can distort them as much as you please.”
Mark Twain
![Page 2: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/2.jpg)
3.1 Scatterplots
Many statistical studies involve MORE THAN ONE variable.
A SCATTERPLOT represents a graphical display that allows one to observe a possible relationship between two quantitative variables.
![Page 3: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/3.jpg)
Response Variable vs. Explanatory Variable
Response Variable
– Measures an outcome of a study
Explanatory variable
– Attempts to explain the observed outcomes
![Page 4: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/4.jpg)
Response Variable vs. Explanatory Variable
When we think changes in a variable x explain, or even cause, changes in a second variable, y, we call x an explanatory variable and y a response variable.
y
Response
Variable
x
Explanatory variable
![Page 5: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/5.jpg)
IMPORTANT!
Even if it appears that y can be “predicted” from x, it does not follow that x causes y.
ASSOCIATION DOES NOT IMPLY CAUSATION.
![Page 6: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/6.jpg)
When examining a scatterplot, look for an overall PATTERN.
Consider:– Direction– Form– Strength– Positive association– Negative association– outliers
![Page 7: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/7.jpg)
Positive vs. Negative Association
Positive Association
(between two variables)– Above-average values of
one tend to accompany above-average values of the other
– Below-average values of one tend to accompany below-average values of the other
Negative Association
(between two variables)
– Above-average values of one tend to accompany below-average values of the other
![Page 8: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/8.jpg)
3.2 Correlation
Describes the direction and strength of a straight-line relationship between two quantitative variables.
Usually written as r.
1
1i i
x y
x x y yrn s s
![Page 9: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/9.jpg)
Facts About Correlation
Positive r indicates positive association between the variables and negative r indicates negative association.
The correlation r always fall between –1 an 1 inclusive. The correlation between x and y does NOT change when
we change the units of measurement of x, y, or both. Correlation ignores the distinction between explanatory
and response variables. Correlation measures the strength of ONLY straight-line
association between two variables. The correlation is STRONGLY affected by a few outlying
observations.
![Page 10: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/10.jpg)
3.3 Least-Squares Regression
If a scatterplot shows a linear relationship between two quantitative variables, least-squares regression is a method for finding a line that summarizes the relationship between the two variables, at least within the domain of the explanatory variable x.
The least-squares regression line (LSRL) is a mathematical model for the data.
![Page 11: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/11.jpg)
Regression Line
Straight line Describes how a response variable y changes
as an explanatory variable x changes. Sometimes it is used to PREDICT the value of
y for a given value of x. Makes the sum of the squares of the vertical
distances of the data points from the line as small as possible.
![Page 12: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/12.jpg)
Residual
A difference between an OBSERVED y and a PREDICTED y:
y y
y y
![Page 13: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/13.jpg)
Some Important Facts About the LSRL
It is a mathematical model for the data. It is the line that makes the sum of the squares of the
residuals AS SMALL AS POSSIBLE. The point is on the line, where is the mean of
the x values, and is the mean of the y values. The form is (N.B. b is the slope and a is the y-
intercept. (On the regression line, a change of one standard deviation
in x corresponds to a change of r standard deviations in y)
,x y xy
y a bx
y
x
sb r
s
![Page 14: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/14.jpg)
Some Important Facts About the LSRL
The slope b is the approximate change in y when x increases by 1.
The y-intercept a is the predicted value of y when
a y bx
0.x
![Page 15: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/15.jpg)
Coefficient of Determination
Symbolism:
It is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x.
Measure of HOW SUCCESSFUL the regression is in explaining the response.
2r
![Page 16: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/16.jpg)
Calculation of
2r
2
2
2
SSM
SSMwhere
SSM sum of squares about the mean
SSE sum of squares of residuals
SSEr
y y y
y y
![Page 17: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/17.jpg)
Example
L1 L2
2 6
4 12
6 15
2
y y 2
y y
? ?x y
![Page 18: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/18.jpg)
Example Solution
L1 L2
2 6 25 .25
4 12 1 1
6 15 16 .25
42 1.50
2
y y 2
y y
4 11x y
![Page 19: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/19.jpg)
Things to Note:
Sum of deviations from mean = 0.
Sum of residuals = 0.
r2 > 0 does not mean r > 0. If x and y are negatively associated, then r < 0.
![Page 20: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/20.jpg)
Outlier
A point that lies outside the overall pattern of the other points in a scatterplot.
It can be an outlier in the x direction, in the y direction, or in both directions.
![Page 21: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/21.jpg)
Influential Point
A point that, if removed, would considerably change the position of the regression line.
Points that are outliers in the x direction are often influential.
![Page 22: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/22.jpg)
Words of Caution
Do NOT CONFUSE the slope b of the LSRL with the correlation r.
– The relation between the two is given by the formula
– If you are working with normalized data, then b does equal r since
When you normalize a data set, the normalized data has a
mean = 0 and standard deviation = 0.
y
x
sb r
s
1y xs s
![Page 23: Chapter 3 Examining Relationships](https://reader035.fdocuments.us/reader035/viewer/2022062409/56814c50550346895db96050/html5/thumbnails/23.jpg)
More Words of Caution
If you are working with normalized data, the regression line has the simple form
Since the regression line contains the mean of x and the mean of y, and since normalized data has a mean of 0, the regression line for normalized x and y values contains (0, 0).
, where and are
normalized and values respectively.n n n ny rx x y
x y