Chapters 8 and 9: Correlations Between Data Sets Math 1680.
-
Upload
hillary-thomas -
Category
Documents
-
view
224 -
download
0
Transcript of Chapters 8 and 9: Correlations Between Data Sets Math 1680.
![Page 1: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/1.jpg)
Chapters 8 and 9: Correlations Between Data Sets
Math 1680
![Page 2: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/2.jpg)
Overview Scatter Plots Associations The Correlation Coefficient Sketching Scatter Plots Changes of Scale Summary
![Page 3: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/3.jpg)
Scatter Plots Often, we are interested in comparing
two related data sets Heights and weights of students SAT scores and freshman GPA Age and fuel efficiency of vehicles
We can draw a scatter plot of the data set Plot paired data points on a Cartesian plane
![Page 4: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/4.jpg)
Scatter Plots Scatter plot for
the heights of 1,078 fathers and their adult sons From HANES
study
![Page 5: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/5.jpg)
Scatter Plots
What does the dashed diagonal line represent?
Find the point representing a 5'3¼" father who has a 5'6½" son
![Page 6: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/6.jpg)
Scatter Plots What does the
vertical dashed column represent?
Consider the families where the father was 72" tall, to the nearest inch How tall was the
tallest son? Shortest?
![Page 7: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/7.jpg)
Scatter Plots Was the
average height of the fathers around 64”, 68” or 72”?
Was the SD of the fathers’ heights around 3", 6" or 9"?
![Page 8: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/8.jpg)
Scatter Plots The points form a
swarm that is more or less football-shaped This indicates
that there is a linear association between the fathers’ heights and the sons’ heights
![Page 9: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/9.jpg)
Scatter Plots Short fathers tend
to have short sons, and tall fathers tend to have tall sons We say there is a
positive association between the heights of fathers and sons
What would it mean for there to be a negative association between the heights?
![Page 10: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/10.jpg)
Scatter Plots Does knowing the father’s height
give a precise prediction of his son’s height?
Does knowing the father’s height let you better predict his son’s height?
![Page 11: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/11.jpg)
Scatter Plots We will generally assume the
scatter plots are football-shaped Association is linear in nature Each data set is approximately
normal
![Page 12: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/12.jpg)
Scatter Plots Key features of scatter plots
Given two data sets X and Y, … The point of averages is the point (x, y)
The average of a data set is denoted by μ (Greek mu, for mean)
The subscript indicates which set is being referenced
It will be in the center of the cloud Due to the normal approximation, the vast
majority (95%) of the cloud should fall within 2 SD’s less than and greater than average for both X and Y
![Page 13: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/13.jpg)
Scatter Plots
![Page 14: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/14.jpg)
Associations When given a value in one data
set, we often want to make a prediction for the other data set We call our given value the
independent variable We call the value we are trying to
predict the dependent variable
![Page 15: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/15.jpg)
Associations If there is indeed a relationship between the
two data sets, we can say various things about their association:
Strong: Knowing X helps you a lot in predicting Y, and vice versa
Weak: Knowing X doesn’t really help you predict Y, and vice versa
Positive: X and Y are directly proportional The higher in one you look, the higher in the other you
should be Negative: X and Y are inversely proportional
The higher in one you look, the lower in the other you should be
![Page 16: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/16.jpg)
Associations Positive associations
Study time/final grade Height/weight SAT score/GPA Clouds in sky/chance
of rain Bowling
practice/bowling score Age of husband/age of
wife
Negative associations Age of car/fuel
efficiency Golfing
practice/golf score Dental
hygiene/cavities formed
Pollution/air quality Speed/mile time
![Page 17: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/17.jpg)
Associations What kind of association is this?
![Page 18: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/18.jpg)
Associations What kind of association is this?
![Page 19: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/19.jpg)
Associations Remember that even a very strong
association does not necessarily imply a causal relationship There may be a confounding
influence at play
![Page 20: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/20.jpg)
The Correlation Coefficient While strong/weak and
positive/negative give a sense of the association, we want a way to quantify the strength and direction of the association The correlation coefficient (r) is the
statistic which accomplishes this
![Page 21: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/21.jpg)
The Correlation Coefficient The correlation coefficient is always
between –1 and 1 A positive r means that there is a positive
association between the sets A negative r means that there is a negative
association between the sets If r is close to 0, then there is only a weak
association between the sets If r is close to 1 or –1, then there is a strong
association between the sets
![Page 22: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/22.jpg)
The Correlation Coefficient The following plots have
and , with 50 points in them
The only difference between them is the correlation coefficient Note how the points fall into a line as
r approaches 1 or –1
3 YX 1 YX SDSD
![Page 23: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/23.jpg)
![Page 24: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/24.jpg)
The Correlation Coefficient To calculate r…
Find the average and SD of each data set
Multiply the data sets pairwise and find the average
The correlation is the average of the product minus the product of the averages, all divided by the product of the SD’s
YX
YXXY
SDSDr
![Page 25: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/25.jpg)
The Correlation Coefficient
X Y
1 5
3 9
4 7
5 1
7 13 91
5
28
27
5
XY
4
7
Y
Y
SD
2
4
X
X
SD
2.31XY
4.0)4)(2(
)7)(4(2.31
r
![Page 26: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/26.jpg)
The Correlation Coefficient Compute r for the following data
X Y
1 2
2 1
3 4
4 3
5 7
6 5
7 6
X Y
1 3
3 7
4 9
5 11
7 15
0.8214
1
![Page 27: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/27.jpg)
The Correlation Coefficient Estimate the correlation
![Page 28: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/28.jpg)
The Correlation Coefficient Estimate the correlation
![Page 29: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/29.jpg)
Sketching Scatter Plots The SD line is the line consisting of
all the points where the standard score in X equals the standard score in Y zX = zY
To sketch the SD line, draw a line bisecting the long axis of the football shape Note that the SD line always goes
through the point of averages
![Page 30: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/30.jpg)
Sketching Scatter Plots Given the five-statistic summary (averages,
SD’s, and correlation) for a pair of data sets, we can sketch the scatter plot
Plot the point of averages in the center Mark two SD’s in both directions, on both axes Plot the point 1 SD above average for both data sets draw a line connecting this point and the point of
averages This is the SD line
Draw an ellipse with the SD line as its long axis Ellipse should go just beyond the 2 SD marks in all
directions The value of r determines how oblong the ellipse is
![Page 31: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/31.jpg)
Sketching Scatter Plots A study of the IQs of husbands and wives
obtained the following results Husbands: average IQ = 100, SD = 15 Wives: average IQ = 100,
SD = 15 r = 0.6
Sketch the scatter plot
![Page 32: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/32.jpg)
Changes of Scale The correlation coefficient is not affected by
changes of scale Moving: adding the same number to all of the
values of one variable Stretching: multiplying the same positive
number to all the values of one variable Would r change if we multiplied by a negative number?
The correlation coefficient is also unaffected by interchanging the two data sets
![Page 33: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/33.jpg)
Changes of Scale
![Page 34: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/34.jpg)
Changes of Scale
![Page 35: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/35.jpg)
Changes of Scale Compute r for each of the
following data sets
X Y
0 8
4 9
6 10
8 12
12 6
X Y
0 2
2 3
3 4
4 6
6 0
r = -0.15
![Page 36: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/36.jpg)
Summary The relationship between two variables, X
and Y, can be graphed in a scatter plot When the scatter plot is tightly clustered
around a line, there is a strong linear association between X and Y
A scatter plot can be characterized by its five-statistic summary Average and SD of the X values Average and SD of the Y values Correlation coefficient
![Page 37: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/37.jpg)
Summary When the correlation coefficient gets
closer to 1 or –1, the points cluster more tightly around a line Positive association has a positive r-value Negative association has a negative r-value
Calculating the correlation coefficient Take the average of the product Subtract the product of the averages Divide the difference by the product of the
SD’s
![Page 38: Chapters 8 and 9: Correlations Between Data Sets Math 1680.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649e405503460f94b30fe3/html5/thumbnails/38.jpg)
Summary The correlation coefficient is not
affected by changes of scale or transposing the variables
Correlation does not measure causation!