Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent) Measures the...

51
Chapter 3 EXAMINING RELATIONSHIPS

Transcript of Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent) Measures the...

Page 1: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

Chapter 3EXAMINING

RELATIONSHIPS

Page 2: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

LESSON 3-1

Page 3: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

VARIABLES

Response variable (dependent)Measures the outcome of a study.

Explanatory variable (independent)Attempts to explain the observed outcomes

Page 4: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

The most common treatment for breast cancer was once the removal of the breast. It is now usual to remove only the tumor and nearby lymph nodes, followed by radiation. The change in policy was due to a large medical experiment that compared two treatments. Some breast cancer patients, chosen at random, were given each treatment. What are the explanatory and response variable? Are they categorical or quantitative?

Explanatory variable Response Variables Treatment old or new Survival Time

Categorical Quantitative

Page 5: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

SCATTERPLOT

A scatterplot shows the relationship between two quantitative variables measured on the same individuals .

Always plot the explanatory variable (if there is one) on thehorizontal axis (x-axis) of a scatterplot. The response variable on the vertical axis (y-axis).

If there is no explanatory-response distinction, eithervariable can go on the horizontal axis.

Page 6: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

Manatees are large, gentle sea creatures that live along the Florida coast. Many manatees are killed or injured by powerboats. Here are data on power boat registrations (in thousands) and the number of manatees killed by boats in Florida in the years 1977 to 1990:

A.We want to examine the relationship between number of power boats and number of manatee killed by boats. Which is the explanatory variable?

Explanatory Variable = Number of powerboat registrations

Page 7: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

MANATEES KILLED BY BOATS IN FLORIDA FROM 1977 TO 1990

Years Powerboats( thousands)

Manatees killed

1977 447 13

1978 460 21

79 481 24

80 498 26

81 513 24

82 512 20

83 526 15

84 559 34

85 585 33

86 615 33

87 645 39

88 675 43

89 711 50

90 719 47

Page 8: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

Make a scatterplot of these data. (Be sure to label the axes with variable names, not just x and y.) What does the scatterplot show about the relationship between these variables?

2nd Y=

ZOOMSTAT 9

The plot shows a moderately strong linear relationship. As registrations increase, the number of manatee deaths also tends to increase.

Page 9: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

Describing A Scatterplot

Direction (think slope)Positive DirectionNegative Direction

FormLinear RelationshipCurved RelationshipClusters

ScatterHow closely the points follow a clear form.

OutliersAn individual values that falls outside the overall

pattern of the relationship

Page 10: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

A .Describe the direction of the relationship. Are the variables positively or negatively associated?

The variables are positively associated; that is, at the number of jet skis in use increases, the number of manatees killed also increase.

B. Describe the form of the association. Is it linear?The association is linear.

C. Describe the strength of the relationship. Can the number of manatees killed be predicted accurately from power boat registrations? If powerboat registrations remained constant at 719,000, about how many manatees would be killed by boats each year?

The association is relatively strong. The number of manatees killed can be predicated accurately from the number of powerboat registrations. If the number of registrations remains constant 719,000, we would expect between 45 and 50 manatees to be killed per year.

Page 11: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

Calories and hot dogs that are high in calories are also high in salt? The following slide is a scatterplot of the calories and salt content (measured as milligrams of sodium) in17 brands of meat hot dogs. A .Roughly what are the lowest

and highest calories counts among these brands? Roughly what is the sodium level in the brands with the fewest and with the most calories.Lowest: about 107 calories with about 145 mg of sodiumHighest: about 195 calories with about 510 mg of sodiumB. Does the scatterplot show a clear positive or negative association? Say in words what this association means about calories and salt in hot dogs?There is positive association: high calorie hot dogs tendto be high in salt, and low calorie hot dogs tend to have low sodium.

C. Are there any outliers? is the relationship (ignoring outliers) roughly linear in form? Still ignoring outliers, how strong would say the relationship between calories and sodium is?Lower left point is, but ignoring it we have a linear pattern.

Page 12: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

CORRELATION Lesson 3-2

Page 13: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

Look at these scatterplots, what can you tell me about them?

They are the same graph, I just changed the scale, so be careful when describing things. SO we will use a mathematical number CORRELATION to accurately describe our plots

Page 14: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

CORRELATION

Correlation measures the direction and strength of the linear

relationship between two quantitative variables.

Correlation is usually written as r.

Page 15: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

Femur =56 59 64 74 Humerus =63 70 72 84

The lengths of two bones in five fossil specimens of extinct beast Archaeopteryx.

A. Find the correlation r

Page 16: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

FACTS ABOUT CORRELATION

Correlation makes no distinction between explanatory and response variable.

Correlation requires both variables to be quantitative.

Correlation (r)itself has no unit of measurement; its just a number.

Positive (r) indicates positive association.

Negative (r) indicates negative association.

The correlation (r) is always between -1 and 1.

Page 17: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

The closer (r) is to +1, the stronger the evidence of positive association between two variables.

The closer (r) is to -1, the stronger the evidence of negative association between to variables.

If (r) is close to 0, does not rule out any strong relationship between xand y, there could still be a strong relationship but one that is not linear.

Correlation is strongly eff ected by a outlying observations.

FACTS ABOUT CORRELATION

Page 18: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

POSITIVE LINEAR CORRELATION

Perfect Positive Linear Correlationr = 1

Strong Positive Linear Correlationr ≈

Weak Positive Linear Correlationr ≈ .4

Page 19: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

NEGATIVE LINEAR CORRELATION

Perfect Negative Linear Correlation r = -1

Strong Negative Linear Correlation r ≈ - .9

Weak Negative Linear Correlation r ≈ - .4

Page 20: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

NO LINEAR CORRELATION

No Linear Correlation r is close to 0

Page 21: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

DESCRIBING THE STRENGTH OF A LINEAR RELATIONSHIP

-1 -.8 -.5 0 .5 .8 1

Strong moderate weak moderate strong

Page 22: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

Do people with larger brains have higher IQ scores? A study looked at 40 volunteer subjects, 20 men and 20women. Brain size was measured by magnetic resonance imagining. Table 3-3 gives the data. the MRI count is the number of “pixels” the brain covered in the image. IQ was measured by the Wechsler test.

A) Make a scatterplot of IQ score versus MRI count, using distinct symbols for the mean and women. In addition find the correlation between IQ and MRI for all 40 subjects for the men alone and for the women alone.

Page 23: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

r (all) = .3576r (men) = .4984r (women) = .3257

Page 24: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

B) Men are larger than women on the average, so they have larger brains. How is this size effect visible in your graph?

The points for mean are generally located on the right side of the plot , while the women’s points are generally on the left.

C) Your result in (b) suggests separating men and women in looking at he relationship between brain size and IQ. Use your work in (a) to comment on the nature and strength of this relationship for women and for men.

The correlation for men and women suggests that there is a moderately positive association for men and a weak one for women. However, one significant feature of the data that can be observed in the scatterplot is that the sample group was highly stratified; that is, there were 10 men and 10 women with high IQs (at least 130), while other 10 of each gender had IQs of no more than 103. The men’s higher correlation can be attributed partly to the two subjects with large brains and 103 IQs (which are relative to the low IQ group). The men’s correlation might not remain so high with a larger sample.

Page 25: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

LEAST SQUARE REGRESSION

3-3

Page 26: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

Is a straight line (equation) that describes how a response variable y changes as an explanatory variable x changes.

A best Fit Line (equation) is used to predict the value of y for a given value of x.

Best Fit Line unlike correlation, requires that we have an explanatory variable and a response variable.

BEST FIT LINE

Page 27: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

BEST FIT LINE

Is the line that comes closer to all the points.y is the actual value is the predicted value.

Residual is the difference between the observed value and the associated predicted value. Negative residual shows a model that is overestimatePositive residual shows a model value that is underestimate.

residual

Page 28: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

“BEST FIT” MEANS LEAST-SQUARES

The line of “Best Fit” is the line for which the sum of the squared residuals is the smallest

The line of “Best Fit” is called a Least-squares Regression Line (LSRL)

Page 29: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

The line must go through the point

Equation

Slope

Intercept

Equation of LSRL

Page 30: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

Keeping water supplies clean requires regular measurement of levels of pollutants. The measurements are indirect –a typical analysis involves forming a dye by a chemical reaction with the dissolved pollutant, then passing light through the solution and measuring its “absorbance. ”To calibrate such measurements, the laboratory measures known standard solutions and uses regression to relate series of data on the absorbance for different levels of nitrates. Nitrates are measured in milligrams per liter of water Nitrates 50 50 100 200 400 800 1200 1600 2000 2000 Absorbance 7.0 7.5 12.8 24.0 47.0 93.0 138.0 183.0 230.0 226.0

A. What is the equation of the least-square lines for predicating absorbance from concentration? Find our slope & intecept

r = .9999

How do I get these from calculator?2 var stats

96.83-(.1133)(840)= 1.658

Page 31: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

If the lab analyzed a specimen with 500 milligrams of nitrates perliter, what do you expect the absorbance to be? Based on your plot and the correlation, do you expect your predicted absorbance to be very accurate?

=58.31

This prediction should be very accurate since the relationship is so strong

Page 32: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)
Page 33: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

ASSESSING THE ACCURACY OF THE LINEAR MODEL

The standard deviation and r² are numerical measures use in the assessment of how well the model fi ts.

r² -Coeffi cient of Determination Measure of the proportion of variability in the y variable that can

be “explained” by the linear relationship between x and y 100*r² is the percentage of variation in y that can be attributed

to approximate linear relationship between x and y. Is a number between 0 and 1, the close r² is to 1, the better the

line describes how the change in the explanatory variable affects the value of the response variable. Meaning a better fit.

r² is the square of the linear correlation coeffi cient for the least-square regression model.

A natural measure of variation about the least-square regression line is the sum of the squared residuals.

Measures the amount of variation in y that cannot be explained by the linear relationship between x and y.

Page 34: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

A study of class attendance and grades among first year students at a state university showed that in general students who attended a higher percent of their classes earned higher grades. Class attendance explained 16% of the variation in grade index among students. What is the numerical value of the correlation between percent of classes attended and grade index?

Higher attendance goes with high grades, so the correlation must be positive

Page 35: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

Some people think that the behavior of the stock market in January predicts its behavior for the rest of the year. Take the explanatory variable x to be the percent change in a stock market index January and the response variable y to be the change in the index for the entire year. We expect a positive correlation between x and y because the change in January contributes to the full year’s change. Calculation from data for the years 1960 to 1997 gives

r = .596

A. What percent of the observed variation in yearly changes in the index is explanatory by a straight-line relationship with the change during January.

r = .596

Page 36: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

B. What is the equation of the least-squares line for predicting full-year change from January change?

=.0907 (1.707)(.0175)

= 0.06083 = 6.083%

yˆ 6.083%1.707x

C. The mean change in January is . Do you have to use the regression line to predict the change in the index in a year in which the index rises 1.75% in January.

NO. The predicted change is . Since the regression line must pass through .

Page 37: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

ASSUMPTIONS AND CONDITIONS

Check the scatterplotThe shape must be linear or we can’t use regression at all.

Watch out for outliers.Outlying values have large residuals and squaring makes their influences that much greater.

Outlying points can dramatically change a regression model.They can change the sign of the slope, misleading us

about the underlying relationship between variables.A r²of 100%

You may have accidentally regressed two variables that measure the same thing.

Page 38: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

Don’t extrapolate beyond the data.A linear model will often do reasonable job of summarizing a

relationship in the narrow range of observed x-values.Beware of predicting y-values for x-values that lie outside the

range of the original data.If you must extrapolate into the future, at least don’t believe

that the prediction will come true!

Don’t infer that x causes y just because there is good linear model for their relationship.

Correlation and regression describe only linear relationships.You can do calculations on any two quantitative variables, but

results are only useful if scatter plot is linear

Correlation is not resistant. Look for unusual observations.

Page 39: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

RESIDUALS

A residual is the diff erence between an observed value of the response variable and the value predicted by the regression line.

residual = observed – predicted =

Plotting the residualA residual plot is a scatterplot of the (x,

residual) pairsResidual plot is a good place to start when assessing the appropriateness of the regression line.

Residuals will always sum up to be 0. S0, the residuals mean will equal 0 also.

Page 40: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

We have a study of fat gained dues to change in excercise

Fat gain=3.505 - .00344Change

One subjects Change rose by 135 calories. That subject gained 2.7 kg of fat.

The predicted fat gain on 135 calories is what. 3.505 - .00344(135)= 3.04 kg

So the residual is actual – predicted2.7 - 3.04 = -.34 kg

CALCULATE RESIDUAL

Page 41: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

Residuals

If your residual is positive, what has happened?

That means you have value that exceeds the prediction, overestimate

What about when it is Negative?

Then our actual does not get to the predicted value. We underestimate.

Page 42: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

Determine whether a linear model is appropriate to describe the relationship between the explanatory and response variables.

Residual are what is “left over” after the model describes the relationship, they often reveal subtleties that were not clear from a plot of the original data.

Determine whether the variance of the residuals is constant.

Check for outliers

RESIDUAL PLOT

Page 43: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

RESIDUAL PLOTS –UNIFORMED

The uniform scatter of points indicated that the regression line is good model

Page 44: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

RESIDUAL PLOTS –CURVED

The residuals have a curved pattern, so a straight line is an inappropriate model

Page 45: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

RESIDUAL PLOTS –INCREASING/DECREASING

The response variable y has more spread for larger valuesof the explanatory variable x, so prediction will be lessaccurate when x is large.

Page 46: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

An outlier is an observation that lies outside the overall pattern of the other observation.

An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation.

Points that are outliers in the x directions of a scatterplot are often influential for the least-square regression line.

INFLUENTIAL OBSERVATIONS

Page 47: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

RESIDUAL PLOT –INDIVIDUAL POINTS

Individual points with large residual (Child 19) are outliers.

Individual points that are extreme in the x direction (Child 18)are influential observations

Page 48: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

Lean body mass as a predictor of metabolic rate. Exercise 3.12, page 132 provides data from a study of dieting for12 women and 7 men subjects. We explore the data further.

Type in mass to L1 and rate to L2.

Page 49: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

A. Make a scatterplot

B. Perform least-squares regression on your calculator and record the equation and the correlation. Lean body mass explains what percent of the variation in metabolic rate for women?

Lean body mass explains about 76.82% of the variation in metabolic rate.

Page 50: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

C. Does the least-square line provide an adequate model for the data?

We will make this on the calculator and look at the patternGraphing Residuals

Y1 → Vars/Y-Vars/Function

Turn on Stat Plot 1

ZOOM 9

Page 51: Chapter 3 EXAMINING RELATIONSHIPS. LESSON 3-1 VARIABLES Response variable (dependent)  Measures the outcome of a study. Explanatory variable (independent)

C. From the residual plot, the line does appear to provide an adequate model. The residual are scattered about the horizontal axis and no patterns are evident.