The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for...

Post on 13-Dec-2015

217 views 0 download

Tags:

Transcript of The Big Picture Where we are coming from and where we are headed… Chapter 5 showed us methods for...

The Big Picture

Where we are coming from and where we are headed… Chapter 5 showed us methods for summarizing data using descriptive statistics, but only one variable at a time. In Chapter 6, we learn how to analyze the relationship between two quantitative variables using scatterplots, correlation, and regression. In Chapter 7, we will learn about probability, which we will need in order to perform statistical inference.

1

• Scatterplots and Correlation

Section 6.1

Objectives:

Construct and interpret scatterplots for two quantitative variables.

Calculate and interpret the correlation coefficient.

Determine whether a linear correlation exists between two variables.

Explanatory and Response Variables• Response variable measures an

outcome of a study.• An explanatory variable explains,

influences or causes change in a response variable.

• Independent variable and dependent variable.

• Be careful!! The relationship between two variables can be strongly influenced by other variables that are lurking in the background.

Explanatory and response variables

In each of the following examples, determine if there is a clear explanatory and response variable, or if it is just best to explore the relationship.

• Price of a house and square footage of a house

• The arm span and height of a person

• Amount of snow in the Colorado mountains and the volume of water in area rivers

Explanatory-square feet Response-price

Explanatory-arm span response-height

Explore the relationship

Displaying relationships: Scatterplots

– A scatterplot displays the relationship between two quantitative variables measured on the same individuals.

– It is the most common way to display the relation between two quantitative variables.

– It displays the form, direction, and strength of the relationship between two quantitative variables.

– The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data appears as the point in the plot fixed by the values of both variables for that individual.

Lot x=square footage (100s of sq ft)

y=sales price ($1000s)

Harding St 75 155

Newton Ave 125 210

Stacy Ct 125 290

Eastern Ave 175 360

Second St 175 250

Sunnybrook Rd 225 450

Ahlstrand Rd 225 530

Eastern Ave 275 635

Example:

7

ScatterplotsThe relationship between two quantitative variables can take many different forms. Four of the most common are:

Positive linear relationship: As x increases, y also tends to increase.

Negative linear relationship: As x increases, y tends to decrease.

No apparent relationship: As x increases, y tends to remain unchanged.

Nonlinear relationship: The x and y variable are related, but not in a way that can be approximated using a straight line.

Interpreting scatterplots

• How to examine a scatterplot:

– Determine the overall pattern showing:• The form, direction, and strength of the relationship

– Identify any outliers or other deviations from this pattern.

Interpreting scatterplots• Overall Pattern

– Form: Linear relationships, where the points show a straight-line pattern, are an important form of relationship between two variables. Curved relationships and clusters (a number of similar individuals that occur together) are other forms to watch for.

– Direction: If the relationship has a clear direction, we speak of either positive association (the more the x, the more the y) or negative association (the more the x, the less the y).

– Strength: The strength of a relationship is determined by how close the points in the scatterplot lie to a line.

Describe the scatterplot:

Strong positive linearStrong negative linear

Strong positive linearStrong negative linear

Strong negative curved

Sketch a scatterplot of the data and then describe the overall pattern.

Is there an obvious explanatory and response variable?

Exercises:

Pg. 337/6.1,6.3,6.4

Pg. 343-345/6.5,6.6,6.8 

 

Scatterplot & Correlation

• Scatterplots provide a visual tool for looking at the relationship between two variables. Unfortunately, our eyes are not good tools for judging the strength of the relationship. Changes in the scale or the amount of white space in the graph can easily change our judgment of the strength of the relationship.

• Correlation is a numerical measure we use to show the strength of linear association.

A scatter plot is helpful in understanding the form, direction, and strength of the relationship between two variables.

Correlation allows us to quantify the direction and strength of the relationship.

Ex 1: Describe the correlation illustrated by the scatter plot.

There is a positive correlation between the two data sets.

As the average daily temperature increased, the number of visitors increased.

Ex. 2: Describe the correlation illustrated by the scatter plot.

There is a negative correlation between elevation and mean annual temp.

As the elevation in Nevada increases, the mean annual temperature decreases.

Facts about correlation• What kind of variables do we use?

– 1. No distinction between explanatory and response variables.

– 2. Both variables must be quantitative• Numerical properties

– 1. – 2. r > 0: positive association between variables– 3. r < 0: negative association between variables– 4. If r = 1 or r = - 1, it indicates a perfect linear

relationship– 5. As |r| is getting close to 1, much stronger relationship

– 6. Effected by a few outliers not resistant.– 7. It doesn’t describe curved relationships– 8. It is not affected by changing units

strongerstronger

iprelationshpositiveiprelationshnegative

101

11 r

20

Measuring linear association: correlation r(The Pearson Product-Moment Correlation Coefficient or Correlation Coefficient)

))((1

1

y

i

x

i

s

yy

s

xx

nr

Don’t worry, that’s why we have

graphing calculators!!!

You can use a graphing calculator to perform a linear regression and find the correlation coefficient r.

To display the correlation coefficient r, you may have to turn on the diagnostic mode. To do this, press and choose the DiagnosticOn mode.Press enter, and then press enter again to activate it. You can use a graphing calculator to perform a linear regression and find the correlation coefficient r.

Example 1:

1.) Sketch a scatterplot2.) State the overall pattern3.) Are there any outliers?4.) Calculate the correlation coefficient

In one of the Boston city parks, there has been a problem with muggings in the summer months. A police officer took a random sample of 10 days (out of the 90-day summer) and compiled the following data. For each day, x represents the number of police officers on duty in the park and y represents the number of reported muggings on that day.

Example 2:

X 10 15 16 1 4 6 18 12 14

y 5 2 1 9 7 8 1 5 3

7

6

a. Construct a scatterplot

b. Estimate a value for r.c. Calculate the actual r value.

A Caution• The correlation coefficient measures the

strength of the relationship between two variables.

• A strong correlation does not imply a cause and effect relationship.

• A correlation between two variables may be caused by other (either known or unknown) variables called lurking variables.

Example Cause-Effect Relationship

During the months of March and April, the weekly weight increases of a puppy in New York were collected.  For the same time frame, the retail price increases of snowshoes in Alaska were collected.

The weight of a The retail price ofGrowing puppy in NY snowshoes in Alaska

8 pounds $32.458.5 $32.959 $33.459.6 $34.00

10.1 $34.5010.7 $35.10

11.5 $35.63

Example Cause-Effect Relationship cont.

• The data was examined and was found to have a very strong linear correlation. So, this must mean that the weight increase of a puppy in New York is causing snowshoe prices in Alaska to increase.  Of course this is not true!

•  The moral of this example is:  "be careful what you infer from your statistical analyses."  Be sure your relationship makes sense.  Also keep in mind that other factors may be involved in a cause-effect relationship

Exercises:Pg.350/6.10-6.12Pg.355/6.13-6.15Pg.359/6.17-6.22 (section review)