Download - AP Statistics Distributions Curriculumteachers.dadeschools.net/sdaniel/AP Statistics Student... · 2015-11-13 · 11/13/2015 5 Let’s Practice… According to Edmunds, 2015 Honda

11/13/2015

1

AP Statistics Student Curriculum Review‐ Fall 2015

Mrs. Daniel

Alonzo & Tracy Mourning Sr. High

[email protected]

This PowerPoint is posted at:

teachers.dadeschools.net/sdaniel

Agenda

1. Z‐scores & Normal Distributions

2. FRQ: 2011 #1

3. Calculator Skills Review: Calculating Probabilities/Percentages using z‐scores

4. Multiple Choice Practice

5. Calculator Skills Review: Scatterplots, Correlation & Linear Regression

6. FRQ: 2007B #4

7. Multiple Choice Practice

8. FRQ: 2013 #6

Normal Distributions• All Normal curves are symmetric, single‐peaked, and bell‐

shaped

• A Specific Normal curve is described by giving its mean µand standard deviation σ.

Two Normal curves, showing the mean µ and standard deviation σ.

Normal Distributions• We abbreviate the Normal distribution with mean µ and standard deviation σ as N(µ,σ).

• Any particular Normal distribution is completely specified by two numbers: its mean µ and standard deviation σ.

• The mean of a Normal distribution is the center of the symmetric Normal curve.

• The standard deviation is the distance from the center to the change‐of‐curvature points on either side.

Normal Distributions are Useful…• Normal distributions are good descriptions for some distributions of real data.

• Normal distributions are good approximations of the results of many kinds of chance outcomes.

• Many statistical inference procedures are based on Normal distributions.

11/13/2015

2

Importance of Standardizing

• There are infinitely many different Normal distributions; all with unique standard deviations and means.

• In order to more effectively compare different Normal distributions we “standardize”.

• Standardizing allows us to compare apples to apples.

• We can compare SAT and ACT scores by standardizing.

The Standardized Normal Distribution

All Normal distributions are the same if we measure in units of size σ from the mean µ as center.

The standardized Normal distributionis the Normal distribution with mean 0

and standard deviation 1.

x= variable

µ=mean

σ= standard deviation

Z‐score Formula

Let’s Practice…

Venus Williams has a very fast first serve. Historically, Ms. Williams’ first serve averages 88 mph with a standard deviation of 12 mph.

A. What is the standard normal score for a first serve at 75 mph?

B. What is the standard normal score for a first serve at 105 mph?

Let’s Practice…

Venus Williams has a very fast first serve. Historically, Ms. Williams’ first serve averages 88 mph with a standard deviation of 12 mph.

A. What is the standard normal score for a first serve at 75 mph?

z =

= ‐1.08

B. What is the standard normal score for a first serve at 105 mph?

z =

= 1.42

FRQ: 2011 #1

11/13/2015

3

FRQ 2011 #1a

No, it is not reasonable to believe that the distribution of 40‐yard running times is approximately normal, because the minimum time is only 1.33 standard deviations below the mean. In a normal distribution we expect about 3 deviations below the mean.

z = . .

.= ‐1.33

FRQ 2011 #1b

The z‐score for a player who can lift a weight of

370 pounds is z = = 2.4.

The z‐score indicates that the amount of weight the player can lift is 2.4 standard deviations above the mean for all previous players in this position. He is very strong!

FRQ 2011 #1c

Player A.

Although the z‐score are similair for weight, Player A has a significantly lower 40 yard dash time as evidenced by (‐1.2 for A vs. ‐0.2 for B).

11/13/2015

4

Calculating Probabilities/

Percentages using z‐scores

By Hand/Table

Draw and label Normal

curve.

Use z‐score formula, plug in

values and solve.

Using z‐score, look up p‐value in Standard Normal Table.

Conclude in context.

Calculator

Draw and label Normal curve.

Plug in lower bound, upper bound, mean and standard deviation.

Conclude in context.

By Hand vs. Calculator

• AP awards FULL credit for answers done “by hand/table” or using the calculator.

• Must show work

– Hand/Table: z‐score formula plugged in

– Calculator: syntax with labels

• Lower/upper bounds, mean and standard deviation

• Calculator leads to less errors and is faster

TI‐84 Calculator: NormalCDF

1. 2nd, VARS (Distr)

2. 2:normalcdf(

3. Enter the following information:

• Lower: (the lower bound of the region OR 1^‐99)

• Upper: (the upper band of the region OR1,000,000)

• µ: (mean)

• : (standard deviation)

4. Press enter, number that appears is the p‐value

TI‐Nspire: NormalCDF1. Select Calculator (on home screen), press center button.

2. Press menu, press enter.

3. Select 6: Statistics, press enter.

4. Select 5: Distributions, press enter.

5. Select 2: Normal Cdf, press enter.

6. Enter the following information:

• Lower: (the lower bound of the region OR 1^‐99/‐ )

• Upper: (the upper band of the region OR 1,000,000/+ )

• µ: (mean)

: (standard deviation)

7. Press enter, number that appears is the p‐value

11/13/2015

5

Let’s Practice…

According to Edmunds, 2015 Honda Civics have an average fuel efficiency of 25 mpg with a standard deviation of 4.5mpg. What is the probability that a randomly selected car with have a gas mileage of 30 or lower?

Solution

Let’s Practice…

According to ACT, the average ACT score for college bound seniors was 20.8 with a standard deviation of 4.8. Jose knows he was in the 82nd

percentile. What was his ACT score?

Solution

Normal Calculations on Calculator

Calculates Example

NormalCDF Probability of obtaining a value BETWEEN two values

What percent of students scored between 70 and 95 on the test?

InvNorm X‐value given probability or percentile

Tommy scored in the 92nd

percentile on the test; what was his raw score?

NormalPDF(RARE)

Probability of obtaining PRECISELYor EXACTLY a specific x‐value

What is the probability that Suzy scored exactly a 75 on the test?

Let’s Practice…

According to ACT, the average ACT score for college bound seniors was 20.8 with a standard deviation of 4.8.

A. What percentage of college bound seniors scored lower than 19 on the ACT?

B. What percentage of college bound seniors scored between 27 and 32 on the ACT?

11/13/2015

6

Let’s Practice…


A. What percentage of college bound seniors scored lower than 19 on the ACT?

Normalcdf(0, 19, 20.8, 4.8)= 0.3538117…

B. What percentage of college bound seniors scored between 27 and 32 on the ACT?

Normalcdf(27, 32, 20.8, 4.8)= 0.08842106…

Let’s Practice…


C. What percentage of college bound seniors scored a 33 or greater on the ACT?

D. If Juan scored in the 90th percentile, what was his ACT score?

Let’s Practice…


C. What percentage of college bound seniors scored a 33 or greater on the ACT?

Normalcdf(33, 36, 20.8, 4.8)= 0.00474523…

D. If Juan scored in the 90th percentile, what was his ACT score?

Invnorm(.90, 20.8, 4.8)= 26.95 (or 27 on ACT)

MC #1

Scores on the ACT college entrance exam follow a bell‐shaped distribution with mean 18 and standard deviation 6. Wayne’s standardized score on the ACT was −0.7. What was Wayne’s actual ACT score?

(a) 4.2 (b) −4.2 (c) 13.8

(d) 17.3 (e) 22.2

MC #1

Scores on the ACT college entrance exam follow a bell‐shaped distribution with mean 18 and standard deviation 6. Wayne’s standardized score on the ACT was −0.7. What was Wayne’s actual ACT score?

(a) 4.2 (b) −4.2 (c) 13.8

(d) 17.3 (e) 22.2

MC #2Which of the following is least likely to have a nearly Normal distribution?

(a) Heights of all female students taking STAT 001 at State Tech.

(b) IQ scores of all students taking STAT 001 at State Tech.

(c) SAT Math scores of all students taking STAT 001 at State Tech.

(d) Family incomes of all students taking STAT 001 at State Tech.

(e) All of (a)–(d) will be approximately Normal.

11/13/2015

7

MC #2Which of the following is least likely to have a nearly Normal distribution?

(a) Heights of all female students taking STAT 001 at State Tech.

(b) IQ scores of all students taking STAT 001 at State Tech.

(c) SAT Math scores of all students taking STAT 001 at State Tech.

(d) Family incomes of all students taking STAT 001 at State Tech.

(e) All of (a)–(d) will be approximately Normal.

MC #3

The scores on the real estate licensing exam given in Florida are Normally distribution with a standard deviation of 70. What is the mean test score if 25% of the applicants score above 475?

a. 416 b. 428 c. 468

d. 522 e. Not enough information to answer question.

MC #3

The scores on the real estate licensing exam given in Florida are Normally distribution with a standard deviation of 70. What is the mean test score if 25% of the applicants score above 475?

a. 416 b. 428 c. 468

d. 522 e. Not enough information to answer question.

MC #4

Polly takes three standardized tests. She scores 600 on all three tests. The scores are Normal distributed. Rank her performance on the three tests.

a. I, II and III b. III, II, and I c. I, III and II

d. III, I, and II e. II, I and III

MC #4

Polly takes three standardized tests. She scores 600 on all three tests. The scores are Normal distributed. Rank her performance on the three tests.

a. I, II and III b. III, II, and I c. I, III and II

d. III, I, and II e. II, I and III

MC #5

The heights of American men aged 15 to 24 are approximately normally distributed with a mean of 68 inches and a standard deviation of 2.5 inches. About 20% of these men are taller than…

a. 66 inches b. 68 inches c. 70 inches

d. 72 inches e. 74 inches

11/13/2015

8

MC #5

The heights of American men aged 15 to 24 are approximately normally distributed with a mean of 68 inches and a standard deviation of 2.5 inches. About 20% of these men are taller than…

a. 66 inches b. 68 inches c. 70 inches

d. 72 inches e. 74 inches

Scatterplots & Correlation

ScatterplotsMake a scatterplot of the relationship between body weight and pack weight. Body weight is our eXplanatory variable.

Body weight (lb) 120 187 109 103 131 165 158 116

Backpack weight (lb)

26 30 26 24 29 35 31 28

Constructing a Scatterplot: TI‐Nspire

1. Enter x values into list 1 and enter y values into list 2.

2. Label each column. Label column x : weight and column y: bpack.

3. Press HOME/On, click Add Data & Statistics

Constructing a Scatterplot: TI‐Nspire

4. Move the cursor to the bottom of the screen and “click to add variable”. Select weight.

5. Move the cursor to the left of the screen and “click to add variable”. Select bpack.

Making a Scatterplot: TI‐84• Using List editor enter data into list1 and list2.

• Press 2nd, Y = (Stat Plot), 1, enter,

– Select: On

– Select: Scatter

– X: list 1

– Y: list2

– Select: Box

• Press “Zoom”, 9

11/13/2015

9

What is Correlation?

• A mathematical value that describes the strength of a linear relationship between two quantitative variables.

• Correlation values are between ‐1 and 1.

• Correlation is abbreviated: r

• The strength of the linear relationship increases as r moves away from 0 towards ‐1 or 1.

What does “r” tell us?!

• Correlation describes what percent of variation in y is ‘explained’ by x.

• Notice that the formula is the sum of the z‐scores of x multiplied by the z‐scores of y.

Scatterplots an

d Correlatio

n

What does “r” mean?

R Value Strength

‐1 Perfectly linear; negative

‐0.75 Strong negative relationship

‐0.50 Moderately strong negative relationship

‐0.25 Weak negative relationship

0 nonexistent

0.25 Weak positive relationship

0.50 Moderately strong positive relationship

0.75 Strong positive relationship

1 Perfectly linear; positive

Calculate Correlation: TI‐Nspire1. Enter x values in list 1 and y values in list 2.

2. Press MENU, then 4: Statistics

3. Option 1: Stat Calculations

4. Option 3: Linear Regression mx + b

5. X: a[] , Y: b[] , ENTER

6. Correlation = r

Correlation should be 0.79

Calculate Correlation: TI‐841. Enter x values in list 1 and y values in list 2.

2. Press Stats, arrow right to Calc

3. Option 4: LinReg(ax + b)

4. Enter Information: Xlist: L1, Ylist: L2

5. Calculate

Correlation should be 0.79

11/13/2015

10

Facts about Correlation1. Correlation requires that both variables be quantitative.

2. Correlation does not describe curved relationships between variables, no matter how strong the relationship is.

3. Correlation is not resistant. r is strongly affected by a few outlying observations.

4. Correlation makes no distinction between explanatory and response variables.

5. r does not change when we change the units of measurement of x, y, or both.

6. r does not change when we add or subtract a constant to either x, y or both.

7. The correlation r itself has no unit of measurement.

R: Ignores distinctions between X & Y

R: Highly Effected By Outliers

Why?!

• Since r is calculated using standardized values (z‐scores), the correlation value will not change if the units of measure are changed (feet to inches, etc.)

• Adding a constant to either x or y or both will not change the correlation because neither the standard deviation nor distance from the mean will be impacted.

Correlation Formula:Suppose that we have data on variables x and y for n

individuals.

The values for the first individual are x1 and y1, the values for the second individual are x2 and y2, and so on.

The means and standard deviations of the two variables are x‐bar and sx for the x‐values and y‐bar and sy for the y‐values.

The correlation r between x and y is:

r 1

n 1

x1 x

sx

y1 y

sy

x2 x

sx

y2 y

sy

...

xn x

sx

yn y

sy

r 1

n 1

xi x

sx

yi y

sy

Least Squares Regressions

11/13/2015

11

Regression LinesA regression line summarizes the relationship between two variables, but only in settings where one of the variables helps explain or predict the other.

A regression line is a line that describes how a

response variable y changes as an explanatory variable x

changes. We often use a regression line to predict the value of y

for a given value of x.

Least‐Squares Regression LineDifferent regression lines produce different residuals. The regression line we use in AP Stats is Least‐Squares Regression.

The least‐squares regression line of y on x is the line that makes the sum of the squared residuals as small as possible.

Regression Line EquationSuppose that y is a response variable (plotted on the vertical axis) and x is an explanatory variable (plotted on the horizontal axis). A regression line relating y to x has an equation of the form:

ŷ = ax + bIn this equation,

•ŷ (read “y hat”) is the predicted value of the response variable y for a given value of the explanatory variable x.

•a is the slope, the amount by which y is predicted to change when x increases by one unit.

•b is the y intercept, the predicted value of y when x = 0.

Suppose that y is a response variable (plotted on the vertical axis) and x is an explanatory variable (plotted on the horizontal axis). A regression line relating y to x has an equation of the form:

ŷ = ax + bIn this equation,

•ŷ (read “y hat”) is the predicted value of the response variable y for a given value of the explanatory variable x.

•a is the slope, the amount by which y is predicted to change when x increases by one unit.

•b is the y intercept, the predicted value of y when x = 0.

Regression Line Equation

Format of Regression Lines

Format 1:

= 0.0908x + 16.3

= predicted back pack weight

x= student’s weight

Format 2:

Predicted back pack weight= 16.3 + 0.0908(student’s weight)

TI‐NSpire: LSRL1. Enter x data into list 1 and y data into list 2.

2. Press MENU, 4: Statistics, 1: Stat Calculations

3. Select Option4: Linear Regression.

4. Insert either name of list or a[] for x and name of list or b[] of y. Press ENTER.

11/13/2015

12

TI‐84: LSRL1. Enter x values in list 1 and y values in list 2.

2. Press Stats, arrow right to Calc

3. Option 4: LinReg(ax + b)

4. Enter Information: Xlist: L1, Ylist: L2, StoreRegEQ:Y1 (VARS, arrow right to Y‐VARS, enter, enter, 1. Y1)

5. Calculate

6. To view: “Zoom”, 10

TI‐NSPIRE: LSRL to View Graph

1. Enter x data into list 1 and y data into list 2. Be sure to name lists

2. Press HOME/ON, Add Data & Statistics

3. Enter variables to x and y axis.

4. Click MENU, 4: Analyze

5. Option 6: Regression

6. Option 2: Show Linear (a + bx), ENTER

ResidualsA residual is the difference between an observed value of the response variable and the value predicted by the regression line. That is,

residual = observed y – predicted y

residual = y ‐ ŷ

residual

Positive residuals(above line)

Negative residuals(below line)

How to Calculate the Residual

1. Calculate the predicted value, by plugging in x to the LSRE.

2. Determine the observed/actual value.

3. Subtract.

11/13/2015

13

Calculate the Residual1. If a student weighs 170 pounds and their backpack weighs

35 pounds, what is the value of the residual?

2. If a student weighs 105 pounds and their backpack weighs 24 pounds, what is the value of the residual?

Calculate the Residual1. If a student weighs 170 pounds and their backpack weighs 35 pounds, what is the value of the residual?

Predicted: ŷ = 16.3 + 0.0908 (170) = 31.736

Observed: 35

Residual: 35 ‐ 31.736 = 3.264 pounds

The student’s backpack weighs 3.264 pounds more than predicted.

Calculate the Residual2. If a student weighs 105 pounds and their backpack weighs 24 pounds, what is the value of the residual?

Predicted: ŷ = 16.3 + 0.0908 (105) = 25.834

Observed: 24

Residual: 24 – 25.834= ‐1.834

The student’s backpack weighs 1.834 pounds less than predicted

FRQ: 2007B #4

2007B #4a & b

2007B #4c

The slope would stay the same, since the new point fits the existing pattern.

The correlation coefficient would increase because the additional points fits the existing pattern and thus makes the relationship even stronger. A strong relationship results in a great correlation coefficient.

11/13/2015

14

MC #1

If women always married men who were 2 years older than themselves, what would the correlation between the ages of husband and wife be?

(a) 2 (b) 1 (c) 0.5

(d) 0 (e) Can’t tell without seeing the data

MC #1

If women always married men who were 2 years older than themselves, what would the correlation between the ages of husband and wife be?

(a) 2 (b) 1 (c) 0.5

(d) 0 (e) Can’t tell without seeing the data

MC #2

Smokers don’t live as long (on average) as nonsmokers, and heavy smokers don’t live as long as light smokers. You perform least‐squares regression on the age at death of a group of male smokers y and the number of packs per day they smoked x. The slope of your regression line

(a) will be greater than 0. (b) will be less than 0.

(c) will be equal to 0.

(d) You can’t perform regression on these data.

(e) You can’t tell without seeing the data.

MC #2

Smokers don’t live as long (on average) as nonsmokers, and heavy smokers don’t live as long as light smokers. You perform least‐squares regression on the age at death of a group of male smokers y and the number of packs per day they smoked x. The slope of your regression line

(a) will be greater than 0. (b) will be less than 0.

(c) will be equal to 0.

(d) You can’t perform regression on these data.

(e) You can’t tell without seeing the data.

MC #3

Measurements on young children in Mumbai, India, found this least‐squares line for predicting height (y) from arm span (x): = 6.4 + 0.93x. Measurements are in centimeters (cm).

How much does height increase on average for each additional centimeter of arm span?

(a) 0.93 cm (b) 1.08 cm (c) 5.81 cm

(d) 6.4 cm (e) 7.33 cm

MC #3


How much does height increase on average for each additional centimeter of arm span?

(a) 0.93 cm (b) 1.08 cm (c) 5.81 cm

(d) 6.4 cm (e) 7.33 cm

11/13/2015

15

MC #4


According to the regression line, the predicted height of a child with an arm span of 100 cm is about

(a) 106.4 cm. (b) 99.4 cm. (c) 93 cm.

(d) 15.7 cm. (e) 7.33 cm.

MC #4


According to the regression line, the predicted height of a child with an arm span of 100 cm is about

(a) 106.4 cm. (b) 99.4 cm. (c) 93 cm.

(d) 15.7 cm. (e) 7.33 cm.

MC #5


One child in the Mumbai study had height 59 cm and arm span 60 cm. This child’s residual is

(a) −3.2 cm. (b) −2.2 cm. (c) −1.3 cm.

(d) 3.2 cm. (e) 62.2 cm.

MC #5


One child in the Mumbai study had height 59 cm and arm span 60 cm. This child’s residual is

(a) −3.2 cm. (b) −2.2 cm. (c) −1.3 cm.

(d) 3.2 cm. (e) 62.2 cm.

FRQ: 2013 #6