Dr J Frost (jfrost@ ) S1: Chapter 7 Regression Dr J Frost (jfrost@ ) .

20
S1: Chapter 7 Regression Dr J Frost ([email protected]) www.drfrostmaths.com Last modified: 22 nd January 2016

description

What is regression? Exam mark (𝑦) 𝑦=20+3π‘₯ Time spent revising (π‘₯) I record people’s exam marks as well as the time they spent revision. I want to predict how well someone will do based on the time they spent revision. How would I do this? What we’ve done here is come up with a model to explain the data, i.e. a line π’š=𝒂+𝒃𝒙. We’ve then tried to set 𝒂 and 𝒃 such that the resulting π’š value matches the actual exam marks as close as possible. The β€˜regression’ bit is the act of setting the parameters of our model (here the gradient and y-intercept of the line of best fit) to best explain the data.

Transcript of Dr J Frost (jfrost@ ) S1: Chapter 7 Regression Dr J Frost (jfrost@ ) .

Page 1: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

S1: Chapter 7Regression

Dr J Frost ([email protected])www.drfrostmaths.com

Last modified: 22nd January 2016

Page 2: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

What is regression?

Time spent revising

Exam mark

I record people’s exam marks as well as the time they spent revision. I want to predict how well someone will do based on the time they spent revision. How would I do this?

𝑦=20+3 π‘₯

What we’ve done here is come up with a model to explain the data, i.e. a line . We’ve then tried to set and such that the resulting value matches the actual exam marks as close as possible.The β€˜regression’ bit is the act of setting the parameters of our model (here the gradient and y-intercept of the line of best fit) to best explain the data.

Page 3: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

What is regression?

Time

Rabbit population

In this chapter we only cover linear regression, where our chosen model is a straight line.

But in general we could use any model that might best explain the data. Population tends to grow exponentially rather than linearly, so we might make our model and then try to use regression to work out the best and to use.

Page 4: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

Explanatory and Response Variables

Time spent revising

Exam mark

! An independent (or explanatory) variable is one that is set independently of other variables.It goes on the x-axis.

! A dependent (or response) variable is one whose values are determined by the values of the independent variable.It goes on the y-axis.

Page 5: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

So how do we numerically find the line of best fit?

π‘₯

𝑒1 𝑒2

𝑒3 𝑒4

𝑒5

𝑒6𝑒7

The residuals are the errors between the value predicted by the model and the y value of each data point.

𝑦

We minimise the total of the squares of the residuals.

Why squared?

This is known as a least squares regression line.

Page 6: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

So how do we numerically find the line of best fit?

π‘₯

𝑒1 𝑒2

𝑒3 𝑒4

𝑒5

𝑒6𝑒7

𝑦

It turns out (using differentiation techniques you’ll see in C2) that the and we use to minimise the total (squared) error is:

𝒃=π‘Ίπ’™π’š

𝑺𝒙𝒙 𝒂=π’šβˆ’π’ƒπ’™

π’š=𝒂+𝒃𝒙

To remember the gradient, I think chromosomes of men and women. Men come out top!

The mean of x and y is on the line, i.e. . Hence this gives us .

Notice that in regression, we write the terms in ascending powers of , contrary to algebraic convention.Hence is the -intercept, not the gradient.

Page 7: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

Example

𝒃=π‘Ίπ’™π’š

𝑺𝒙𝒙 𝒂=π’šβˆ’π’ƒπ’™

Mass, (kg) 20 40 60 80 100

Length, (cm) 48 55.1 56.3 61.2 68

a) Calculate and (You may use that , , , , , , )

b) Calculate the regression line of on .? ?

??

?Broculator Tip: Your calculator will calculate and while in STATS mode (under the Reg menu)

Page 8: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

Test Your UnderstandingMay 2009 Q5

For β€˜comment on reliability of estimate’ questions, always one of: !

β€’ Reliable (1) because inside the range of the data/interpolating (1)

β€’ Unreliable (1) because outside the range of the data/extrapolating (1).

β€’ Reliable (1) because just outside the range of the data (1).

Note that once finding and , you still need to write the equation at the end for the final mark!

?

?

?

A common error is to do . The first row (the explanatory variable) is always the β€˜β€™ one.

Page 9: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

Exercises

On provided sheet.Answers on next slides.

(Note that Q7 and 8 uses β€˜coding’. We will cover this next lesson)

Help with wordy questions:

β€œExplain why this diagram would support the fitting of a regression line of onto .”The variables have a linear relationship, i.e. the points are close to the implied straight line of best fit.

β€œInterpret the gradient/slope of the line/interpret ”As (x) increases by 1, (y) increases/decreases by ___.

β€œInterpret the y-intercept/interpret ”The value (y) takes when (x) is 0.

β€œWhich is the explanatory variable? Explain your answer.”(x) is the explanatory variable because (x) influences (y)

Explain method of least squares."We minimise the square of the residuals" (draw a diagram)

?

?

?

?

?

Page 10: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

Exercises

?

?

??

Page 11: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

Exercises

?

?

?

?

?

?

Page 12: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

Exercises

?

?

?

???

Page 13: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

Exercises

?

?

???

Page 14: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

Exercises

?

?

?

?

?

Page 15: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

Exercises

?

??

?

?

?

Page 16: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

CodingWe’ve previously considered how coding affects a means, variances and the PMCC. So how do they affect the regression line?

Eight samples of carbon steel were produced with different percentages, of carbon in them. Each sample was heated in a furnace until it melted and the temperature, in C, at which it melted was recorded.

The results were coded such that and .

Suppose that we found the regression line of on was .Then what is the regression line in terms of the original variables and ?

?Just replace the variables using the substitution and rearrange. That’s it!

Page 17: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

More Examples

The length and height of an Ewok was coded using and .If the equation of the regression line of on is:

what is the equation of the regression line of on ?

𝟐 π’š+𝟏𝟏=𝟐𝟎 (π’™βˆ’πŸ‘πŸŽ)βˆ’πŸ‘The maths mark and English mark of some stormtroopers is coded using and .If the equation of the regression line of on is:

What is the equation of the regression line of on ?

π’š βˆ’πŸπŸŽ=πŸ“( π’™πŸ )+πŸ’

?

?

Page 18: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

Exercises (continued)

?

?

Page 19: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

Exercises

?

?

?

Page 20: Dr J Frost (jfrost@ )   S1: Chapter 7 Regression Dr J Frost (jfrost@ )  .

Just For Fun…