Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am...

51
Video Conference 1 Video Conference 1 AS 2013/2012 AS 2013/2012 Chapters 10 – Correlation and Chapters 10 – Correlation and Regression Regression 15 December 2013 15 December 2013 10 am – 11 am 10 am – 11 am Puan Hasmawati Binti Hassan [email protected] 04-6532285

description

Objectives JIM 212 After going through this lesson, you should be able to:  Draw a scatter plot for a set of ordered pairs  Compute the correlation coefficient, r  Test the hypothesis: H 0 : ρ = 0 (test the significance of correlation coefficient) 3

Transcript of Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am...

Page 1: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

Video Conference 1Video Conference 1

AS 2013/2012AS 2013/2012Chapters 10 – Correlation and Regression Chapters 10 – Correlation and Regression

15 December 2013 15 December 2013 10 am – 11 am10 am – 11 am

Puan Hasmawati Binti [email protected]

04-6532285

Page 2: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

Chapter 10 OverviewChapter 10 Overview Introduction 10-1 Scatter Plots and Correlation 10-2 Regression 10-3 Coefficient of Determination and

Standard Error of the Estimate 10-4 Multiple Regression (Optional)

2

Page 3: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

ObjectivesObjectives

JIM 212

After going through this lesson, you should be able to:

Draw a scatter plot for a set of ordered pairs

Compute the correlation coefficient, r Test the hypothesis: H0: ρ = 0 (test the significance of correlation

coefficient)3

Page 4: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

4

ObjectivesObjectives1. Draw a scatter plot for a set of ordered pairs.2. Compute the correlation coefficient.3. Test the hypothesis Ho: ρ = 0.4. Compute the equation of the regression line.5. Compute the standard error of the estimate.6. Find a prediction interval.7. Be familiar with the concept of multiple

regression - determining whether a relationship between two or more numerical or quantitative variables exists.

Page 5: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 2125

Terminology

1. Correlation2. Independent variable3. Dependent variable4. Relationship5. Simple relationship6. Multiple relationship7. Positive relationship8. Negative relationship9. Linear relationship10.Correlation coefficient11.Prediction

Page 6: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 2126

In addition to hypothesis testing and confidence intervals, inferential statistics involves determining whether a relationshiprelationship between two or more numerical or quantitative variables exists.

Introduction

Page 7: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 212

• CorrelationCorrelation is a statistical method used to determine whether a linear relationship between variables exists.

7

Introduction (cont…)

Page 8: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 2128

• The purpose of this chapter is to answer these questions statistically:

1. Are two or more variables related?2. If so, what is the strength of the

relationship?3. What type of relationship exists?4. What kind of predictions can be

made from the relationship?

Introduction (cont…)

Page 9: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 2129

Introduction (cont…)

1. Are two or more variables related?2. If so, what is the strength of the

relationship?

To answer these two questions, statisticians use the correlation coefficientcorrelation coefficient, a numerical measure to determine whether two or more variables are related and to determine the strength of the relationship between or among the variables.

Page 10: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 21210

Introduction (cont…)

3. What type of relationship exists?

There are two types of relationships: simple and multiple.

In a simple relationship, there are two variables: an independent variable independent variable (predictor variable) and a dependent variable dependent variable (response variable).

In a multiple relationship, there are two or more independent variables that are used to predict one dependent variable.

Page 11: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 21211

4. What kind of predictions can be made from the relationship?

Predictions are made in all areas and daily. Examples include weather forecasting, stock market analyses, sales predictions, crop predictions, gasoline price predictions, and sports predictions. Some predictions are more accurate than others, due to the strength of the relationship. That is, the stronger the relationship is between variables, the more accurate the prediction is.

Introduction (cont…)

Page 12: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

• Both are STATISTICAL METHODS• CorrelationCorrelation - to determine whether relationship relationship

between variables exists• RegressionRegression - to describe the nature of the relationship nature of the relationship

between variables (+ or -, linear or nonlinear)

Correlation & RegressionCorrelation & Regression

12

Page 13: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

13

The purpose of this chapter is to answer these questions statistically:

1. Are two or more variables related?2. If so, what is the strength of the relationship?

3. What type of relationship exists?

4. What kind of predictions can be made from the relationship?

correlation correlation coefficientcoefficient

simple & multiplesimple & multiple

all areas and dailyall areas and daily

Page 14: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 212

• Graph of ordered pairs (x, y) of numbers consisting of the independent variable x independent variable x and the dependent variable ydependent variable y.

• Independent variable? Independent variable? • Dependent variable?Dependent variable?

Scatter PlotsScatter Plots

14

Page 15: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 212

Q1(i) Forest Fires and Acres Burneda) Page 549 Ex. 10 – 1 No. 14

Number of fires vs. number of acres burned15

Page 16: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 21216

CorrelationCorrelation is a statistical method used to determine whether a linear relationship between variables exists.

Correlation

Page 17: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 21217

• The correlation coefficient correlation coefficient computed from the sample data measures the strength and direction of a linear relationship between two variables.

• There are several types of correlation coefficients. The one explained in this section is called the Pearson product moment Pearson product moment correlation coefficient (PPMC)correlation coefficient (PPMC).

• The symbol for the sample correlation sample correlation coefficient is coefficient is rr. The symbol for the population population correlation coefficient is correlation coefficient is ..

Correlation (cont…)

Page 18: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 21218

• The range of the correlation coefficient is from 1 to 1.

• If there is a strong positive linear strong positive linear relationship relationship between the variables, the value of r will be close to 1.

• If there is a strong negative linear strong negative linear relationship relationship between the variables, the value of r will be close to 1.

Correlation (cont…)

Page 19: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 21219

Correlation (cont…)

Page 20: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 212

o Numerical measure to determine whether two or more variables are

linearlylinearly related, ando to determine the strengthstrength of the

relationship between or among the variables.

Correlation Coefficient

20

Page 21: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 212

the strength (strong, weak) and direction (+ , -) of a linearlinear relationship between two variables.

r : sample correlation coefficient : population correlation coefficient Range: -1 ≤ ≤ 1

**Look at page 540 Figure 10-6

Correlation Coefficient (cont…)

21

Page 22: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 21222

2 22 2

n xy x yr

n x x n y y

Formula for Correlation Coefficient

One of the formula for r :

where n is the number of data pairs.

Page 23: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

494x 260y 2 31,692x 2 10,596y

17,285 8xy n

2 22 2

n xy x yr

n x x n y y

2 2

8 17,285 494 260

8 31,692 494 8 10,596 260

0.771

1(i) b) Page 549 Ex. 10 – 1 No. 14

JIM 21223

Page 24: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

The Significance of the Correlation Coefficient

Use hypothesis-testing procedure, in order to make the decision.

3 ways 1. Traditional method2. P-value method3. Using Table I in Appendix C

JIM 21224

Page 25: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 21225

• In hypothesis testing, one of the following is true:H0: 0 This null hypothesis means that

there is no correlation no correlation between the x and y variables in the population.

H1: 0 This alternative hypothesis means that there is a significant significant

correlation correlation between the variables in the population.

Hypothesis Testing

Page 26: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

0

1

H : 0H : 0

2

21nt rr

Decision: Reject the null hypothesis, since the test value falls in the critical region. There is significant linear relationship between the number of forest fires and the number of acres burned.

2

8 20.7711 0.771

2.966

. 2.447c v

1(i) (c, d, e) Page 549 Ex. 10 – 1 No. 14 cont...

JIM 21226

Page 27: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 21227

Now try using the other two procedures.

Page 28: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

10.2 Regression10.2 Regression If the value of the correlation coefficient is

significant, the next step is to determine the equation of the regression line regression line which is the data’s line of best fit.

28

Page 29: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

RegressionRegression

29

Best fit Best fit means that the sum of the squares of the vertical distance from each point to the line is at a minimum.

Page 30: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

Regression LineRegression Line

30

y a bx

2

22

22

where = intercept = the slope of the line.

y x x xya

n x x

n xy x yb

n x x

a yb

Page 31: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

31

Q1(ii) Forest Fires and Acres BurnedQ1(ii) Forest Fires and Acres BurnedPage 559 Ex. 10 – 2 No. 14Page 559 Ex. 10 – 2 No. 14

2

22

y x x xya

n x x

2

260 31,692 494 17,285

8 31,692 494

298,8709500

31.46

2 2

494 260 17, 285

31,692 10,596 8

x y xy

x y n

Page 32: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

32

22

n xy x yb

n x x

2

8 17,285 494 260

8 31,692 494

98409500

1.036

' 31.46 1.036y x

(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...

Page 33: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

33

' 31.46 1.036y x

Number of fires vs. number of acres burned

(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...

Page 34: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

' 31.46 1.036y x

' when 60y x

' 31.46 1.036 60y

30.7 acres

(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...(Q.1(ii)) Page 559 Ex. 10 – 2 No. 14 cont...

34

Page 35: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

' 31.46 1.036y x Regression line:

2 10,596y 260y 17,285 8xy n

10,596 31.46 260 1.036 17,2858 2

2

2est

y a y b xyS

n

12.03

Q1(iii) Q1(iii) Page 574 Ex. 10 – 3 No. 16Page 574 Ex. 10 – 3 No. 16 ((Forest Fires and Acres Burned)Forest Fires and Acres Burned)

35

Page 36: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

2

/ 2 22

1' 1est

n x Xy t S

n n x x

494x 2 31,692x

When 60, ' 30.7x y

494 61.758

X 12.03estS

/ 2 2.447t

Q1(iv) Q1(iv) Page 574 Ex. 10 – 3 No. 20Page 574 Ex. 10 – 3 No. 20 ((Forest Fires and Acres Burned)Forest Fires and Acres Burned)

36

Page 37: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

2

2

8 60 61.75130.7 2.447 12.03 18 8 31,692 494

30.7 31.259

2

/ 2 22

1' 1est

n x Xy t S

n n x x

0.559 61.959y

(Q1(iv)) (Q1(iv)) Page 574 Ex. 10 – 3 No. 20 cont...Page 574 Ex. 10 – 3 No. 20 cont... ((Forest Fires and Acres Burned)Forest Fires and Acres Burned)

37

Page 38: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 21238

Q2(i) State Debt and Per Capita Taxa) Page 549 Ex. 10 – 1 No. 16

500 700 900 1100 1300 1500 1700 1900500

700

900

1100

1300

1500

1700

1900

x

y

Page 39: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 21239

2(i) b) Page 549 Ex. 10 – 1 No. 16

2 22 2

n xy x yr

n x x n y y

2 2

5 11,247,109 6545 8416

5 9,635,035 6545 5 14,351,678 8416

0.518

2 2

6545 8416 11,247,109

9,635,035 14,351,678

x y xy

x y

Page 40: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

JIM 21240

2(i) (c, d, e) Page 549 Ex. 10 – 1 No. 16 cont...

0

1

H : 0H : 0

. . 5 2 3, 0.05, . . 0.878d f c v

Decision: Do not reject. There is nosignificant linear relationship between percapita debt and tax.

0.518r

0.8780.878 0.518

Page 41: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

41

Q2(ii) State Debt and Per Capita TaxQ2(ii) State Debt and Per Capita TaxPage 549 Ex. 10 – 2 No. 16Page 549 Ex. 10 – 2 No. 16

From the hypothesis testing done, the null hypothesis is not rejected (r is not significant).

Therefore, there is no significant linear relationship between state debt and per capita tax.

Therefore, no regression should be done.

0.518r

Page 42: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

No regression line no prediction??? When r is not significant, ......?........ is the

best predictor of y.

42

Q2(ii) State Debt and Per Capita TaxQ2(ii) State Debt and Per Capita TaxPage 549 Ex. 10 – 2 No. 16 (cont...)Page 549 Ex. 10 – 2 No. 16 (cont...)

Page 43: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

Standard Error of the EstimateStandard Error of the Estimate The standard error of estimatestandard error of estimate, denoted

by sest is the standard deviation of the observed y values about the predicted y' values. The formula for the standard error of estimate is:

43

2

2

est

y ys

n

2

2

est

y a y b xys

n

Page 44: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

44

Since r is not significant, the standard error should not be calculated.

Q2(iii) Q2(iii) Page 574 Ex. 10 – 3 No. 18Page 574 Ex. 10 – 3 No. 18 ((State Debt and Per Capita Tax)State Debt and Per Capita Tax)

Page 45: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

2

/ 2 22

2

/ 2 22

11

1

'

1'

est

esty

n x Xt

n n x x

n x Xt

n n

S

xS

x

y

y

Prediction IntervalPrediction Interval

45

Page 46: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

46

Since r is not significant, the prediction interval should not be calculated.

Q1(iv) Q1(iv) Page 574 Ex. 10 – 3 No. 22Page 574 Ex. 10 – 3 No. 22 ((State Debt and Per Capita Tax)State Debt and Per Capita Tax)

Page 47: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

47

Multiple RegressionMultiple Regression

In multiple regression, there are several independent variables and one dependent variable, and the equation is

1 1 2 2 k ky a b x b x b x

1 2

where , , , = independent variables. kx x x

Page 48: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

48

Assumptions for Multiple RegressionAssumptions for Multiple Regression1. normality assumption – for any specific value of the

independent variable, the values of the y variable are normally distributed.

2. equal-variance assumption - the variances (or standard deviations) for the y variables are the same for each value of the independent variable.

3. linearity assumption - there is a linear relationship between the dependent variable and the independent variables.

4. nonmulticollinearity assumption - the independent variables are not correlated.

5. independence assumption - the values for the y variables are independent.

Page 49: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

49

Q3. Special Occasion CakesQ3. Special Occasion Cakes Page 581 Ex. 10 – 4 No. 8Page 581 Ex. 10 – 4 No. 8

1 2 326.279 14.855 3.1035 0.73079y x x x

1

2

3

number of layers desirednumber of servings neededamount of filling mix used

xxx

price of a cakey

26.279 14.855 3 3.1035 48 0.73079 40y

$196.49

Page 50: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

50

Page 51: Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December 2013 10 am –…

Thank Thank YouYou

51