004

Post on 11-May-2015

616 views 3 download

Tags:

description

Lesson 4 Correlational Analysis

Transcript of 004

IBS Statistics Year 1Dr. Ning DING

Table of content• Review

• Learning Goals

• Chapter 12: Simple Regression and Correlation

• Exercises

Chapter 3: Describing Data

Review

Find the interquartile range: 146014711637172117581787194020382047205420972205228723112406

Interquartile Range=Q3-Q1

=2205-1721=484

Correction of EXCEL Exercise 5

L=(8+1)*25%=2.25L=(8+1)*25%=2.25

Q1=133.5Q1=133.5

L=(8+1)*75%=6.75L=(8+1)*75%=6.75

Q3=274.5Q3=274.5

Interquartile Range=274.5-133.5=141

Interquartile Range=274.5-133.5=141

BoxplotBoxplotBoxplotBoxplot

12245789

12

12245789

12

Median1224

1224

789

12

789

12

Quartile

Q1=2

Q3=8.5

5InterquartileInterquartile

RangeRange

Decile

1st D

9th D

Percentile

http://cnx.org/content/m11192/latest/

How to interpret?How to interpret?

BoxplotBoxplotBoxplotBoxplot

The distribution is skewed to __________ because the mean is __________the median.

the right larger than

http://cnx.org/content/m11192/latest/

€ 20 € 2000Q1= € 250 Q3= € 850Median= € 350

Mean= € 450Mean= € 450a b

0.81.01.01.21.21.31.51.72.02.02.12.24.0

0.81.01.01.21.21.31.51.72.02.02.12.24.0 2.0

3.23.63.74.04.24.24.54.54.64.85.05.0

2.03.23.63.74.04.24.24.54.54.64.85.05.0

Mean > MedianMean > Median

Mean < MedianMean < Median

Positively skewedPositively skewed

Negatively skewedNegatively skewedhttp://qudata.com/online/statcalc/

This means that the data is symmetrically distributed.

Zero skewness

mode=median=mean

Zero skewness

mode=median=mean

Learning Goals• Chapter 12:

– Learn how many business decisions depend on knowing the specific relationship between two or more variables

– Use scatter diagrams to visualize the relationship between two variables

– Use regression analysis to estimate the relationship between two variables

– Use the least-squares estimating equation to predict future values of the dependent variable

– Learn how correlation analysis describes the degree to which two variables are linearly related to each other

– Understand the coefficient of determination as a measure of the strength of the relationship between two variables

– Learn limitations of regression and correlation analyses and caveats about their use.

1. IntroductionChapter 12: Sim Reg & Corr

Regression and Correlation Analyses:

– How to determine both the nature and the strength of a relationship between variables.

1. IntroductionChapter 12: Sim Reg & Corr

Scatter Diagram:

28

Describing Relationship between Two Variables – Scatter Diagram Examples

Positive correlationPositive correlation

1. IntroductionChapter 12: Sim Reg & Corr

Scatter Diagram:

Negative correlationNegative correlation

28

Describing Relationship between Two Variables – Scatter Diagram Examples

1. IntroductionChapter 12: Sim Reg & Corr

Scatter Diagram:

No correlationNo correlation

28

Describing Relationship between Two Variables – Scatter Diagram Examples

2. Types of RelationshipsChapter 12: Sim Reg & Corr

Variables: – Independent variables: known– Dependent variables: to predict

Independent VariableIndependent Variable

Dependent VariableDependent Variable

28

Describing Relationship between Two Variables – Scatter Diagram Examples

28

Describing Relationship between Two Variables – Scatter Diagram Examples

2. Types of RelationshipChapter 12: Sim Reg & Corr

Correlation & Cause Effect?

• The relationships found by regression to be relationships of association

• Not necessarilly of cause and effect.

Independent VariableIndependent Variable

Dependent VariableDependent Variable

28

Describing Relationship between Two Variables – Scatter Diagram Examples

28

Describing Relationship between Two Variables – Scatter Diagram Examples

2. Estimation Using the Regression Line

Chapter 12: Sim Reg & Corr

Scatter Diagrams:• Patterns indicating that the variables are related• If related, we can describe the relationship

Strong & Positivecorrelation

Strong & Negativecorrelation

Weak & Positivecorrelation

Weak & Negativecorrelation

Nocorrelation

Chapter 12: Sim Reg & Corr

Scatter Diagrams:

2. Estimation Using the Regression Line

Chapter 12: Sim Reg & Corr

Simple Linear Regression:• The dependent variable Y is determined by the independent variable

X

2. Estimation Using the Regression Line

Ŷ = a + bXŶ = a + bX

YX

Independent VariableIndependent Variable

Dependent VariableDependent Variable

Ŷ = a + bXŶ = a + bX

Chapter 12: Sim Reg & Corr

Simple Linear Regression:• The dependent variable Y is determined by the independent variable

X

2. Estimation Using the Regression Line

Ŷ = a + bXŶ = a + bX

Chapter 12: Sim Reg & Corr

Slope of the Best-Fitting Regression Line:

2. Estimation Using the Regression Line

xn-x

y xn-xy=b

22

Y = a + bX a = Y - bX

Chapter 12: Sim Reg & Corr

2. Estimation Using the Regression Line

75.09*444

6*3*478

-

-=b

Y = a + bX a = Y - bX

the relationship between the age of a truck and the annual repair expense?

X=3 Y=6

xn-x

y xn-xy=b

22

a = 6 - 0.75*3 = 3.75 Ŷ = 3.75 + 0.75 XŶ = 3.75 + 0.75 X

If the city has a truck that is 4 years old,

the director could use the equation to predict $675 annually in repairs.

6.75 = 3.75 + 0.75 * 46.75 = 3.75 + 0.75 * 4

Chapter 12: Sim Reg & Corr

Example:• To find the simple/linear regression of Personal Income (X) and Auto Sales (Y)

Exercise

Count the number of values.      Step 1:

Find XY, X2   See the below tableStep 2:

N = 5N = 5

X=64 what about Y?

Chapter 12: Sim Reg & Corr

Exercise

Step 3:

Step 4:

Find ΣX, ΣY, ΣXY, ΣX2.            ΣX = 311 Mean = 62.2             ΣY = 18.6 Mean = 3.72            ΣXY = 1159.7             ΣX2 = 19359

xn-x

y xn-xy=b

22 Substitute in the above slope formula given.

            Slope(b) = = 0.19 1159.7-5*62.2*3.72

19359-5*62.2*62.2

Chapter 12: Sim Reg & Corr

Exercise

Step 5:

Then substitute these values in regression equation formula            Regression Equation(Ŷ) = a + bX

         Ŷ  = -8.098 + 0.19X.

Step 6:

            Slope(b) = 0.19

Now, again substitute in the above intercept formula given.           

Intercept(a) = Y - bX  = 3.72- 0.19 * 62.2= -8.098

Suppose if we want to know the approximate y value for the variable X = 64. Then we can substitute the value in the above equation.

Regression Equation:Ŷ = a + bX             = -8.098 + 0.19(64).            = -8.098 + 12.16            = 4.06

Regression Equation:Ŷ = a + bX             = -8.098 + 0.19(64).            = -8.098 + 12.16            = 4.06

Chapter 12: Sim Reg & Corr

Least Squares Method:Minimize the sum of the squares of the errors to measure thegoodness of fit of a line

2. Estimation Using the Regression Line

ei = residuali

Chapter 12: Sim Reg & Corr

Least Squares Method:

2. Estimation Using the Regression Line

Chapter 12: Sim Reg & Corr

Example:

2. Estimation Using the Regression Line

Chapter 12: Sim Reg & Corr

Example Solution:

2. Estimation Using the Regression Line

Chapter 12: Sim Reg & Corr

Correlation Analysis:describe the degree to which one variable is linearly related

to another.

3. Correlation Analysis

Coefficient of Determination:Measure the extent, or strength, of the association that existsbetween two variables.

Coefficient of Correlation:Square root of coefficient of determination

r 2r 2

rr

Chapter 12: Sim Reg & Corr

3. Correlation Analysis

Coefficient of Determination:Measure the extent, or strength, of the association that existsbetween two variables.

• 0 ≤ r2 ≤ 1.• The larger r2 , the stronger the linear relationship.• The closer r2 is to 1, the more confident we are in our

prediction.

Yn-YYn-XYb+Ya

=r 22

22

Chapter 12: Sim Reg & Corr

3. Correlation Analysis

Coefficient of Correlation:

Chapter 12: Sim Reg & Corr

3. Correlation Analysis

Coefficient of Determination:

Chapter 12: Sim Reg & Corr

Example Solution:

3. Correlation Analysis

Chapter 12: Sim Reg & Corr

Example Solution:

3. Correlation Analysis

Chapter 3: Describing Data

Review

Which value of r indicates a stronger correlation than 0.40? A. -0.30B. -0.50C. +0.38D. 0

If all the plots on a scatter diagram lie on a straight line, what is the standard error of estimate? A. -1B. +1C. 0D. Infinity

Chapter 3: Describing Data

Review

In the least squares equation,  Ŷ = 10 + 20X the value of 20 indicates A. the Y intercept.B. for each unit increase in X, Y increases by 20.C. for each unit increase in Y, X increases by 20.D. none of these. 

Chapter 3: Describing Data

Exercise

A sales manager for an advertising agency believes there is a relationship between the number of contacts and the amount of the sales. To verify this belief, the following data was collected: 

What is the Y-intercept of the linear equation? A. -12.201B. 2.1946C. -2.1946D. 12.201

Chapter 12: Sim Reg & Corr

Exercise

Ŷ = -1.8182 + 0.1329XŶ = -1.8182 + 0.1329X Sample Exam P.4

Chapter 12: Sim Reg & Corr

Exercise

Sample Exam P.4

Chapter 12: Sim Reg & Corr

Exercise

Sample Exam P.4

Ŷ = -1.8182 + 0.1329XŶ = -1.8182 + 0.1329X

SummaryChapter 1: What is Statistics?

• Chapter 3: – Calculate the arithmetic mean, weighted mean, median,

mode, and geometric mean– Explain the characteristics, uses, advantages, and

disadvantages of each measure of location– Identify the position of the mean, median, and mode for

both symmetric and skewed distributions– Compute and interpret the range, mean deviation,

variance, and standard deviation– Understand the characteristics, uses, advantages, and

disadvantages of each measure of dispersion– Understand Chebyshev’s theorem and the Empirical Rule

as they relate to a set of observations