Course Title: Business Statistics
BBA (Hons)
2nd Semester
Course Instructor: Atiq ur Rehman Shah
Lecturer, Federal Urdu University of Arts, Science & Technology, Islamabad
+92-345-5271959
Correlation
• Correlation is a LINEAR association between two random variables
• Correlation is a statistical technique used to determine the degree to which two variables are related
Scatter diagram
• Rectangular coordinate
• Two quantitative variables
• One variable is called independent (X) and
the second is called dependent (Y)
Scatter diagram of weight and systolic blood pressure
Scatter diagram of weight and systolic blood pressure
Scatter plots
The pattern of data is indicative of the type of relationship between your two variables:
• positive relationship• negative relationship• no relationship
Positive relationship
Negative relationship
Reliability
Age of Car
No relation
Correlation Coefficient
• The correlation coefficient (r) measures the strength and direction of relationship between two variables
How to interpret the value of r?
• r lies between -1 and 1. Values near 0 means no (linear) correlation and values near ± 1 means very strong correlation.
• The negative sign means that the two variables are inversely related, that is, as one variable increases the other variable decreases.
How to interpret the value of r?
Pearson’s r
• A 0.9 is a strong positive association (as one variable rises, so does the other)
• A -0.9 is a strong negative association (as one variable rises, the other falls)
r=correlation coefficient
Coefficient of DeterminationDefined
• Pearson’s r can be squared , r 2, to derive a coefficient of determination.
• Coefficient of determination – the portion of variability in one of the variables that can be accounted for by variability in the second variable
• Example of depression and CGPA– Pearson’s r shows negative correlation, r=-0.5– r2=0.25
• In this example we can say that 1/4 or 0.25 of the variability in CGPA scores can be accounted for by depression (remaining 75% of variability is other factors, habits, ability, motivation, courses studied, etc)
Coefficient of Determinationand Pearson’s r
• If r=0.5, then r2=0.25• If r=0.7 then r2=0.49
• Thus while r=0.5 versus 0.7 might not look so different in terms of strength, r2 tells us that r=0.7 accounts for about twice the variability relative to r=0.5
Example
• Calculate the coefficient of correlation between the value X and Y given below:
X 78 89 97 69 59 79 68 61
Y 125 137 156 112 107 136 123 108
X Y X2 Y2 XY
78 125 6084 15625 9750
89 137 7921 18769 12193
97 156 9409 24336 15132
69 112 4761 12544 7728
59 107 3481 11449 6313
79 136 6241 18496 10744
68 123 4624 15129 8364
61 108 3721 11664 6588
Summation 600 1004 46242 128012 76812
= 0.95Hence the correlation co-efficient between X and Y is 0.95.
** (What does this value tells us??)**
Regression
• A statistical tool that is used to investigate the dependence of one variable (dependent variable) on one or more other variables (independent variables)
• The dependent variable (Y) is the variable for which we want to make a prediction.
• The independent variable (X) is the variable on the basis of which we are making predictions.
• The linear relationship between two variables can either be positive or negative.
• For instance, an increase in advertisement budget will bring more sales (positive), and increase in temperature will decrease the cooling efficiency of a room AC (negative)
Simple Linear Regression
• Positive Linear RelationshipPositive Linear Relationship
yy
xx
Slope (b)Slope (b)is positiveis positive
Regression lineRegression line
InterceptIntercept(a)(a)
Simple Linear Regression
• Negative Linear RelationshipNegative Linear Relationship
yy
xx
Slope (b)Slope (b)is negativeis negative
Regression lineRegression line
InterceptIntercept(a)(a)
Simple Linear Regression
• No RelationshipNo Relationship
yy
xx
Slope (b)Slope (b)is 0is 0
Regression lineRegression line
InterceptIntercept(a)(a)
Simple Linear Regression Equation
• Hence the equation for linear regression line can be written as:
y= a + bx
Where:
y= dependent variable
x= independent variable
a= y-intercept (i.e value of y when x=0)
b= slope
Least-squares estimates
• For a simple linear regression equation:
y= a + bx
We have,
Where, and
Example
• Compute the least squares regression equation of Y on X for the following data. What is the regression coefficient and what does it mean??
X 5 6 8 10 12 13 15 16 17
Y 16 19 23 28 36 41 44 45 50
X Y XY X2
5 16 80 25
6 19 114 36
8 23 184 64
10 28 280 100
12 36 432 144
13 41 533 169
15 44 660 225
16 45 720 256
17 50 850 289
Summation 102 302 3853 1308
Now = 102/9 = 11.33
And = 302/9 = 33.56
= 9(3853) – (102) (302) 9( 1308) – (102)2
= 3873/1368
So b = 2.381
And
= 33.56 – (2.831) (11.33)
= 1.47
Hence the desired estimated regression line of Y on X is
y= 1.47 + 2.831x
** The estimated regression co-efficient is b=2.831, which means that yhe value of y increase by 2.831 units for a unit increase in x.
Top Related