Correlation analysis

5
Divyanshu Singh, Yalin Wang MBA, NGO-Management Business Statistics Hand Out: Date: 16/17 December 2011 Correlation Analysis Introduction to Correlation Analysis Why is correlation analysis done? Methods and Tools SPSS Application of Correlation Analysis in Business Management Correlation Analysis Introduction to Correlation Analysis: Correlation analysis is a statistical technique to quantify the dependence of two or more variables. This dependence or degree of correlation is given by correlation coefficient ‘r’ or ρ’. Dependence of 2 variables states that values of both the variables increase simultaneously. The correlation coefficient value lies between +1 and -1. Any value close to1.0 or more than 0.5 shows positive correlation which means values of both the variables increase simultaneously showing linear dependence. A value less that 0.5 and close to -1.0 shows negative correlation which means increase in the value of 1 variable shows decrease in the value of another. There are several techniques for calculating correlation coefficient. Many Types of Correlation coefficient exist and choice of these depends in the type of data to be analysed. Methods and Tools: Different types of correlation functions exist for different types of data analysis namely, 1) Pearson’s coefficient (r) 2) Spearman’s coefficient (ρ) 3) Point biserial coefficient etc… Degree of linear relationship between the 2 variables (r) is calculated by the ratio of the covariance of 2 variables to the product of their standard deviations.

description

 

Transcript of Correlation analysis

Page 1: Correlation analysis

Divyanshu Singh, Yalin Wang

MBA, NGO-Management

Business Statistics

Hand Out: Date: 16/17 December 2011

Correlation Analysis

Introduction to Correlation Analysis

Why is correlation analysis done?

Methods and Tools

SPSS

Application of Correlation Analysis in Business Management

Correlation Analysis

Introduction to Correlation Analysis:

Correlation analysis is a statistical technique to quantify the dependence of two or more

variables. This dependence or degree of correlation is given by correlation coefficient ‘r’ or

‘ρ’. Dependence of 2 variables states that values of both the variables increase

simultaneously. The correlation coefficient value lies between +1 and -1. Any value close

to1.0 or more than 0.5 shows positive correlation which means values of both the variables

increase simultaneously showing linear dependence. A value less that 0.5 and close to -1.0

shows negative correlation which means increase in the value of 1 variable shows decrease in

the value of another. There are several techniques for calculating correlation coefficient.

Many Types of Correlation coefficient exist and choice of these depends in the type of data to

be analysed.

Methods and Tools:

Different types of correlation functions exist for different types of data analysis namely,

1) Pearson’s coefficient (r)

2) Spearman’s coefficient (ρ)

3) Point biserial coefficient etc…

Degree of linear relationship between the 2 variables (r) is calculated by the ratio of the

covariance of 2 variables to the product of their standard deviations.

Page 2: Correlation analysis

Divyanshu Singh, Yalin Wang

MBA, NGO-Management

Business Statistics

Date: 16/17 December 2011

An r value of zero indicates that there is no relationship between the two variables. Note that

the correlation coefficient is only intended to detect linear relationships between variables

that are normally distributed.

How are the results of Correlation Analysis interpreted?

How to interpret the results from SPSS?

(a) is a perfect linear correlation with r = 1 and (d) has a positive linear correlation with 0

<r< 1. Example (b) is a perfect linear correlation with r = -1 and (e) has a negative

linear correlation with -1 <r< 0. Example (c) is not correlated with r= 0 and (f) has a

non-linear correlation which causes r to be close to zero.

Page 3: Correlation analysis

Divyanshu Singh, Yalin Wang

MBA, NGO-Management

Business Statistics

Date: 16/17 December 2011

Interpretation of results of Correlation Analysis:

Linear correlation is measured by calculating the Pearson correlation coefficient. This

coefficient is symbolized by r for a sample of data values and by the Greek letter ρ for a

population. It is common practice to simply refer to this as the correlation coefficient.

The correlation coefficient varies between -1.00 and +1.00. An r value of 1 indicates a perfect

positive linear correlation. This happens when the values of both variables increase together

and their coordinates on a scatter plot form a straight line. An r value of -1 indicates a perfect

negative linear correlation. This happens when the values of one variable increases while the

other variable decreases and their coordinates on a scatter plot form a straight line. Values of

r that are not zero show decreasing significance as they approach zero. The scatter plot of

variables with r values not equal to 1 or -1 does not form a straight line.

What is the purpose of Correlation Analysis?

Correlation analysis shows the extent to which two quantitative variables vary together,

including the strength and direction of their relationship. The strength of the relationship

refers to the extent to which one variable predicts the other. For example, in a study of

consumers, you might find that the amount of money spent weekly on groceries varies

directly with the size of the household; you'd expect it to be a strong positive relationship.

However, you'd expect to find a weak correlation (or none at all) between amount spent

weekly on groceries and scores on a customer satisfaction survey. The direction of the

relationship shows whether the two variables vary together directly or inversely. In a direct

relationship, the two variables increase together. In an inverse relationship, one variable tends

to decrease as the other increases.

Correlation analysis can be used to make inferences about one variable which cannot be

easily measured based on on which can be. For example, we cannot measure sales of a

product which hasn't yet been produced or marketed. Correlation analysis of similar products

may show us the variables which affect sales. The snack food industry might do a correlation

analysis of sales of snack foods with salt content, discovering that the more salt in potato

chips, the higher the sales. This might lead to a business decision to produce snack foods with

increasing amounts of salt, with the goal of driving sales.

Page 4: Correlation analysis

Divyanshu Singh, Yalin Wang

MBA, NGO-Management

Business Statistics

Date: 16/17 December 2011

SPSS software enables such statistical calculations and analysis. Detail on this

follows:

SPSS

Steps involved:

1) Create data for analysis or upload an existing data file.

2) Select the data required to analyse.

3) Correlate the data using Bivariate option.

Using SPSS:

The Bivariate Correlations procedure computes the pairwise associations for a set

of variables and displays the results in a matrix. It is useful for determining the

strength and direction of the association between two scale or ordinal variables.

For quantitative, normally distributed variables, choose the Pearson correlation

coefficient. If your data are not normally distributed or have ordered categories,

choose Kendall's tau-b or Spearman, which measure the association between rank

orders.

To check if the result is not by chance. When your research hypothesis states the

direction of the difference or relationship, then you use a one-tailed probability.

While it is generally safest to use a two-tailed test, there are situations where a

one-tailed test seems more appropriate.

Flag significant correlations. Correlation coefficients significant at the 0.05 level

are identified with a single asterisk, and those significant at the 0.01 level are

identified with two asterisks.

Page 5: Correlation analysis

Divyanshu Singh, Yalin Wang

MBA, NGO-Management

Business Statistics

Date: 16/17 December 2011

How to interpret the results from SPSS?

Pearson's correlation coefficient (0.891) is significant at the 0.000 level which means that

they are positively correlated.

How to interpret the results from SPSS?

The plot shows that efficiency increases with automation and the number of working hours

decreases due to less intake of human resources.

Interpretation of Case Study

Increase in the use of Technology and Efficient service by the bank increase

simultaneously

Computerized techniques and ATMs made banking easier and faster.

Automated procedures reduced errors caused manually.

The same applies to other variables as well.

But there is no causation between these variables.