Session One presentation: 7:30-8:00 Session Two presentation: 8-8:30.
8 Correlations [Session 8]
-
Upload
rahul-agarwal -
Category
Documents
-
view
227 -
download
0
Transcript of 8 Correlations [Session 8]
-
8/3/2019 8 Correlations [Session 8]
1/37
CoCo--relation Analysisrelation Analysis
Session 10Session 10
-
8/3/2019 8 Correlations [Session 8]
2/37
Introduction
Is there an association between two or more
variables? If yes, what is form and degree of that
relationship?Is the relationship strong or significant enough to
be useful to arrive at a desirable conclusion?
Can the relationship be used for predictive
purposes, that is, to predict the most likely value
of a dependent variable corresponding to the
given value of independent variable or variables?
-
8/3/2019 8 Correlations [Session 8]
3/37
DefinitionDefinition
CorrelationCorrelation
existsexists betweenbetween twotwo variables variables whenwhenoneone ofof themthem isis relatedrelated toto thethe otherother inin
somesome wayway
-
8/3/2019 8 Correlations [Session 8]
4/37
AssumptionsAssumptions
1)1) TheThe samplesample ofof paired paired datadata ((x,yx,y)) isis aarandomrandom samplesample..
2)2) TheThe pairspairs ofof ((x,yx,y)) datadata havehave aa bivariatebivariatenormalnormal distributiondistribution..
-
8/3/2019 8 Correlations [Session 8]
5/37
Methods of Correlation Analysis
In this chapter, the following methods of finding the correlationcoefficient between two variables xandyare discussed:
Scatter Diagram method
Karl Pearsons Coefficient of Correlation method
Spearmans Rank Correlation method
Method of Least-squares
Figure shows how the strength of the association between two
variables is represented by the coefficient of correlation.
Negative Correlation Positive Correlation
1.00 0.50 0 + 0.50 + 1.00
Perfect negative
correlation
Moderate negative
correlation
No correlation Moderate positive
correlation
Strong negativecorrelation
Weak negativecorrelation
Weak positivecorrelation
Strong positivecorrelation
Perfect positivecorrelation
-
8/3/2019 8 Correlations [Session 8]
6/37
DefinitionDefinition
Scatterplot (or scatter diagram)Scatterplot (or scatter diagram)
isis aa graphgraph inin whichwhich thethe pairedpaired ((x,yx,y))samplesample datadata areare plotted plotted with with aahorizontalhorizontal xxaxisaxis andand aa verticalverticalyyaxisaxis..
EachEach individualindividual ((x,yx,y)) pairpair isis plottedplottedasas aa singlesingle pointpoint..
-
8/3/2019 8 Correlations [Session 8]
7/37
Scatter Diagram of Paired DataScatter Diagram of Paired Data
-
8/3/2019 8 Correlations [Session 8]
8/37
Scatter Diagram of Paired Data
-
8/3/2019 8 Correlations [Session 8]
9/37
Positive Linear CorrelationPositive Linear Correlation
x x
yy y
x
Scatter Plots
(a) Positive (b) Strongpositive
(c) Perfectpositive
-
8/3/2019 8 Correlations [Session 8]
10/37
Negative Linear CorrelationNegative Linear Correlation
x x
yy y
x(d) Negative (e) Strong
negative(f) Perfect
negative
Scatter Plots
-
8/3/2019 8 Correlations [Session 8]
11/37
No Linear CorrelationNo Linear Correlation
xx
yy
(g) No Correlation (h) Nonlinear Correlation
Scatter Plots
-
8/3/2019 8 Correlations [Session 8]
12/37
Karl Pearson's Correlation
Coefficient
-
8/3/2019 8 Correlations [Session 8]
13/37
7fdxdy - (7fdx)(7fdy)/N
(SDx) (SDy)r=
Definition Karl Pearson[For Classified data]
Correlation Coefficient r
SDx = fdx (fdx)/N
SDy = fdy (fdy)/N
dx = Xi A
dy = Yi A
-
8/3/2019 8 Correlations [Session 8]
14/37
Example 1Example 1
FindFind coefficientcoefficient ofof correlationcorrelation betweenbetween heightheight (X)(X) andand
weightweight (Y)(Y) fromfrom thethe followingfollowing datadata.. Also,Also, obtainobtain thethe
twotwo regressionregression lineline..
HeightHeight 6161 6565 6868 6262 6060
Weight Weight 6262 5555 7070 6060 5353
-
8/3/2019 8 Correlations [Session 8]
15/37
Answer 1Answer 1
rr == 00..6565
XX 6363..22 == 00..3232(Y(Y 6060))
YY 6060 == 11..3333(X(X 6363..22))
-
8/3/2019 8 Correlations [Session 8]
16/37
Example 2Example 2
GivenGiven thethe twotwo regressionregression lineslines
44xx 55yy ++ 3333 == 00
2020xx 99yy 107107 == 00AndAnd variancevariance ofofxx beingbeing99,, calculatecalculate
1)1) MeanMean xx andand MeanMean yy
2)2) CorrelationCorrelation CoefficientCoefficient ofofxx && yy3)3) SDSD ofof yy
-
8/3/2019 8 Correlations [Session 8]
17/37
Answer 2Answer 2
MeanMean XX == 1313,, MeanMean YY == 1717
rr == 00..66
VarianceVariance ofof y y ==1616
-
8/3/2019 8 Correlations [Session 8]
18/37
Round toRound to threethree decimal placesdecimal places
Use calculator or computer if possibleUse calculator or computer if possible
Rounding the
Linear Correlation Coefficient r
-
8/3/2019 8 Correlations [Session 8]
19/37
Properties of theProperties of the
Linear Correlation CoefficientLinear Correlation Coefficientrr
1.1. --11ee rree 11
2. Value of2. Value ofrrdoes not change if all values ofdoes not change if all values ofeither variable are converted to a differenteither variable are converted to a different
scale.scale.
3.3. TheThe rris not affected by the choice ofis not affected by the choice ofxxandandyy..InterchangeInterchange xxandandyyand the value ofand the value ofrrwill notwill notchange.change.
4.4. rr measures strength of ameasures strength of a linearlinear relationship.relationship.
-
8/3/2019 8 Correlations [Session 8]
20/37
Interpreting the Linear CorrelationInterpreting the Linear Correlation
CoefficientCoefficient rr == ++ 11 :: PerfectPerfect PositivePositive CorrelationCorrelation
rr == 11 :: PerfectPerfect NegativeNegative CorrelationCorrelation
rr == 00 :: UncorrelatedUncorrelated CorrelationCorrelation
StandardStandard ErrorError (S(S..EE..)) == ((11 rr22)/)/N,N, NN == pair pair ofofobservationsobservations
ProbableProbable ErrorError == 00..67456745 XX SS..EE..
-
8/3/2019 8 Correlations [Session 8]
21/37
Spearmans Rank Correlation Coefficient
This method is applied to measure the association between two variables when
only ordinal (or rank) data are available. In other words, this method is applied in a
situation in which quantitative measure of certain qualitative factors such as
judgment, brands personalities, TV programmes, leadership, colour, taste, cannot
be fixed, but individual observations can be arranged in a definite order(also called
rank). The ranking is decided by using a set of ordinal rank numbers, with 1 for the
individual observation ranked first either in terms ofquantity orquality; and n for the
individual observation ranked last in a group of n pairs of observations.
Mathematically, Spearmans rank correlation coefficient is defined as:
where R = rank correlation coefficient
R1 = rank of observations with respect to first variable
R2 = rank of observations with respect to second variable
d = R1 R2, difference in a pair of ranksn = number of paired observations or individuals being ranked
The number 6 is placed in the formula as a scaling device, it ensures that the
possible range of R is from 1 to 1. While using this method we may come across
three types of cases.
-
8/3/2019 8 Correlations [Session 8]
22/37
Advantages
This method is easy to understand and its application issimpler than Pearsons method.
This method is useful for correlation analysis when variables
are expressed in qualitative terms like beauty, intelligence,
honesty, efficiency, and so on.This method is appropriate to measure the association between
two variables if the data type is at least ordinal scaled (ranked)
The sample data of values of two variables is converted into
ranks either in ascending order or descending order forcalculating degree of correlation between two variables.
-
8/3/2019 8 Correlations [Session 8]
23/37
Disadvantages
Values of both variables are assumed to be normallydistributed and describing a linear relationship rather than non-
linear relationship.
A large computational time is required when number of pairs
of values of two variables exceed 30.This method cannot be applied to measure the association
between two variable grouped data.
-
8/3/2019 8 Correlations [Session 8]
24/37
Rank Order CorrelationRank Order Correlation
HitsHits RankRank HRHR RankRank DD DD22
11 1010 33 88 22 44
22 99 44 77 22 44
33 88 55 66 22 44
44 77 11 1010 --33 9955 66 77 44 22 44
66 55 66 55 00 00
77 44 22 99 --55 2525
88 33 1010 11 22 44
99 22 99 22 00 00
1010 11 88 33 22 44
-
8/3/2019 8 Correlations [Session 8]
25/37
Rank Order Correlation, contRank Order Correlation, cont
HitsHits RankRank HRHR RankRank DD DD22
11 1010 33 88 22 44
22 99 44 77 22 44
33 88 55 66 22 44
44 77 11 1010 --33 99
55 66 77 44 22 44
66 55 66 55 00 00
77 44 22 99 --55 2525
88 33 1010 11 22 44
99 22 99 22 00 00
1010 11 88 33 22 44
Rho = 1- [6 (D2) / N (N2-1)]
Rho = 1- [6(58)/10(102-1)]
Rho = 1- [348 / 10 (100 -1)]
Rho = 1- [348 / 990]
Rho = 1- 0.352
Rho = 0.648
(D2
= 58)N=10
-
8/3/2019 8 Correlations [Session 8]
26/37
PearsonsPearsons rrH
itsH
itsHRHR 77
xyxy11 33 33
22 44 88
33 55 1515
44 11 44
55 77 3535
66 66 3636
77 22 1414
88 1010 8080
99 99 8181
1010 88 8080
77x/nx/n
=5.5=5.5
77x/nx/n
= 5.5= 5.577xy/nxy/n
==32.8632.86
7xy/n - (7x/n)(7y/n)
(SDx) (SDy)r=
r= 32.86 - (5.5) (5.5)/(3.03) (3.03)
r= 35.86 - 30.25 / 9.09
r= 5.61 / 9.09
r= 0.6172
-
8/3/2019 8 Correlations [Session 8]
27/37
Example 3Example 3 Compute the correlation coefficient:Compute the correlation coefficient:
Age ofAge of
husbandshusbands
Age of wivesAge of wives
1515--2525 2525--3535 3535--4545 4545--5555 5555--6565 6565--7575 TotalTotal
1515--2525 11 11 -- -- -- -- 222525--3535 22 1212 11 -- -- -- 1515
3535--4545 -- 44 1010 11 -- -- 1515
4545--5555 -- -- 33 66 11 -- 1010
5555--6565 -- -- -- 22 44 22 88
6565--7575 -- -- -- -- 11 22 33
TotalTotal 33 1717 1414 99 66 44 5353
-
8/3/2019 8 Correlations [Session 8]
28/37
Sol.3Sol.3 1515--2525 2525--3535 3535--4545 4545--5555 5555--6565 6565--7575
ff fdfdyy fdfdyy fdfdxxddyy
1515--2525
2525--3535
3535--4545
4545--5555
5555--6565
6565--7575
ff
fdfdxx
fdfdxx
fdfdxxdd
yy
X
dx
dyY
-
8/3/2019 8 Correlations [Session 8]
29/37
Sol.3Sol.3 1515--2525 2525--3535 3535--4545 4545--5555 5555--6565 6565--7575
--22 --11 00 +1+1 +2+2 +3+3 ff fdfdyy fdfdyy fdfdxxddyy
1515--2525 --22 11 11 -- -- -- -- 22
2525--3535 --11 22 1212 11 -- -- -- 1515
3535--4545 00 -- 44 1010 11 -- -- 1515
4545--5555 +1+1 -- -- 33 66 11 -- 1010
5555--6565 +2+2 -- -- -- 22 44 22 88
6565--7575 +3+3 -- -- -- -- 11 22 33
ff 33 1717 1414 99 66 44 5353
fdfdxx
fdfdxx
fdfdxxdd
yy
X
dx
dyY
-
8/3/2019 8 Correlations [Session 8]
30/37
Sol.3Sol.3 1515--2525 2525--3535 3535--4545 4545--5555 5555--6565 6565--7575
--22 --11 00 +1+1 +2+2 +3+3 ff fdfdyy fdfdyy fdfdxxddyy
1515--2525 --22 11 11 -- -- -- -- 22 --44 88 66
2525--3535 --11 22 1212 11 -- -- -- 1515 --1515 1515 1616
3535--4545 00 -- 44 1010 11 -- -- 1515 00 00 00
45
45--5555 +1+1 -- -- 33 66 11 -- 1010 +10+10 1010 88
5555--6565 +2+2 -- -- -- 22 44 22 88 +16+16 3232 3232
6565--7575 +3+3 -- -- -- -- 11 22 33 +9+9 2727 2424
ff
33 1717 14
14 99
6644
5353 +16+169
29
2 8686fdfd
xx --66 --1717 00 99 1212 1212 +1+1
00
fdfdxx 1212 1717 00 99 2424 3636 9898
fdfdxxddyy 88 1414 00 1010 2424 3030 8686
X
dx
dyY
4
4
2
12 0
0 0 0
0 6 2
4 16 12
6 18
-
8/3/2019 8 Correlations [Session 8]
31/37
Sol.3Sol.3 1515--2525 2525--3535 3535--4545 4545--5555 5555--6565 6565--7575
--22 --11 00 +1+1 +2+2 +3+3 ff fdfdyy fdfdyy fdfdxxddyy
1515--2525 --22 11 11 -- -- -- -- 22 --44 88 66
2525--3535 --11 22 1212 11 -- -- -- 1515 --1515 1515 1616
3535--4545 00 -- 44 1010 11 -- -- 1515 00 00 00
4545--5555 +1+1 -- -- 33 66 11 -- 1010 +10+10 1010 885555--6565 +2+2 -- -- -- 22 44 22 88 +16+16 3232 3232
6565--7575 +3+3 -- -- -- -- 11 22 33 +9+9 2727 2424
ff
33 1717 1414 99 66 44 5353 +16+16 9292 8686fdfd
xx --66 --1717 00 99 1212 1212 +1+1
00
fdfdxx 1212 1717 00 99 2424 3636 9898
fdfdxxddyy 88 1414 00 1010 2424 3030 8686
X
dx
dyY
4
4
2
12 0
0 0 0
0 6 2
416 12
6 18
r= 0.907
-
8/3/2019 8 Correlations [Session 8]
32/37
0.27
2
1.41
3
2.19
3
2.83
6
2.19
4
1.81
2
0.85
1
3.05
5
Data from the Garbage Project
x Plastic (lb)
y Household
Is there a significant linear correlation?
-
8/3/2019 8 Correlations [Session 8]
33/37
0.27
2
1.41
3
2.19
3
2.83
6
2.19
4
1.81
2
0.85
1
3.05
5
Data from the Garbage Project
x Plastic (lb)
y Household
Is there a significant linear correlation?
Plastic Household
0.27 2
1.41 3
2.19 3
2.83 6
2.19 41.81 2
0.85 1
3.05 5
-
8/3/2019 8 Correlations [Session 8]
34/37
0.27
2
1.41
3
2.19
3
2.83
6
2.19
4
1.81
2
0.85
1
3.05
5
Data from the Garbage Project
x Plastic (lb)
y Household
Is there a significant linear correlation?
r= 0.842
R2 = 0.71
-
8/3/2019 8 Correlations [Session 8]
35/37
Correlation AnalysisCorrelation Analysis
Vs. Regression AnalysisVs. Regression Analysis
CorrelationCorrelation meansmeans thethe relationshiprelationship betweenbetween
twotwo or or moremore variablesvariables toto measuremeasure thethedirectiondirection andand degreedegree ofof linearlinear relationshiprelationship..
RegressionRegression analysisanalysis aimsaims atat establishingestablishing thethe
functionalfunctional relationshiprelationship..
-
8/3/2019 8 Correlations [Session 8]
36/37
Correlation does not imply causationCorrelation does not imply causation
CorrelationCorrelation doesdoes notnot implyimply causationcausation isis aa phrasephraseusedused inin thethe sciencessciences andand statisticsstatistics toto emphasizeemphasizethatthat correlationcorrelation betweenbetween twotwo variables variables doesdoes notnotimplyimply therethere isis aa causecause--andand--effecteffect relationshiprelationship
betweenbetween thethe twotwo.. ItsIts converse,converse, correlationcorrelation provesprovescausation,causation, isis aa logicallogical fallacyfallacy byby whichwhich twotwo eventseventsthatthat occuroccur togethertogether areare claimedclaimed toto havehave aa causecause--andand--effecteffect relationshiprelationship.. ForFor example,example,
AA occursoccurs inin correlationcorrelation withwith BB..
Therefore,Therefore, AA causescauses BB..
ThisThis isis aa logicallogical fallacyfallacy becausebecause therethere areare atat leastleastfourfour otherother possibilitiespossibilities::
-
8/3/2019 8 Correlations [Session 8]
37/37
Correlation does not imply causationCorrelation does not imply causation
1.1. BB maymay bebe thethe causecause ofof A,A, oror
2.2. somesome unknownunknown thirdthird factorfactor isis actuallyactually thethe causecause ofof
thethe relationshiprelationship betweenbetween AA andand B,B, oror3.3. thethe "relationship""relationship" isis soso complexcomplex itit cancan bebe labeledlabeled
coincidentalcoincidental (i(i..ee..,, twotwo eventsevents occurringoccurring atat thethe samesametimetime thatthat havehave nono simplesimple relationshiprelationship toto eacheach otherotherbesidesbesides thethe factfact thatthat theythey areare occurringoccurring atat thethe samesame
time)time)..4.4. BB maymay bebe thethe causecause ofof AA atat thethe samesame timetime asas AA isis
thethe causecause ofof BB (contradicting(contradicting thatthat thethe onlyonlyrelationshiprelationship betweenbetween AA andand BB isis thatthat AA causescauses B)B)..ThisThis describesdescribes aa selfself--reinforcingreinforcing systemsystem..