Chapter 5: Description of Bivariate Data€¦ · · 2007-11-06Chapter 5: Description of Bivariate...

Chapter 5: Description of Bivariate Data

James B. Ramsey

Department of Ecconomics, NYU

September 2007

(Institute) Chapt. 5 September 2007 1 / 23

Bivariate Data: Some Preliminaries

Bivariate data are pairs of observations, two measurements on asingle referent;

e.g. mid-term and �nal grades for students; heights & weights ofenrollees ina �tness program; changes in GNP & investment levels for a country.

Could plot a 3-dimensional histogram, plot the frequencies, Fi ,j ,

by cell over cells (i,j) de�ned by intervals on each measurement;

but requires a lot of data and is di¢ cult to analyze.

The analysis of bivariate data comprises two aspects:

1 Analysis of each of the component series by themselves;2 Explore the "association" between them; this will be our concentration.

An early step in the analysis is to standardize the individualcomponent series.

Create a "scatter plot."

The analysis of bivariate data comprises two aspects:1 Analysis of each of the component series by themselves;

2 Explore the "association" between them; this will be our concentration.

The analysis of bivariate data comprises two aspects:1 Analysis of each of the component series by themselves;2 Explore the "association" between them; this will be our concentration.

There are Three Types of Scatter Plots of StandardizedVariables

1 Positive association with scatter;

2 Negative association with scatter;3 No association; just random scatter.

Review the sets of Box & Whisker plots of bivariate data.

Plot the line of the medians of the B & W plots;

Consider the product of the measures; xiyi .

There are three cases.

1 Average product is positive;2 Average product is negative;3 Average product is nearly zero

1 Positive association with scatter;2 Negative association with scatter;

3 No association; just random scatter.

1 Positive association with scatter;2 Negative association with scatter;3 No association; just random scatter.

There are three cases.1 Average product is positive;

2 Average product is negative;3 Average product is nearly zero

There are three cases.1 Average product is positive;2 Average product is negative;

3 Average product is nearly zero

There are three cases.1 Average product is positive;2 Average product is negative;3 Average product is nearly zero

De�ne the First Cross Product Moment

We de�ne: m1,1(xst , yst ) = ∑Ni=1(xsti ysti )/N;

this is the �rst cross product moment of the standardized variables.Note immediately that if we substitute for xst , yst in de�nition ofm1,1(xst , yst ) obtain:

∑i=1(xsti ysti )/N =

N�1 ∑(xi � x)(yi � y)pm2(x)

pm2(y)

De�ne the First Cross Product Moment

We de�ne: m1,1(xst , yst ) = ∑Ni=1(xsti ysti )/N;

this is the �rst cross product moment of the standardized variables.Note immediately that if we substitute for xst , yst in de�nition ofm1,1(xst , yst ) obtain:

∑i=1(xsti ysti )/N =

N�1 ∑(xi � x)(yi � y)pm2(x)

pm2(y)

This is the �rst cross product moment standardized; theunstandardized is:

m1,1(x , y) = N�1 ∑(xi � x)(yi � y)

is termed the covariance between x and yThe unit of measurement for the covariance is "unit of x" times "unitof Y"

m1,1(xst , yst ) is dimensionless, a pure number.

Occurs so often that it has its own symbol,r, and name: "correlationcoe¢ cient"

m1,1(x , y) = N�1 ∑(xi � x)(yi � y)

is termed the covariance between x and y

The unit of measurement for the covariance is "unit of x" times "unitof Y"

m1,1(x , y) = N�1 ∑(xi � x)(yi � y)

A Special Case: Linear (Functional) Relationship

If yi = α+ βxi ,what is the value of m1,1(xst , yst )?

m1,1(x , y)pm2(x)

pm2(y)

=N�1 ∑(xi � x)(α+ βxi � [α+ βx ])p

m2(x)pm2(y)

=βm2(x)p

m2(x)pm2(y)

=βpm2(x)pm2(y)

m2(y) = N�1 ∑(α+ βxi � [α+ βx ])2 = β2m2(x)

βpm2(x)q

β2m2(x)=

βpm2(x)

jβjpm2(x)

= �1

m1,1(x , y)pm2(x)

pm2(y)

=N�1 ∑(xi � x)(α+ βxi � [α+ βx ])p

m2(x)pm2(y)

=βm2(x)p

m2(x)pm2(y)

=βpm2(x)pm2(y)

m2(y) = N�1 ∑(α+ βxi � [α+ βx ])2 = β2m2(x)

βpm2(x)q

β2m2(x)=

βpm2(x)

jβjpm2(x)

= �1

m1,1(x , y)pm2(x)

pm2(y)

=N�1 ∑(xi � x)(α+ βxi � [α+ βx ])p

m2(x)pm2(y)

=βm2(x)p

m2(x)pm2(y)

=βpm2(x)pm2(y)

m2(y) = N�1 ∑(α+ βxi � [α+ βx ])2 = β2m2(x)

βpm2(x)q

β2m2(x)=

βpm2(x)

jβjpm2(x)

= �1

m1,1(x , y)pm2(x)

pm2(y)

=N�1 ∑(xi � x)(α+ βxi � [α+ βx ])p

m2(x)pm2(y)

=βm2(x)p

m2(x)pm2(y)

=βpm2(x)pm2(y)

m2(y) = N�1 ∑(α+ βxi � [α+ βx ])2 = β2m2(x)

βpm2(x)q

β2m2(x)=

βpm2(x)

jβjpm2(x)

= �1

The Structural Model.

We consider:

yi = α+ βxi + eiei a random variable unobserved

For simplicity, let x = e = α = 0

Important : ∑ xiei �= 0

m1,1(x , y) = N�1 ∑ixiyi

= N�1[β ∑ix2i +∑ xiei ]

�= βm2(x)

The Structural Model.

We consider:

yi = α+ βxi + eiei a random variable unobserved

For simplicity, let x = e = α = 0

Important : ∑ xiei �= 0

m1,1(x , y) = N�1 ∑ixiyi

= N�1[β ∑ix2i +∑ xiei ]

�= βm2(x)

Structural Model continued:

m2(y) = m2(βxi + ei )

= N�1Σ(βxi + ei )2

= β2m2(x) +m2(e) + 2βm1,1(x , e)

= β2m2(x)[1+m2(e)

β2m2(x)]

Calculating the correlation coe¢ cient, r, obtain by substitution;

Structural Model continued:

m2(y) = m2(βxi + ei )

= N�1Σ(βxi + ei )2

= β2m2(x) +m2(e) + 2βm1,1(x , e)

= β2m2(x)[1+m2(e)

β2m2(x)]

Calculating the correlation coe¢ cient, r, obtain by substitution;

r =m1,1(x , y)pm2(x)

pm2(y)

βm2(x)pm2(x)

rβ2m2(x)[1+

m2(e)β2m2(x )

�1[1+ m2(e)

β2m2(x )]

Explore Correlation & Variation in Random Variable

r =�1

[1+ m2(e)β2m2(x )

If m2(e) �= 0, then see that r �= �1 re�ecting the near functionalityof y on x.

If β2m2(x) is small relative to m2(e);i.e. the ratio is large, r �= 0.

r =�1

[1+ m2(e)β2m2(x )

r =�1

[1+ m2(e)β2m2(x )

Correlation & Slope of y on x

For yi = βxi + ei , and retaining restriction that ∑ xiei �= 0, we havefrom above:

m1,1(x , y) �= βm2(x)

so that : β �= m1,1(x , y)m2(x)

β[units ] =unit(x) times unit(y)

unit(x)2

=unit(y)unit(x)

Correlation & Slope of y on x

For yi = βxi + ei , and retaining restriction that ∑ xiei �= 0, we havefrom above:

m1,1(x , y) �= βm2(x)

so that : β �= m1,1(x , y)m2(x)

β[units ] =unit(x) times unit(y)

unit(x)2

=unit(y)unit(x)

To convert to a dimensionless measure, we multiply β by:

βunit(x)unit(y)

=m1,1(x , y)m2(x)

pm2(x)pm2(y)

=m1,1(x , y)pm2(x)

pm2(y)

The slope coe¢ cient is similar to the correlation coe¢ cient,but in units of y per unit of x.

βunit(x)unit(y)

=m1,1(x , y)m2(x)

pm2(x)pm2(y)

=m1,1(x , y)pm2(x)

pm2(y)

βunit(x)unit(y)

=m1,1(x , y)m2(x)

pm2(x)pm2(y)

=m1,1(x , y)pm2(x)

pm2(y)

Approximations to the slope coe¢ cient.

Results above obtained assuming

∑ixiei = 0

But what if this is only approximately true? We de�ne β by:

β =m1,1(x , y)m2(x)

=N�1 ∑(xiyi )m2(x)

=N�1 ∑(xi [βxi + ei ])

= β+N�1 ∑(xiei )m2(x)

∑ixiei = 0

β =m1,1(x , y)m2(x)

=N�1 ∑(xiyi )m2(x)

=N�1 ∑(xi [βxi + ei ])

= β+N�1 ∑(xiei )m2(x)

∑ixiei = 0

β =m1,1(x , y)m2(x)

=N�1 ∑(xiyi )m2(x)

=N�1 ∑(xi [βxi + ei ])

= β+N�1 ∑(xiei )m2(x)

∑ixiei = 0

β =m1,1(x , y)m2(x)

=N�1 ∑(xiyi )m2(x)

=N�1 ∑(xi [βxi + ei ])

= β+N�1 ∑(xiei )m2(x)

Degree of approximation of β to β depends on magnitude ofN�1 ∑(xiei ) relative to the magnitude of m2(x).

∑(xiei ) may be positive or negative, so β may be larger or smallerthan β.

Spearman�s Rank Correlation Coe¢ cient

Note �rst the loss of information from using ranked data,instead of the raw observations; ordinal data as opposed to cardinal.

For pairs of observations; (xi , yi ), let Sibe the rank of xiand Ti the rank of yi ; Spearman�rank correlation, rS ,is:

rS =∑ SiTi �∑ Si ∑Ti/Nq

∑ S2i �(∑ Si )2N

q∑T 2i �

(∑Ti )2N

Recognize that division by N in numerator & denominator has beencancelled.

Or calculation of rS can be simpli�ed by setting Di = Si � Ti :

rS = 1�6∑D2i

N(N2 � 1)

For pairs of observations; (xi , yi ), let Sibe the rank of xi

and Ti the rank of yi ; Spearman�rank correlation, rS ,is:

∑ S2i �(∑ Si )2N

q∑T 2i �

(∑Ti )2N

rS = 1�6∑D2i

N(N2 � 1)

∑ S2i �(∑ Si )2N

q∑T 2i �

(∑Ti )2N

rS = 1�6∑D2i

N(N2 � 1)

∑ S2i �(∑ Si )2N

q∑T 2i �

(∑Ti )2N

rS = 1�6∑D2i

N(N2 � 1)

∑ S2i �(∑ Si )2N

q∑T 2i �

(∑Ti )2N

rS = 1�6∑D2i

N(N2 � 1)

∑ S2i �(∑ Si )2N

q∑T 2i �

(∑Ti )2N

rS = 1�6∑D2i

N(N2 � 1)

∑ S2i �(∑ Si )2N

q∑T 2i �

(∑Ti )2N

rS = 1�6∑D2i

N(N2 � 1)

Result is obtained by recognizing that:

∑ Si = ∑Ti =N(N + 1)

∑ S2i = ∑T 2i =N(N + 1)(2N + 1)

If Di is zero for all i; rS will be one; cf. the functional case.

If Si , Ti are perfectly negatively correlated, rSwill be -1.

∑ Si = ∑Ti =N(N + 1)

∑ S2i = ∑T 2i =N(N + 1)(2N + 1)

∑ Si = ∑Ti =N(N + 1)

∑ S2i = ∑T 2i =N(N + 1)(2N + 1)

Analysis of Bivariate Categorical Data

See overheads for example of bivariate categorical data;a 2x2 table of frequencies, Fi ,j = count in the ith row and jth column.

Three classes of relative freqencies:

1 Row relative frequencies; divide entries by row totals;row entries sum to one by rows.

2 Column relative frequencies; divide entries by column totals;column entries sum to one column by column.

3 Joint relative frequencies; divide all entries by overall total;sum of all relative frequencies, rows & columns, adds to 1.

Three classes of relative freqencies:

1 Row relative frequencies; divide entries by row totals;row entries sum to one by rows.

Three classes of relative freqencies:1 Row relative frequencies; divide entries by row totals;row entries sum to one by rows.

There are three types of comparisons:

1 Comparison across rows; how dissimilar are the rows?

2 Comparison across columns; how dissimilar are the columns?3 Are the entries associated, or non-associated?

Questions answered by using the coe¢ cient of association

Consider a 2x2 table of relative/absolute frequencies, labeled:

a b a+ bc d c + d

a+ c b+ d a+ b+ c + d

1 Comparison across rows; how dissimilar are the rows?2 Comparison across columns; how dissimilar are the columns?

3 Are the entries associated, or non-associated?

a b a+ bc d c + d

a+ c b+ d a+ b+ c + d

1 Comparison across rows; how dissimilar are the rows?2 Comparison across columns; how dissimilar are the columns?3 Are the entries associated, or non-associated?

a b a+ bc d c + d

a+ c b+ d a+ b+ c + d

a b a+ bc d c + d

a+ c b+ d a+ b+ c + d

a b a+ bc d c + d

a+ c b+ d a+ b+ c + d

a b a+ bc d c + d

a+ c b+ d a+ b+ c + d

Introduction to Measures of Non-association:

If non-association across rows:

c + da(c + d) = c(a+ b)

ad = cb

Similarly for non-association across columns:

b+ da(b+ d) = b(a+ c)

ad = cb

Non-association is represented by [ad - cb] = 0;Association is represented by [ad - cb] 6= 0;

c + da(c + d) = c(a+ b)

ad = cb

b+ da(b+ d) = b(a+ c)

ad = cb

c + da(c + d) = c(a+ b)

ad = cb

b+ da(b+ d) = b(a+ c)

ad = cb

c + da(c + d) = c(a+ b)

ad = cb

b+ da(b+ d) = b(a+ c)

ad = cb

c + da(c + d) = c(a+ b)

ad = cb

b+ da(b+ d) = b(a+ c)

ad = cb

Non-association is represented by [ad - cb] = 0;

Association is represented by [ad - cb] 6= 0;

c + da(c + d) = c(a+ b)

ad = cb

b+ da(b+ d) = b(a+ c)

ad = cb

The Coe¢ cient of Association de�ned.

Two problems:

1 How to interpret the sign of the di¤erence [ad - cb] ?2 What constitutes a "big" di¤erence?

We de�ne the coe¢ cient of association, φ by:

φ =ad � bcp

(a+ b)(c + d)(a+ c)(b+ d)

Maximum absolute value for φ is obtained when b = c = 0, or whena = d = 0.

φ is dimensionless in that numerator is a di¤erence of a product;and denominator is the square root of a product of four terms.

maximum φ is 1.

Two problems:1 How to interpret the sign of the di¤erence [ad - cb] ?

2 What constitutes a "big" di¤erence?

φ =ad � bcp

(a+ b)(c + d)(a+ c)(b+ d)

maximum φ is 1.

Two problems:1 How to interpret the sign of the di¤erence [ad - cb] ?2 What constitutes a "big" di¤erence?

φ =ad � bcp

(a+ b)(c + d)(a+ c)(b+ d)

maximum φ is 1.

φ =ad � bcp

(a+ b)(c + d)(a+ c)(b+ d)

maximum φ is 1.

φ =ad � bcp

(a+ b)(c + d)(a+ c)(b+ d)

maximum φ is 1.

φ =ad � bcp

(a+ b)(c + d)(a+ c)(b+ d)

maximum φ is 1.

φ =ad � bcp

(a+ b)(c + d)(a+ c)(b+ d)

maximum φ is 1.

φ =ad � bcp

(a+ b)(c + d)(a+ c)(b+ d)

maximum φ is 1.

Interpretation of Sign of Coe¢ cient

Where the categories are merely categories, no ordinal rankingthe sign of φ is uninterpretable.

Where the categories have an ordinal interpretation,sign of φ is interpretable; e.g. the categories of income & bloodpressure.

For those with some matrix algebra, the term (ad - cb) is thedeterminant.

END OF CHAPTER FIVE.

Chapter 5: Description of Bivariate Data€¦ · · 2007-11-06Chapter 5: Description of Bivariate...

Documents

Transcript of Chapter 5: Description of Bivariate Data€¦ · · 2007-11-06Chapter 5: Description of Bivariate...