Sum of Variances

8/19/2019 Sum of Variances

1/11

Lecture 30: Covariance, Variance of Sums

Friday, April 20

1 Expectation of Products of Independent Ran-

dom Variables

We shall assume (unless otherwise mentioned) that the random variables we dealwith have expectations. We shall use the notation

µX = E [X ],

where X is a random variable.

Proposition. Let X and Y be independent random variables on the same probability

space. Let g and h be functions of one variable such that the random variables g(X )and h(Y ) are defined. Then

E [g(X )h(Y )] = E [g(X )] E [h(Y )] .

provided that either both X and Y are both discrete or that both X and Y have

densities.

Proof. We first assume that X and Y are both discrete. Let pX and pY be their

probability mass functions. Since they are independent, their joint probability massfunction is given by p(x, y) = pX (x) pY (y). Thus

E [g(X )h(Y )] =y

x

g(x)h(y) p(x, y)

=y

x

g(x)h(y) pX (x) pY (y)

=y

h(y) pY (y)x

g(x) pX (x)

= y

h(y) pY (y)E [g(X )]

= E [g(X )] E [h(Y )]

1


2/11

Next we assume that X and Y both have densities. Let f X and f Y be their densities.Since they are independent, their joint density is given by f (x, y) = f X (x)f Y (y).Thus

E [g(X )h(Y )] =∞

−∞

∞ −∞

g(x)h(y)f (x, y) dxdy

=

∞ −∞

∞ −∞

g(x)h(y)f X (x)f Y (y) dxdy

=

∞ −∞

h(y)f Y (y)

∞ −∞

g(x)f X (x) dx

dy

= E [g(X )]

∞ −∞

h(y)f Y (y) dy

= E [g(X )] E [h(Y )]

Corollary. If X and Y are independent random variables with expectations, then

E [XY ] = E [X ] E [Y ] .

2


3/11

2 Covariance

Definition. Let X and Y be random variables defined on the same probability space.

Assume that both X and Y have expectations and variances. The covariance of X and Y is defined by

Cov (X, Y ) = E [(X − µX ) (Y − µY )] .

Proposition (Some Properties of Covariance). Let X , Y , X j, j = 1, . . . m, and Y k,k = 1 . . . n, be random variables with expectations and variances, and assume that they are all defined on the same probability space.

1. Cov(X, Y ) = E [XY ] − E [X ]E [Y ].

2. If X and Y are independent. then Cov(X, Y ) = 0.

3. Cov(X, Y ) = Cov(Y, X ).

4. Cov(aX,Y ) = aCov(X, Y ). for a ∈ R.

5. Cov(X 1 + X 2, Y ) = Cov(X 1, Y ) + Cov(X 2, Y ).

6. Cov(m j=1

X j,

nk=1

Y k) =m j=1

nk=1

Cov(X j , Y k) .

Proof of 1.

Cov(X, Y ) = E [(X − µX ) (Y − µY )]

= E [XY − µX Y − µY X + µX µY ]

= E [XY ] − µX E [Y ] − µX E [Y ] + µX µY

= E [XY ] − E [X ] E [Y ]

Proof of 2. Assertion 2 follows from 1 because E [XY ] = E [X ]E [Y ] for independentrandom variables X and Y .

Remark. The converse is not true, as the following example shows.

3


4/11

Example 1. Let X be a random variable which satisfies

P {X = −1} = P {X = 0} = P {X = 1} = 1

3,

and let Y = 1 − X 2. Then E [X ] = 0 and X Y = 0. Thus,

Cov(X, Y ) = E [XY ] − E [X ]E [Y ] = 0

but X and Y are not independent. Indeed, Y is a function of X .

Proof of 3. Assertion 3 follows immediately from the definition of covariance.

Proof of 4.

Cov(aX,Y ) = E [aXY ] − E [aX ]E [Y ]= aE [Y ]

= aCov(X, Y )

Proof of 5.

Cov(X 1 + X 2, Y ) = E [(X 1 + X 2) Y ] − E [X 1 + X 2]E [Y ]

= E [X 1Y + X 2Y ] − (E [X 1] + E [X 2]) E [Y ]

= E [X 1Y ] + E [X 2Y ] − E [X 1] E [Y ] − E [X 2] E [Y ]= Cov(X 1, Y ) + Cov(X 2, Y )

4


5/11

Proof of 6. It follows from 5 , by induction on m, that for any m ∈ N

Cov m

j=1

X j, Y =m

j=1

Cov(X j , Y )

Using this and 1, we see that for any n ∈ N

Cov

X,

nk=1

Y k

= Cov

nk=1

Y k, X

=nk=1

Cov(Y k, X )

=n

k=1Cov(X, Y k)

Thus,

Cov

m j=1

X j,

nk=1

Y k

=

m j=1

Cov

X j,

nk=1

Y k

=m j=1

nk=1

Cov(X j, Y k)

3 Variance of a Sum

Proposition. Let X 1, . . . , X n be rnadom variables defined on the same probability

space and al l having expectations and variances. Then

Var

nk=1

X k

=

nk=1

Var (X k) + 2 j


6/11

Proof.

Var n

k=1

X k = Cov n

j=1

X j ,

n

k=1

X k=

n j=1

nk=1

Cov (X j, X k)

= j=k

Cov(X j , X k) + j=k

Cov(X j, X k)

Each pair, α, β , of indices with α = β occurs twice in the sum: once as (α, β ) andonce as (β, α). Since the terms Cov (X α, X β ) and Cov (X β , X α) are equal, we have

Var n

k=1

X k =n

k=1

Var (X k) + 2 j


7/11

Example 4. Suppose we want to estimate the mean height of ND undergraduatemales, that is, the arithmetic mean µ of the heights of all the individuals in thispopulation of N ≈ 4000. The parameter µ is called the population mean. We are

also interested in estimating the population variance

σ

2

, that is the arithmeticmean of the squares of the deviations of the individual heights from µ.

We select a random sample of size n of them. (Here random means that each of theN

n

possible samples of size n is equally likely.) We measure and record the heights

of the individuals in the sample.

Now let X 1 be the height of the first person selected, X 2 the height of the second,and so on. We assume that the sampling is done in such a way that we can treatthese random variables as independent. Each X k has mean µ and variance σ

2. Let

X̄ = 1

n

n

k=1

X k.

Now X̄ is called the sample mean; it is the arithmetic mean of the heights of theindividuals in the sample. Now,

E [X k] = µ

for each k, k = 1, . . . , n. Thus,

E

X̄

= E

1

n

nk=1

X k

= 1n

nk=1

E [X k]

= 1

n · n · µ

= µ

Statisticians would summarize this calculation by saying that X̄ is an unbiased es-timator of the population mean µ. This is one of the (many) reasons that we woulduse the observed value of X̄ as our estimate of the population mean µ.

7


8/11

We also have

Var X̄ = Var1

n

n

k=1

X k=

1

n2Var

nk=1

X k

= 1

n2

nk=1

X kVar (X k)

= 1

n2nσ2

= σ2

n

where σ2 is the population variance Note that the variance of the sample meandecreases as the sample size n increases.

To estimate the population variance, σ2, we would use the sample variance:

S 2 = 1

n − 1

nk=1

X k − X̄

2We show that

E

S 2

= σ2.

Now

(n − 1)S 2 =nk=1

X k − X̄

2

=nk=1

X k − µ + µ − X̄

2

=nk=1

(X k − µ) +

µ − X̄

2

=n

k=1

(X k − µ)2 − 2(X k − µ)( X̄ − µ) + ( X̄ − µ)2=

nk=1

(X k − µ)2 − 2( X̄ − µ)

nk=1

(X k − µ) + n( X̄ − µ)2

8


9/11

Since

n

k=1

(X k − µ) =n

k=1

X k − nµ

= n X̄ − nµ,

we have

(n − 1)S 2 =nk=1

(X k − µ)2 − 2n( X̄ − µ)2 + n( X̄ − µ)2

=nk=1

(X k − µ)2 − n( X̄ − µ)2

Therefore,

(n − 1)E

S 2

=nk=1

E

(X k − µ)2− nE

X̄ − µ

2= nσ2 − nVar

X̄

= nσ2 − nσ2

n

= (n − 1)σ2

Thus the sample variance S 2 is an unbiased estimator of the population variance σ2.

4 Correlation

Let X and Y be random variables defined on the same probability space, and assumethat the expectations and variances of X and Y exist.

Definition. The correlation of X and Y is denoted ρ(X, Y ) and is defined by

ρ(X, Y )(X, Y ) = Cov(X, Y )

Var(X )Var(Y )provided that Var(X ) = 0 = Var(Y ).

9


10/11

We remarked earlier that we can think of the covariance as an inner product. Thisanalogy works most precisely if we think of

Cov(X, Y ) = E [(X − µX ) (Y − µY )]

as the inner product of the inner product of the random variables X − µX andY −µY , both of which have mean 0. Then the variance is anologous to the square of the length of the vector, and the correlation is analogous to the cosine of the anglebetween the two vectors.

If we have vectors x and y with x = 0, then there is a scalar a and there is a vectorz such that

y = ax + z

z · x = 0

Now |ax| = | cos θ||y| and |z | = sin θ| y|, where θ is the angle between x and y.

Thus cos θ is a measure of the strength of the component of y in the direction of x.

Returning to the random variables X and Y , and assuming that Var(X ) = 0, we havethat x corresponds to X − µX and y corresponds to Y − µY . Also cos θ correspondsto ρ(X, Y ). A multiple of X − µX has the form

a (X − µX ) = aX − aµX

= aX + b

of a linear function of X . Thus ρ(X, Y ) measures the extent to which Y is a linearfunction of X . (Since ρ(X, Y ) = ρ(Y, X ), this is also the extent to which X is alinear function of y.)

Terminology. If ρ(X, Y ) > 0 , we say that X and Y are positively correlated. If ρ(X, Y ) < 0 , we say that X and Y are negatively correlated. If ρ(X, Y ) = 0 , wesay that X and Y are uncorrelated.

Remark. ρ(X, Y ) and Cov(X, Y ) have the same sign.

Example 5. Let A be an event with P (A) > 0. Let I A be the indicator of A; thatis, I A = 1 if A occurs, and I A = 0 otherwise. Similarly let B be an event (in the

same probability space) with P (B) > 0, and let I B be the indicator of B. We shallshow that

Cov(I A, I B) = P (AB) − P (A)P (B).

10


11/11

Solution.

E [I A] = P (A)

E [I B] = P (B)E [I AB] = P (AB)

Thus

Cov (I A, I B) = E [I AI B] − E [I A] E [I B]

= P (AB) − P (A)P (B)

Remark. Now P (AB) = P (A|B)P (B), so

Cov(I AI B) = P (A|B)P (B) − P (A)P (B)

= (P (A|B) − P (A)) P (B)

Thus I A and I B are positively correlated if and only if P (A|B) > P (A); negativelycorrelated if and only if P (A|B) < P (A); and uncorrelated if and only if P (A|B) =P (A), i.e, if and only if A and B are independent.

Note that the argument above is symmetric in A and B .

Example 6. Let’s specialize the example above. Consider rolling a fair die twice.Let A be the event that the resulting sum is even and let B be the event that thesum is a perfect square, i.e., 4 or 9.

Cov (I A, I B) = E [I AI B] − E [I A] E [I B]

= 1

12 −

1

2

7

36

= − 1

72

Thus I A and I B are negatively correlated. This reflects the fact that if the sum is aperfect square, it is more likely to be 9 than 4.

11

Sum of Variances

Documents

Transcript of Sum of Variances