Sum of Variances
Transcript of Sum of Variances
-
8/19/2019 Sum of Variances
1/11
Lecture 30: Covariance, Variance of Sums
Friday, April 20
1 Expectation of Products of Independent Ran-
dom Variables
We shall assume (unless otherwise mentioned) that the random variables we dealwith have expectations. We shall use the notation
µX = E [X ],
where X is a random variable.
Proposition. Let X and Y be independent random variables on the same probability
space. Let g and h be functions of one variable such that the random variables g(X )and h(Y ) are defined. Then
E [g(X )h(Y )] = E [g(X )] E [h(Y )] .
provided that either both X and Y are both discrete or that both X and Y have
densities.
Proof. We first assume that X and Y are both discrete. Let pX and pY be their
probability mass functions. Since they are independent, their joint probability massfunction is given by p(x, y) = pX (x) pY (y). Thus
E [g(X )h(Y )] =y
x
g(x)h(y) p(x, y)
=y
x
g(x)h(y) pX (x) pY (y)
=y
h(y) pY (y)x
g(x) pX (x)
= y
h(y) pY (y)E [g(X )]
= E [g(X )] E [h(Y )]
1
-
8/19/2019 Sum of Variances
2/11
Next we assume that X and Y both have densities. Let f X and f Y be their densities.Since they are independent, their joint density is given by f (x, y) = f X (x)f Y (y).Thus
E [g(X )h(Y )] =∞
−∞
∞ −∞
g(x)h(y)f (x, y) dxdy
=
∞ −∞
∞ −∞
g(x)h(y)f X (x)f Y (y) dxdy
=
∞ −∞
h(y)f Y (y)
∞ −∞
g(x)f X (x) dx
dy
= E [g(X )]
∞ −∞
h(y)f Y (y) dy
= E [g(X )] E [h(Y )]
Corollary. If X and Y are independent random variables with expectations, then
E [XY ] = E [X ] E [Y ] .
2
-
8/19/2019 Sum of Variances
3/11
2 Covariance
Definition. Let X and Y be random variables defined on the same probability space.
Assume that both X and Y have expectations and variances. The covariance of X and Y is defined by
Cov (X, Y ) = E [(X − µX ) (Y − µY )] .
Proposition (Some Properties of Covariance). Let X , Y , X j, j = 1, . . . m, and Y k,k = 1 . . . n, be random variables with expectations and variances, and assume that they are all defined on the same probability space.
1. Cov(X, Y ) = E [XY ] − E [X ]E [Y ].
2. If X and Y are independent. then Cov(X, Y ) = 0.
3. Cov(X, Y ) = Cov(Y, X ).
4. Cov(aX,Y ) = aCov(X, Y ). for a ∈ R.
5. Cov(X 1 + X 2, Y ) = Cov(X 1, Y ) + Cov(X 2, Y ).
6. Cov(m j=1
X j,
nk=1
Y k) =m j=1
nk=1
Cov(X j , Y k) .
Proof of 1.
Cov(X, Y ) = E [(X − µX ) (Y − µY )]
= E [XY − µX Y − µY X + µX µY ]
= E [XY ] − µX E [Y ] − µX E [Y ] + µX µY
= E [XY ] − E [X ] E [Y ]
Proof of 2. Assertion 2 follows from 1 because E [XY ] = E [X ]E [Y ] for independentrandom variables X and Y .
Remark. The converse is not true, as the following example shows.
3
-
8/19/2019 Sum of Variances
4/11
Example 1. Let X be a random variable which satisfies
P {X = −1} = P {X = 0} = P {X = 1} = 1
3,
and let Y = 1 − X 2. Then E [X ] = 0 and X Y = 0. Thus,
Cov(X, Y ) = E [XY ] − E [X ]E [Y ] = 0
but X and Y are not independent. Indeed, Y is a function of X .
Proof of 3. Assertion 3 follows immediately from the definition of covariance.
Proof of 4.
Cov(aX,Y ) = E [aXY ] − E [aX ]E [Y ]= aE [Y ]
= aCov(X, Y )
Proof of 5.
Cov(X 1 + X 2, Y ) = E [(X 1 + X 2) Y ] − E [X 1 + X 2]E [Y ]
= E [X 1Y + X 2Y ] − (E [X 1] + E [X 2]) E [Y ]
= E [X 1Y ] + E [X 2Y ] − E [X 1] E [Y ] − E [X 2] E [Y ]= Cov(X 1, Y ) + Cov(X 2, Y )
4
-
8/19/2019 Sum of Variances
5/11
Proof of 6. It follows from 5 , by induction on m, that for any m ∈ N
Cov m
j=1
X j, Y =m
j=1
Cov(X j , Y )
Using this and 1, we see that for any n ∈ N
Cov
X,
nk=1
Y k
= Cov
nk=1
Y k, X
=nk=1
Cov(Y k, X )
=n
k=1Cov(X, Y k)
Thus,
Cov
m j=1
X j,
nk=1
Y k
=
m j=1
Cov
X j,
nk=1
Y k
=m j=1
nk=1
Cov(X j, Y k)
3 Variance of a Sum
Proposition. Let X 1, . . . , X n be rnadom variables defined on the same probability
space and al l having expectations and variances. Then
Var
nk=1
X k
=
nk=1
Var (X k) + 2 j
-
8/19/2019 Sum of Variances
6/11
Proof.
Var n
k=1
X k = Cov n
j=1
X j ,
n
k=1
X k=
n j=1
nk=1
Cov (X j, X k)
= j=k
Cov(X j , X k) + j=k
Cov(X j, X k)
Each pair, α, β , of indices with α = β occurs twice in the sum: once as (α, β ) andonce as (β, α). Since the terms Cov (X α, X β ) and Cov (X β , X α) are equal, we have
Var n
k=1
X k =n
k=1
Var (X k) + 2 j
-
8/19/2019 Sum of Variances
7/11
Example 4. Suppose we want to estimate the mean height of ND undergraduatemales, that is, the arithmetic mean µ of the heights of all the individuals in thispopulation of N ≈ 4000. The parameter µ is called the population mean. We are
also interested in estimating the population variance
σ
2
, that is the arithmeticmean of the squares of the deviations of the individual heights from µ.
We select a random sample of size n of them. (Here random means that each of theN
n
possible samples of size n is equally likely.) We measure and record the heights
of the individuals in the sample.
Now let X 1 be the height of the first person selected, X 2 the height of the second,and so on. We assume that the sampling is done in such a way that we can treatthese random variables as independent. Each X k has mean µ and variance σ
2. Let
X̄ = 1
n
n
k=1
X k.
Now X̄ is called the sample mean; it is the arithmetic mean of the heights of theindividuals in the sample. Now,
E [X k] = µ
for each k, k = 1, . . . , n. Thus,
E
X̄
= E
1
n
nk=1
X k
= 1n
nk=1
E [X k]
= 1
n · n · µ
= µ
Statisticians would summarize this calculation by saying that X̄ is an unbiased es-timator of the population mean µ. This is one of the (many) reasons that we woulduse the observed value of X̄ as our estimate of the population mean µ.
7
-
8/19/2019 Sum of Variances
8/11
We also have
Var X̄ = Var1
n
n
k=1
X k=
1
n2Var
nk=1
X k
= 1
n2
nk=1
X kVar (X k)
= 1
n2nσ2
= σ2
n
where σ2 is the population variance Note that the variance of the sample meandecreases as the sample size n increases.
To estimate the population variance, σ2, we would use the sample variance:
S 2 = 1
n − 1
nk=1
X k − X̄
2We show that
E
S 2
= σ2.
Now
(n − 1)S 2 =nk=1
X k − X̄
2
=nk=1
X k − µ + µ − X̄
2
=nk=1
(X k − µ) +
µ − X̄
2
=n
k=1
(X k − µ)2 − 2(X k − µ)( X̄ − µ) + ( X̄ − µ)2=
nk=1
(X k − µ)2 − 2( X̄ − µ)
nk=1
(X k − µ) + n( X̄ − µ)2
8
-
8/19/2019 Sum of Variances
9/11
Since
n
k=1
(X k − µ) =n
k=1
X k − nµ
= n X̄ − nµ,
we have
(n − 1)S 2 =nk=1
(X k − µ)2 − 2n( X̄ − µ)2 + n( X̄ − µ)2
=nk=1
(X k − µ)2 − n( X̄ − µ)2
Therefore,
(n − 1)E
S 2
=nk=1
E
(X k − µ)2− nE
X̄ − µ
2= nσ2 − nVar
X̄
= nσ2 − nσ2
n
= (n − 1)σ2
Thus the sample variance S 2 is an unbiased estimator of the population variance σ2.
4 Correlation
Let X and Y be random variables defined on the same probability space, and assumethat the expectations and variances of X and Y exist.
Definition. The correlation of X and Y is denoted ρ(X, Y ) and is defined by
ρ(X, Y )(X, Y ) = Cov(X, Y )
Var(X )Var(Y )provided that Var(X ) = 0 = Var(Y ).
9
-
8/19/2019 Sum of Variances
10/11
We remarked earlier that we can think of the covariance as an inner product. Thisanalogy works most precisely if we think of
Cov(X, Y ) = E [(X − µX ) (Y − µY )]
as the inner product of the inner product of the random variables X − µX andY −µY , both of which have mean 0. Then the variance is anologous to the square of the length of the vector, and the correlation is analogous to the cosine of the anglebetween the two vectors.
If we have vectors x and y with x = 0, then there is a scalar a and there is a vectorz such that
y = ax + z
z · x = 0
Now |ax| = | cos θ||y| and |z | = sin θ| y|, where θ is the angle between x and y.
Thus cos θ is a measure of the strength of the component of y in the direction of x.
Returning to the random variables X and Y , and assuming that Var(X ) = 0, we havethat x corresponds to X − µX and y corresponds to Y − µY . Also cos θ correspondsto ρ(X, Y ). A multiple of X − µX has the form
a (X − µX ) = aX − aµX
= aX + b
of a linear function of X . Thus ρ(X, Y ) measures the extent to which Y is a linearfunction of X . (Since ρ(X, Y ) = ρ(Y, X ), this is also the extent to which X is alinear function of y.)
Terminology. If ρ(X, Y ) > 0 , we say that X and Y are positively correlated. If ρ(X, Y ) < 0 , we say that X and Y are negatively correlated. If ρ(X, Y ) = 0 , wesay that X and Y are uncorrelated.
Remark. ρ(X, Y ) and Cov(X, Y ) have the same sign.
Example 5. Let A be an event with P (A) > 0. Let I A be the indicator of A; thatis, I A = 1 if A occurs, and I A = 0 otherwise. Similarly let B be an event (in the
same probability space) with P (B) > 0, and let I B be the indicator of B. We shallshow that
Cov(I A, I B) = P (AB) − P (A)P (B).
10
-
8/19/2019 Sum of Variances
11/11
Solution.
E [I A] = P (A)
E [I B] = P (B)E [I AB] = P (AB)
Thus
Cov (I A, I B) = E [I AI B] − E [I A] E [I B]
= P (AB) − P (A)P (B)
Remark. Now P (AB) = P (A|B)P (B), so
Cov(I AI B) = P (A|B)P (B) − P (A)P (B)
= (P (A|B) − P (A)) P (B)
Thus I A and I B are positively correlated if and only if P (A|B) > P (A); negativelycorrelated if and only if P (A|B) < P (A); and uncorrelated if and only if P (A|B) =P (A), i.e, if and only if A and B are independent.
Note that the argument above is symmetric in A and B .
Example 6. Let’s specialize the example above. Consider rolling a fair die twice.Let A be the event that the resulting sum is even and let B be the event that thesum is a perfect square, i.e., 4 or 9.
Cov (I A, I B) = E [I AI B] − E [I A] E [I B]
= 1
12 −
1
2
7
36
= − 1
72
Thus I A and I B are negatively correlated. This reflects the fact that if the sum is aperfect square, it is more likely to be 9 than 4.
11