Sum of Variances

download Sum of Variances

of 6

Transcript of Sum of Variances

  • 8/19/2019 Sum of Variances

    1/11

    Lecture 30: Covariance, Variance of Sums

    Friday, April 20

    1 Expectation of Products of Independent Ran-

    dom Variables

    We shall assume (unless otherwise mentioned) that the random variables we dealwith have expectations. We shall use the notation

    µX  = E [X ],

    where X  is a random variable.

    Proposition.  Let  X  and  Y  be independent random variables on the same probability 

    space. Let  g   and  h  be functions of one variable such that the random variables  g(X )and  h(Y  )  are defined. Then 

    E  [g(X )h(Y  )] = E  [g(X )] E  [h(Y  )] .

    provided that either both   X   and   Y    are both discrete or that both   X   and   Y    have 

    densities.

    Proof.   We first assume that   X   and   Y   are both discrete. Let   pX   and   pY    be their

    probability mass functions. Since they are independent, their joint probability massfunction is given by  p(x, y) = pX (x) pY   (y). Thus

    E  [g(X )h(Y  )] =y

    x

    g(x)h(y) p(x, y)

    =y

    x

    g(x)h(y) pX (x) pY   (y)

    =y

    h(y) pY   (y)x

    g(x) pX (x)

    = y

    h(y) pY   (y)E  [g(X )]

    =   E  [g(X )] E  [h(Y  )]

    1

  • 8/19/2019 Sum of Variances

    2/11

    Next we assume that  X  and Y  both have densities. Let f X  and f Y   be their densities.Since they are independent, their joint density is given by   f (x, y) =   f X (x)f Y   (y).Thus

    E  [g(X )h(Y  )] =∞ 

    −∞

    ∞ −∞

    g(x)h(y)f (x, y) dxdy

    =

    ∞ −∞

    ∞ −∞

    g(x)h(y)f X (x)f Y   (y) dxdy

    =

    ∞ −∞

    h(y)f Y   (y)

    ∞ −∞

    g(x)f X (x) dx

     dy

    =   E  [g(X )]

    ∞ −∞

    h(y)f Y   (y) dy

    =   E  [g(X )] E  [h(Y  )]

    Corollary.   If  X   and  Y   are independent random variables with expectations, then 

    E  [XY  ] = E  [X ] E  [Y  ] .

    2

  • 8/19/2019 Sum of Variances

    3/11

    2 Covariance

    Definition.  Let X  and  Y  be random variables defined on the same probability space.

    Assume that both X  and Y  have expectations and variances. The  covariance of  X and  Y   is defined by

    Cov (X, Y  ) = E  [(X  − µX ) (Y  − µY   )] .

    Proposition  (Some Properties of Covariance).   Let  X ,  Y  ,  X  j,  j = 1, . . . m, and  Y k,k   = 1 . . . n, be random variables with expectations and variances, and assume that they are all defined on the same probability space.

    1.   Cov(X, Y  ) = E [XY  ] − E [X ]E [Y  ].

    2. If  X   and  Y   are independent. then  Cov(X, Y  ) = 0.

    3.   Cov(X, Y  ) = Cov(Y, X ).

    4.   Cov(aX,Y  ) = aCov(X, Y  ).   for  a ∈  R.

    5.   Cov(X 1 + X 2, Y  ) = Cov(X 1, Y  ) + Cov(X 2, Y  ).

    6.   Cov(m j=1

    X  j,

    nk=1

    Y k) =m j=1

    nk=1

    Cov(X  j , Y k) .

    Proof of 1.

    Cov(X, Y  ) =   E  [(X  −  µX ) (Y  − µY   )]

    =   E  [XY  − µX Y  − µY  X  +  µX µY   ]

    =   E  [XY  ] − µX E  [Y  ] − µX E  [Y  ] + µX µY  

    =   E  [XY  ] − E  [X ] E  [Y  ]

    Proof of 2.   Assertion 2  follows from 1   because E [XY  ] = E [X ]E [Y  ] for independentrandom variables  X   and  Y  .

    Remark.   The converse is not true, as the following example shows.

    3

  • 8/19/2019 Sum of Variances

    4/11

    Example 1.   Let  X  be a random variable which satisfies

    P {X  = −1} =  P {X  = 0} =  P {X  = 1} = 1

    3,

    and let  Y   = 1 − X 2. Then  E [X ] = 0 and  X Y   = 0. Thus,

    Cov(X, Y  ) = E [XY  ] − E [X ]E [Y  ] = 0

    but  X  and  Y  are not independent. Indeed,  Y   is a function of  X .

    Proof of 3.   Assertion  3   follows immediately from the definition of covariance.

    Proof of 4.

    Cov(aX,Y  ) =   E  [aXY  ] − E [aX ]E [Y  ]=   aE  [Y  ]

    =   aCov(X, Y  )

    Proof of 5.

    Cov(X 1 + X 2, Y  ) =   E  [(X 1 + X 2) Y  ] − E [X 1 + X 2]E [Y  ]

    =   E  [X 1Y   + X 2Y  ] − (E  [X 1] + E  [X 2]) E [Y  ]

    =   E  [X 1Y  ] + E  [X 2Y  ] − E  [X 1] E [Y  ] − E  [X 2] E [Y  ]= Cov(X 1, Y  ) + Cov(X 2, Y  )

    4

  • 8/19/2019 Sum of Variances

    5/11

    Proof of 6.  It follows from  5 , by induction on  m, that for any  m  ∈  N

    Cov  m

     j=1

    X  j, Y  =m

     j=1

    Cov(X  j , Y  )

    Using this and  1, we see that for any  n  ∈  N

    Cov

    X,

    nk=1

    Y k

      = Cov

      nk=1

    Y k, X 

    =nk=1

    Cov(Y k, X )

    =n

    k=1Cov(X, Y k)

    Thus,

    Cov

      m j=1

    X  j,

    nk=1

    Y k

      =

    m j=1

    Cov

    X  j,

    nk=1

    Y k

    =m j=1

    nk=1

    Cov(X  j, Y k)

    3 Variance of a Sum

    Proposition.   Let   X 1, . . . , X  n   be rnadom variables defined on the same probability 

    space and al l having expectations and variances. Then 

    Var

      nk=1

    X k

     =

    nk=1

    Var (X k) + 2 j

  • 8/19/2019 Sum of Variances

    6/11

    Proof.

    Var  n

    k=1

    X k   = Cov  n

     j=1

    X  j ,

    n

    k=1

    X k=

    n j=1

    nk=1

    Cov (X  j, X k)

    = j=k

    Cov(X  j , X k) + j=k

    Cov(X  j, X k)

    Each pair,   α,  β , of indices with  α  =  β  occurs twice in the sum: once as (α, β ) andonce as (β, α). Since the terms Cov (X α, X β ) and Cov (X β , X α) are equal, we have

    Var  n

    k=1

    X k =n

    k=1

    Var (X k) + 2 j

  • 8/19/2019 Sum of Variances

    7/11

    Example 4.   Suppose we want to estimate the mean height of ND undergraduatemales, that is, the arithmetic mean   µ   of the heights of all the individuals in thispopulation of  N  ≈ 4000. The parameter  µ  is called the  population mean.  We are

    also interested in estimating the population variance

      σ

    2

    , that is the arithmeticmean of the squares of the deviations of the individual heights from  µ.

    We select a random sample of size  n  of them. (Here random means that each of theN 

    n

     possible samples of size  n  is equally likely.) We measure and record the heights

    of the individuals in the sample.

    Now let   X 1  be the height of the first person selected,   X 2  the height of the second,and so on. We assume that the sampling is done in such a way that we can treatthese random variables as independent. Each  X k  has mean  µ  and variance  σ

    2. Let

    X̄  =  1

    n

    n

    k=1

    X k.

    Now  X̄   is called the   sample mean; it is the arithmetic mean of the heights of theindividuals in the sample. Now,

    E  [X k] = µ

    for each  k,  k  = 1, . . . , n. Thus,

     X̄ 

      =   E 

    1

    n

    nk=1

    X k

    =   1n

    nk=1

    E  [X k]

    =  1

    n ·  n · µ

    =   µ

    Statisticians would summarize this calculation by saying that  X̄   is an unbiased es-timator of the population mean  µ. This is one of the (many) reasons that we woulduse the observed value of  X̄  as our estimate of the population mean  µ.

    7

  • 8/19/2019 Sum of Variances

    8/11

    We also have

    Var  X̄    = Var1

    n

    n

    k=1

    X k=

      1

    n2Var

      nk=1

    X k

    =  1

    n2

    nk=1

    X kVar (X k)

    =  1

    n2nσ2

    =  σ2

    n

    where   σ2 is the   population variance Note that the variance of the sample meandecreases as the sample size  n increases.

    To estimate the population variance,  σ2, we would use the   sample variance:

    S 2 =  1

    n − 1

    nk=1

    X k −  X̄ 

    2We show that

    S 2

     =  σ2.

    Now

    (n − 1)S 2 =nk=1

    X k −  X̄ 

    2

    =nk=1

    X k − µ + µ −  X̄ 

    2

    =nk=1

    (X k − µ) +

    µ −  X̄ 

    2

    =n

    k=1

    (X k − µ)2 − 2(X k − µ)( X̄  − µ) + ( X̄  − µ)2=

    nk=1

    (X k − µ)2 − 2( X̄  − µ)

    nk=1

    (X k − µ) + n( X̄  − µ)2

    8

  • 8/19/2019 Sum of Variances

    9/11

    Since

    n

    k=1

    (X k − µ) =n

    k=1

    X k − nµ

    =   n X̄  −  nµ,

    we have

    (n − 1)S 2 =nk=1

    (X k − µ)2 − 2n( X̄  −  µ)2 + n( X̄  − µ)2

    =nk=1

    (X k − µ)2 − n( X̄  −  µ)2

    Therefore,

    (n − 1)E 

    S 2

      =nk=1

    (X k − µ)2− nE 

     X̄  − µ

    2=   nσ2 − nVar

     X̄ 

    =   nσ2 − nσ2

    n

    = (n − 1)σ2

    Thus the sample variance  S 2 is an unbiased estimator of the population variance  σ2.

    4 Correlation

    Let X  and Y  be random variables defined on the same probability space, and assumethat the expectations and variances of  X   and  Y   exist.

    Definition.   The  correlation  of  X   and  Y   is denoted  ρ(X, Y  ) and is defined by

    ρ(X, Y  )(X, Y  ) =  Cov(X, Y  )

     Var(X )Var(Y  )provided that Var(X ) = 0  = Var(Y  ).

    9

  • 8/19/2019 Sum of Variances

    10/11

    We remarked earlier that we can think of the covariance as an inner product. Thisanalogy works most precisely if we think of 

    Cov(X, Y  ) = E  [(X  − µX ) (Y  − µY   )]

    as the inner product of the inner product of the random variables   X  −  µX   andY  −µY   , both of which have mean 0. Then the variance is anologous to the square of the length of the vector, and the correlation is analogous to the cosine of the anglebetween the two vectors.

    If we have vectors x and  y  with  x =  0, then there is a scalar  a  and there is a vectorz  such that

    y   =   ax + z 

    z  · x   = 0

    Now |ax| =  | cos θ||y| and  |z | = sin θ| y|, where  θ  is the angle between  x  and  y.

    Thus cos θ  is a measure of the strength of the component of  y  in the direction of  x.

    Returning to the random variables X  and  Y  , and assuming that Var(X ) = 0, we havethat x corresponds to  X  − µX   and y  corresponds to  Y  − µY   . Also cos θ  correspondsto  ρ(X, Y  ). A multiple of  X  −  µX  has the form

    a (X  − µX ) =   aX  − aµX 

    =   aX  +  b

    of a linear function of  X . Thus  ρ(X, Y  ) measures the extent to which  Y   is a linearfunction of   X . (Since   ρ(X, Y  ) =   ρ(Y, X ), this is also the extent to which   X   is alinear function of  y.)

    Terminology.   If  ρ(X, Y  )  > 0 , we say that  X   and  Y   are  positively correlated.   If ρ(X, Y  )  <  0 , we say that   X   and   Y   are   negatively correlated.   If   ρ(X, Y  ) = 0 , wesay that  X  and  Y   are  uncorrelated.

    Remark.   ρ(X, Y  ) and Cov(X, Y  ) have the same sign.

    Example 5.   Let  A  be an event with  P (A)  >  0. Let  I A  be the indicator of  A; thatis,   I A  = 1 if   A  occurs, and   I A  = 0 otherwise. Similarly let   B  be an event (in the

    same probability space) with  P (B)  >  0, and let  I B  be the indicator of  B. We shallshow that

    Cov(I A, I B) = P (AB) − P (A)P (B).

    10

  • 8/19/2019 Sum of Variances

    11/11

    Solution.

    E  [I A] =   P (A)

    E  [I B] =   P (B)E  [I AB] =   P (AB)

    Thus

    Cov (I A, I B) =   E  [I AI B] − E  [I A] E  [I B]

    =   P (AB) − P (A)P (B)

    Remark.   Now P (AB) = P (A|B)P (B), so

    Cov(I AI B) =   P (A|B)P (B) − P (A)P (B)

    = (P (A|B) − P (A)) P (B)

    Thus  I A   and  I B  are positively correlated if and only if  P (A|B)  > P (A); negativelycorrelated if and only if  P (A|B) < P (A); and uncorrelated if and only if  P (A|B) =P (A), i.e, if and only if  A  and  B  are independent.

    Note that the argument above is symmetric in  A  and  B .

    Example 6.   Let’s specialize the example above. Consider rolling a fair die twice.Let  A  be the event that the resulting sum is even and let  B  be the event that thesum is a perfect square, i.e., 4 or 9.

    Cov (I A, I B) =   E  [I AI B] − E  [I A] E  [I B]

    =  1

    12 −

     1

    2

    7

    36

    =   − 1

    72

    Thus  I A  and  I B  are negatively correlated. This reflects the fact that if the sum is aperfect square, it is more likely to be 9 than 4.

    11