Charlie Gibbons Political Science 236 September 22,...

42
Random Variables: Expectations and Variances Charlie Gibbons Political Science 236 September 22, 2008

Transcript of Charlie Gibbons Political Science 236 September 22,...

  • Random Variables: Expectations and Variances

    Charlie GibbonsPolitical Science 236

    September 22, 2008

  • Outline

    1 ExpectationsOf Random VariablesOf Functions of Random VariablesPropertiesConditional Expectations

    2 DispersionVarianceConditional VarianceCovariance and Correlation

  • Expectations of Random Variables

    The expectation, E(X) ≡ µ of a random variable X is simplythe weighted average of the possible realizations, weighted bytheir probability.

  • Expectations of Random Variables

    The expectation, E(X) ≡ µ of a random variable X is simplythe weighted average of the possible realizations, weighted bytheir probability.

    For discrete random variables, this can be written as

    E(X) =∑x∈X

    Pr(X = x)x =∑x∈X

    f(x)x.

  • Expectations of Random Variables

    The expectation, E(X) ≡ µ of a random variable X is simplythe weighted average of the possible realizations, weighted bytheir probability.

    For discrete random variables, this can be written as

    E(X) =∑x∈X

    Pr(X = x)x =∑x∈X

    f(x)x.

    For continuous random variables:

    E(X) =∫ ∞−∞

    xf(x) dx =∫ ∞−∞

    x dF

  • Expectations of Functions of Random Variables

    The definition of expectation can be generalized for functions ofrandom variables, g(x).

  • Expectations of Functions of Random Variables

    The definition of expectation can be generalized for functions ofrandom variables, g(x).

    We have the equations

    E[g(x)] =∑x∈X

    f(x)g(x)

    =∫ ∞−∞

    g(x) dF

    for discrete and continuous random variables respectively.

  • Properties of Expectations

    Expectations are linear operators, i.e.,

    E(a · g(X) + b · h(X) + c) = a · E[g(X)] + b · E[h(X)] + c.

  • Properties of Expectations

    Expectations are linear operators, i.e.,

    E(a · g(X) + b · h(X) + c) = a · E[g(X)] + b · E[h(X)] + c.

    Note that, in general, E[g(x)] 6= g[E(X)].

    Jensen’s inequality states that, for a convex function g(x),E[g(x)] ≥ g[E(X)]. For concave functions, the inequality isreversed.

  • Properties of Expectations

    Expectations preserve monotonicity; for g(x) ≥ h(x) ∀x ∈ X,E[g(X)] ≥ E[h(X)].

    As a special case, let h(x) = 0. If g(x) ≥ 0, then E[g(x)] ≥ 0.

  • Properties of Expectations

    Expectations preserve monotonicity; for g(x) ≥ h(x) ∀x ∈ X,E[g(X)] ≥ E[h(X)].

    As a special case, let h(x) = 0. If g(x) ≥ 0, then E[g(x)] ≥ 0.

    Expectations also preserve equality; for g(x) = h(x) ∀x ∈ X,E[g(X)] = E[h(X)].

  • Conditional Expectations

    The expectation of a random variable Y conditional on or givenX is defined analogously to the preceding formulations, butuses the conditional distribution fY |X(y|X = x), rather thanthe unconditional fY (y).Conditional distributions are about changing thepopulation that you are considering.

  • Conditional Expectations

    Conditional distributions are about changing thepopulation that you are considering.

    For discrete random variables:

    E(Y |X = x) =∑y∈Y

    Pr(Y = y|X = x)y =∑y∈Y

    fY |X(y)y.

  • Conditional Expectations

    Conditional distributions are about changing thepopulation that you are considering.

    For discrete random variables:

    E(Y |X = x) =∑y∈Y

    Pr(Y = y|X = x)y =∑y∈Y

    fY |X(y)y.

    For continuous random variables:

    E(Y |X = x) =∫ ∞−∞

    yfY |X(y|X = x) dy =∫ ∞−∞

    y dFY |X(y|X = X)

  • Conditional Expectations

    Recall from the previous set of slides that, for independentrandom variables X and Y ,

    fY |X(y|X = x) = fY (y) and fX|Y (x|Y = y) = fX(x)

  • Conditional Expectations

    Recall from the previous set of slides that, for independentrandom variables X and Y ,

    fY |X(y|X = x) = fY (y) and fX|Y (x|Y = y) = fX(x)

    Hence,E(Y |X) = E(Y ) and E(X|Y ) = E(X)

    This will be a heavily-used result later in the course.

  • Conditional Expectations

    Let’s pause and consider what is random here.

  • Conditional Expectations

    Let’s pause and consider what is random here.

    First, there is one conditional distribution of Y given X = x,but a family of distributions of Y given X, since X takes manyvalues and each value imparts its own conditional distributionfor Y .

  • Conditional Expectations

    Let’s pause and consider what is random here.

    First, there is one conditional distribution of Y given X = x,but a family of distributions of Y given X, since X takes manyvalues and each value imparts its own conditional distributionfor Y .

    Y |X = x derives its randomness solely from the fact that Y israndom—X is fixed at x here.

  • Conditional Expectations

    Let’s pause and consider what is random here.

    First, there is one conditional distribution of Y given X = x,but a family of distributions of Y given X, since X takes manyvalues and each value imparts its own conditional distributionfor Y .

    Y |X = x derives its randomness solely from the fact that Y israndom—X is fixed at x here.

    E(Y |X = x) is not random, since X is fixed and we areintegrating over all possible values of Y .

  • Conditional Expectations

    Let’s pause and consider what is random here.

    First, there is one conditional distribution of Y given X = x,but a family of distributions of Y given X, since X takes manyvalues and each value imparts its own conditional distributionfor Y .

    Y |X = x derives its randomness solely from the fact that Y israndom—X is fixed at x here.

    E(Y |X = x) is not random, since X is fixed and we areintegrating over all possible values of Y .

    E(Y |X) is random, since X is no longer fixed (i.e., it israndom), though Y is integrated over.

  • Conditional Expectations

    The unconditional expected value of a function of X and Y ,g(x, y) can be written as:

    E[g(X,Y )] =∑x∈X

    ∑y∈Y

    g(x, y)fX,Y (x, y)

    =∫ ∞−∞

    ∫ ∞−∞

    g(x, y)fX,Y (x, y) dy dx

  • Conditional Expectations

    The following holds for the conditional expectation of Y givenX:

    E[g(X)h(Y )|X] = g(X)E[h(Y )|X]

    Question: Why does this hold when X is a randomvariable?

  • Conditional Expectations

    The law of iterated expectations states that

    EY (Y ) = EX [E(Y |X)],

    which can be shown by

    EY (Y ) =∫ ∫

    yfX,Y (x, y) dx dy

    =∫ ∫

    yfY |X(y|x)fX(x) dx dy

    =∫ [

    yfY |X(y|x) dy]fX(x) dx

    =∫

    E(Y |X)fX(x) dx

    = EX [E(Y |X)]

  • Conditional Expectations

    Note that the LIE only holds when the necessary marginalmoments exist.

  • Variance

    The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofX:

    Var(X) = E[(X − µ)2

    ]

  • Variance

    The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofX:

    Var(X) = E[(X − µ)2

    ]Multiplying this out yields:

    = E(X2 − 2µX + µ2

    )= E

    (X2)− 2µE(x) + µ2

    = E(X2)− [E(X)]2

  • Variance

    The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofX:

    Var(X) = E[(X − µ)2

    ]Multiplying this out yields:

    = E(X2 − 2µX + µ2

    )= E

    (X2)− 2µE(x) + µ2

    = E(X2)− [E(X)]2

    The standard deviation, σ, of a random variable is the squareroot of its variance; i.e., σ =

    √Var(X).

  • Variance

    The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofX:

    Var(X) = E[(X − µ)2

    ]Multiplying this out yields:

    = E(X2 − 2µX + µ2

    )= E

    (X2)− 2µE(x) + µ2

    = E(X2)− [E(X)]2

    The standard deviation, σ, of a random variable is the squareroot of its variance; i.e., σ =

    √Var(X).

    See that Var(aX + b) = a2Var(X).

  • Conditional Variance

    The conditional variance of Y given X is

    E[

    (Y − E(Y |X))2∣∣∣X] = E (Y 2∣∣X)− [E(Y |X)]2

  • Conditional Variance

    The conditional variance of Y given X is

    E[

    (Y − E(Y |X))2∣∣∣X] = E (Y 2∣∣X)− [E(Y |X)]2

    Note that

    Var[g(X)h(Y )|X] = [g(X)]2Var[h(Y )|X]

  • Conditional Variance

    Var(Y ) = EY[(Y − EY (Y ))2

    ]= EY

    [(Y − EY |X(Y |X) + EY |X(Y |X)− EY (Y )

    )2]= EY

    [(Y − EY |X(Y |X)

    )2]+ EY [(EY |X(Y |X)− EY (Y ))2]= EX

    [E[(Y − EY |X(Y |X)

    )2∣∣∣X]]+EX

    [E[(

    EY |X(Y |X)− EY (Y ))2∣∣∣X]]

    = EX [Var(Y |X)] + EX[(

    EY |X(Y |X)− EY (Y ))2]

    = EX [Var(Y |X)] + Var[EY |X(Y |X)

    ]

  • Conditional Variance

    Var(Y ) = EX [Var(Y |X)] + Var[EY |X(Y |X)

    ]So what?

  • Conditional Variance

    Var(Y ) = EX [Var(Y |X)] + Var[EY |X(Y |X)

    ]So what?

    Var(Y ) ≥ Var[EY |X(Y |X)

    ]Conditioning on more X variables yields a smaller variance.

  • Conditional Variance

    Var(Y ) = EX [Var(Y |X)] + Var[EY |X(Y |X)

    ]So what?

    Var(Y ) ≥ Var[EY |X(Y |X)

    ]Conditioning on more X variables yields a smaller variance.

    Intuition: Adding more predictors can only make your guess ofY better. If you add something that isn’t relevant or too noisy,you can just ignore it (in a regression, e.g., you give thatpredictor a 0 coefficient).

    See Casella and Berger, pp. 167–168 for a similar, detailedexposition.

  • Covariance and Correlation

    The covariance of random variables X and Y is defined as

    Cov(X,Y ) ≡ σXY = EX,Y [(X − EX(X)) (Y − EY (Y ))]= E(XY )− µXµY

  • Covariance and Correlation

    The covariance of random variables X and Y is defined as

    Cov(X,Y ) ≡ σXY = EX,Y [(X − EX(X)) (Y − EY (Y ))]= E(XY )− µXµY

    We have

    Var(aX + bY ) = a2Var(X) + b2Var(Y ) + 2abCov(X,Y )

  • Covariance and Correlation

    The covariance of random variables X and Y is defined as

    Cov(X,Y ) ≡ σXY = EX,Y [(X − EX(X)) (Y − EY (Y ))]= E(XY )− µXµY

    We have

    Var(aX + bY ) = a2Var(X) + b2Var(Y ) + 2abCov(X,Y )

    Note that covariance only measures the linear relationshipbetween two random variables.

  • Covariance and Correlation

    The correlation of random variables X and Y is defined as

    ρXY =σXYσXσY

  • Covariance and Correlation

    The correlation of random variables X and Y is defined as

    ρXY =σXYσXσY

    Correlation is a normalized version of covariance—how big isthe correlation relative to the variation in X and Y ? Both willhave the same sign.

  • Covariance and Correlation

    If X and Y are independent, then

    E [g(X)h(Y )] = E [g(X)] E [h(Y )]

  • Covariance and Correlation

    If X and Y are independent, then

    E [g(X)h(Y )] = E [g(X)] E [h(Y )]

    Hence, for independent random variables, the covariance is 0.

    ExpectationsOf Random VariablesOf Functions of Random VariablesPropertiesConditional Expectations

    DispersionVarianceConditional VarianceCovariance and Correlation