Charlie Gibbons Political Science 236 September 22,...

Random Variables: Expectations and Variances

Charlie GibbonsPolitical Science 236

September 22, 2008

Outline

1 ExpectationsOf Random VariablesOf Functions of Random VariablesPropertiesConditional Expectations

2 DispersionVarianceConditional VarianceCovariance and Correlation

Expectations of Random Variables

The expectation, E(X) ≡ µ of a random variable X is simplythe weighted average of the possible realizations, weighted bytheir probability.



For discrete random variables, this can be written as

E(X) =∑x∈X

Pr(X = x)x =∑x∈X

f(x)x.



For discrete random variables, this can be written as

E(X) =∑x∈X

Pr(X = x)x =∑x∈X

f(x)x.

For continuous random variables:

E(X) =∫ ∞−∞

xf(x) dx =∫ ∞−∞

x dF

Expectations of Functions of Random Variables

The definition of expectation can be generalized for functions ofrandom variables, g(x).

Expectations of Functions of Random Variables

The definition of expectation can be generalized for functions ofrandom variables, g(x).

We have the equations

E[g(x)] =∑x∈X

f(x)g(x)

=∫ ∞−∞

g(x) dF

for discrete and continuous random variables respectively.

Properties of Expectations

Expectations are linear operators, i.e.,

E(a · g(X) + b · h(X) + c) = a · E[g(X)] + b · E[h(X)] + c.


Expectations are linear operators, i.e.,

E(a · g(X) + b · h(X) + c) = a · E[g(X)] + b · E[h(X)] + c.

Note that, in general, E[g(x)] 6= g[E(X)].

Jensen’s inequality states that, for a convex function g(x),E[g(x)] ≥ g[E(X)]. For concave functions, the inequality isreversed.


Expectations preserve monotonicity; for g(x) ≥ h(x) ∀x ∈ X,E[g(X)] ≥ E[h(X)].

As a special case, let h(x) = 0. If g(x) ≥ 0, then E[g(x)] ≥ 0.


Expectations preserve monotonicity; for g(x) ≥ h(x) ∀x ∈ X,E[g(X)] ≥ E[h(X)].

As a special case, let h(x) = 0. If g(x) ≥ 0, then E[g(x)] ≥ 0.

Expectations also preserve equality; for g(x) = h(x) ∀x ∈ X,E[g(X)] = E[h(X)].

Conditional Expectations

The expectation of a random variable Y conditional on or givenX is defined analogously to the preceding formulations, butuses the conditional distribution fY |X(y|X = x), rather thanthe unconditional fY (y).Conditional distributions are about changing thepopulation that you are considering.


Conditional distributions are about changing thepopulation that you are considering.

For discrete random variables:

E(Y |X = x) =∑y∈Y

Pr(Y = y|X = x)y =∑y∈Y

fY |X(y)y.


Recall from the previous set of slides that, for independentrandom variables X and Y ,

fY |X(y|X = x) = fY (y) and fX|Y (x|Y = y) = fX(x)


Let’s pause and consider what is random here.



First, there is one conditional distribution of Y given X = x,but a family of distributions of Y given X, since X takes manyvalues and each value imparts its own conditional distributionfor Y .




Y |X = x derives its randomness solely from the fact that Y israndom—X is fixed at x here.





E(Y |X = x) is not random, since X is fixed and we areintegrating over all possible values of Y .





E(Y |X = x) is not random, since X is fixed and we areintegrating over all possible values of Y .

E(Y |X) is random, since X is no longer fixed (i.e., it israndom), though Y is integrated over.


The unconditional expected value of a function of X and Y ,g(x, y) can be written as:

E[g(X,Y )] =∑x∈X

∑y∈Y

g(x, y)fX,Y (x, y)

=∫ ∞−∞

∫ ∞−∞

g(x, y)fX,Y (x, y) dy dx


The following holds for the conditional expectation of Y givenX:

E[g(X)h(Y )|X] = g(X)E[h(Y )|X]

Question: Why does this hold when X is a randomvariable?


Note that the LIE only holds when the necessary marginalmoments exist.

Variance

The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofX:

Var(X) = E[(X − µ)2

]

Variance


Var(X) = E[(X − µ)2

]Multiplying this out yields:

= E(X2 − 2µX + µ2

)= E

(X2)− 2µE(x) + µ2

= E(X2)− [E(X)]2

Variance


Var(X) = E[(X − µ)2


= E(X2 − 2µX + µ2

)= E

(X2)− 2µE(x) + µ2

= E(X2)− [E(X)]2

The standard deviation, σ, of a random variable is the squareroot of its variance; i.e., σ =

√Var(X).

Variance


Var(X) = E[(X − µ)2


= E(X2 − 2µX + µ2

)= E

(X2)− 2µE(x) + µ2

= E(X2)− [E(X)]2

The standard deviation, σ, of a random variable is the squareroot of its variance; i.e., σ =

√Var(X).

See that Var(aX + b) = a2Var(X).

Conditional Variance

The conditional variance of Y given X is

E[

(Y − E(Y |X))2∣∣∣X] = E (Y 2∣∣X)− [E(Y |X)]2


The conditional variance of Y given X is

E[

(Y − E(Y |X))2∣∣∣X] = E (Y 2∣∣X)− [E(Y |X)]2

Note that

Var[g(X)h(Y )|X] = [g(X)]2Var[h(Y )|X]


Var(Y ) = EY[(Y − EY (Y ))2

]= EY

[(Y − EY |X(Y |X) + EY |X(Y |X)− EY (Y )

)2]= EY

[(Y − EY |X(Y |X)

)2]+ EY [(EY |X(Y |X)− EY (Y ))2]= EX

[E[(Y − EY |X(Y |X)

)2∣∣∣X]]+EX

[E[(

EY |X(Y |X)− EY (Y ))2∣∣∣X]]

= EX [Var(Y |X)] + EX[(

EY |X(Y |X)− EY (Y ))2]

= EX [Var(Y |X)] + Var[EY |X(Y |X)

]


Var(Y ) = EX [Var(Y |X)] + Var[EY |X(Y |X)

]So what?



]So what?

Var(Y ) ≥ Var[EY |X(Y |X)

]Conditioning on more X variables yields a smaller variance.



]So what?

Var(Y ) ≥ Var[EY |X(Y |X)

]Conditioning on more X variables yields a smaller variance.

Intuition: Adding more predictors can only make your guess ofY better. If you add something that isn’t relevant or too noisy,you can just ignore it (in a regression, e.g., you give thatpredictor a 0 coefficient).

See Casella and Berger, pp. 167–168 for a similar, detailedexposition.

Covariance and Correlation

The covariance of random variables X and Y is defined as

Cov(X,Y ) ≡ σXY = EX,Y [(X − EX(X)) (Y − EY (Y ))]= E(XY )− µXµY




We have

Var(aX + bY ) = a2Var(X) + b2Var(Y ) + 2abCov(X,Y )




We have

Var(aX + bY ) = a2Var(X) + b2Var(Y ) + 2abCov(X,Y )

Note that covariance only measures the linear relationshipbetween two random variables.


The correlation of random variables X and Y is defined as

ρXY =σXYσXσY


The correlation of random variables X and Y is defined as

ρXY =σXYσXσY

Correlation is a normalized version of covariance—how big isthe correlation relative to the variation in X and Y ? Both willhave the same sign.


If X and Y are independent, then

E [g(X)h(Y )] = E [g(X)] E [h(Y )]


If X and Y are independent, then

E [g(X)h(Y )] = E [g(X)] E [h(Y )]

Hence, for independent random variables, the covariance is 0.

ExpectationsOf Random VariablesOf Functions of Random VariablesPropertiesConditional Expectations

DispersionVarianceConditional VarianceCovariance and Correlation

Charlie Gibbons Political Science 236 September 22,...

Documents

Transcript of Charlie Gibbons Political Science 236 September 22,...