Charlie Gibbons Political Science 236 September 22,...
Transcript of Charlie Gibbons Political Science 236 September 22,...
-
Random Variables: Expectations and Variances
Charlie GibbonsPolitical Science 236
September 22, 2008
-
Outline
1 ExpectationsOf Random VariablesOf Functions of Random VariablesPropertiesConditional Expectations
2 DispersionVarianceConditional VarianceCovariance and Correlation
-
Expectations of Random Variables
The expectation, E(X) ≡ µ of a random variable X is simplythe weighted average of the possible realizations, weighted bytheir probability.
-
Expectations of Random Variables
The expectation, E(X) ≡ µ of a random variable X is simplythe weighted average of the possible realizations, weighted bytheir probability.
For discrete random variables, this can be written as
E(X) =∑x∈X
Pr(X = x)x =∑x∈X
f(x)x.
-
Expectations of Random Variables
The expectation, E(X) ≡ µ of a random variable X is simplythe weighted average of the possible realizations, weighted bytheir probability.
For discrete random variables, this can be written as
E(X) =∑x∈X
Pr(X = x)x =∑x∈X
f(x)x.
For continuous random variables:
E(X) =∫ ∞−∞
xf(x) dx =∫ ∞−∞
x dF
-
Expectations of Functions of Random Variables
The definition of expectation can be generalized for functions ofrandom variables, g(x).
-
Expectations of Functions of Random Variables
The definition of expectation can be generalized for functions ofrandom variables, g(x).
We have the equations
E[g(x)] =∑x∈X
f(x)g(x)
=∫ ∞−∞
g(x) dF
for discrete and continuous random variables respectively.
-
Properties of Expectations
Expectations are linear operators, i.e.,
E(a · g(X) + b · h(X) + c) = a · E[g(X)] + b · E[h(X)] + c.
-
Properties of Expectations
Expectations are linear operators, i.e.,
E(a · g(X) + b · h(X) + c) = a · E[g(X)] + b · E[h(X)] + c.
Note that, in general, E[g(x)] 6= g[E(X)].
Jensen’s inequality states that, for a convex function g(x),E[g(x)] ≥ g[E(X)]. For concave functions, the inequality isreversed.
-
Properties of Expectations
Expectations preserve monotonicity; for g(x) ≥ h(x) ∀x ∈ X,E[g(X)] ≥ E[h(X)].
As a special case, let h(x) = 0. If g(x) ≥ 0, then E[g(x)] ≥ 0.
-
Properties of Expectations
Expectations preserve monotonicity; for g(x) ≥ h(x) ∀x ∈ X,E[g(X)] ≥ E[h(X)].
As a special case, let h(x) = 0. If g(x) ≥ 0, then E[g(x)] ≥ 0.
Expectations also preserve equality; for g(x) = h(x) ∀x ∈ X,E[g(X)] = E[h(X)].
-
Conditional Expectations
The expectation of a random variable Y conditional on or givenX is defined analogously to the preceding formulations, butuses the conditional distribution fY |X(y|X = x), rather thanthe unconditional fY (y).Conditional distributions are about changing thepopulation that you are considering.
-
Conditional Expectations
Conditional distributions are about changing thepopulation that you are considering.
For discrete random variables:
E(Y |X = x) =∑y∈Y
Pr(Y = y|X = x)y =∑y∈Y
fY |X(y)y.
-
Conditional Expectations
Conditional distributions are about changing thepopulation that you are considering.
For discrete random variables:
E(Y |X = x) =∑y∈Y
Pr(Y = y|X = x)y =∑y∈Y
fY |X(y)y.
For continuous random variables:
E(Y |X = x) =∫ ∞−∞
yfY |X(y|X = x) dy =∫ ∞−∞
y dFY |X(y|X = X)
-
Conditional Expectations
Recall from the previous set of slides that, for independentrandom variables X and Y ,
fY |X(y|X = x) = fY (y) and fX|Y (x|Y = y) = fX(x)
-
Conditional Expectations
Recall from the previous set of slides that, for independentrandom variables X and Y ,
fY |X(y|X = x) = fY (y) and fX|Y (x|Y = y) = fX(x)
Hence,E(Y |X) = E(Y ) and E(X|Y ) = E(X)
This will be a heavily-used result later in the course.
-
Conditional Expectations
Let’s pause and consider what is random here.
-
Conditional Expectations
Let’s pause and consider what is random here.
First, there is one conditional distribution of Y given X = x,but a family of distributions of Y given X, since X takes manyvalues and each value imparts its own conditional distributionfor Y .
-
Conditional Expectations
Let’s pause and consider what is random here.
First, there is one conditional distribution of Y given X = x,but a family of distributions of Y given X, since X takes manyvalues and each value imparts its own conditional distributionfor Y .
Y |X = x derives its randomness solely from the fact that Y israndom—X is fixed at x here.
-
Conditional Expectations
Let’s pause and consider what is random here.
First, there is one conditional distribution of Y given X = x,but a family of distributions of Y given X, since X takes manyvalues and each value imparts its own conditional distributionfor Y .
Y |X = x derives its randomness solely from the fact that Y israndom—X is fixed at x here.
E(Y |X = x) is not random, since X is fixed and we areintegrating over all possible values of Y .
-
Conditional Expectations
Let’s pause and consider what is random here.
First, there is one conditional distribution of Y given X = x,but a family of distributions of Y given X, since X takes manyvalues and each value imparts its own conditional distributionfor Y .
Y |X = x derives its randomness solely from the fact that Y israndom—X is fixed at x here.
E(Y |X = x) is not random, since X is fixed and we areintegrating over all possible values of Y .
E(Y |X) is random, since X is no longer fixed (i.e., it israndom), though Y is integrated over.
-
Conditional Expectations
The unconditional expected value of a function of X and Y ,g(x, y) can be written as:
E[g(X,Y )] =∑x∈X
∑y∈Y
g(x, y)fX,Y (x, y)
=∫ ∞−∞
∫ ∞−∞
g(x, y)fX,Y (x, y) dy dx
-
Conditional Expectations
The following holds for the conditional expectation of Y givenX:
E[g(X)h(Y )|X] = g(X)E[h(Y )|X]
Question: Why does this hold when X is a randomvariable?
-
Conditional Expectations
The law of iterated expectations states that
EY (Y ) = EX [E(Y |X)],
which can be shown by
EY (Y ) =∫ ∫
yfX,Y (x, y) dx dy
=∫ ∫
yfY |X(y|x)fX(x) dx dy
=∫ [
yfY |X(y|x) dy]fX(x) dx
=∫
E(Y |X)fX(x) dx
= EX [E(Y |X)]
-
Conditional Expectations
Note that the LIE only holds when the necessary marginalmoments exist.
-
Variance
The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofX:
Var(X) = E[(X − µ)2
]
-
Variance
The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofX:
Var(X) = E[(X − µ)2
]Multiplying this out yields:
= E(X2 − 2µX + µ2
)= E
(X2)− 2µE(x) + µ2
= E(X2)− [E(X)]2
-
Variance
The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofX:
Var(X) = E[(X − µ)2
]Multiplying this out yields:
= E(X2 − 2µX + µ2
)= E
(X2)− 2µE(x) + µ2
= E(X2)− [E(X)]2
The standard deviation, σ, of a random variable is the squareroot of its variance; i.e., σ =
√Var(X).
-
Variance
The variance of a random variable is a measure of its dispersionaround its mean. It is defined as the second central moment ofX:
Var(X) = E[(X − µ)2
]Multiplying this out yields:
= E(X2 − 2µX + µ2
)= E
(X2)− 2µE(x) + µ2
= E(X2)− [E(X)]2
The standard deviation, σ, of a random variable is the squareroot of its variance; i.e., σ =
√Var(X).
See that Var(aX + b) = a2Var(X).
-
Conditional Variance
The conditional variance of Y given X is
E[
(Y − E(Y |X))2∣∣∣X] = E (Y 2∣∣X)− [E(Y |X)]2
-
Conditional Variance
The conditional variance of Y given X is
E[
(Y − E(Y |X))2∣∣∣X] = E (Y 2∣∣X)− [E(Y |X)]2
Note that
Var[g(X)h(Y )|X] = [g(X)]2Var[h(Y )|X]
-
Conditional Variance
Var(Y ) = EY[(Y − EY (Y ))2
]= EY
[(Y − EY |X(Y |X) + EY |X(Y |X)− EY (Y )
)2]= EY
[(Y − EY |X(Y |X)
)2]+ EY [(EY |X(Y |X)− EY (Y ))2]= EX
[E[(Y − EY |X(Y |X)
)2∣∣∣X]]+EX
[E[(
EY |X(Y |X)− EY (Y ))2∣∣∣X]]
= EX [Var(Y |X)] + EX[(
EY |X(Y |X)− EY (Y ))2]
= EX [Var(Y |X)] + Var[EY |X(Y |X)
]
-
Conditional Variance
Var(Y ) = EX [Var(Y |X)] + Var[EY |X(Y |X)
]So what?
-
Conditional Variance
Var(Y ) = EX [Var(Y |X)] + Var[EY |X(Y |X)
]So what?
Var(Y ) ≥ Var[EY |X(Y |X)
]Conditioning on more X variables yields a smaller variance.
-
Conditional Variance
Var(Y ) = EX [Var(Y |X)] + Var[EY |X(Y |X)
]So what?
Var(Y ) ≥ Var[EY |X(Y |X)
]Conditioning on more X variables yields a smaller variance.
Intuition: Adding more predictors can only make your guess ofY better. If you add something that isn’t relevant or too noisy,you can just ignore it (in a regression, e.g., you give thatpredictor a 0 coefficient).
See Casella and Berger, pp. 167–168 for a similar, detailedexposition.
-
Covariance and Correlation
The covariance of random variables X and Y is defined as
Cov(X,Y ) ≡ σXY = EX,Y [(X − EX(X)) (Y − EY (Y ))]= E(XY )− µXµY
-
Covariance and Correlation
The covariance of random variables X and Y is defined as
Cov(X,Y ) ≡ σXY = EX,Y [(X − EX(X)) (Y − EY (Y ))]= E(XY )− µXµY
We have
Var(aX + bY ) = a2Var(X) + b2Var(Y ) + 2abCov(X,Y )
-
Covariance and Correlation
The covariance of random variables X and Y is defined as
Cov(X,Y ) ≡ σXY = EX,Y [(X − EX(X)) (Y − EY (Y ))]= E(XY )− µXµY
We have
Var(aX + bY ) = a2Var(X) + b2Var(Y ) + 2abCov(X,Y )
Note that covariance only measures the linear relationshipbetween two random variables.
-
Covariance and Correlation
The correlation of random variables X and Y is defined as
ρXY =σXYσXσY
-
Covariance and Correlation
The correlation of random variables X and Y is defined as
ρXY =σXYσXσY
Correlation is a normalized version of covariance—how big isthe correlation relative to the variation in X and Y ? Both willhave the same sign.
-
Covariance and Correlation
If X and Y are independent, then
E [g(X)h(Y )] = E [g(X)] E [h(Y )]
-
Covariance and Correlation
If X and Y are independent, then
E [g(X)h(Y )] = E [g(X)] E [h(Y )]
Hence, for independent random variables, the covariance is 0.
ExpectationsOf Random VariablesOf Functions of Random VariablesPropertiesConditional Expectations
DispersionVarianceConditional VarianceCovariance and Correlation