Handout Three

Introduction to Probability and Statistics

Probability & Statistics for Engineers & Scientists, 9th Ed.

2009

Handout #3

Instructor: Lingzhou Xue

TA: Daniel Eck

1

Goal

Mean, Expectation, Expected Value: The value you expectto get in a statistical experiment is the mean.

Variance To measures how spread out a distribution is. In otherwords, it’s a measure of variability.

Covariance It’s is a measure of the linear relationship betweentwo random variables.

Chebyshev’s Inequality It place an upper bound on the proba-bility that some random variable is greater than or equal a setvalue. No other information about that variable’s distributionis required.

2

Chapter 4

Mathematical Expectation

4.1 Mean of a Random Variable

3

Example

Experiment: Tossing a coin once.

1. What is probability to get one head?

2. Repeat the experiment 10 times. On average, how many

number of heads would there be in each experiment?

4

Let X denote the number of heads. The probability density

function of X is:

x 0 1

f(x) = P (X = x) 12

12

Intuition: Suppose we play n times, then the total number of

heads we expect to have is

n×1

2.

Then, on average, in each experiment, we have

0.5n/n = 0.5.

Mathematics:

E(X) = 0×1

2+ 1×

1

2= 0.5 =

1∑x=0

xf(x).

5

Example

Experiment: Tossing a coin twice.

1. What is probability to get one head?

2. Repeat the experiment 10 times. On average, how many

number of heads would there be in each experiment?

6

Let X denote the number of heads. The probability density

function of X is:

x 0 1 2

f(x) = P (X = x) 14

12

14

Intuition: Expected value of the number of heads: 1.

Mathematics:

E(X) = 0×1

4+ 1×

1

2+ 2×

1

4= 1 =

2∑x=0

xf(x).

7

Example 1

If two coins are tossed 16 times and X is the number of heads

occurring per toss, then the value of X can be 0, 1, and 2.

Suppose that the experiment yields

x 0 1 2times 4 7 5

The average number of heads per toss of the two coins is then

0 · 4 + 1 · 7 + 2 · 516

= 1.06.

An average value is not necessarily a possible outcome for the

experiment.

8

Motivation 1

Consider a casino game in which the probability of losing $1 per

game is 0.8 and the probability 0.2 win $2 per game. The gain or

loss of a gambler who plays this game only a few times depends

on his luck more than anything else. For example, in one play

of the game, a lucky gambler might win $2, but he has 80%

chance of losing $1. However, if a gambler decides to play the

game a large number of times, his loss or gain depends more

on the number of plays than on his luck. A calculating player

argues that if he plays the game n times, for a large n, then in

approximately (0.8)n games he will lose $1 per game, and (0.2)n

he will win $1. Therefore, his total gain is

(0.8)n · (−1) + (0.2)n · 2 = (−0.4)n.

This gives an average of -0.4 of loss per game.

9

If X is the random variable denoting the gain in one play, then

the number -0.4 is the average value of X. In this example, X

is a discrete random variable with the set of possible values {-1,

2}. The probability function of X, f(x) is given by

x -1 2f(x) = P (X = x) 0.8 0.2

Hence

−1 · f(−1) + 2 · f(1) = −0.4,

a relation showing that the expected value of X can be calcu-

lated directly by summing up the product of possible values

of X by their probabilities.

10

Expected value is used to describe the long-term average out-

come of a given scenario. In order to calculate expected value,

you take every possible outcome, multiply each by the probabil-

ity of that outcome happening, and then adding those numbers

altogether.

11

Definition: Expectation

Let X be a random variable with probability distribution f(x).

The mean, expected value, expectation of X is:

• if X is discrete

µ = E(X) =∑xxf(x).

• if X is continuous

µ = E(X) =∫ ∞−∞

xf(x)dx.

12

Example 2

A lot containing 7 components is sampled by a quality inspector;

the lot contains 4 good component and 3 defective components.

A sample of 3 is taken by the inspector. Find the expected value

of the number of good components in this sample.

Solution:

Let X be represent the number of good components in the sam-

ple. The probability distribution of X is

f(x) = P (X = x) =

(4x

)(3

3− x

)(

73

) x = 0,1,2,3.

µ = E(X) =∑xxf(x) = 0 ·

1

35+ 1 ·

12

35+ 2 ·

18

35+ 3 ·

4

35=

12

7.

13

Thus, if a sample of size 3 is selected at random over and over

again from a lot of 4 good components and 3 defective compo-

nents, it would contain, on average, 1.7 good components.

Example 3

In a gambling game a man is paid $5 if he gets all heads or alltails when three coin are tossed, and he will pay out $3 if eitherone or two heads show. What is his expected gain?Solution:The sample space for the possible outcomes when three coinsare tossed simultaneously isS = {HHH,HHT,HTH, THH,HTT, THT, TTH, TTT} The ran-dom variable of interest is Y , the amount the gambler can win;and the possible values of Y are $5 if event E1 = {HHH,TTT}occurs and $-3 if eventE2 = {HHT,HTH, THH,HTT, THT, TTH} occurs, that is, theprobability function of Y is given by

f(y) = P (X = y) =

14, y = 5;34, y = −3;0, elsewhere.

µ = E(Y ) = 5·1

4+(−3)·

3

4= −1.

14

Example 4

Let X be the random variable that denotes the life in hours of a

certain electronic device. The probability density function is

f(x) =

{20,000x3 , x > 100;

0, elsewhere.

Solution:

µ = E(X) =∫ ∞

100x

20,000

x3dx

= 20,000∫ ∞

100

1

x2dx

= 20,000 · (−1

x|∞100) = 200.

15

Question

Suppose g(X) is the function of a random variable X.

• Is g(X) also a random variable?

• If yes, how to find the mean of g(X), E[g(X)]?

16

Question

Suppose g(X) is the function of a random variable X.

• Is g(X) also a random variable?

• If yes, how to find the mean of g(X), E[g(X)]?

17

Illustration 1

Now let us consider a new random variable g(X), which dependson X; that is, each value of g(X) is determined by knowing thevalues of X.

For instance, let Y = g(X) = X2. If X is a discrete randomvariable with probability distribution f(x)

x -1 0 1 2

f(x) = P (X = x) 18

38

38

18

P [g(X) = 0] = P (X2 = 0) = f(0) =3

8,

P [g(X) = 1] = P (X2 = 1) = f(−1) + f(1) =4

8,

P [g(X) = 4] = P (X2 = 4) = f(2) =3

8,

18

so that the probability distribution of Y = g(X) may be written

y = g(x) 0 1 4

h(y) = P (Y = y) = P [g(X) = g(x)] 38

48

18

µg(X) = E[g(X)] = E(Y ) =∑yyh(y)

= 0 ·3

8+ 1 ·

4

8+ 4 ·

1

8

=8

8= 1

= (−1)2 ·1

8+ 02 ·

3

8+ 12 ·

3

8+ 22 ·

1

8=∑xg(x) · f(x).

Illustration 1

Now let us consider a new random variable g(X), which dependson X; that is, each value of g(X) is determined by knowing thevalues of X.

For instance, let Y = g(X) = X2. If X is a discrete randomvariable with probability distribution f(x), for x = −1,0,1,2, then

P [g(X) = 0] = P (X2 = 0) = P (X = 0) = f(0),

P [g(X) = 1] = P (X2 = 1) = P (X = −1) + P (X = 1) = f(−1) + f(1),

P [g(X) = 4] = P (X2 = 4) = P (X = 2) = f(2),

so that the probability distribution of Y = g(X) may be written

y = g(x) 0 1 4h(y) = P (Y = y) = P [g(X) = g(x)] f(0) f(−1) + f(1) f(2)

19

µg(X) = E[g(X)] = E(Y ) =∑yyh(y)

= 0 · h(0) + 1 · h(1) + 4 · h(2)

= 0 · f(0) + 1 · [f(1) + f(−1)] + 4 · f(2)

= (−1)2 · f(−1) + 02 · f(0) + 12 · f(1) + 22 · f(2)

=∑xg(x) · f(x).

Illustration 2

Let Y = g(X) = 3X − 1. If X is a discrete random variable with

probability distribution f(x), for x = −1,0,1,2, then

P [g(X) = −4] = P (3X − 1 = −4) = P (X = −1) = f(−1)

P [g(X) = −1] = P (3X − 1 = −1) = P (X = 0) = f(0),

P [g(X) = 2] = P (3X − 1 = 2) = P (X = 1) = f(1),

P [g(X) = 5] = P (3X − 1 = 5) = P (X = 2) = f(2),

so that the probability distribution of g(X) may be written

y = g(x) -4 -1 2 5h(y) = P (Y = y) = P [g(X) = g(x)] f(−1) f(0) f(1) f(2)

20

µg(X) = E[g(X)] = E(Y ) =∑yyh(y)

= (−4) · h(−4) + (−1) · h(−1) + 2 · h(2) + 5 · h(5)

= (−4) · f(−1) + (−1) · f(0) + 2 · f(1) + 5 · f(2)

=∑x

(3x− 1) · f(x)

=∑xg(x) · f(x).

Theorem

Let X be a random variable with probability function f(x). The

expected value of the random variable g(X) is

• Discrete: if X is discrete

µg(X) = E[g(X)] =∑xg(x) · f(x).

• Continuous: if X is continuous

µg(X) = E[g(X)] =∫ ∞−∞

g(x) · f(x)dx.

21

Example 5

Suppose that the number of cars X that pass through a car wash

between 4:00 P.M. and 5:00 P.M. on any sunny Friday has the

following probability:

x 4 5 6 7 8 9

P (X = x) 112

112

14

14

16

16

Let g(X) = 2X − 1 represent the amount of money in dollars,

paid to the attendant by the manager. Find the attendant’s

expected earnings for this particular time period.

22

Solution:

By theorem, the attendant can expect to receive

E[g(X)] = E(2X − 1) =9∑

x=4

(2x− 1)f(x)

= 7×1

12+ 9×

1

12+ 11×

1

4

+ 13×1

4+ 15×

1

6+ 17×

1

6= 12.67.

23

Example 6

Let X be a random variable with density function

f(x) =

{x2

3 , −1 < x < 2;0, elsewhere;

Find the expected value of g(X) = 4X + 3.

Solution:

E(4X + 3) =∫ 2

−1(4x+ 3)

x2

3dx =

1

3

∫ 2

−1(4x3 + 3x2)dx

=1

3

[(x4 + x3)|x=2

x=−1

]= 8.

24

Definition

Let X and Y be random variables with joint probability distribu-

tion f(x, y). The mean or expected value of g(X,Y ) is:

• if both X and Y are discrete

µg(X,Y ) = E[g(X,Y )] =∑x

∑yg(x, y)f(x, y).

• if both X and Y are continuous

µg(X,Y ) = E[g(X,Y )] =∫ ∞−∞

∫ ∞−∞

g(x, y)f(x, y)dxdy.

25

Example 7: Example 15 in Handout #2

Let X and Y be the random variables with joint probability dis-tribution indicated in the following Table.

x Rowf(x, y) 0 1 2 Totals

0 328

928

328

1528

y 1 628

628 0 12

282 1

28 0 0 128

Column Totals 1028

1528

328 1

Find the expected value of g(X,Y ) = XY .Solution:

E[g(X,Y )] = E(XY ) =2∑

x=0

2∑y=0

xyf(x, y) =3

14.

26

Example 8

Find E(YX

)for the density function

f(x, y) =

{x(1+3y2)

4 , 0 < x < 2,0 < y < 1;0, elsewhere.

Solution:

E

(Y

X

)=∫ 1

0

∫ 2

0

y

x·x(1 + 3y2)

4dxdy =

∫ 1

0

y + 3y3

2dy =

5

8.

27

Notes

Let X and Y be random variables with joint probability distribu-

tion f(x, y).

• if g(X,Y ) = X

E(X) =

{ ∑x∑y g(x, y)f(x, y) =

∑x xm1(x), discrete case;∫∞

−∞∫∞−∞ xf(x, y)dxdy =

∫∞−∞ xm1(x)dx, continuous case.

where m1(x) is the marginal distribution of X.

• if g(X,Y ) = Y

E(Y ) =

{ ∑y∑y g(x, y)f(x, y) =

∑y ym2(y), discrete case;∫∞

−∞∫∞−∞ yf(x, y)dxdy =

∫∞−∞ ym2(y)dy, continuous case.

where m2(x) is the marginal distribution of Y .

28

Therefore, in calculating E(X) over a two-dimensional space,

one may use either the joint probability function of X and Y or

the marginal distribution of X.

Riddle

Why are the mean, median, and mode like a valuable piece of

real estate?

LOCATION! LOCATION! LOCATION!

29

4.2 Variance and Covariance of Random

Variables

30

If I put your feet in the boiling water and head in the ice, on

average, you’d be comfortable. However, you’d die eventually.

It is not enough to just consider the average.

31

If I put your feet in the boiling water and head in the ice, on

average, you’d be comfortable. However, you’d die eventually.

It is not enough to just consider the average.

32

The variance of a random variable is a measure of its statistical

dispersion, indicating how its possible values are spread around

the expected value. While the expected value shows the loca-

tion of the distribution, the variance indicates the variability of

the values.

Distribution with equal means µ = 2 and unequal dispersions.

33

Definition: Variance

Let X be a random variable with probability function f(x) and

mean µ. The variance of the random variable X is


σ2 = Var(X) = E[(X − µ)2] =∑x

(x− µ)2 · f(x).


σ2 = Var(X) = E[(X − µ)2] =∫ ∞−∞

(x− µ)2 · f(x)dx.

The average of the squared distance of its possible values

X from the expected value µ.

34

Definition: Standard Deviation

The positive square root of the variance, σ, is called the stan-

dard deviation of X.

σ =√

Var(X).

35

Example 9

Let the random variable X represent the number of automobiles

that are used for official business purposes on any given workday.

The probability distribution for company A is

x 1 2 3f(x) 0.3 0.4 0.3

and for company B

x 0 1 2 3 4f(x) 0.2 0.1 0.3 0.3 0.1

36

Show that the variance of the probability distribution for com-

pany B is greater than that of company A.

µA = 1 · 0.3 + 2 · 0.4 + 3 · 0.3 = 2;

µB = 0 · 0.2 + 1 · 0.1 + 2 · 0.3 + 3 · 0.3 + 4 · 0.1 = 2;

σ2A =

2∑x=1

(x− 2)2fA(x) = 0.6;

σ2B =

4∑x=0

(x− 2)2fB(x) = 1.6.

Theorem

The variance of a random variable X is

σ2 = Var(X) = E(X2)− µ2

Proof:

For the discrete case we can write

σ2 =∑x

(x− µ)2f(x) =∑x

(x2 − 2µx+ µ2)f(x)

=∑xx2f(x)− 2µ

∑xxf(x) + µ2∑

xf(x).

=∑xx2f(x)− 2µ · µ+ µ2 · 1.

=∑xx2f(x)− µ2

=E(X2)− µ2.

37

Example 10

Let the random variable X represent the number of defective

parts for a machine when 3 parts are sampled from a production

line and tested. The following is the probability distribution of

X.

x 0 1 2 3f(x) 0.51 0.38 0.10 0.01

Solution:

µ = 0 · 0.51 + 1 · 0.38 + 2 · 0.10 + 3 · 0.01 = 0.61;

E(X2) = 0 · 0.51 + 1 · 0.38 + 4 · 0.10 + 9 · 0.01 = 0.87;

σ2 = E(X2)− µ2 = 0.87− 0.612 = 0.4979.

38

Example 11

The weekly demand for pepsi, in thousands of liters, from a

local chain of efficiency stores, is a continuous random variable

X having the probability density

f(x) =

{2(x− 1), 1 < x < 2;0, elsewhere.

Find the mean and variance of X.

Solution:

µ = E(X) =∫ 2

1x · 2(x− 1)dx =

5

3.

E(X2) =∫ 2

1x2 · 2(x− 1)dx =

17

6.

σ2 = Var(X) = E(X2)− µ2 =17

6−(

5

3

)2=

1

18

39

Theorem

Let X be a random variable with probability function f(x). The

variance of the random variable g(X) is


σ2g(X) = E{[g(X)− µg(X)]2} =

∑x

[g(x)− µg(X)]2 · f(x).


σ2g(X) = E{[g(X)− µg(X))]2} =

∫ ∞−∞

[x− µg(X)]2 · f(x)dx.

40

Example 12

Calculate the variance of g(X) = 2X + 3, where X is a random

variable with probability distribution

x 0 1 2 3

f(x) 14

14

14

14

Solution:

µ2X+3 = E(2X + 3) =3∑

x=0

(2x+ 3)f(x) = 6.

σ22X+3 = E{[(2X + 3)− µ2X+3]2} = E[(2X + 3− 6)2]

= E(4X2 − 12X + 9) =3∑

x=0

(4x2 − 12x+ 9)f(x)

= 4.

41

Example 6 Cont’d

Find the variance of the random variable g(X) = 4X + 3.

Solution:

σ24X+3 = E{[(4X + 3)− µ4X+3]2} = E[(4X + 3− 8)2]

= E(16X2 − 40X + 25) =∫ 2

−1(16x2 − 40x+ 25)

x2

3dx

=51

5.

42

Question

Does there exist a quantity to measure how much two variables

change together or provide a measure of the strength of the

correlation between two random variables?

43

Example

Let X denote the father’s height, Y the daughter’s height, and

Z the mother’s height.

●

●

●

●

●

●

●

●

●

●●

●

●

●●

● ●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●●● ●

●●

●

●●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

4 5 6 7

3.54.0

4.55.0

5.56.0

6.5

Unit of Measurement: Foot

Father's Height

Daught

er's He

ight

Cov(X, Y)=0.3

●

●

●

●●

●●

●

●

●

●

●

●

● ●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●

● ●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

4 5 6 7 83.5

4.04.5

5.05.5

6.06.5


Father's Height

Mothe

r's Heigh

t

Cov(X, Z)=0.02

44

(a) Cov(X,Y ) > 0 (b) Cov(X,Y ) < 0.

45

Definition: Covariance

Let X and Y be random variables with probability distribution

f(x, y). The covariance of the random variables X and Y is

• Discrete: if both X and Y are discrete is discrete

σXY = Cov(X,Y ) = E [(X − µX)(Y − µY )]

=∑x

∑y

(x− µx)(y − µy) · f(x, y)

• Continuous: if both X and Y are continuous

σXY = Cov(X,Y ) = E [(X − µX)(Y − µY )]

=∫ ∞−∞

∫ ∞−∞

(x− µx)(y − µy) · f(x, y)dxdy

46

The covariance between two random variables is a measurementof the nature of the association between the two. If large valuesof X often result in large values of Y or small values of X resultin small values of Y , positive X − µx will often result in positiveY − µy and negative X − µx will often result in negative Y − µy.Thus the product (X − µx)(Y − µy) will tend to be positive. Onthe other hand, if large X values often result in small Y values,the product (X −µx)(Y −µy) will tend to be negative. Thus thesign of the covariance indicates whether the relationship betweentwo dependent random variables is positive or negative. WhenX and Y are statistically independent, it can be shown that thecovariance is zero. The converse, however, is not general true.Two variables may have zero covariance and still not be statis-tically independent. Note that the covariance only describes thelinear relationship between two random variables. Therefore,if a covariance between X and Y is zero, X and Y may havenonlinear relationship, which means that they are not necessarilyindependent.

47

Example 13

Let X be a random variable and the probability distribution of Xis

x -2 -1 0 1 2f(x) 0.2 0.2 0.2 0.2 0.2

Let Y = g(X) = X2. Then Cov(X,Y ) = 0, but X and Y havethe quadratic relationship.

49

Theorem

The covariance of two random variables X and Y with meansµX and µY , respectively, is given by

σXY = Cov(X,Y ) = E(XY )− µXµY .Proof:For the discrete case we can write

σXY =∑x

∑y

(x− µX)(y − µY )f(x, y)

=∑x

∑y

(xy − µXy − µY x+ µXµY )f(x, y)

=∑x

∑yxyf(x, y)− µX

∑x

∑yyf(x, y)

−µY∑x

∑yxf(x, y) + µXµY

∑x

∑yf(x, y)

=E(XY )− µXµY − µY µX + µXµY=E(XY )− µXµY .

50

Example 14

The fraction X of male runners and the fraction Y of female

runners who compete in marathon races are described by the

joint probability density function

f(x, y) =

{8xy, 0 ≤ y ≤ x ≤ 1;0, elsewhere.

Find the covariance of X and Y .

Solution:

We first compute the marginal density functions:

g(x) =

{4x3, 0 < x < 1;0, elsewhere.

h(x) =

{4y(1− y2), 0 < y < 1;0, elsewhere.

51

µX = E(X) =∫ 1

04x4dx =

4

5;

µY =∫ 1

04y2(1− y2)dy =

8

15.

E(XY ) =∫ 1

0

∫ 1

y8x2y2dxdy =

4

9

σXY = E(XY )− µXµY =4

9−

4

5·

8

15=

4

225.

Although the covariance between two random variables does pro-

vide information regarding the nature of the relationship, the

magnitude of σXY does not indicate anything regarding the

strength of the relationship, since σXY is not scale-free. Its

magnitude will depend on the units measured for both X and

Y . There is a scale-free version of the covariance called the

correlation coefficient that is used widely in statistics.

52

Example

Let X denote the father’s height, Y the daughter’s height, and

Z the mother’s height.

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

4 5 6 7

3.54.0

4.55.0

5.56.0

6.5


Father's Height

Daug

hter's

Heig

ht

Cov(X, Y)=0.3Cov(X, Y)=0.3Cov(X, Y)=0.3

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

50 60 70 80 90

4050

6070

80

Unit of Measurement: Inch

Father's Height

Daug

hter's

Heig

ht

Cov(X, Y)=30

53

Definition: Correlation Coefficient

Let X and Y be random variables with covariance σXY and stan-dard deviations σX and σY , respectively. The correlation coef-ficient of X and Y is

ρXY =Cov(X,Y )√

Var(X)Var(Y )=

σXYσXσY

Notice:

• −1 ≤ ρXY ≤ 1

• When Y = a+ bX, then

– If b > 0, ρXY = 1

– If b < 0, ρXY = −1

54

Example 14 Cont’d

Find the correlation coefficient of X and Y .

σ2X =

2

75

σ2Y =

1

5

ρXY =σXYσXσY

=4

225√2

75 ·15

=2√

30

45.

55

4.3 Means and Variances of Linear Combinations of

Random Variables

56

Theorem

If a and b are constants, then

E(aX + b) = aE(X) + b

Proof:

E(aX + b) =∫ ∞−∞

(ax+ b)f(x)dx

=a∫ ∞−∞

xf(x)dx+ b∫ ∞−∞

f(x)dx

=aE(X) + b.

57

Example 15


f(x) =

{x2

3 , −1 < x < 2;0, elsewhere;

Find the expected value of g(X) = 4X + 3.

Solution:

E(4X + 3) =∫ 2

−1(4x+ 3)

x2

3dx =

1

3

∫ 2

−1(4x3 + 3x2)dx = 8

E(X) =∫ 2

−1x ·

x2

3dx =

15

12

E(4X + 3) = 4E(X) + 3 = 4 ·15

12+ 3 = 8.

58

Theorem

The expected value of the sum or difference of two or more

functions of a random variable X is the sum or difference of the

expected values of the functions. That is,

E[g(X)± h(X)] = E[g(X)]± E[h(X)]

Proof:

E[g(X)± h(X)] =∫ ∞−∞

(g(x)± h(x))f(x)dx

=∫ ∞−∞

g(x)f(x)dx±∫ ∞−∞

h(x)f(x)dx

=E[g(X)]± E[h(X)].

59

Theorem

The expected value of the sum or difference of two or more

functions of random variables X and Y is the sum or difference

of the expected values of the functions. That is,

E[g(X,Y )± h(X,Y )] = E[g(X,Y )]± E[h(X,Y )]

Proof:

E[g(X,Y )± h(X,Y )] =∫ ∞−∞

∫ ∞−∞

(g(x, y)± h(x, y))f(x, y)dxdy

=∫ ∞−∞

∫ ∞−∞

g(x, y)f(x, y)dxdy

±∫ ∞−∞

∫ ∞−∞

h(x, y)f(x, y)dxdy

=E[g(X,Y )]± E[h(X,Y )].

60

Theorem

Let X and Y be two independent random variables. Then

E(XY ) = E(X)E(Y ).

Proof:

Suppose that g(x) and h(y) are the marginal distribution of X

and Y , respectively. Since X and Y are independent, we may

write f(x, y) = g(x)h(y). By definition,

E(XY ) =∫ ∞−∞

∫ ∞−∞

xyf(x, y)dxdy

=∫ ∞−∞

∫ ∞−∞

xyg(x)h(y)dxdy

=∫ ∞−∞

xg(x)dx∫ ∞−∞

yh(y)dy

= E(X)E(Y ).

61

Corollary

Let X and Y be two independent random variables. Then σXY =

0.

Proof:

Cov(X,Y ) = E(XY )− EXEY = 0.

Example 16

Given the joint density function

f(x, y) =

{x(1+3y2)

4 , 0 < x < 2,0 < y < 1;0, elsewhere.

Verify E(XY ) = E(X)E(Y ).

Solution:

• g(x) = x2, 0 < x < 2.

• h(y) = 1+3y2

2 , 0 < y < 1.

• f(x, y) = g(x)h(y) for all x and y, so X and Y are indepen-

dent.62

E(XY ) =∫ 1

0

∫ 2

0xyf(x, y)dxdy =

5

6.

E(X) =∫ 1

0

∫ 2

0xf(x, y)dxdy =

∫ 2

0xh(x)dx =

4

3.

E(Y ) =∫ 1

0

∫ 2

0yf(x, y)dxdy =

∫ 1

0yg(y)dy =

5

8.

Hence,

E(X)E(Y ) =4

3·

5

8=

5

6= E(XY ).

Example 17

If the joint density function of X and Y is given by?

f(x, y) =

{27(x+ 2y), if 0 < x < 1,1 < y < 2;0, elsewhere.

Find the expected value of E(XY ).Solution:

E(X

Y) =

2

3ln 2 6=

E(X)

E(Y ).

Question:

• E(XY )?=E(X)E(Y ) .

• E(XY )?=E(X)E( 1

Y ).

63

Theorem

If a and b are constants, then

σ2aX+b = Var(aX + b) = a2Var(X)

Proof:

By definition,

Var(aX + b) = E[aX + b− µaX+b]2

= E[aX + b− (aµX + b)]2

= E[(aX − aµX)2]

= E[a2 · (X − µX)2]

= a2E[(X − µX)2] = a2Var(X).

64

Theorem

If X and Y are random variables with joint probability distribution

f(x, y) and a and b are constants, then

σ2aX+bY = Var(aX + bY ) = a2Var(X) + 2abCov(X,Y ) + b2Var(Y )

Proof:

By definition,

Var(aX + b)

= E[aX + bY − µaX+bY ]2

= E[aX + bY − (aµX + bµY )]2

= E[a(X − µX) + b(Y − µY )]2

= E([a2(X − µX)2 + 2ab(X − µX)(Y − µY ) + b2(Y − µY )2]

= a2E[(X − µX)2] + 2abE[(X − µX)(Y − µY )] + b2E[(Y − µY )2]

= a2Var(X) + 2abCov(X,Y ) + b2Var(Y )

65

Corollary

Let X and Y be two independent random variables. Then

Var(aX ± bY ) = a2Var(X)+b2Var(Y ).

Example 18


f(x) =

{x2

3 , −1 < x < 2;0, elsewhere;

Find the variance of g(X) = 4X + 3.

Solution:

Var(4X + 3) =∫ 2

−1(4x+ 3− 8)2x

2

3dx = 8

Var(X) =∫ 2

−1(x−

5

3)2x

2

3dx =

1

2

Var(4X + 3) = 42Var(X) = 16 ·1

2= 8.

66

4.4 Chebyshev’s Theorem

67

Question

Let X have a probability density function f(x). Then

µ = E(X) =∫xf(x)dx, σ2 = Var(X) =

∫(x− µ)2f(x)dx.

P (µ− kσ < X < µ+ kσ) =∫ µ+kσ

µ−kσf(x)dx.

Suppose f(x) is unknown, but the mean µ and the variance σ2

are known. What is probability P (µ− kσ < X < µ+ kσ)?

Although we’re unable to obtain the exact probability, we can

estimate it.

68

Question

Let X have a probability density function f(x). Then

µ = E(X) =∫xf(x)dx, σ2 = Var(X) =

∫(x− µ)2f(x)dx.

P (µ− kσ < X < µ+ kσ) =∫ µ+kσ

µ−kσf(x)dx.

Suppose f(x) is unknown, but the mean µ and the variance σ2

are known. What is probability P (µ− kσ < X < µ+ kσ)?

Although we’re unable to obtain the exact probability, we can

estimate it.

69

Markov’s Inequality

Let X be a nonnegative random variable; then for any t > 0,

P (X ≥ t) ≤E(X)

t.

Proof:

70

By definition,

E(X) =∫ ∞−∞

(x− µ)2f(x)dx

=∫ ∞

0xf(x)dx

=∫ t

0xf(x)dx+

∫ ∞t

xf(x)dx

≥∫ ∞t

xf(x)dx

≥∫ ∞t

tf(x)dx

= t∫ ∞t

f(x)dx

= tP (X ≥ t).

Thus,

P (X ≥ t) ≤E(X)

t.

Theorem (Chebyshev’s Inequality)

The probability that any random variable X will assume a valuewithin k standard deviations of the mean is at least 1− 1

k2. Thatis,

P (µ− kσ < X < µ+ kσ) ≥ 1−1

k2

Proof:By definition,

P (µ− kσ < X < µ+ kσ)

=P [|X − µ| ≤ kσ]

=1− P [|X − µ|2 ≥ (kσ)2] (X − µ)2 ≥ (kσ)2 ⇔ |X − µ| ≥ kσ

≥1−E[(X − µ)2]

t2(by Markov Inequality)

=1−σ2

(kσ)2= 1−

1

k2

71

Example 19

A random variable X has a mean µ = 8, a variance σ2 = 9, andan unknown probability distribution. Find

1. P (−4 < X < 20)

2. P (|X − 8| ≥ 6).

Solution:

1. P (−4 < X < 20) = P [8− 4 · 3 < X < 8 + 4 · 3] ≥ 1516

2. P (|X − 8| ≥ 6) = 1 − P (|X − 8| < 6) = 1 − P (−6 < X − 8 <6) = 1− P (8− 2 · 3 < X < 8 + 2 · 3) ≤ 1

4.

72

L’Hopital’s Rule

In simple cases, L’Hopital’s rule states that for functions f(x)

and g(x), if:

limx→c f(x) = lim

x→c g(x) = 0

or:

limx→c f(x) = lim

x→c g(x) = ±∞

then:

limx→c

f(x)

g(x)= lim

x→cf ′(x)

g′(x)

where the prime (’) denotes the derivative.

Among other requirements, for this rule to hold, the limit limx→cf ′(x)g′(x)

must exist.

73

Integration by Parts∫udv = uv −

∫vdu.

74

Handout Three

Documents

Transcript of Handout Three