1 G89.2228 Lect 3b G89.2228 Lecture 3b Why are means and variances so useful? Recap of random...

1G89.2228 Lect 3b

G89.2228Lecture 3b

• Why are means and variances so useful?

• Recap of random variables and expectations with examples

• Further consideration of random variables

• Expected mean and variance of averages

• Estimates of population variance

• Bias of Variance Estimator

2G89.2228 Lect 3b

Why are Means and Variances so useful?

• The commonly observed NORMAL distribution is indexed by two parameters and 2, the mean and variance

• is the index of location, and 2 is the index of spread. We can estimate the relative frequency of values given and

e

X

Xf

22

1)( 2

2

Family of normal curves

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-3

-2.5 -2

-1.5 -1

-0.5

0

0.5 1

1.5 2

2.5 3

X

f(X

)

Series1Series2

Normal(0,1)

Normal(-.5,.25)

3G89.2228 Lect 3b

An example: learning about distributions

• Suppose we were planning to study performance variables that are known to be affected by anxiety.

• Is the distribution of performance scores obtained in the month following the WTC attack systematically different from previous studies?

• Suppose we plan to measure performance with a measure that goes from 1 to 10, but published studies used a measure that ranged from 0 to 5. How are the means and variances affected by this difference in range?

4G89.2228 Lect 3b

Expectations Recap

• A Random Variable is a real-valued function defined on a sample space.

• f(X) is a function that describes the likelihood of each value of X» Density function for continuous X

» Probability mass function for discrete X

• Suppose that g(X) is any arbitrary function of values of X.

• E(g(X)) is the expectation of g(X), the average value of g(X) in the population» For continuous variables:

» For discrete variables: dxxfxgxgE )()()]([

i

ii xfxgXgE )()()]([

5G89.2228 Lect 3b

Recap: First Moment(the Mean x)

• E(X)=x is the first moment, the mean

• For k an arbitrary fixed constant:» E(X+k) = E(X)+k = x +k

» E(k*X) = k*E(X) = k* x

• Let Y be a second random variable (perhaps related to X, perhaps not):» E(X+Y) = E(X)+E(Y) = x + y

» E(X-Y) = E(X)-E(Y) = x - y

6G89.2228 Lect 3b

Example

• We can relate the 1-10 scale to the 0-5 scale with a simple linear function. » Let X be on the 0-5 scale.» G(X) is on the 1-10 scale

• G(X) = (9/5)X + 1

• If E(X), the mean of X, is X then E[G(X)], the mean of G(X), is» E[G(X)] = (9/5) X + 1

7G89.2228 Lect 3b

Recap: Second Moment(the Variance V(X))

•

• Let k be a fixed constant»

»

• Let Y be another random variable independent of X, then:»

»

22 )(])[( xx XVXE

222 *)(*)*( xkXVkXkV

2)()( xXVkXV

22)()()( yxYVXVYXV

22)()()( yxYVXVYXV

8G89.2228 Lect 3b

Example

• If X is on the 0-5 scale.» G(X) is on the 1-10 scale

• G(X) = (9/5)X + 1

• If V(X), the variance of X, is X then V[G(X)], the variance of G(X), is» V[G(X)] = (9/5)2 X

• The standard deviation is the square root of the variance» The standard deviation of G(X) is

simply (9/5) the standard deviation of X.

9G89.2228 Lect 3b

Notes on Random Variables

• Statisticians consider all instances of X to be random variables

• E.G., A sample of 10 women measured on CESD gives 10 random variables» independent if sampled randomly

» identically distributed if from same population

• hence, same f(X)

• i.i.d. is shorthand for “independent, identically distributed”

• Note that data analysts use the term “variable” to refer to one kind of measure. If the sample has n subjects, the variable describes the set of n random variables in the statistician’s sense.

10G89.2228 Lect 3b

Random Variables Need Not Be Independent

• Three outcomes measured on a single subject are three random variables» They are not likely to be independent, nor to

have the same f(X)

» We would then consider the multivariate joint density, f(X1,X2,X3)

• Random variables can be nonindependent in other ways» Unit of analysis issue

» E.g., randomly selected employees within randomly selected supervisor’s teams

• If supervisor level is ignored, employees are not sampled randomly (rather in “clusters”)

• Within a team, the employees may be considered independent

» Average team score may be assumed to be independent over supervisors, however

11G89.2228 Lect 3b

Example: Sample of Size 10

6468893684

• The values at the right have a variance of 3.8, (standard deviation of 1.9).

• The mean of the sample is 6.2.

• What can we say about the population from which the numbers are sampled?

• What can we say about the sample statistics themselves?

12G89.2228 Lect 3b

Studying sample statistics using expectation operators

• Let be the sample average of n random variables that are independently sampled from the same distribution (i.i.d). (The expected mean of each X is the same, as is the expected variance).

•

• Because the expectation of the sample mean is equal to the parameter it is estimating, we say it is unbiased.

)(11

211

n

n

ii XXX

nX

nX

X

X

XXX

n

n

n

nn

n

XEXEXEn

XXXEn

XXXnEXE

)(/1

)(/1

))()()((/1

)(/1

))(/1()(

21

21

21

X

13G89.2228 Lect 3b

Expected variance of the sample mean

• The expected variance of the sample mean goes down directly with increased sample size, n.

nnn

XVXVXVn

XXXVn

XVn

Xn

V

XEXEXV

XX

n

n

n

ii

n

ii

XX

/)(1

))()()((1

)(1

)(1

)1

(

))(())(()(

22

2

212

212

12

1

22

14G89.2228 Lect 3b

Bias of a Variance Estimator

• If variance is defined as the average squared deviation from the mean, consider the estimate,

• On the average, will this function of the data give an unbiased estimate?» The answer is NO!» The conceptual reason is that the

sample mean is itself variable» The expected value of the above

sample estimate is [(n-1)/n]

n

i

i

n

XX

1

22 )(̂

15G89.2228 Lect 3b

Bias of a Variance Estimator 2

First, let’s derive an alternative definition of variance:

Next, let’s do the same for our biased variance estimator:

22

22

22

22

2222

))(()(

)(

2)(

)1()(2)(

2)()(

XEXE

XE

XE

EXEXE

XXEXEXV

X

XXX

XX

XXXX

21

2

21

2

1211

2

1

22

1

2

2

2

12

)2()(ˆ

Xn

XXXX

n

X

nX

n

XX

n

X

n

XXXX

n

XX

n

ii

n

ii

n

i

n

ii

n

ii

n

iii

n

ii

X

16G89.2228 Lect 3b


To determine bias, we determine the expected value:

The first term is:

21

2

21

2

2 )ˆ( XEn

XEX

n

XEE

n

ii

n

ii

X

221

22

1

2

1

2

1

2

)(

)(

XX

n

iXX

n

ii

n

ii

n

ii

n

n

XE

n

XE

n

XE

17G89.2228 Lect 3b


The second term is:

Hence,

To make it unbiased,

nXE X

XXX

22222 )(

!!Biased!1

)()()ˆ(

222

2

22222

XXX

X

XXXX

n

n

n

nE

1

)(ˆ

11

2

22

n

XXs

n

n

n

ii

1 G89.2228 Lect 3b G89.2228 Lecture 3b Why are means and variances so useful? Recap of random...

Documents

Transcript of 1 G89.2228 Lect 3b G89.2228 Lecture 3b Why are means and variances so useful? Recap of random...