1 G89.2228 Lect 3b G89.2228 Lecture 3b Why are means and variances so useful? Recap of random...
-
Upload
griffin-pearson -
Category
Documents
-
view
212 -
download
0
Transcript of 1 G89.2228 Lect 3b G89.2228 Lecture 3b Why are means and variances so useful? Recap of random...
1G89.2228 Lect 3b
G89.2228Lecture 3b
• Why are means and variances so useful?
• Recap of random variables and expectations with examples
• Further consideration of random variables
• Expected mean and variance of averages
• Estimates of population variance
• Bias of Variance Estimator
2G89.2228 Lect 3b
Why are Means and Variances so useful?
• The commonly observed NORMAL distribution is indexed by two parameters and 2, the mean and variance
• is the index of location, and 2 is the index of spread. We can estimate the relative frequency of values given and
e
X
Xf
22
1)( 2
2
Family of normal curves
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
-3
-2.5 -2
-1.5 -1
-0.5
0
0.5 1
1.5 2
2.5 3
X
f(X
)
Series1Series2
Normal(0,1)
Normal(-.5,.25)
3G89.2228 Lect 3b
An example: learning about distributions
• Suppose we were planning to study performance variables that are known to be affected by anxiety.
• Is the distribution of performance scores obtained in the month following the WTC attack systematically different from previous studies?
• Suppose we plan to measure performance with a measure that goes from 1 to 10, but published studies used a measure that ranged from 0 to 5. How are the means and variances affected by this difference in range?
4G89.2228 Lect 3b
Expectations Recap
• A Random Variable is a real-valued function defined on a sample space.
• f(X) is a function that describes the likelihood of each value of X» Density function for continuous X
» Probability mass function for discrete X
• Suppose that g(X) is any arbitrary function of values of X.
• E(g(X)) is the expectation of g(X), the average value of g(X) in the population» For continuous variables:
» For discrete variables: dxxfxgxgE )()()]([
i
ii xfxgXgE )()()]([
5G89.2228 Lect 3b
Recap: First Moment(the Mean x)
• E(X)=x is the first moment, the mean
• For k an arbitrary fixed constant:» E(X+k) = E(X)+k = x +k
» E(k*X) = k*E(X) = k* x
• Let Y be a second random variable (perhaps related to X, perhaps not):» E(X+Y) = E(X)+E(Y) = x + y
» E(X-Y) = E(X)-E(Y) = x - y
6G89.2228 Lect 3b
Example
• We can relate the 1-10 scale to the 0-5 scale with a simple linear function. » Let X be on the 0-5 scale.» G(X) is on the 1-10 scale
• G(X) = (9/5)X + 1
• If E(X), the mean of X, is X then E[G(X)], the mean of G(X), is» E[G(X)] = (9/5) X + 1
7G89.2228 Lect 3b
Recap: Second Moment(the Variance V(X))
•
• Let k be a fixed constant»
»
• Let Y be another random variable independent of X, then:»
»
22 )(])[( xx XVXE
222 *)(*)*( xkXVkXkV
2)()( xXVkXV
22)()()( yxYVXVYXV
22)()()( yxYVXVYXV
8G89.2228 Lect 3b
Example
• If X is on the 0-5 scale.» G(X) is on the 1-10 scale
• G(X) = (9/5)X + 1
• If V(X), the variance of X, is X then V[G(X)], the variance of G(X), is» V[G(X)] = (9/5)2 X
• The standard deviation is the square root of the variance» The standard deviation of G(X) is
simply (9/5) the standard deviation of X.
9G89.2228 Lect 3b
Notes on Random Variables
• Statisticians consider all instances of X to be random variables
• E.G., A sample of 10 women measured on CESD gives 10 random variables» independent if sampled randomly
» identically distributed if from same population
• hence, same f(X)
• i.i.d. is shorthand for “independent, identically distributed”
• Note that data analysts use the term “variable” to refer to one kind of measure. If the sample has n subjects, the variable describes the set of n random variables in the statistician’s sense.
10G89.2228 Lect 3b
Random Variables Need Not Be Independent
• Three outcomes measured on a single subject are three random variables» They are not likely to be independent, nor to
have the same f(X)
» We would then consider the multivariate joint density, f(X1,X2,X3)
• Random variables can be nonindependent in other ways» Unit of analysis issue
» E.g., randomly selected employees within randomly selected supervisor’s teams
• If supervisor level is ignored, employees are not sampled randomly (rather in “clusters”)
• Within a team, the employees may be considered independent
» Average team score may be assumed to be independent over supervisors, however
11G89.2228 Lect 3b
Example: Sample of Size 10
6468893684
• The values at the right have a variance of 3.8, (standard deviation of 1.9).
• The mean of the sample is 6.2.
• What can we say about the population from which the numbers are sampled?
• What can we say about the sample statistics themselves?
12G89.2228 Lect 3b
Studying sample statistics using expectation operators
• Let be the sample average of n random variables that are independently sampled from the same distribution (i.i.d). (The expected mean of each X is the same, as is the expected variance).
•
• Because the expectation of the sample mean is equal to the parameter it is estimating, we say it is unbiased.
)(11
211
n
n
ii XXX
nX
nX
X
X
XXX
n
n
n
nn
n
XEXEXEn
XXXEn
XXXnEXE
)(/1
)(/1
))()()((/1
)(/1
))(/1()(
21
21
21
X
13G89.2228 Lect 3b
Expected variance of the sample mean
• The expected variance of the sample mean goes down directly with increased sample size, n.
nnn
XVXVXVn
XXXVn
XVn
Xn
V
XEXEXV
XX
n
n
n
ii
n
ii
XX
/)(1
))()()((1
)(1
)(1
)1
(
))(())(()(
22
2
212
212
12
1
22
14G89.2228 Lect 3b
Bias of a Variance Estimator
• If variance is defined as the average squared deviation from the mean, consider the estimate,
• On the average, will this function of the data give an unbiased estimate?» The answer is NO!» The conceptual reason is that the
sample mean is itself variable» The expected value of the above
sample estimate is [(n-1)/n]
n
i
i
n
XX
1
22 )(̂
15G89.2228 Lect 3b
Bias of a Variance Estimator 2
First, let’s derive an alternative definition of variance:
Next, let’s do the same for our biased variance estimator:
22
22
22
22
2222
))(()(
)(
2)(
)1()(2)(
2)()(
XEXE
XE
XE
EXEXE
XXEXEXV
X
XXX
XX
XXXX
21
2
21
2
1211
2
1
22
1
2
2
2
12
)2()(ˆ
Xn
XXXX
n
X
nX
n
XX
n
X
n
XXXX
n
XX
n
ii
n
ii
n
i
n
ii
n
ii
n
iii
n
ii
X
16G89.2228 Lect 3b
Bias of a Variance Estimator 3
To determine bias, we determine the expected value:
The first term is:
21
2
21
2
2 )ˆ( XEn
XEX
n
XEE
n
ii
n
ii
X
221
22
1
2
1
2
1
2
)(
)(
XX
n
iXX
n
ii
n
ii
n
ii
n
n
XE
n
XE
n
XE
17G89.2228 Lect 3b
Bias of a Variance Estimator 4
The second term is:
Hence,
To make it unbiased,
nXE X
XXX
22222 )(
!!Biased!1
)()()ˆ(
222
2
22222
XXX
X
XXXX
n
n
n
nE
1
)(ˆ
11
2
22
n
XXs
n
n
n
ii