Math 362: Mathmatical Statistics...
Transcript of Math 362: Mathmatical Statistics...
![Page 1: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/1.jpg)
Math 362: Mathmatical Statistics II
Emory UniversityAtlanta, GA
Last updated on February 20, 2020
2020 Spring
0
![Page 2: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/2.jpg)
Chapter 5: Estimation
1
![Page 3: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/3.jpg)
Plan
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
2
![Page 4: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/4.jpg)
Chapter 5. Estimation
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
3
![Page 5: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/5.jpg)
§ 5.1 Introduction
Motivating example: Given an unfair coin, or p-coin, such that
X =
{1 head with probability p,0 tail with probability 1− p,
how would you determine the value p?
Solutions:1. You need to try the coin several times, say, three times. What you obtain
is “HHT”.
2. Draw a conclusion from the experiment you just made.
4
![Page 6: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/6.jpg)
§ 5.1 Introduction
Motivating example: Given an unfair coin, or p-coin, such that
X =
{1 head with probability p,0 tail with probability 1− p,
how would you determine the value p?
Solutions:1. You need to try the coin several times, say, three times. What you obtain
is “HHT”.
2. Draw a conclusion from the experiment you just made.
4
![Page 7: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/7.jpg)
Rationale: The choice of the parameter p should be the value thatmaximizes the probability of the sample.
P(X1 = 1,X2 = 1,X3 = 0) =P(X1 = 1)P(X2 = 1)P(X3 = 0)
=p2(1− p).
1 # Hello, R.2 p <− seq(0,1,0.01)3 plot (p,p^2∗(1−p),4 type="l" ,5 col="red")6 title ("Likelihood")7 # add a vertical dotted (4)
blue line8 abline(v=0.67, col="blue",
lty =4)9 # add some text
10 text (0.67,0.01, "2/3")
Maximize f (p) = p2(1− p) ....
5
![Page 8: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/8.jpg)
Rationale: The choice of the parameter p should be the value thatmaximizes the probability of the sample.
P(X1 = 1,X2 = 1,X3 = 0) =P(X1 = 1)P(X2 = 1)P(X3 = 0)
=p2(1− p).
1 # Hello, R.2 p <− seq(0,1,0.01)3 plot (p,p^2∗(1−p),4 type="l" ,5 col="red")6 title ("Likelihood")7 # add a vertical dotted (4)
blue line8 abline(v=0.67, col="blue",
lty =4)9 # add some text
10 text (0.67,0.01, "2/3")
Maximize f (p) = p2(1− p) ....
5
![Page 9: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/9.jpg)
A random sample of size n from the population – Bernoulli(p):
I X1, · · · ,Xn are i.i.d.1 random variables, each following Bernoulli(p).
I Suppose the outcomes of the random sample are: X1 = k1, · · · ,Xn = kn.
I What is your choice of p based on the above random sample?
p =1n
n∑i=1
ki =: k .
1 independent and identically distributed6
![Page 10: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/10.jpg)
A random sample of size n from the population – Bernoulli(p):
I X1, · · · ,Xn are i.i.d.1 random variables, each following Bernoulli(p).
I Suppose the outcomes of the random sample are: X1 = k1, · · · ,Xn = kn.
I What is your choice of p based on the above random sample?
p =1n
n∑i=1
ki =: k .
1 independent and identically distributed6
![Page 11: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/11.jpg)
A random sample of size n from the population – Bernoulli(p):
I X1, · · · ,Xn are i.i.d.1 random variables, each following Bernoulli(p).
I Suppose the outcomes of the random sample are: X1 = k1, · · · ,Xn = kn.
I What is your choice of p based on the above random sample?
p =1n
n∑i=1
ki =: k .
1 independent and identically distributed6
![Page 12: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/12.jpg)
A random sample of size n from the population – Bernoulli(p):
I X1, · · · ,Xn are i.i.d.1 random variables, each following Bernoulli(p).
I Suppose the outcomes of the random sample are: X1 = k1, · · · ,Xn = kn.
I What is your choice of p based on the above random sample?
p =1n
n∑i=1
ki =: k .
1 independent and identically distributed6
![Page 13: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/13.jpg)
A random sample of size n from the population with given pdf:
I X1, · · · ,Xn are i.i.d. random variables, each following the same given pdf.
I a statistic or an estimator is a function of the random sample.
Statistic/Estimator is a random variable!
e.g.,
p =1n
n∑i=1
Xi .
I The outcome of a statistic/estimator is called an estimate. e.g.,
pe =1n
n∑i=1
ki .
7
![Page 14: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/14.jpg)
A random sample of size n from the population with given pdf:
I X1, · · · ,Xn are i.i.d. random variables, each following the same given pdf.
I a statistic or an estimator is a function of the random sample.
Statistic/Estimator is a random variable!
e.g.,
p =1n
n∑i=1
Xi .
I The outcome of a statistic/estimator is called an estimate. e.g.,
pe =1n
n∑i=1
ki .
7
![Page 15: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/15.jpg)
A random sample of size n from the population with given pdf:
I X1, · · · ,Xn are i.i.d. random variables, each following the same given pdf.
I a statistic or an estimator is a function of the random sample.
Statistic/Estimator is a random variable!
e.g.,
p =1n
n∑i=1
Xi .
I The outcome of a statistic/estimator is called an estimate. e.g.,
pe =1n
n∑i=1
ki .
7
![Page 16: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/16.jpg)
Plan
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
8
![Page 17: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/17.jpg)
Chapter 5. Estimation
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
9
![Page 18: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/18.jpg)
§ 5.2 Estimating parameters: MLE and MME
Two methods: Corresponding estimator
1. Method of maximum likelihood. MLE
2. Method of moments. MME
10
![Page 19: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/19.jpg)
§ 5.2 Estimating parameters: MLE and MME
Two methods: Corresponding estimator
1. Method of maximum likelihood. MLE
2. Method of moments. MME
10
![Page 20: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/20.jpg)
Maximum Likelihood Estimation
Definition 5.2.1. For a random sample of size n from the discrete (resp.continuous) population/pdf pX (k ; θ) (resp. fY (y ; θ)), the likelihood function,L(θ), is the product of the pdf evaluated at Xi = ki (resp. Yi = yi ), i.e.,
L(θ) =n∏
i=1
pX (ki ; θ)
(resp. L(θ) =
n∏i=1
fY (yi ; θ)
).
Definition 5.2.2. Let L(θ) be as defined in Definition 5.2.1. If θe is a value ofthe parameter such that L(θe) ≥ L(θ) for all possible values of θ, then we callθe the maximum likelihood estimate for θ.
11
![Page 21: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/21.jpg)
Maximum Likelihood Estimation
Definition 5.2.1. For a random sample of size n from the discrete (resp.continuous) population/pdf pX (k ; θ) (resp. fY (y ; θ)), the likelihood function,L(θ), is the product of the pdf evaluated at Xi = ki (resp. Yi = yi ), i.e.,
L(θ) =n∏
i=1
pX (ki ; θ)
(resp. L(θ) =
n∏i=1
fY (yi ; θ)
).
Definition 5.2.2. Let L(θ) be as defined in Definition 5.2.1. If θe is a value ofthe parameter such that L(θe) ≥ L(θ) for all possible values of θ, then we callθe the maximum likelihood estimate for θ.
11
![Page 22: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/22.jpg)
Examples for MLEOften but not always MLE can be obtained by setting the first derivative equalto zero:
E.g. 1. Poisson distribution: pX (k) = e−λ λk
k!, k = 0, 1, · · · .
L(λ) =n∏
i=1
e−λλki
ki != e−nλλ
∑ki=1 ki
(n∏
i=1
ki !
)−1
.
ln L(λ) = −nλ+
(n∑
i=1
ki
)lnλ− ln
(n∏
i=1
ki !
).
ddλ
ln L(λ) = −n +1λ
n∑i=1
ki .
ddλ
ln L(λ) = 0 =⇒ λe =1n
n∑i=1
ki =: k .
Comment: The critical point is indeed global maximum because
d2
dλ2 ln L(λ) = − 1λ2
n∑i=1
ki < 0.
12
![Page 23: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/23.jpg)
Examples for MLEOften but not always MLE can be obtained by setting the first derivative equalto zero:
E.g. 1. Poisson distribution: pX (k) = e−λ λk
k!, k = 0, 1, · · · .
L(λ) =n∏
i=1
e−λλki
ki != e−nλλ
∑ki=1 ki
(n∏
i=1
ki !
)−1
.
ln L(λ) = −nλ+
(n∑
i=1
ki
)lnλ− ln
(n∏
i=1
ki !
).
ddλ
ln L(λ) = −n +1λ
n∑i=1
ki .
ddλ
ln L(λ) = 0 =⇒ λe =1n
n∑i=1
ki =: k .
Comment: The critical point is indeed global maximum because
d2
dλ2 ln L(λ) = − 1λ2
n∑i=1
ki < 0.
12
![Page 24: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/24.jpg)
Examples for MLEOften but not always MLE can be obtained by setting the first derivative equalto zero:
E.g. 1. Poisson distribution: pX (k) = e−λ λk
k!, k = 0, 1, · · · .
L(λ) =n∏
i=1
e−λλki
ki != e−nλλ
∑ki=1 ki
(n∏
i=1
ki !
)−1
.
ln L(λ) = −nλ+
(n∑
i=1
ki
)lnλ− ln
(n∏
i=1
ki !
).
ddλ
ln L(λ) = −n +1λ
n∑i=1
ki .
ddλ
ln L(λ) = 0 =⇒ λe =1n
n∑i=1
ki =: k .
Comment: The critical point is indeed global maximum because
d2
dλ2 ln L(λ) = − 1λ2
n∑i=1
ki < 0.
12
![Page 25: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/25.jpg)
Examples for MLEOften but not always MLE can be obtained by setting the first derivative equalto zero:
E.g. 1. Poisson distribution: pX (k) = e−λ λk
k!, k = 0, 1, · · · .
L(λ) =n∏
i=1
e−λλki
ki != e−nλλ
∑ki=1 ki
(n∏
i=1
ki !
)−1
.
ln L(λ) = −nλ+
(n∑
i=1
ki
)lnλ− ln
(n∏
i=1
ki !
).
ddλ
ln L(λ) = −n +1λ
n∑i=1
ki .
ddλ
ln L(λ) = 0 =⇒ λe =1n
n∑i=1
ki =: k .
Comment: The critical point is indeed global maximum because
d2
dλ2 ln L(λ) = − 1λ2
n∑i=1
ki < 0.
12
![Page 26: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/26.jpg)
Examples for MLEOften but not always MLE can be obtained by setting the first derivative equalto zero:
E.g. 1. Poisson distribution: pX (k) = e−λ λk
k!, k = 0, 1, · · · .
L(λ) =n∏
i=1
e−λλki
ki != e−nλλ
∑ki=1 ki
(n∏
i=1
ki !
)−1
.
ln L(λ) = −nλ+
(n∑
i=1
ki
)lnλ− ln
(n∏
i=1
ki !
).
ddλ
ln L(λ) = −n +1λ
n∑i=1
ki .
ddλ
ln L(λ) = 0 =⇒ λe =1n
n∑i=1
ki =: k .
Comment: The critical point is indeed global maximum because
d2
dλ2 ln L(λ) = − 1λ2
n∑i=1
ki < 0.
12
![Page 27: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/27.jpg)
Examples for MLEOften but not always MLE can be obtained by setting the first derivative equalto zero:
E.g. 1. Poisson distribution: pX (k) = e−λ λk
k!, k = 0, 1, · · · .
L(λ) =n∏
i=1
e−λλki
ki != e−nλλ
∑ki=1 ki
(n∏
i=1
ki !
)−1
.
ln L(λ) = −nλ+
(n∑
i=1
ki
)lnλ− ln
(n∏
i=1
ki !
).
ddλ
ln L(λ) = −n +1λ
n∑i=1
ki .
ddλ
ln L(λ) = 0 =⇒ λe =1n
n∑i=1
ki =: k .
Comment: The critical point is indeed global maximum because
d2
dλ2 ln L(λ) = − 1λ2
n∑i=1
ki < 0.
12
![Page 28: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/28.jpg)
The following two cases are related to waiting time:
E.g. 2. Exponential distribution: fY (y) = λe−λy for y ≥ 0.
L(λ) =n∏
i=1
λe−λyi = λn exp
(−λ
n∑i=1
yi
)
ln L(λ) = n lnλ− λn∑
i=1
yi .
ddλ
ln L(λ) =nλ−
n∑i=1
yi .
ddλ
ln L(λ) = 0 =⇒ λe =n∑n
i=1 yi=:
1y.
13
![Page 29: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/29.jpg)
The following two cases are related to waiting time:
E.g. 2. Exponential distribution: fY (y) = λe−λy for y ≥ 0.
L(λ) =n∏
i=1
λe−λyi = λn exp
(−λ
n∑i=1
yi
)
ln L(λ) = n lnλ− λn∑
i=1
yi .
ddλ
ln L(λ) =nλ−
n∑i=1
yi .
ddλ
ln L(λ) = 0 =⇒ λe =n∑n
i=1 yi=:
1y.
13
![Page 30: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/30.jpg)
The following two cases are related to waiting time:
E.g. 2. Exponential distribution: fY (y) = λe−λy for y ≥ 0.
L(λ) =n∏
i=1
λe−λyi = λn exp
(−λ
n∑i=1
yi
)
ln L(λ) = n lnλ− λn∑
i=1
yi .
ddλ
ln L(λ) =nλ−
n∑i=1
yi .
ddλ
ln L(λ) = 0 =⇒ λe =n∑n
i=1 yi=:
1y.
13
![Page 31: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/31.jpg)
The following two cases are related to waiting time:
E.g. 2. Exponential distribution: fY (y) = λe−λy for y ≥ 0.
L(λ) =n∏
i=1
λe−λyi = λn exp
(−λ
n∑i=1
yi
)
ln L(λ) = n lnλ− λn∑
i=1
yi .
ddλ
ln L(λ) =nλ−
n∑i=1
yi .
ddλ
ln L(λ) = 0 =⇒ λe =n∑n
i=1 yi=:
1y.
13
![Page 32: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/32.jpg)
The following two cases are related to waiting time:
E.g. 2. Exponential distribution: fY (y) = λe−λy for y ≥ 0.
L(λ) =n∏
i=1
λe−λyi = λn exp
(−λ
n∑i=1
yi
)
ln L(λ) = n lnλ− λn∑
i=1
yi .
ddλ
ln L(λ) =nλ−
n∑i=1
yi .
ddλ
ln L(λ) = 0 =⇒ λe =n∑n
i=1 yi=:
1y.
13
![Page 33: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/33.jpg)
A random sample of size n from the following population:
E.g. 3. Gamma distribution: fY (y ;λ) = λr
Γ(r)y r−1e−λy for y ≥ 0 with r > 1 known.
L(λ) =n∏
i=1
λr
Γ(r)y r−1
i e−λyi = λr nΓ(r)−n
(n∏
i=1
y r−1i
)exp
(−λ
n∑i=1
yi
)
ln L(λ) = r n lnλ− n ln Γ(r) + ln
(n∏
i=1
y r−1i
)− λ
n∑i=1
yi .
ddλ
ln L(λ) =r nλ−
n∑i=1
yi .
ddλ
ln L(λ) = 0 =⇒ λe =r n∑ni=1 yi
=ry.
Comment:I When r = 1, this reduces to the exponential distribution case.I If r is also unknown, it will be much more complicated.
No closed-form solution. One needs numerical solver2.Try MME instead.
2[DW, Example 7.2.25]14
![Page 34: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/34.jpg)
A random sample of size n from the following population:
E.g. 3. Gamma distribution: fY (y ;λ) = λr
Γ(r)y r−1e−λy for y ≥ 0 with r > 1 known.
L(λ) =n∏
i=1
λr
Γ(r)y r−1
i e−λyi = λr nΓ(r)−n
(n∏
i=1
y r−1i
)exp
(−λ
n∑i=1
yi
)
ln L(λ) = r n lnλ− n ln Γ(r) + ln
(n∏
i=1
y r−1i
)− λ
n∑i=1
yi .
ddλ
ln L(λ) =r nλ−
n∑i=1
yi .
ddλ
ln L(λ) = 0 =⇒ λe =r n∑ni=1 yi
=ry.
Comment:I When r = 1, this reduces to the exponential distribution case.I If r is also unknown, it will be much more complicated.
No closed-form solution. One needs numerical solver2.Try MME instead.
2[DW, Example 7.2.25]14
![Page 35: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/35.jpg)
A random sample of size n from the following population:
E.g. 3. Gamma distribution: fY (y ;λ) = λr
Γ(r)y r−1e−λy for y ≥ 0 with r > 1 known.
L(λ) =n∏
i=1
λr
Γ(r)y r−1
i e−λyi = λr nΓ(r)−n
(n∏
i=1
y r−1i
)exp
(−λ
n∑i=1
yi
)
ln L(λ) = r n lnλ− n ln Γ(r) + ln
(n∏
i=1
y r−1i
)− λ
n∑i=1
yi .
ddλ
ln L(λ) =r nλ−
n∑i=1
yi .
ddλ
ln L(λ) = 0 =⇒ λe =r n∑ni=1 yi
=ry.
Comment:I When r = 1, this reduces to the exponential distribution case.I If r is also unknown, it will be much more complicated.
No closed-form solution. One needs numerical solver2.Try MME instead.
2[DW, Example 7.2.25]14
![Page 36: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/36.jpg)
A random sample of size n from the following population:
E.g. 3. Gamma distribution: fY (y ;λ) = λr
Γ(r)y r−1e−λy for y ≥ 0 with r > 1 known.
L(λ) =n∏
i=1
λr
Γ(r)y r−1
i e−λyi = λr nΓ(r)−n
(n∏
i=1
y r−1i
)exp
(−λ
n∑i=1
yi
)
ln L(λ) = r n lnλ− n ln Γ(r) + ln
(n∏
i=1
y r−1i
)− λ
n∑i=1
yi .
ddλ
ln L(λ) =r nλ−
n∑i=1
yi .
ddλ
ln L(λ) = 0 =⇒ λe =r n∑ni=1 yi
=ry.
Comment:I When r = 1, this reduces to the exponential distribution case.I If r is also unknown, it will be much more complicated.
No closed-form solution. One needs numerical solver2.Try MME instead.
2[DW, Example 7.2.25]14
![Page 37: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/37.jpg)
A random sample of size n from the following population:
E.g. 3. Gamma distribution: fY (y ;λ) = λr
Γ(r)y r−1e−λy for y ≥ 0 with r > 1 known.
L(λ) =n∏
i=1
λr
Γ(r)y r−1
i e−λyi = λr nΓ(r)−n
(n∏
i=1
y r−1i
)exp
(−λ
n∑i=1
yi
)
ln L(λ) = r n lnλ− n ln Γ(r) + ln
(n∏
i=1
y r−1i
)− λ
n∑i=1
yi .
ddλ
ln L(λ) =r nλ−
n∑i=1
yi .
ddλ
ln L(λ) = 0 =⇒ λe =r n∑ni=1 yi
=ry.
Comment:I When r = 1, this reduces to the exponential distribution case.I If r is also unknown, it will be much more complicated.
No closed-form solution. One needs numerical solver2.Try MME instead.
2[DW, Example 7.2.25]14
![Page 38: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/38.jpg)
A random sample of size n from the following population:
E.g. 3. Gamma distribution: fY (y ;λ) = λr
Γ(r)y r−1e−λy for y ≥ 0 with r > 1 known.
L(λ) =n∏
i=1
λr
Γ(r)y r−1
i e−λyi = λr nΓ(r)−n
(n∏
i=1
y r−1i
)exp
(−λ
n∑i=1
yi
)
ln L(λ) = r n lnλ− n ln Γ(r) + ln
(n∏
i=1
y r−1i
)− λ
n∑i=1
yi .
ddλ
ln L(λ) =r nλ−
n∑i=1
yi .
ddλ
ln L(λ) = 0 =⇒ λe =r n∑ni=1 yi
=ry.
Comment:I When r = 1, this reduces to the exponential distribution case.I If r is also unknown, it will be much more complicated.
No closed-form solution. One needs numerical solver2.Try MME instead.
2[DW, Example 7.2.25]14
![Page 39: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/39.jpg)
A detailed study with data:
E.g. 4. Geometric distribution: pX (k ; p) = (1− p)k−1p, k = 1, 2, · · · .
L(p) =n∏
i=1
(1− p)ki−1p = (1− p)−n+∑k
i=1 ki pn.
ln L(p) =
(−n +
n∑i=1
ki
)ln(1− p) + n ln p.
ddp
ln L(p) = −−n +
∑ni=1 ki
1− p+
np.
ddp
ln L(p) = 0 =⇒ pe =n∑n
i=1 ki=
1k.
Comment: Its cousin distribution, the negative binomial distribution canbe worked out similarly (See Ex 5.2.14).
15
![Page 40: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/40.jpg)
A detailed study with data:
E.g. 4. Geometric distribution: pX (k ; p) = (1− p)k−1p, k = 1, 2, · · · .
L(p) =n∏
i=1
(1− p)ki−1p = (1− p)−n+∑k
i=1 ki pn.
ln L(p) =
(−n +
n∑i=1
ki
)ln(1− p) + n ln p.
ddp
ln L(p) = −−n +
∑ni=1 ki
1− p+
np.
ddp
ln L(p) = 0 =⇒ pe =n∑n
i=1 ki=
1k.
Comment: Its cousin distribution, the negative binomial distribution canbe worked out similarly (See Ex 5.2.14).
15
![Page 41: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/41.jpg)
A detailed study with data:
E.g. 4. Geometric distribution: pX (k ; p) = (1− p)k−1p, k = 1, 2, · · · .
L(p) =n∏
i=1
(1− p)ki−1p = (1− p)−n+∑k
i=1 ki pn.
ln L(p) =
(−n +
n∑i=1
ki
)ln(1− p) + n ln p.
ddp
ln L(p) = −−n +
∑ni=1 ki
1− p+
np.
ddp
ln L(p) = 0 =⇒ pe =n∑n
i=1 ki=
1k.
Comment: Its cousin distribution, the negative binomial distribution canbe worked out similarly (See Ex 5.2.14).
15
![Page 42: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/42.jpg)
A detailed study with data:
E.g. 4. Geometric distribution: pX (k ; p) = (1− p)k−1p, k = 1, 2, · · · .
L(p) =n∏
i=1
(1− p)ki−1p = (1− p)−n+∑k
i=1 ki pn.
ln L(p) =
(−n +
n∑i=1
ki
)ln(1− p) + n ln p.
ddp
ln L(p) = −−n +
∑ni=1 ki
1− p+
np.
ddp
ln L(p) = 0 =⇒ pe =n∑n
i=1 ki=
1k.
Comment: Its cousin distribution, the negative binomial distribution canbe worked out similarly (See Ex 5.2.14).
15
![Page 43: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/43.jpg)
A detailed study with data:
E.g. 4. Geometric distribution: pX (k ; p) = (1− p)k−1p, k = 1, 2, · · · .
L(p) =n∏
i=1
(1− p)ki−1p = (1− p)−n+∑k
i=1 ki pn.
ln L(p) =
(−n +
n∑i=1
ki
)ln(1− p) + n ln p.
ddp
ln L(p) = −−n +
∑ni=1 ki
1− p+
np.
ddp
ln L(p) = 0 =⇒ pe =n∑n
i=1 ki=
1k.
Comment: Its cousin distribution, the negative binomial distribution canbe worked out similarly (See Ex 5.2.14).
15
![Page 44: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/44.jpg)
A detailed study with data:
E.g. 4. Geometric distribution: pX (k ; p) = (1− p)k−1p, k = 1, 2, · · · .
L(p) =n∏
i=1
(1− p)ki−1p = (1− p)−n+∑k
i=1 ki pn.
ln L(p) =
(−n +
n∑i=1
ki
)ln(1− p) + n ln p.
ddp
ln L(p) = −−n +
∑ni=1 ki
1− p+
np.
ddp
ln L(p) = 0 =⇒ pe =n∑n
i=1 ki=
1k.
Comment: Its cousin distribution, the negative binomial distribution canbe worked out similarly (See Ex 5.2.14).
15
![Page 45: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/45.jpg)
k
1
Observed frequency
2
Predicted frequency
3
4
5
6
72
35
11
6
2
2
74.14
31.2
13.13
5.52
2.32
0.98
MLE for Geom. Distr. of sample size n = 128
Real p = unknown and MLE for p = 0.5792
16
![Page 46: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/46.jpg)
1 # The example from the book.2 library (pracma) # Load the library " Practical Numerical Math Functions"3 k<−c(72, 35, 11, 6, 2, 2) # observed freq.4 a=1:65 pe=sum(k)/dot(k,a) # MLE for p.6 f=a7 for ( i in 1:6) {8 f [ i ] = round((1−pe)^(i−1) ∗ pe ∗ sum(k),2)9 }
10 # Initialize the table11 d <−matrix(1:18, nrow = 6, ncol = 3)12 # Now adding the column names13 colnames(d) <− c("k",14 "Observed freq.",15 "Predicted freq . ")16 d [1:6,1]<−a17 d [1:6,2]<−k18 d [1:6,3]<−f19 grid . table(d) # Show the table20 PlotResults("unknown", pe, d, "Geometric.pdf") # Output the results using a user defined function
17
![Page 47: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/47.jpg)
k
1
Observed frequency
2
Predicted frequency
3
4
5
6
7
8
9
10
11
13
14
42
31
15
11
9
5
7
2
1
2
1
1
1
40.96
27.85
18.94
12.88
8.76
5.96
4.05
2.75
1.87
1.27
0.87
0.59
0.4
MLE for Geom. Distr. of sample size n = 128
Real p = 0.3333 and MLE for p = 0.32
18
![Page 48: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/48.jpg)
1 # Now let’s generate random samples from a Geometric distribution with p=1/3 with the same sizeof the sample.
2 p = 1/33 n = 1284 gdata<−rgeom(n, p)+1 # Generate random samples5 g<− table(gdata) # Count frequency of your data.6 g<− t(rbind(as.numeric(rownames(g)), g)) # Transpose and combine two columns.7 pe=n/dot(g [,1], g [,2]) # MLE for p.8 f <− g[,1] # Initialize f9 for ( i in 1:nrow(g)) {
10 f [ i ] = round((1−pe)^(i−1) ∗ pe ∗ n,2)11 } # Compute the expected frequency12 g<−cbind(g,f) # Add one columns to your matrix.13 colnames(g) <− c("k",14 "Observed freq.",15 "Predicted freq . ") # Specify the column names.16 d_df <− as.data.frame(d) # One can use data frame to store data17 d_df # Show data on your terminal18 PlotResults(p, pe, g, "Geometric2.pdf") # Output the results using a user defined function
19
![Page 49: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/49.jpg)
k
1
Observed frequency
2
Predicted frequency
3
4
5
6
7
8
9
10
99
69
47
28
27
9
8
5
5
3
105.88
68.51
44.33
28.69
18.56
12.01
7.77
5.03
3.25
2.11
MLE for Geom. Distr. of sample size n = 300
Real p = 0.3333 and MLE for p = 0.3529
20
![Page 50: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/50.jpg)
In case we have several parameters:
E.g. 5. Normal distribution: fY (y ;µ, σ2) = 1√2πσ
e−(y−µ)2
2σ2 , y ∈ R.
L(µ, σ2) =n∏
i=1
1√2πσ
e−(yi−µ)2
2σ2 = (2πσ2)−n/2 exp
(− 1
2σ2
n∑i=1
(yi − µ)2
)
ln L(µ, σ2) = −n2
ln(2πσ2)− 12σ2
n∑i=1
(yi − µ)2.
∂
∂µln L(µ, σ2) =
1σ2
n∑i=1
(yi − µ)
∂
∂σ2 ln L(µ, σ2) = − n2σ2 +
12σ4
n∑i=1
(yi − µ)2
∂
∂µln L(µ, σ2) = 0
∂
∂σ2 ln L(µ, σ2) = 0=⇒
µe = y
σ2e =
1n
n∑i=1
(yi − y)2
21
![Page 51: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/51.jpg)
In case we have several parameters:
E.g. 5. Normal distribution: fY (y ;µ, σ2) = 1√2πσ
e−(y−µ)2
2σ2 , y ∈ R.
L(µ, σ2) =n∏
i=1
1√2πσ
e−(yi−µ)2
2σ2 = (2πσ2)−n/2 exp
(− 1
2σ2
n∑i=1
(yi − µ)2
)
ln L(µ, σ2) = −n2
ln(2πσ2)− 12σ2
n∑i=1
(yi − µ)2.
∂
∂µln L(µ, σ2) =
1σ2
n∑i=1
(yi − µ)
∂
∂σ2 ln L(µ, σ2) = − n2σ2 +
12σ4
n∑i=1
(yi − µ)2
∂
∂µln L(µ, σ2) = 0
∂
∂σ2 ln L(µ, σ2) = 0=⇒
µe = y
σ2e =
1n
n∑i=1
(yi − y)2
21
![Page 52: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/52.jpg)
In case we have several parameters:
E.g. 5. Normal distribution: fY (y ;µ, σ2) = 1√2πσ
e−(y−µ)2
2σ2 , y ∈ R.
L(µ, σ2) =n∏
i=1
1√2πσ
e−(yi−µ)2
2σ2 = (2πσ2)−n/2 exp
(− 1
2σ2
n∑i=1
(yi − µ)2
)
ln L(µ, σ2) = −n2
ln(2πσ2)− 12σ2
n∑i=1
(yi − µ)2.
∂
∂µln L(µ, σ2) =
1σ2
n∑i=1
(yi − µ)
∂
∂σ2 ln L(µ, σ2) = − n2σ2 +
12σ4
n∑i=1
(yi − µ)2
∂
∂µln L(µ, σ2) = 0
∂
∂σ2 ln L(µ, σ2) = 0=⇒
µe = y
σ2e =
1n
n∑i=1
(yi − y)2
21
![Page 53: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/53.jpg)
In case we have several parameters:
E.g. 5. Normal distribution: fY (y ;µ, σ2) = 1√2πσ
e−(y−µ)2
2σ2 , y ∈ R.
L(µ, σ2) =n∏
i=1
1√2πσ
e−(yi−µ)2
2σ2 = (2πσ2)−n/2 exp
(− 1
2σ2
n∑i=1
(yi − µ)2
)
ln L(µ, σ2) = −n2
ln(2πσ2)− 12σ2
n∑i=1
(yi − µ)2.
∂
∂µln L(µ, σ2) =
1σ2
n∑i=1
(yi − µ)
∂
∂σ2 ln L(µ, σ2) = − n2σ2 +
12σ4
n∑i=1
(yi − µ)2
∂
∂µln L(µ, σ2) = 0
∂
∂σ2 ln L(µ, σ2) = 0=⇒
µe = y
σ2e =
1n
n∑i=1
(yi − y)2
21
![Page 54: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/54.jpg)
In case we have several parameters:
E.g. 5. Normal distribution: fY (y ;µ, σ2) = 1√2πσ
e−(y−µ)2
2σ2 , y ∈ R.
L(µ, σ2) =n∏
i=1
1√2πσ
e−(yi−µ)2
2σ2 = (2πσ2)−n/2 exp
(− 1
2σ2
n∑i=1
(yi − µ)2
)
ln L(µ, σ2) = −n2
ln(2πσ2)− 12σ2
n∑i=1
(yi − µ)2.
∂
∂µln L(µ, σ2) =
1σ2
n∑i=1
(yi − µ)
∂
∂σ2 ln L(µ, σ2) = − n2σ2 +
12σ4
n∑i=1
(yi − µ)2
∂
∂µln L(µ, σ2) = 0
∂
∂σ2 ln L(µ, σ2) = 0=⇒
µe = y
σ2e =
1n
n∑i=1
(yi − y)2
21
![Page 55: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/55.jpg)
In case when the parameters determine the support of the density:(Non regular case)
E.g. 6. Uniform distribution on [a, b] with a < b: fY (y ; a, b) = 1b−a if y ∈ [a, b].
L(a, b) =
{∏ni=1
1b−a = 1
(b−a)n if a ≤ y1, · · · , yn ≤ b,0 otherwise.
L(a, b) is monotone increasing in a and decreasing in b. Hence, in orderto maximize L(a, b), one needs to choose
ae = ymin and be = ymax .
E.g. 7. fY (y ; θ) = 2yθ2 for y ∈ [0, θ].
L(θ) =
{∏ni=1
2yiθ2 = 2nθ−2n∏n
i=1 yi if 0 ≤ y1, · · · , yn ≤ θ,0 otherwise.
⇓
θe = ymax .
22
![Page 56: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/56.jpg)
In case when the parameters determine the support of the density:(Non regular case)
E.g. 6. Uniform distribution on [a, b] with a < b: fY (y ; a, b) = 1b−a if y ∈ [a, b].
L(a, b) =
{∏ni=1
1b−a = 1
(b−a)n if a ≤ y1, · · · , yn ≤ b,0 otherwise.
L(a, b) is monotone increasing in a and decreasing in b. Hence, in orderto maximize L(a, b), one needs to choose
ae = ymin and be = ymax .
E.g. 7. fY (y ; θ) = 2yθ2 for y ∈ [0, θ].
L(θ) =
{∏ni=1
2yiθ2 = 2nθ−2n∏n
i=1 yi if 0 ≤ y1, · · · , yn ≤ θ,0 otherwise.
⇓
θe = ymax .
22
![Page 57: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/57.jpg)
In case when the parameters determine the support of the density:(Non regular case)
E.g. 6. Uniform distribution on [a, b] with a < b: fY (y ; a, b) = 1b−a if y ∈ [a, b].
L(a, b) =
{∏ni=1
1b−a = 1
(b−a)n if a ≤ y1, · · · , yn ≤ b,0 otherwise.
L(a, b) is monotone increasing in a and decreasing in b. Hence, in orderto maximize L(a, b), one needs to choose
ae = ymin and be = ymax .
E.g. 7. fY (y ; θ) = 2yθ2 for y ∈ [0, θ].
L(θ) =
{∏ni=1
2yiθ2 = 2nθ−2n∏n
i=1 yi if 0 ≤ y1, · · · , yn ≤ θ,0 otherwise.
⇓
θe = ymax .
22
![Page 58: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/58.jpg)
In case when the parameters determine the support of the density:(Non regular case)
E.g. 6. Uniform distribution on [a, b] with a < b: fY (y ; a, b) = 1b−a if y ∈ [a, b].
L(a, b) =
{∏ni=1
1b−a = 1
(b−a)n if a ≤ y1, · · · , yn ≤ b,0 otherwise.
L(a, b) is monotone increasing in a and decreasing in b. Hence, in orderto maximize L(a, b), one needs to choose
ae = ymin and be = ymax .
E.g. 7. fY (y ; θ) = 2yθ2 for y ∈ [0, θ].
L(θ) =
{∏ni=1
2yiθ2 = 2nθ−2n∏n
i=1 yi if 0 ≤ y1, · · · , yn ≤ θ,0 otherwise.
⇓
θe = ymax .
22
![Page 59: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/59.jpg)
In case when the parameters determine the support of the density:(Non regular case)
E.g. 6. Uniform distribution on [a, b] with a < b: fY (y ; a, b) = 1b−a if y ∈ [a, b].
L(a, b) =
{∏ni=1
1b−a = 1
(b−a)n if a ≤ y1, · · · , yn ≤ b,0 otherwise.
L(a, b) is monotone increasing in a and decreasing in b. Hence, in orderto maximize L(a, b), one needs to choose
ae = ymin and be = ymax .
E.g. 7. fY (y ; θ) = 2yθ2 for y ∈ [0, θ].
L(θ) =
{∏ni=1
2yiθ2 = 2nθ−2n∏n
i=1 yi if 0 ≤ y1, · · · , yn ≤ θ,0 otherwise.
⇓
θe = ymax .
22
![Page 60: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/60.jpg)
In case when the parameters determine the support of the density:(Non regular case)
E.g. 6. Uniform distribution on [a, b] with a < b: fY (y ; a, b) = 1b−a if y ∈ [a, b].
L(a, b) =
{∏ni=1
1b−a = 1
(b−a)n if a ≤ y1, · · · , yn ≤ b,0 otherwise.
L(a, b) is monotone increasing in a and decreasing in b. Hence, in orderto maximize L(a, b), one needs to choose
ae = ymin and be = ymax .
E.g. 7. fY (y ; θ) = 2yθ2 for y ∈ [0, θ].
L(θ) =
{∏ni=1
2yiθ2 = 2nθ−2n∏n
i=1 yi if 0 ≤ y1, · · · , yn ≤ θ,0 otherwise.
⇓
θe = ymax .
22
![Page 61: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/61.jpg)
In case of discrete parameter:
E.g. 8. Wildlife sampling. Capture-tag-recapture.... In the history, a tags havebeen put. In order to estimate the population size N, one randomlycaptures n animals, and there are k tagged. Find the MLE for N.
Sol. The population follows hypergeometric distr.: pX (k ; N) =(a
k)(N−an−k )
(Nn)
.
L(N) =
(ak
)(N−an−k
)(Nn
)How to maximize L(N)?
1 > a=102 > k=53 > n=204 > N=seq(a,a+100)5 > p=choose(a,k)∗choose(N−a,n−k)/
choose(N,n)6 > plot (N,p,type = "p")7 > print (paste("The MLE is", n∗a/k))8 [1] "The MLE is 40"
23
![Page 62: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/62.jpg)
In case of discrete parameter:
E.g. 8. Wildlife sampling. Capture-tag-recapture.... In the history, a tags havebeen put. In order to estimate the population size N, one randomlycaptures n animals, and there are k tagged. Find the MLE for N.
Sol. The population follows hypergeometric distr.: pX (k ; N) =(a
k)(N−an−k )
(Nn)
.
L(N) =
(ak
)(N−an−k
)(Nn
)How to maximize L(N)?
1 > a=102 > k=53 > n=204 > N=seq(a,a+100)5 > p=choose(a,k)∗choose(N−a,n−k)/
choose(N,n)6 > plot (N,p,type = "p")7 > print (paste("The MLE is", n∗a/k))8 [1] "The MLE is 40"
23
![Page 63: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/63.jpg)
In case of discrete parameter:
E.g. 8. Wildlife sampling. Capture-tag-recapture.... In the history, a tags havebeen put. In order to estimate the population size N, one randomlycaptures n animals, and there are k tagged. Find the MLE for N.
Sol. The population follows hypergeometric distr.: pX (k ; N) =(a
k)(N−an−k )
(Nn)
.
L(N) =
(ak
)(N−an−k
)(Nn
)How to maximize L(N)?
1 > a=102 > k=53 > n=204 > N=seq(a,a+100)5 > p=choose(a,k)∗choose(N−a,n−k)/
choose(N,n)6 > plot (N,p,type = "p")7 > print (paste("The MLE is", n∗a/k))8 [1] "The MLE is 40"
23
![Page 64: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/64.jpg)
In case of discrete parameter:
E.g. 8. Wildlife sampling. Capture-tag-recapture.... In the history, a tags havebeen put. In order to estimate the population size N, one randomlycaptures n animals, and there are k tagged. Find the MLE for N.
Sol. The population follows hypergeometric distr.: pX (k ; N) =(a
k)(N−an−k )
(Nn)
.
L(N) =
(ak
)(N−an−k
)(Nn
)How to maximize L(N)?
1 > a=102 > k=53 > n=204 > N=seq(a,a+100)5 > p=choose(a,k)∗choose(N−a,n−k)/
choose(N,n)6 > plot (N,p,type = "p")7 > print (paste("The MLE is", n∗a/k))8 [1] "The MLE is 40"
23
![Page 65: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/65.jpg)
In case of discrete parameter:
E.g. 8. Wildlife sampling. Capture-tag-recapture.... In the history, a tags havebeen put. In order to estimate the population size N, one randomlycaptures n animals, and there are k tagged. Find the MLE for N.
Sol. The population follows hypergeometric distr.: pX (k ; N) =(a
k)(N−an−k )
(Nn)
.
L(N) =
(ak
)(N−an−k
)(Nn
)How to maximize L(N)?
1 > a=102 > k=53 > n=204 > N=seq(a,a+100)5 > p=choose(a,k)∗choose(N−a,n−k)/
choose(N,n)6 > plot (N,p,type = "p")7 > print (paste("The MLE is", n∗a/k))8 [1] "The MLE is 40"
23
![Page 66: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/66.jpg)
The graph suggests to sudty the following quantity:
r(N) :=L(N)
L(N − 1)=
N − nN
× N − aN − a− n + k
r(N) < 1 ⇐⇒ na < Nk i.e., N >nak
Ne = arg max{
L(N) : N =⌊na
k
⌋,⌈na
k
⌉}.
�
24
![Page 67: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/67.jpg)
The graph suggests to sudty the following quantity:
r(N) :=L(N)
L(N − 1)=
N − nN
× N − aN − a− n + k
r(N) < 1 ⇐⇒ na < Nk i.e., N >nak
Ne = arg max{
L(N) : N =⌊na
k
⌋,⌈na
k
⌉}.
�
24
![Page 68: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/68.jpg)
The graph suggests to sudty the following quantity:
r(N) :=L(N)
L(N − 1)=
N − nN
× N − aN − a− n + k
r(N) < 1 ⇐⇒ na < Nk i.e., N >nak
Ne = arg max{
L(N) : N =⌊na
k
⌋,⌈na
k
⌉}.
�
24
![Page 69: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/69.jpg)
The graph suggests to sudty the following quantity:
r(N) :=L(N)
L(N − 1)=
N − nN
× N − aN − a− n + k
r(N) < 1 ⇐⇒ na < Nk i.e., N >nak
Ne = arg max{
L(N) : N =⌊na
k
⌋,⌈na
k
⌉}.
�
24
![Page 70: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/70.jpg)
Method of Moments Estimation
Rationale: The population moments should be close to the samplemoments, i.e.,
E(Y k ) ≈ 1n
n∑i=1
y ki , k = 1, 2, 3, · · · .
Definition 5.2.3. For a random sample of size n from the discrete (resp.continuous) population/pdf pX (k ; θ1, · · · , θs) (resp. fY (y ; θ1, · · · , θs)),solutions to
E(Y ) = 1n
∑ni=1 yi
...E(Y s) = 1
n
∑ni=1 y s
i
which are denoted by θ1e, · · · , θse, are called the method of momentsestimates of θ1, · · · , θs.
25
![Page 71: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/71.jpg)
Examples for MME
MME is often the same as MLE:
E.g. 1. Normal distribution: fY (y ;µ, σ2) = 1√2πσ
e−(y−µ)2
2σ2 , y ∈ R.
µ = E(Y ) =
1n
n∑i=1
yi = y
σ2 + µ2 = E(Y 2) =1n
n∑i=1
y2i
⇒
µe = y
σ2e =
1n
n∑i=1
y2i − µ2
e
= 1n
∑ni=1 (yi − y)2
More examples when MLE coincides with MME: Poisson, Exponential,Geometric.
26
![Page 72: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/72.jpg)
MME is often much more tractable than MLE:
E.g. 2. Gamma distribution3: fY (y ; r , λ) = λr
Γ(r)y r−1e−λy for y ≥ 0.
rλ
= E(Y ) =1n
n∑i=1
yi = y
rλ2 +
r 2
λ2 = E(Y 2) =1n
n∑i=1
y2i
⇒
re =
y2
σ2
λe =yσ2 =
re
y
where y is the sample mean and σ2 is the sample variance:σ2 := 1
n
∑ni=1(yi − y)2.
Comments: MME for λ is consistent with MLE when r is known.
3Check Theorem 4.6.3 on p. 269 for mean and variance27
![Page 73: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/73.jpg)
Another tractable example for MME, while less tractable for MLE:
E.g. 3. Neg. binomial distribution: pX (k ; p, r) =(k+r−1
k
)(1− p)k pr , k = 0, 1, · · · .
r(1− p)
p= E(X ) = k
r(1− p)
p2 = Var(X ) = σ2⇒
pe =
kσ2
re =k2
σ2 − k
28
![Page 74: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/74.jpg)
re = 12.74 and pe = 0.391.
29
![Page 75: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/75.jpg)
E.g. 4. fY (y ; θ) = 2yθ2 for y ∈ [0, θ].
y = E[Y ] =
∫ θ
0
2y2
θ2 dy =23
y3
θ2
∣∣∣∣y=θ
y=0=
23θ.
⇓
θe =32
y .
30
![Page 76: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/76.jpg)
Plan
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
31
![Page 77: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/77.jpg)
Chapter 5. Estimation
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
32
![Page 78: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/78.jpg)
§ 5.3 Interval Estimation
Rationale. Point estimate doesn’t provide precision information.
By using the variance of the estimator, one can construct an interval suchthat with a high probability that interval will contain the unknown parameter.
I The interval is called confidence interval.
I The high probability is confidence level.
33
![Page 79: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/79.jpg)
§ 5.3 Interval Estimation
Rationale. Point estimate doesn’t provide precision information.
By using the variance of the estimator, one can construct an interval suchthat with a high probability that interval will contain the unknown parameter.
I The interval is called confidence interval.
I The high probability is confidence level.
33
![Page 80: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/80.jpg)
E.g. 1. A random sample of size 4, (Y1 = 6.5, Y2 = 9.2, Y3 = 9.9, Y4 = 12.4),from a normal population:
fY (y ;µ) =1√
2π 0.8e−
12 ( y−µ
0.8 )2
.
Both MLE and MME give µe = y = 14 (6.5 + 9.2 + 9.9 + 12.4) = 9.5. The
estimator µ = Y follows normal distribution.
Construct 95%-confidence interval for µ ...
34
![Page 81: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/81.jpg)
"The parameter is an unknown constant and no probability state-ment concerning its value may be made."
–Jerzy Neyman, original developer of confidence intervals.
35
![Page 82: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/82.jpg)
In general, for a normal population with σ known, the 100(1− α)%confidence interval for µ is(
y − zα/2σ√n, y + zα/2
σ√n
)
Comment: There are many variations
1. One-sided interval such as(y − zα
σ√n, y)
or(
y , y + zασ√n
)2. σ is unknown and sample size is small: z-score→ t-score
3. σ is unknown and sample size is large: z-score by CLT
4. Non-Gaussian population but sample size is large: z-score by CLT
36
![Page 83: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/83.jpg)
In general, for a normal population with σ known, the 100(1− α)%confidence interval for µ is(
y − zα/2σ√n, y + zα/2
σ√n
)
Comment: There are many variations
1. One-sided interval such as(y − zα
σ√n, y)
or(
y , y + zασ√n
)2. σ is unknown and sample size is small: z-score→ t-score
3. σ is unknown and sample size is large: z-score by CLT
4. Non-Gaussian population but sample size is large: z-score by CLT
36
![Page 84: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/84.jpg)
In general, for a normal population with σ known, the 100(1− α)%confidence interval for µ is(
y − zα/2σ√n, y + zα/2
σ√n
)
Comment: There are many variations
1. One-sided interval such as(y − zα
σ√n, y)
or(
y , y + zασ√n
)2. σ is unknown and sample size is small: z-score→ t-score
3. σ is unknown and sample size is large: z-score by CLT
4. Non-Gaussian population but sample size is large: z-score by CLT
36
![Page 85: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/85.jpg)
In general, for a normal population with σ known, the 100(1− α)%confidence interval for µ is(
y − zα/2σ√n, y + zα/2
σ√n
)
Comment: There are many variations
1. One-sided interval such as(y − zα
σ√n, y)
or(
y , y + zασ√n
)2. σ is unknown and sample size is small: z-score→ t-score
3. σ is unknown and sample size is large: z-score by CLT
4. Non-Gaussian population but sample size is large: z-score by CLT
36
![Page 86: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/86.jpg)
Theorem. Let k be the number of successes in n independent trials, wheren is large and p = P(success) is unknown. An approximate 100(1− α)%confidence interval for p is the set of numbers(
kn− zα/2
√(k/n)(1− k/n)
n,
kn
+ zα/2
√(k/n)(1− k/n)
n
).
Proof.... �
37
![Page 87: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/87.jpg)
E.g. 1. Use median test to check the randomness of a random generator.
Suppose y1, · · · , yn denote measurements presumed to have comefrom a continuous pdf fY (y). Let k denote the number of yi ’s thatare less than the median of fY (y). If the sample is random, we wouldexpect the difference between k
n and 12 to be small. More specifically,
a 95% confidence interval based on k should contain the value 0.5.
Let fY (y) = e−y . The median is m = 0.69315.
38
![Page 88: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/88.jpg)
E.g. 1. Use median test to check the randomness of a random generator.
Suppose y1, · · · , yn denote measurements presumed to have comefrom a continuous pdf fY (y). Let k denote the number of yi ’s thatare less than the median of fY (y). If the sample is random, we wouldexpect the difference between k
n and 12 to be small. More specifically,
a 95% confidence interval based on k should contain the value 0.5.
Let fY (y) = e−y . The median is m = 0.69315.
38
![Page 89: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/89.jpg)
1 #! /usr/bin/Rscript2 main <− function() {3 args <− commandArgs(trailingOnly = TRUE)4 n <− 100 # Number of random samples.5 r <− as.numeric(args[1]) # Rate of the exponential6 # Check if the rate argument is given.7 if ( is .na(r) ) return("Please provide the rate and try again.")8
9 # Now start computing ...10 f <− function (y) pexp(y, rate = r )−0.511 m <− uniroot(f, lower = 0, upper = 100, tol = 1e−9)$root12 print (paste("For rate " , r , "exponential distribution , " ,13 "the median is equal to " , round(m,3)))14 data <− rexp(n,r) # Generate n random samples15 data <− round(data,3) # Round to 3 digits after decimal16 data <− matrix(data, nrow = 10,ncol = 10) # Turn the data to a matrix17 prmatrix(data) # Show data on terminal18 k <− sum(data > m) # Count how many entries is bigger than m19 lowerbd = k/n − 1.96 ∗ sqrt((k/n)∗(1−k/n)/n);20 upperbd = k/n + 1.96 ∗sqrt((k/n)∗(1−k/n)/n);21 print (paste("The 95% confidence interval is (" ,22 round(lowerbd,3), " , " ,23 round(upperbd,3), ")" ) )24 }25 main()
Try commandline ...39
![Page 90: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/90.jpg)
40
![Page 91: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/91.jpg)
Instead of the C.I.(
kn − zα/2
√(k/n)(1−k/n)
n , kn + zα/2
√(k/n)(1−k/n)
n
).
One can simply specify the meankn
and
the margin of error: d := zα/2
√(k/n)(1− k/n)
n.
maxp∈(0,1)
p(1− p) = p(1− p)
∣∣∣∣p=1/2
= 1/4 =⇒ d ≤zα/2
2√
n=: dm.
41
![Page 92: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/92.jpg)
Comment:
1. When p is close to 1/2, d ≈ zα/22√
n , which is equivalent to σp ≈ 12√
n .
E.g., n = 1000, k/n = 0.48, and α = 5%, then
d = 1.96
√0.48× 0.52
1000= 0.03097 and dm =
1.962√
1000= 0.03099
σp =
√0.48× 0.52
1000= 0.01579873 and σp ≈
12√
1000= 0.01581139.
2. When p is away from 1/2, the discrepancy between d and dm becomesbig....
42
![Page 93: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/93.jpg)
Comment:
1. When p is close to 1/2, d ≈ zα/22√
n , which is equivalent to σp ≈ 12√
n .
E.g., n = 1000, k/n = 0.48, and α = 5%, then
d = 1.96
√0.48× 0.52
1000= 0.03097 and dm =
1.962√
1000= 0.03099
σp =
√0.48× 0.52
1000= 0.01579873 and σp ≈
12√
1000= 0.01581139.
2. When p is away from 1/2, the discrepancy between d and dm becomesbig....
42
![Page 94: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/94.jpg)
E.g. Running for presidency. Max and Sirius obtained 480 and 520 votes,respectively. What is probability that Max will win?
What if the sample size is n = 5000, and Max obtained 2400 votes.
43
![Page 95: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/95.jpg)
Choosing sample sizes
d ≤zα/2
2√
n⇐⇒ n ≥
z2α/2p(1− p)
d2 (When p is known)
d ≤zα/2
2√
n⇐⇒ n ≥
z2α/2
4d2 (When p is unknown)
E.g. Anti-smoking campaign. Need to find an 95% C.I. with a margin of errorequal to 1%. Determine the sample size?
Answer: n ≥ 1.962
4×0.012 = 9640.
E.g.’ In order to reduce the sample size, a small sample is used to determinep. One finds that p ≈ 0.22. Determine the sample size again.
Answer: n ≥ 1.962×0.22×0.78×0.012 = 6592.2.
44
![Page 96: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/96.jpg)
Choosing sample sizes
d ≤zα/2
2√
n⇐⇒ n ≥
z2α/2p(1− p)
d2 (When p is known)
d ≤zα/2
2√
n⇐⇒ n ≥
z2α/2
4d2 (When p is unknown)
E.g. Anti-smoking campaign. Need to find an 95% C.I. with a margin of errorequal to 1%. Determine the sample size?
Answer: n ≥ 1.962
4×0.012 = 9640.
E.g.’ In order to reduce the sample size, a small sample is used to determinep. One finds that p ≈ 0.22. Determine the sample size again.
Answer: n ≥ 1.962×0.22×0.78×0.012 = 6592.2.
44
![Page 97: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/97.jpg)
Plan
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
45
![Page 98: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/98.jpg)
Chapter 5. Estimation
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
46
![Page 99: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/99.jpg)
§ 5.4 Properties of Estimators
Question: Estimators are not in general unique (MLE or MME ...). How toselect one estimator?
Recall: For a random sample of size n from the population with given pdf, wehave X1, · · · ,Xn, which are i.i.d. r.v.’s. The estimator θ is a function of X ′i s:
θ = θ(X1, · · · ,Xn).
Criterions:1. Unbiased. (Mean)
2. Efficiency, the minimum-variance estimator. (Variance)
3. Sufficency.
4. Consistency. (Asymptotic behavior)
47
![Page 100: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/100.jpg)
§ 5.4 Properties of Estimators
Question: Estimators are not in general unique (MLE or MME ...). How toselect one estimator?
Recall: For a random sample of size n from the population with given pdf, wehave X1, · · · ,Xn, which are i.i.d. r.v.’s. The estimator θ is a function of X ′i s:
θ = θ(X1, · · · ,Xn).
Criterions:1. Unbiased. (Mean)
2. Efficiency, the minimum-variance estimator. (Variance)
3. Sufficency.
4. Consistency. (Asymptotic behavior)
47
![Page 101: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/101.jpg)
§ 5.4 Properties of Estimators
Question: Estimators are not in general unique (MLE or MME ...). How toselect one estimator?
Recall: For a random sample of size n from the population with given pdf, wehave X1, · · · ,Xn, which are i.i.d. r.v.’s. The estimator θ is a function of X ′i s:
θ = θ(X1, · · · ,Xn).
Criterions:1. Unbiased. (Mean)
2. Efficiency, the minimum-variance estimator. (Variance)
3. Sufficency.
4. Consistency. (Asymptotic behavior)
47
![Page 102: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/102.jpg)
§ 5.4 Properties of Estimators
Question: Estimators are not in general unique (MLE or MME ...). How toselect one estimator?
Recall: For a random sample of size n from the population with given pdf, wehave X1, · · · ,Xn, which are i.i.d. r.v.’s. The estimator θ is a function of X ′i s:
θ = θ(X1, · · · ,Xn).
Criterions:1. Unbiased. (Mean)
2. Efficiency, the minimum-variance estimator. (Variance)
3. Sufficency.
4. Consistency. (Asymptotic behavior)
47
![Page 103: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/103.jpg)
Unbiasedness
Definition 5.4.1. Given a random sample of size n whose populationdistribution dependes on an unknown parameter θ, let θ be an estimator of θ.
Then θ is called unbiased if E(θ) = θ;
and θ is called asymptotically unbiased if limn→∞ E(θ) = θ.
48
![Page 104: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/104.jpg)
E.g. 1. fY (y ; θ) = 2yθ2 if y ∈ [0, θ].
I θ1 =32
Y
I θ2 = Ymax .I θ3 =
2n + 12n
Ymax .
E.g. 2. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter θ = E(X ). Show that for any constants ai ’s,
θ =n∑
i=1
aiXi is unbiased ⇐⇒n∑
i=1
ai = 1.
49
![Page 105: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/105.jpg)
E.g. 1. fY (y ; θ) = 2yθ2 if y ∈ [0, θ].
I θ1 =32
Y
I θ2 = Ymax .I θ3 =
2n + 12n
Ymax .
E.g. 2. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter θ = E(X ). Show that for any constants ai ’s,
θ =n∑
i=1
aiXi is unbiased ⇐⇒n∑
i=1
ai = 1.
49
![Page 106: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/106.jpg)
E.g. 1. fY (y ; θ) = 2yθ2 if y ∈ [0, θ].
I θ1 =32
Y
I θ2 = Ymax .I θ3 =
2n + 12n
Ymax .
E.g. 2. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter θ = E(X ). Show that for any constants ai ’s,
θ =n∑
i=1
aiXi is unbiased ⇐⇒n∑
i=1
ai = 1.
49
![Page 107: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/107.jpg)
E.g. 1. fY (y ; θ) = 2yθ2 if y ∈ [0, θ].
I θ1 =32
Y
I θ2 = Ymax .I θ3 =
2n + 12n
Ymax .
E.g. 2. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter θ = E(X ). Show that for any constants ai ’s,
θ =n∑
i=1
aiXi is unbiased ⇐⇒n∑
i=1
ai = 1.
49
![Page 108: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/108.jpg)
E.g. 1. fY (y ; θ) = 2yθ2 if y ∈ [0, θ].
I θ1 =32
Y
I θ2 = Ymax .I θ3 =
2n + 12n
Ymax .
E.g. 2. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter θ = E(X ). Show that for any constants ai ’s,
θ =n∑
i=1
aiXi is unbiased ⇐⇒n∑
i=1
ai = 1.
49
![Page 109: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/109.jpg)
E.g. 3. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter σ2 = Var(X ).
I σ2 =1n
n∑i=1
(Xi − X
)2
I S2 = Sample Variance =1
n − 1
n∑i=1
(Xi − X
)2
I S = Sample Standard Deviation =
√√√√ 1n − 1
n∑i=1
(Xi − X
)2. (Biased for σ!)
50
![Page 110: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/110.jpg)
E.g. 3. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter σ2 = Var(X ).
I σ2 =1n
n∑i=1
(Xi − X
)2
I S2 = Sample Variance =1
n − 1
n∑i=1
(Xi − X
)2
I S = Sample Standard Deviation =
√√√√ 1n − 1
n∑i=1
(Xi − X
)2. (Biased for σ!)
50
![Page 111: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/111.jpg)
E.g. 3. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter σ2 = Var(X ).
I σ2 =1n
n∑i=1
(Xi − X
)2
I S2 = Sample Variance =1
n − 1
n∑i=1
(Xi − X
)2
I S = Sample Standard Deviation =
√√√√ 1n − 1
n∑i=1
(Xi − X
)2. (Biased for σ!)
50
![Page 112: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/112.jpg)
E.g. 3. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter σ2 = Var(X ).
I σ2 =1n
n∑i=1
(Xi − X
)2
I S2 = Sample Variance =1
n − 1
n∑i=1
(Xi − X
)2
I S = Sample Standard Deviation =
√√√√ 1n − 1
n∑i=1
(Xi − X
)2. (Biased for σ!)
50
![Page 113: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/113.jpg)
E.g. 4. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. λ = 1/Y is biased.
nY =∑n
i=1 Yi ∼ Gamma distribution(n, λ). Hence,
E(λ)
= E(
1/Y)
= n∫ ∞
0
1yλn
Γ(n)yn−1e−λy dy
=nλ
n − 1
∫ ∞0
λn−1
Γ(n − 1)y (n−1)−1e−λy︸ ︷︷ ︸
pdf for Gamma distr. (n − 1, λ)
dy
=n
n − 1λ.
Biased! But E(λ) = nn−1λ→ λ as n→∞. (Asymptotically unbiased.)
Note: λ∗ = n−1nY
is unbiased.
E.g. 4’. Exponential distr.: fY (y ; θ) = 1θe−y/θ for y ≥ 0. θ = Y is unbiased.
E(θ)
=1n
n∑i=1
E(Yi ) =1n
n∑i=1
θ = θ.
51
![Page 114: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/114.jpg)
E.g. 4. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. λ = 1/Y is biased.
nY =∑n
i=1 Yi ∼ Gamma distribution(n, λ). Hence,
E(λ)
= E(
1/Y)
= n∫ ∞
0
1yλn
Γ(n)yn−1e−λy dy
=nλ
n − 1
∫ ∞0
λn−1
Γ(n − 1)y (n−1)−1e−λy︸ ︷︷ ︸
pdf for Gamma distr. (n − 1, λ)
dy
=n
n − 1λ.
Biased! But E(λ) = nn−1λ→ λ as n→∞. (Asymptotically unbiased.)
Note: λ∗ = n−1nY
is unbiased.
E.g. 4’. Exponential distr.: fY (y ; θ) = 1θe−y/θ for y ≥ 0. θ = Y is unbiased.
E(θ)
=1n
n∑i=1
E(Yi ) =1n
n∑i=1
θ = θ.
51
![Page 115: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/115.jpg)
E.g. 4. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. λ = 1/Y is biased.
nY =∑n
i=1 Yi ∼ Gamma distribution(n, λ). Hence,
E(λ)
= E(
1/Y)
= n∫ ∞
0
1yλn
Γ(n)yn−1e−λy dy
=nλ
n − 1
∫ ∞0
λn−1
Γ(n − 1)y (n−1)−1e−λy︸ ︷︷ ︸
pdf for Gamma distr. (n − 1, λ)
dy
=n
n − 1λ.
Biased! But E(λ) = nn−1λ→ λ as n→∞. (Asymptotically unbiased.)
Note: λ∗ = n−1nY
is unbiased.
E.g. 4’. Exponential distr.: fY (y ; θ) = 1θe−y/θ for y ≥ 0. θ = Y is unbiased.
E(θ)
=1n
n∑i=1
E(Yi ) =1n
n∑i=1
θ = θ.
51
![Page 116: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/116.jpg)
E.g. 4. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. λ = 1/Y is biased.
nY =∑n
i=1 Yi ∼ Gamma distribution(n, λ). Hence,
E(λ)
= E(
1/Y)
= n∫ ∞
0
1yλn
Γ(n)yn−1e−λy dy
=nλ
n − 1
∫ ∞0
λn−1
Γ(n − 1)y (n−1)−1e−λy︸ ︷︷ ︸
pdf for Gamma distr. (n − 1, λ)
dy
=n
n − 1λ.
Biased! But E(λ) = nn−1λ→ λ as n→∞. (Asymptotically unbiased.)
Note: λ∗ = n−1nY
is unbiased.
E.g. 4’. Exponential distr.: fY (y ; θ) = 1θe−y/θ for y ≥ 0. θ = Y is unbiased.
E(θ)
=1n
n∑i=1
E(Yi ) =1n
n∑i=1
θ = θ.
51
![Page 117: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/117.jpg)
E.g. 4. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. λ = 1/Y is biased.
nY =∑n
i=1 Yi ∼ Gamma distribution(n, λ). Hence,
E(λ)
= E(
1/Y)
= n∫ ∞
0
1yλn
Γ(n)yn−1e−λy dy
=nλ
n − 1
∫ ∞0
λn−1
Γ(n − 1)y (n−1)−1e−λy︸ ︷︷ ︸
pdf for Gamma distr. (n − 1, λ)
dy
=n
n − 1λ.
Biased! But E(λ) = nn−1λ→ λ as n→∞. (Asymptotically unbiased.)
Note: λ∗ = n−1nY
is unbiased.
E.g. 4’. Exponential distr.: fY (y ; θ) = 1θe−y/θ for y ≥ 0. θ = Y is unbiased.
E(θ)
=1n
n∑i=1
E(Yi ) =1n
n∑i=1
θ = θ.
51
![Page 118: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/118.jpg)
E.g. 4. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. λ = 1/Y is biased.
nY =∑n
i=1 Yi ∼ Gamma distribution(n, λ). Hence,
E(λ)
= E(
1/Y)
= n∫ ∞
0
1yλn
Γ(n)yn−1e−λy dy
=nλ
n − 1
∫ ∞0
λn−1
Γ(n − 1)y (n−1)−1e−λy︸ ︷︷ ︸
pdf for Gamma distr. (n − 1, λ)
dy
=n
n − 1λ.
Biased! But E(λ) = nn−1λ→ λ as n→∞. (Asymptotically unbiased.)
Note: λ∗ = n−1nY
is unbiased.
E.g. 4’. Exponential distr.: fY (y ; θ) = 1θe−y/θ for y ≥ 0. θ = Y is unbiased.
E(θ)
=1n
n∑i=1
E(Yi ) =1n
n∑i=1
θ = θ.
51
![Page 119: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/119.jpg)
E.g. 4. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. λ = 1/Y is biased.
nY =∑n
i=1 Yi ∼ Gamma distribution(n, λ). Hence,
E(λ)
= E(
1/Y)
= n∫ ∞
0
1yλn
Γ(n)yn−1e−λy dy
=nλ
n − 1
∫ ∞0
λn−1
Γ(n − 1)y (n−1)−1e−λy︸ ︷︷ ︸
pdf for Gamma distr. (n − 1, λ)
dy
=n
n − 1λ.
Biased! But E(λ) = nn−1λ→ λ as n→∞. (Asymptotically unbiased.)
Note: λ∗ = n−1nY
is unbiased.
E.g. 4’. Exponential distr.: fY (y ; θ) = 1θe−y/θ for y ≥ 0. θ = Y is unbiased.
E(θ)
=1n
n∑i=1
E(Yi ) =1n
n∑i=1
θ = θ.
51
![Page 120: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/120.jpg)
E.g. 4. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. λ = 1/Y is biased.
nY =∑n
i=1 Yi ∼ Gamma distribution(n, λ). Hence,
E(λ)
= E(
1/Y)
= n∫ ∞
0
1yλn
Γ(n)yn−1e−λy dy
=nλ
n − 1
∫ ∞0
λn−1
Γ(n − 1)y (n−1)−1e−λy︸ ︷︷ ︸
pdf for Gamma distr. (n − 1, λ)
dy
=n
n − 1λ.
Biased! But E(λ) = nn−1λ→ λ as n→∞. (Asymptotically unbiased.)
Note: λ∗ = n−1nY
is unbiased.
E.g. 4’. Exponential distr.: fY (y ; θ) = 1θe−y/θ for y ≥ 0. θ = Y is unbiased.
E(θ)
=1n
n∑i=1
E(Yi ) =1n
n∑i=1
θ = θ.
51
![Page 121: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/121.jpg)
E.g. 4. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. λ = 1/Y is biased.
nY =∑n
i=1 Yi ∼ Gamma distribution(n, λ). Hence,
E(λ)
= E(
1/Y)
= n∫ ∞
0
1yλn
Γ(n)yn−1e−λy dy
=nλ
n − 1
∫ ∞0
λn−1
Γ(n − 1)y (n−1)−1e−λy︸ ︷︷ ︸
pdf for Gamma distr. (n − 1, λ)
dy
=n
n − 1λ.
Biased! But E(λ) = nn−1λ→ λ as n→∞. (Asymptotically unbiased.)
Note: λ∗ = n−1nY
is unbiased.
E.g. 4’. Exponential distr.: fY (y ; θ) = 1θe−y/θ for y ≥ 0. θ = Y is unbiased.
E(θ)
=1n
n∑i=1
E(Yi ) =1n
n∑i=1
θ = θ.
51
![Page 122: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/122.jpg)
E.g. 4. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. λ = 1/Y is biased.
nY =∑n
i=1 Yi ∼ Gamma distribution(n, λ). Hence,
E(λ)
= E(
1/Y)
= n∫ ∞
0
1yλn
Γ(n)yn−1e−λy dy
=nλ
n − 1
∫ ∞0
λn−1
Γ(n − 1)y (n−1)−1e−λy︸ ︷︷ ︸
pdf for Gamma distr. (n − 1, λ)
dy
=n
n − 1λ.
Biased! But E(λ) = nn−1λ→ λ as n→∞. (Asymptotically unbiased.)
Note: λ∗ = n−1nY
is unbiased.
E.g. 4’. Exponential distr.: fY (y ; θ) = 1θe−y/θ for y ≥ 0. θ = Y is unbiased.
E(θ)
=1n
n∑i=1
E(Yi ) =1n
n∑i=1
θ = θ.
51
![Page 123: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/123.jpg)
Efficiency
Definition 5.4.2. Let θ1 and θ2 be two unbiased estimators for a parameter θ.If Var(θ1) < Var(θ2), then we say that θ1 is more efficient than θ2.The relative efficiency of θ1 w.r.t. θ2 is the ratio Var(θ1)/Var(θ2).
52
![Page 124: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/124.jpg)
E.g. 1. fY (y ; θ) = 2yθ2 if y ∈ [0, θ]. Which is more efficient? Find the relative
efficiency of θ1 w.r.t. θ3 .
I θ1 =32
Y
I θ3 =2n + 1
2nYmax .
E.g. 2. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter θ = E(X ) (suppose σ2 = Var(X ) <∞).
Among all possible unbiased estimators θ =∑n
i=1 aiXi with∑n
i=1 ai = 1.Find the most efficient one.
Sol:
Var(θ) =n∑
i=1
a2i Var(X ) = σ2
n∑i=1
a2i ≥ σ22n−1
(n∑
i=1
ai
)2
= 2n−1σ2,
with equality iff a1 = · · · = an = 1/n.
Hence, the most efficient one is the sample mean θ = X . �
53
![Page 125: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/125.jpg)
E.g. 1. fY (y ; θ) = 2yθ2 if y ∈ [0, θ]. Which is more efficient? Find the relative
efficiency of θ1 w.r.t. θ3 .
I θ1 =32
Y
I θ3 =2n + 1
2nYmax .
E.g. 2. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter θ = E(X ) (suppose σ2 = Var(X ) <∞).
Among all possible unbiased estimators θ =∑n
i=1 aiXi with∑n
i=1 ai = 1.Find the most efficient one.
Sol:
Var(θ) =n∑
i=1
a2i Var(X ) = σ2
n∑i=1
a2i ≥ σ22n−1
(n∑
i=1
ai
)2
= 2n−1σ2,
with equality iff a1 = · · · = an = 1/n.
Hence, the most efficient one is the sample mean θ = X . �
53
![Page 126: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/126.jpg)
E.g. 1. fY (y ; θ) = 2yθ2 if y ∈ [0, θ]. Which is more efficient? Find the relative
efficiency of θ1 w.r.t. θ3 .
I θ1 =32
Y
I θ3 =2n + 1
2nYmax .
E.g. 2. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter θ = E(X ) (suppose σ2 = Var(X ) <∞).
Among all possible unbiased estimators θ =∑n
i=1 aiXi with∑n
i=1 ai = 1.Find the most efficient one.
Sol:
Var(θ) =n∑
i=1
a2i Var(X ) = σ2
n∑i=1
a2i ≥ σ22n−1
(n∑
i=1
ai
)2
= 2n−1σ2,
with equality iff a1 = · · · = an = 1/n.
Hence, the most efficient one is the sample mean θ = X . �
53
![Page 127: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/127.jpg)
E.g. 1. fY (y ; θ) = 2yθ2 if y ∈ [0, θ]. Which is more efficient? Find the relative
efficiency of θ1 w.r.t. θ3 .
I θ1 =32
Y
I θ3 =2n + 1
2nYmax .
E.g. 2. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter θ = E(X ) (suppose σ2 = Var(X ) <∞).
Among all possible unbiased estimators θ =∑n
i=1 aiXi with∑n
i=1 ai = 1.Find the most efficient one.
Sol:
Var(θ) =n∑
i=1
a2i Var(X ) = σ2
n∑i=1
a2i ≥ σ22n−1
(n∑
i=1
ai
)2
= 2n−1σ2,
with equality iff a1 = · · · = an = 1/n.
Hence, the most efficient one is the sample mean θ = X . �
53
![Page 128: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/128.jpg)
E.g. 1. fY (y ; θ) = 2yθ2 if y ∈ [0, θ]. Which is more efficient? Find the relative
efficiency of θ1 w.r.t. θ3 .
I θ1 =32
Y
I θ3 =2n + 1
2nYmax .
E.g. 2. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter θ = E(X ) (suppose σ2 = Var(X ) <∞).
Among all possible unbiased estimators θ =∑n
i=1 aiXi with∑n
i=1 ai = 1.Find the most efficient one.
Sol:
Var(θ) =n∑
i=1
a2i Var(X ) = σ2
n∑i=1
a2i ≥ σ22n−1
(n∑
i=1
ai
)2
= 2n−1σ2,
with equality iff a1 = · · · = an = 1/n.
Hence, the most efficient one is the sample mean θ = X . �
53
![Page 129: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/129.jpg)
E.g. 1. fY (y ; θ) = 2yθ2 if y ∈ [0, θ]. Which is more efficient? Find the relative
efficiency of θ1 w.r.t. θ3 .
I θ1 =32
Y
I θ3 =2n + 1
2nYmax .
E.g. 2. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter θ = E(X ) (suppose σ2 = Var(X ) <∞).
Among all possible unbiased estimators θ =∑n
i=1 aiXi with∑n
i=1 ai = 1.Find the most efficient one.
Sol:
Var(θ) =n∑
i=1
a2i Var(X ) = σ2
n∑i=1
a2i ≥ σ22n−1
(n∑
i=1
ai
)2
= 2n−1σ2,
with equality iff a1 = · · · = an = 1/n.
Hence, the most efficient one is the sample mean θ = X . �
53
![Page 130: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/130.jpg)
E.g. 1. fY (y ; θ) = 2yθ2 if y ∈ [0, θ]. Which is more efficient? Find the relative
efficiency of θ1 w.r.t. θ3 .
I θ1 =32
Y
I θ3 =2n + 1
2nYmax .
E.g. 2. Let X1, · · · ,Xn be a random sample of size n with the unknownparameter θ = E(X ) (suppose σ2 = Var(X ) <∞).
Among all possible unbiased estimators θ =∑n
i=1 aiXi with∑n
i=1 ai = 1.Find the most efficient one.
Sol:
Var(θ) =n∑
i=1
a2i Var(X ) = σ2
n∑i=1
a2i ≥ σ22n−1
(n∑
i=1
ai
)2
= 2n−1σ2,
with equality iff a1 = · · · = an = 1/n.
Hence, the most efficient one is the sample mean θ = X . �
53
![Page 131: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/131.jpg)
Plan
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
54
![Page 132: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/132.jpg)
Chapter 5. Estimation
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
55
![Page 133: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/133.jpg)
§ 5.5 MVE: The Cramér-Rao Lower Bound
Question: Can one identify the unbiased estimator having the smallest variance?
Short answer: In many cases, yes!
We are going to develop the theory to answer this question in details!
56
![Page 134: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/134.jpg)
Regular Estimation/Condition: The set of y (resp. k ) values, wherefY (y ; θ) 6= 0 (resp. pX (k ; θ) 6= 0), does not depend on θ.
i.e., the domain of the pdf does not depend on the parameter (so that onecan differentiate under integration).
Definition. The Fisher’s Information of a continous (resp. discrete) randomvariable Y (resp. X ) with pdf fY (y ; θ) (resp. pX (k ; θ)) is defined as
I(θ) = E
[(∂ ln fY (Y ; θ)
∂θ
)2] (
resp. E
[(∂ ln pX (X ; θ)
∂θ
)2])
.
57
![Page 135: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/135.jpg)
Regular Estimation/Condition: The set of y (resp. k ) values, wherefY (y ; θ) 6= 0 (resp. pX (k ; θ) 6= 0), does not depend on θ.
i.e., the domain of the pdf does not depend on the parameter (so that onecan differentiate under integration).
Definition. The Fisher’s Information of a continous (resp. discrete) randomvariable Y (resp. X ) with pdf fY (y ; θ) (resp. pX (k ; θ)) is defined as
I(θ) = E
[(∂ ln fY (Y ; θ)
∂θ
)2] (
resp. E
[(∂ ln pX (X ; θ)
∂θ
)2])
.
57
![Page 136: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/136.jpg)
Lemma. Under regular condition, let Y1, · · · ,Yn be a random sample of sizen from the continuous population pdf fY (y ; θ). Then the Fisher Information inthe random sample Y1, · · · ,Yn equals n times the Fisher information in X :
E
[(∂ ln fY1,··· ,Yn (Y1, · · · ,Yn; θ)
∂θ
)2]
= n E
[(∂ ln fY (Y ; θ)
∂θ
)2]
= n I(θ). (1)
(A similar statement holds for the discrete case pX (k ; θ)).
Proof. Based on two observations:
LHS = E
( n∑i=1
∂
∂θln fYi (Yi ; θ)
)2
E(∂
∂θln fYi (Yi ; θ)
)=
∫R
∂∂θ
fY (y ; θ)
fY (y ; θ)fY (y ; θ)dy =
∫R
∂
∂θfY (y ; θ)dy
R.C.=
∂
∂θ
∫R
fY (y ; θ)dy =∂
∂θ1 = 0.
�
58
![Page 137: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/137.jpg)
Lemma. Under regular condition, let Y1, · · · ,Yn be a random sample of sizen from the continuous population pdf fY (y ; θ). Then the Fisher Information inthe random sample Y1, · · · ,Yn equals n times the Fisher information in X :
E
[(∂ ln fY1,··· ,Yn (Y1, · · · ,Yn; θ)
∂θ
)2]
= n E
[(∂ ln fY (Y ; θ)
∂θ
)2]
= n I(θ). (1)
(A similar statement holds for the discrete case pX (k ; θ)).
Proof. Based on two observations:
LHS = E
( n∑i=1
∂
∂θln fYi (Yi ; θ)
)2
E(∂
∂θln fYi (Yi ; θ)
)=
∫R
∂∂θ
fY (y ; θ)
fY (y ; θ)fY (y ; θ)dy =
∫R
∂
∂θfY (y ; θ)dy
R.C.=
∂
∂θ
∫R
fY (y ; θ)dy =∂
∂θ1 = 0.
�
58
![Page 138: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/138.jpg)
Lemma. Under regular condition, if ln fY (y ; θ) is twice differentiable in θ, then
I(θ) = −E[∂2
∂θ2 ln fY (Y ; θ)
]. (2)
(A similar statement holds for the discrete case pX (k ; θ)).
Proof. This is due to the two facts:
∂2
∂θ2 ln fY (Y ; θ) =∂2
∂θ2 fY (Y ; θ)
fY (Y ; θ)−
(∂∂θ
fY (Y ; θ)
fY (Y ; θ)
)2
︸ ︷︷ ︸=
(∂
∂θln fY (Y ; θ)
)2
E
(∂2
∂θ2 fY (Y ; θ)
fY (Y ; θ)
)=
∫R
∂2
∂θ2 fY (y ; θ)
fY (y ; θ)fY (y ; θ)dy =
∫R
∂2
∂θ2 fY (y ; θ)dy .
R.C.=
∂2
∂θ2
∫R
fY (y ; θ)dy =∂2
∂θ2 1 = 0.
�
59
![Page 139: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/139.jpg)
Lemma. Under regular condition, if ln fY (y ; θ) is twice differentiable in θ, then
I(θ) = −E[∂2
∂θ2 ln fY (Y ; θ)
]. (2)
(A similar statement holds for the discrete case pX (k ; θ)).
Proof. This is due to the two facts:
∂2
∂θ2 ln fY (Y ; θ) =∂2
∂θ2 fY (Y ; θ)
fY (Y ; θ)−
(∂∂θ
fY (Y ; θ)
fY (Y ; θ)
)2
︸ ︷︷ ︸=
(∂
∂θln fY (Y ; θ)
)2
E
(∂2
∂θ2 fY (Y ; θ)
fY (Y ; θ)
)=
∫R
∂2
∂θ2 fY (y ; θ)
fY (y ; θ)fY (y ; θ)dy =
∫R
∂2
∂θ2 fY (y ; θ)dy .
R.C.=
∂2
∂θ2
∫R
fY (y ; θ)dy =∂2
∂θ2 1 = 0.
�
59
![Page 140: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/140.jpg)
Theorem (Cramér-Rao Inequality) Under regular condition, let Y1, · · · ,Yn bea random sample of size n from the continuous population pdf fY (y ; θ). Letθ = θ(Y1, · · · ,Yn) be any unbiased estimator for θ. Then
Var(θ) ≥ 1n I(θ)
.
(A similar statement holds for the discrete case pX (k ; θ)).
Proof. If n = 1, then by Cauchy-Schwartz inequality,
E[
(θ − θ)∂
∂θln fY (Y ; θ)
]≤√
Var(θ)× I(θ)
On the other hand,
E[
(θ − θ)∂
∂θln fY (Y ; θ)
]=
∫R
(θ − θ)∂∂θ
fY (y ; θ)
fY (y ; θ)fY (y ; θ)dy
=
∫R
(θ − θ)∂
∂θfY (y ; θ)dy
=∂
∂θ
∫R
(θ − θ)fY (y ; θ)dy︸ ︷︷ ︸= E(θ − θ) = 0
+1 = 1.
For general n, apply for (1). �.60
![Page 141: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/141.jpg)
Theorem (Cramér-Rao Inequality) Under regular condition, let Y1, · · · ,Yn bea random sample of size n from the continuous population pdf fY (y ; θ). Letθ = θ(Y1, · · · ,Yn) be any unbiased estimator for θ. Then
Var(θ) ≥ 1n I(θ)
.
(A similar statement holds for the discrete case pX (k ; θ)).
Proof. If n = 1, then by Cauchy-Schwartz inequality,
E[
(θ − θ)∂
∂θln fY (Y ; θ)
]≤√
Var(θ)× I(θ)
On the other hand,
E[
(θ − θ)∂
∂θln fY (Y ; θ)
]=
∫R
(θ − θ)∂∂θ
fY (y ; θ)
fY (y ; θ)fY (y ; θ)dy
=
∫R
(θ − θ)∂
∂θfY (y ; θ)dy
=∂
∂θ
∫R
(θ − θ)fY (y ; θ)dy︸ ︷︷ ︸= E(θ − θ) = 0
+1 = 1.
For general n, apply for (1). �.60
![Page 142: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/142.jpg)
Definition. Let Θ be the set of all estimators θ that are unbiased for theparameter θ. We say that θ∗ is a best or minimum-variance esimator (MVE)if θ∗ ∈ Θ and
Var(θ∗) ≤ Var(θ) for all θ ∈ Θ.
Definition. An unbiased estimator θ is efficient if Var(θ) is equal to theCramér-Rao lower bound, i.e., Varθ = (n I(θ))−1.
The efficiency of an unbiased estimator θ is defined to be(
nI(θ)Var(θ))−1
.
Unbiased estimators Θ
MVEEfficient est.
61
![Page 143: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/143.jpg)
E.g. 1. X ∼Bernoulli(p). Check whether p = X is efficient?
Step 1. Compute Fisher’s Information:
pX (k ; p) = pk (1− p)1−k .
ln pX (k ; p) = k ln p + (1− k) ln(1− p)
∂
∂pln pX (k ; p) =
kp−
1− k1− p
−∂2
∂2pln pX (k ; p) =
kp2
+1− k
(1− p)2
−E[∂2
∂2pln pX (X ; p)
]= E
[Xp2
+1− X
(1− p)2
]=
1p
+1
1− p=
1pq.
I(p) =1
pq, q = 1− p.
Step 2. Compute Var(p).
Var(p) =1n2
Var
( n∑i=1
Xi
)=
1n2
npq =pqn
Conclusion Because p is unbiased and Var(p) = (nI(p))−1, p is efficient.
62
![Page 144: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/144.jpg)
E.g. 1. X ∼Bernoulli(p). Check whether p = X is efficient?
Step 1. Compute Fisher’s Information:
pX (k ; p) = pk (1− p)1−k .
ln pX (k ; p) = k ln p + (1− k) ln(1− p)
∂
∂pln pX (k ; p) =
kp−
1− k1− p
−∂2
∂2pln pX (k ; p) =
kp2
+1− k
(1− p)2
−E[∂2
∂2pln pX (X ; p)
]= E
[Xp2
+1− X
(1− p)2
]=
1p
+1
1− p=
1pq.
I(p) =1
pq, q = 1− p.
Step 2. Compute Var(p).
Var(p) =1n2
Var
( n∑i=1
Xi
)=
1n2
npq =pqn
Conclusion Because p is unbiased and Var(p) = (nI(p))−1, p is efficient.
62
![Page 145: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/145.jpg)
E.g. 1. X ∼Bernoulli(p). Check whether p = X is efficient?
Step 1. Compute Fisher’s Information:
pX (k ; p) = pk (1− p)1−k .
ln pX (k ; p) = k ln p + (1− k) ln(1− p)
∂
∂pln pX (k ; p) =
kp−
1− k1− p
−∂2
∂2pln pX (k ; p) =
kp2
+1− k
(1− p)2
−E[∂2
∂2pln pX (X ; p)
]= E
[Xp2
+1− X
(1− p)2
]=
1p
+1
1− p=
1pq.
I(p) =1
pq, q = 1− p.
Step 2. Compute Var(p).
Var(p) =1n2
Var
( n∑i=1
Xi
)=
1n2
npq =pqn
Conclusion Because p is unbiased and Var(p) = (nI(p))−1, p is efficient.
62
![Page 146: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/146.jpg)
E.g. 1. X ∼Bernoulli(p). Check whether p = X is efficient?
Step 1. Compute Fisher’s Information:
pX (k ; p) = pk (1− p)1−k .
ln pX (k ; p) = k ln p + (1− k) ln(1− p)
∂
∂pln pX (k ; p) =
kp−
1− k1− p
−∂2
∂2pln pX (k ; p) =
kp2
+1− k
(1− p)2
−E[∂2
∂2pln pX (X ; p)
]= E
[Xp2
+1− X
(1− p)2
]=
1p
+1
1− p=
1pq.
I(p) =1
pq, q = 1− p.
Step 2. Compute Var(p).
Var(p) =1n2
Var
( n∑i=1
Xi
)=
1n2
npq =pqn
Conclusion Because p is unbiased and Var(p) = (nI(p))−1, p is efficient.
62
![Page 147: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/147.jpg)
E.g. 1. X ∼Bernoulli(p). Check whether p = X is efficient?
Step 1. Compute Fisher’s Information:
pX (k ; p) = pk (1− p)1−k .
ln pX (k ; p) = k ln p + (1− k) ln(1− p)
∂
∂pln pX (k ; p) =
kp−
1− k1− p
−∂2
∂2pln pX (k ; p) =
kp2
+1− k
(1− p)2
−E[∂2
∂2pln pX (X ; p)
]= E
[Xp2
+1− X
(1− p)2
]=
1p
+1
1− p=
1pq.
I(p) =1
pq, q = 1− p.
Step 2. Compute Var(p).
Var(p) =1n2
Var
( n∑i=1
Xi
)=
1n2
npq =pqn
Conclusion Because p is unbiased and Var(p) = (nI(p))−1, p is efficient.
62
![Page 148: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/148.jpg)
E.g. 1. X ∼Bernoulli(p). Check whether p = X is efficient?
Step 1. Compute Fisher’s Information:
pX (k ; p) = pk (1− p)1−k .
ln pX (k ; p) = k ln p + (1− k) ln(1− p)
∂
∂pln pX (k ; p) =
kp−
1− k1− p
−∂2
∂2pln pX (k ; p) =
kp2
+1− k
(1− p)2
−E[∂2
∂2pln pX (X ; p)
]= E
[Xp2
+1− X
(1− p)2
]=
1p
+1
1− p=
1pq.
I(p) =1
pq, q = 1− p.
Step 2. Compute Var(p).
Var(p) =1n2
Var
( n∑i=1
Xi
)=
1n2
npq =pqn
Conclusion Because p is unbiased and Var(p) = (nI(p))−1, p is efficient.
62
![Page 149: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/149.jpg)
E.g. 1. X ∼Bernoulli(p). Check whether p = X is efficient?
Step 1. Compute Fisher’s Information:
pX (k ; p) = pk (1− p)1−k .
ln pX (k ; p) = k ln p + (1− k) ln(1− p)
∂
∂pln pX (k ; p) =
kp−
1− k1− p
−∂2
∂2pln pX (k ; p) =
kp2
+1− k
(1− p)2
−E[∂2
∂2pln pX (X ; p)
]= E
[Xp2
+1− X
(1− p)2
]=
1p
+1
1− p=
1pq.
I(p) =1
pq, q = 1− p.
Step 2. Compute Var(p).
Var(p) =1n2
Var
( n∑i=1
Xi
)=
1n2
npq =pqn
Conclusion Because p is unbiased and Var(p) = (nI(p))−1, p is efficient.
62
![Page 150: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/150.jpg)
E.g. 1. X ∼Bernoulli(p). Check whether p = X is efficient?
Step 1. Compute Fisher’s Information:
pX (k ; p) = pk (1− p)1−k .
ln pX (k ; p) = k ln p + (1− k) ln(1− p)
∂
∂pln pX (k ; p) =
kp−
1− k1− p
−∂2
∂2pln pX (k ; p) =
kp2
+1− k
(1− p)2
−E[∂2
∂2pln pX (X ; p)
]= E
[Xp2
+1− X
(1− p)2
]=
1p
+1
1− p=
1pq.
I(p) =1
pq, q = 1− p.
Step 2. Compute Var(p).
Var(p) =1n2
Var
( n∑i=1
Xi
)=
1n2
npq =pqn
Conclusion Because p is unbiased and Var(p) = (nI(p))−1, p is efficient.
62
![Page 151: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/151.jpg)
E.g. 1. X ∼Bernoulli(p). Check whether p = X is efficient?
Step 1. Compute Fisher’s Information:
pX (k ; p) = pk (1− p)1−k .
ln pX (k ; p) = k ln p + (1− k) ln(1− p)
∂
∂pln pX (k ; p) =
kp−
1− k1− p
−∂2
∂2pln pX (k ; p) =
kp2
+1− k
(1− p)2
−E[∂2
∂2pln pX (X ; p)
]= E
[Xp2
+1− X
(1− p)2
]=
1p
+1
1− p=
1pq.
I(p) =1
pq, q = 1− p.
Step 2. Compute Var(p).
Var(p) =1n2
Var
( n∑i=1
Xi
)=
1n2
npq =pqn
Conclusion Because p is unbiased and Var(p) = (nI(p))−1, p is efficient.
62
![Page 152: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/152.jpg)
E.g. 2. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. Is λ = 1/Y efficent?
Answer No, because λ is biased. Nevertheless, we can still compute Fisher’sInformation as follows
Fisher’s Inf.ln fY (y ;λ) = lnλ− λy
∂
∂λln fY (y ;λ) =
1λ− y
−∂2
∂2λln fY (y ;λ) =
1λ2
−E[∂2
∂2λln fY (Y ;λ)
]= E
[1λ2
]=
1λ2.
I(λ) = λ−2
Try: λ∗ := n−1n
1Y
. It is unbiased. Is it efficient?
63
![Page 153: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/153.jpg)
E.g. 2. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. Is λ = 1/Y efficent?
Answer No, because λ is biased. Nevertheless, we can still compute Fisher’sInformation as follows
Fisher’s Inf.ln fY (y ;λ) = lnλ− λy
∂
∂λln fY (y ;λ) =
1λ− y
−∂2
∂2λln fY (y ;λ) =
1λ2
−E[∂2
∂2λln fY (Y ;λ)
]= E
[1λ2
]=
1λ2.
I(λ) = λ−2
Try: λ∗ := n−1n
1Y
. It is unbiased. Is it efficient?
63
![Page 154: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/154.jpg)
E.g. 2. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. Is λ = 1/Y efficent?
Answer No, because λ is biased. Nevertheless, we can still compute Fisher’sInformation as follows
Fisher’s Inf.ln fY (y ;λ) = lnλ− λy
∂
∂λln fY (y ;λ) =
1λ− y
−∂2
∂2λln fY (y ;λ) =
1λ2
−E[∂2
∂2λln fY (Y ;λ)
]= E
[1λ2
]=
1λ2.
I(λ) = λ−2
Try: λ∗ := n−1n
1Y
. It is unbiased. Is it efficient?
63
![Page 155: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/155.jpg)
E.g. 2. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. Is λ = 1/Y efficent?
Answer No, because λ is biased. Nevertheless, we can still compute Fisher’sInformation as follows
Fisher’s Inf.ln fY (y ;λ) = lnλ− λy
∂
∂λln fY (y ;λ) =
1λ− y
−∂2
∂2λln fY (y ;λ) =
1λ2
−E[∂2
∂2λln fY (Y ;λ)
]= E
[1λ2
]=
1λ2.
I(λ) = λ−2
Try: λ∗ := n−1n
1Y
. It is unbiased. Is it efficient?
63
![Page 156: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/156.jpg)
E.g. 2. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. Is λ = 1/Y efficent?
Answer No, because λ is biased. Nevertheless, we can still compute Fisher’sInformation as follows
Fisher’s Inf.ln fY (y ;λ) = lnλ− λy
∂
∂λln fY (y ;λ) =
1λ− y
−∂2
∂2λln fY (y ;λ) =
1λ2
−E[∂2
∂2λln fY (Y ;λ)
]= E
[1λ2
]=
1λ2.
I(λ) = λ−2
Try: λ∗ := n−1n
1Y
. It is unbiased. Is it efficient?
63
![Page 157: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/157.jpg)
E.g. 2. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. Is λ = 1/Y efficent?
Answer No, because λ is biased. Nevertheless, we can still compute Fisher’sInformation as follows
Fisher’s Inf.ln fY (y ;λ) = lnλ− λy
∂
∂λln fY (y ;λ) =
1λ− y
−∂2
∂2λln fY (y ;λ) =
1λ2
−E[∂2
∂2λln fY (Y ;λ)
]= E
[1λ2
]=
1λ2.
I(λ) = λ−2
Try: λ∗ := n−1n
1Y
. It is unbiased. Is it efficient?
63
![Page 158: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/158.jpg)
E.g. 2. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. Is λ = 1/Y efficent?
Answer No, because λ is biased. Nevertheless, we can still compute Fisher’sInformation as follows
Fisher’s Inf.ln fY (y ;λ) = lnλ− λy
∂
∂λln fY (y ;λ) =
1λ− y
−∂2
∂2λln fY (y ;λ) =
1λ2
−E[∂2
∂2λln fY (Y ;λ)
]= E
[1λ2
]=
1λ2.
I(λ) = λ−2
Try: λ∗ := n−1n
1Y
. It is unbiased. Is it efficient?
63
![Page 159: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/159.jpg)
E.g. 2. Exponential distr.: fY (y ;λ) = λe−λy for y ≥ 0. Is λ = 1/Y efficent?
Answer No, because λ is biased. Nevertheless, we can still compute Fisher’sInformation as follows
Fisher’s Inf.ln fY (y ;λ) = lnλ− λy
∂
∂λln fY (y ;λ) =
1λ− y
−∂2
∂2λln fY (y ;λ) =
1λ2
−E[∂2
∂2λln fY (Y ;λ)
]= E
[1λ2
]=
1λ2.
I(λ) = λ−2
Try: λ∗ := n−1n
1Y
. It is unbiased. Is it efficient?
63
![Page 160: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/160.jpg)
E.g. 2’. Exponential distr.: fY (y ; θ) = θ−1e−y/θ for y ≥ 0. θ = Y efficent?
Step. 1. Compute Fisher’s Information:
ln fY (y ; θ) = − ln θ −yθ
∂
∂θln fY (y ; θ) = −
1θ
+yθ2
−∂2
∂2θln fY (y ; θ) = −
1θ2
+2yθ3
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
1θ2
+2Yθ3
]= −
1θ2
+2θθ3
= θ−2.
I(θ) = θ−2
Step 2. Compute Var(θ):
Var(Y ) =1n2
n∑i=1
Var(Yi ) =1n2
nθ2 =θ2
n.
Conclusion. Because θ is unbiased and Var(p) = (nI(p))−1, θ is efficient.
64
![Page 161: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/161.jpg)
E.g. 2’. Exponential distr.: fY (y ; θ) = θ−1e−y/θ for y ≥ 0. θ = Y efficent?
Step. 1. Compute Fisher’s Information:
ln fY (y ; θ) = − ln θ −yθ
∂
∂θln fY (y ; θ) = −
1θ
+yθ2
−∂2
∂2θln fY (y ; θ) = −
1θ2
+2yθ3
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
1θ2
+2Yθ3
]= −
1θ2
+2θθ3
= θ−2.
I(θ) = θ−2
Step 2. Compute Var(θ):
Var(Y ) =1n2
n∑i=1
Var(Yi ) =1n2
nθ2 =θ2
n.
Conclusion. Because θ is unbiased and Var(p) = (nI(p))−1, θ is efficient.
64
![Page 162: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/162.jpg)
E.g. 2’. Exponential distr.: fY (y ; θ) = θ−1e−y/θ for y ≥ 0. θ = Y efficent?
Step. 1. Compute Fisher’s Information:
ln fY (y ; θ) = − ln θ −yθ
∂
∂θln fY (y ; θ) = −
1θ
+yθ2
−∂2
∂2θln fY (y ; θ) = −
1θ2
+2yθ3
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
1θ2
+2Yθ3
]= −
1θ2
+2θθ3
= θ−2.
I(θ) = θ−2
Step 2. Compute Var(θ):
Var(Y ) =1n2
n∑i=1
Var(Yi ) =1n2
nθ2 =θ2
n.
Conclusion. Because θ is unbiased and Var(p) = (nI(p))−1, θ is efficient.
64
![Page 163: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/163.jpg)
E.g. 2’. Exponential distr.: fY (y ; θ) = θ−1e−y/θ for y ≥ 0. θ = Y efficent?
Step. 1. Compute Fisher’s Information:
ln fY (y ; θ) = − ln θ −yθ
∂
∂θln fY (y ; θ) = −
1θ
+yθ2
−∂2
∂2θln fY (y ; θ) = −
1θ2
+2yθ3
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
1θ2
+2Yθ3
]= −
1θ2
+2θθ3
= θ−2.
I(θ) = θ−2
Step 2. Compute Var(θ):
Var(Y ) =1n2
n∑i=1
Var(Yi ) =1n2
nθ2 =θ2
n.
Conclusion. Because θ is unbiased and Var(p) = (nI(p))−1, θ is efficient.
64
![Page 164: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/164.jpg)
E.g. 2’. Exponential distr.: fY (y ; θ) = θ−1e−y/θ for y ≥ 0. θ = Y efficent?
Step. 1. Compute Fisher’s Information:
ln fY (y ; θ) = − ln θ −yθ
∂
∂θln fY (y ; θ) = −
1θ
+yθ2
−∂2
∂2θln fY (y ; θ) = −
1θ2
+2yθ3
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
1θ2
+2Yθ3
]= −
1θ2
+2θθ3
= θ−2.
I(θ) = θ−2
Step 2. Compute Var(θ):
Var(Y ) =1n2
n∑i=1
Var(Yi ) =1n2
nθ2 =θ2
n.
Conclusion. Because θ is unbiased and Var(p) = (nI(p))−1, θ is efficient.
64
![Page 165: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/165.jpg)
E.g. 2’. Exponential distr.: fY (y ; θ) = θ−1e−y/θ for y ≥ 0. θ = Y efficent?
Step. 1. Compute Fisher’s Information:
ln fY (y ; θ) = − ln θ −yθ
∂
∂θln fY (y ; θ) = −
1θ
+yθ2
−∂2
∂2θln fY (y ; θ) = −
1θ2
+2yθ3
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
1θ2
+2Yθ3
]= −
1θ2
+2θθ3
= θ−2.
I(θ) = θ−2
Step 2. Compute Var(θ):
Var(Y ) =1n2
n∑i=1
Var(Yi ) =1n2
nθ2 =θ2
n.
Conclusion. Because θ is unbiased and Var(p) = (nI(p))−1, θ is efficient.
64
![Page 166: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/166.jpg)
E.g. 2’. Exponential distr.: fY (y ; θ) = θ−1e−y/θ for y ≥ 0. θ = Y efficent?
Step. 1. Compute Fisher’s Information:
ln fY (y ; θ) = − ln θ −yθ
∂
∂θln fY (y ; θ) = −
1θ
+yθ2
−∂2
∂2θln fY (y ; θ) = −
1θ2
+2yθ3
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
1θ2
+2Yθ3
]= −
1θ2
+2θθ3
= θ−2.
I(θ) = θ−2
Step 2. Compute Var(θ):
Var(Y ) =1n2
n∑i=1
Var(Yi ) =1n2
nθ2 =θ2
n.
Conclusion. Because θ is unbiased and Var(p) = (nI(p))−1, θ is efficient.
64
![Page 167: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/167.jpg)
E.g. 2’. Exponential distr.: fY (y ; θ) = θ−1e−y/θ for y ≥ 0. θ = Y efficent?
Step. 1. Compute Fisher’s Information:
ln fY (y ; θ) = − ln θ −yθ
∂
∂θln fY (y ; θ) = −
1θ
+yθ2
−∂2
∂2θln fY (y ; θ) = −
1θ2
+2yθ3
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
1θ2
+2Yθ3
]= −
1θ2
+2θθ3
= θ−2.
I(θ) = θ−2
Step 2. Compute Var(θ):
Var(Y ) =1n2
n∑i=1
Var(Yi ) =1n2
nθ2 =θ2
n.
Conclusion. Because θ is unbiased and Var(p) = (nI(p))−1, θ is efficient.
64
![Page 168: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/168.jpg)
E.g. 3. fY (y ; θ) = 2y/θ2 for y ∈ [0, θ]. θ = 32 Y efficent?
Step. 1. Compute Fisher’s Information:ln fY (y ; θ) = ln(2y)− 2 ln θ
∂
∂θln fY (y ; θ) = −
2θ
By the definition of Fisher’s information,
I(θ) = E
[(∂
∂θln fY (y ; θ)
)2]
= E
[(−
2θ
)2]
=4θ2.
However, if we compute
−∂2
∂2θln fY (y ; θ) = −
2θ2
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
2θ2
]= −
2θ26=
4θ2
= I(θ). (†)
Step 2. Compute Var(θ):
Var(θ) =9
4nVar(Y ) =
94n
θ2
18=θ2
8n.
Discussion. Even though θ is unbiased, we have two discripencies: (†) and
Var(θ) =θ2
8n≤θ2
4n=
1nI(θ)
This is because this is not a regular estimation!65
![Page 169: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/169.jpg)
E.g. 3. fY (y ; θ) = 2y/θ2 for y ∈ [0, θ]. θ = 32 Y efficent?
Step. 1. Compute Fisher’s Information:ln fY (y ; θ) = ln(2y)− 2 ln θ
∂
∂θln fY (y ; θ) = −
2θ
By the definition of Fisher’s information,
I(θ) = E
[(∂
∂θln fY (y ; θ)
)2]
= E
[(−
2θ
)2]
=4θ2.
However, if we compute
−∂2
∂2θln fY (y ; θ) = −
2θ2
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
2θ2
]= −
2θ26=
4θ2
= I(θ). (†)
Step 2. Compute Var(θ):
Var(θ) =9
4nVar(Y ) =
94n
θ2
18=θ2
8n.
Discussion. Even though θ is unbiased, we have two discripencies: (†) and
Var(θ) =θ2
8n≤θ2
4n=
1nI(θ)
This is because this is not a regular estimation!65
![Page 170: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/170.jpg)
E.g. 3. fY (y ; θ) = 2y/θ2 for y ∈ [0, θ]. θ = 32 Y efficent?
Step. 1. Compute Fisher’s Information:ln fY (y ; θ) = ln(2y)− 2 ln θ
∂
∂θln fY (y ; θ) = −
2θ
By the definition of Fisher’s information,
I(θ) = E
[(∂
∂θln fY (y ; θ)
)2]
= E
[(−
2θ
)2]
=4θ2.
However, if we compute
−∂2
∂2θln fY (y ; θ) = −
2θ2
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
2θ2
]= −
2θ26=
4θ2
= I(θ). (†)
Step 2. Compute Var(θ):
Var(θ) =9
4nVar(Y ) =
94n
θ2
18=θ2
8n.
Discussion. Even though θ is unbiased, we have two discripencies: (†) and
Var(θ) =θ2
8n≤θ2
4n=
1nI(θ)
This is because this is not a regular estimation!65
![Page 171: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/171.jpg)
E.g. 3. fY (y ; θ) = 2y/θ2 for y ∈ [0, θ]. θ = 32 Y efficent?
Step. 1. Compute Fisher’s Information:ln fY (y ; θ) = ln(2y)− 2 ln θ
∂
∂θln fY (y ; θ) = −
2θ
By the definition of Fisher’s information,
I(θ) = E
[(∂
∂θln fY (y ; θ)
)2]
= E
[(−
2θ
)2]
=4θ2.
However, if we compute
−∂2
∂2θln fY (y ; θ) = −
2θ2
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
2θ2
]= −
2θ26=
4θ2
= I(θ). (†)
Step 2. Compute Var(θ):
Var(θ) =9
4nVar(Y ) =
94n
θ2
18=θ2
8n.
Discussion. Even though θ is unbiased, we have two discripencies: (†) and
Var(θ) =θ2
8n≤θ2
4n=
1nI(θ)
This is because this is not a regular estimation!65
![Page 172: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/172.jpg)
E.g. 3. fY (y ; θ) = 2y/θ2 for y ∈ [0, θ]. θ = 32 Y efficent?
Step. 1. Compute Fisher’s Information:ln fY (y ; θ) = ln(2y)− 2 ln θ
∂
∂θln fY (y ; θ) = −
2θ
By the definition of Fisher’s information,
I(θ) = E
[(∂
∂θln fY (y ; θ)
)2]
= E
[(−
2θ
)2]
=4θ2.
However, if we compute
−∂2
∂2θln fY (y ; θ) = −
2θ2
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
2θ2
]= −
2θ26=
4θ2
= I(θ). (†)
Step 2. Compute Var(θ):
Var(θ) =9
4nVar(Y ) =
94n
θ2
18=θ2
8n.
Discussion. Even though θ is unbiased, we have two discripencies: (†) and
Var(θ) =θ2
8n≤θ2
4n=
1nI(θ)
This is because this is not a regular estimation!65
![Page 173: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/173.jpg)
E.g. 3. fY (y ; θ) = 2y/θ2 for y ∈ [0, θ]. θ = 32 Y efficent?
Step. 1. Compute Fisher’s Information:ln fY (y ; θ) = ln(2y)− 2 ln θ
∂
∂θln fY (y ; θ) = −
2θ
By the definition of Fisher’s information,
I(θ) = E
[(∂
∂θln fY (y ; θ)
)2]
= E
[(−
2θ
)2]
=4θ2.
However, if we compute
−∂2
∂2θln fY (y ; θ) = −
2θ2
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
2θ2
]= −
2θ26=
4θ2
= I(θ). (†)
Step 2. Compute Var(θ):
Var(θ) =9
4nVar(Y ) =
94n
θ2
18=θ2
8n.
Discussion. Even though θ is unbiased, we have two discripencies: (†) and
Var(θ) =θ2
8n≤θ2
4n=
1nI(θ)
This is because this is not a regular estimation!65
![Page 174: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/174.jpg)
E.g. 3. fY (y ; θ) = 2y/θ2 for y ∈ [0, θ]. θ = 32 Y efficent?
Step. 1. Compute Fisher’s Information:ln fY (y ; θ) = ln(2y)− 2 ln θ
∂
∂θln fY (y ; θ) = −
2θ
By the definition of Fisher’s information,
I(θ) = E
[(∂
∂θln fY (y ; θ)
)2]
= E
[(−
2θ
)2]
=4θ2.
However, if we compute
−∂2
∂2θln fY (y ; θ) = −
2θ2
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
2θ2
]= −
2θ26=
4θ2
= I(θ). (†)
Step 2. Compute Var(θ):
Var(θ) =9
4nVar(Y ) =
94n
θ2
18=θ2
8n.
Discussion. Even though θ is unbiased, we have two discripencies: (†) and
Var(θ) =θ2
8n≤θ2
4n=
1nI(θ)
This is because this is not a regular estimation!65
![Page 175: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/175.jpg)
E.g. 3. fY (y ; θ) = 2y/θ2 for y ∈ [0, θ]. θ = 32 Y efficent?
Step. 1. Compute Fisher’s Information:ln fY (y ; θ) = ln(2y)− 2 ln θ
∂
∂θln fY (y ; θ) = −
2θ
By the definition of Fisher’s information,
I(θ) = E
[(∂
∂θln fY (y ; θ)
)2]
= E
[(−
2θ
)2]
=4θ2.
However, if we compute
−∂2
∂2θln fY (y ; θ) = −
2θ2
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
2θ2
]= −
2θ26=
4θ2
= I(θ). (†)
Step 2. Compute Var(θ):
Var(θ) =9
4nVar(Y ) =
94n
θ2
18=θ2
8n.
Discussion. Even though θ is unbiased, we have two discripencies: (†) and
Var(θ) =θ2
8n≤θ2
4n=
1nI(θ)
This is because this is not a regular estimation!65
![Page 176: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/176.jpg)
E.g. 3. fY (y ; θ) = 2y/θ2 for y ∈ [0, θ]. θ = 32 Y efficent?
Step. 1. Compute Fisher’s Information:ln fY (y ; θ) = ln(2y)− 2 ln θ
∂
∂θln fY (y ; θ) = −
2θ
By the definition of Fisher’s information,
I(θ) = E
[(∂
∂θln fY (y ; θ)
)2]
= E
[(−
2θ
)2]
=4θ2.
However, if we compute
−∂2
∂2θln fY (y ; θ) = −
2θ2
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
2θ2
]= −
2θ26=
4θ2
= I(θ). (†)
Step 2. Compute Var(θ):
Var(θ) =9
4nVar(Y ) =
94n
θ2
18=θ2
8n.
Discussion. Even though θ is unbiased, we have two discripencies: (†) and
Var(θ) =θ2
8n≤θ2
4n=
1nI(θ)
This is because this is not a regular estimation!65
![Page 177: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/177.jpg)
E.g. 3. fY (y ; θ) = 2y/θ2 for y ∈ [0, θ]. θ = 32 Y efficent?
Step. 1. Compute Fisher’s Information:ln fY (y ; θ) = ln(2y)− 2 ln θ
∂
∂θln fY (y ; θ) = −
2θ
By the definition of Fisher’s information,
I(θ) = E
[(∂
∂θln fY (y ; θ)
)2]
= E
[(−
2θ
)2]
=4θ2.
However, if we compute
−∂2
∂2θln fY (y ; θ) = −
2θ2
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
2θ2
]= −
2θ26=
4θ2
= I(θ). (†)
Step 2. Compute Var(θ):
Var(θ) =9
4nVar(Y ) =
94n
θ2
18=θ2
8n.
Discussion. Even though θ is unbiased, we have two discripencies: (†) and
Var(θ) =θ2
8n≤θ2
4n=
1nI(θ)
This is because this is not a regular estimation!65
![Page 178: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/178.jpg)
E.g. 3. fY (y ; θ) = 2y/θ2 for y ∈ [0, θ]. θ = 32 Y efficent?
Step. 1. Compute Fisher’s Information:ln fY (y ; θ) = ln(2y)− 2 ln θ
∂
∂θln fY (y ; θ) = −
2θ
By the definition of Fisher’s information,
I(θ) = E
[(∂
∂θln fY (y ; θ)
)2]
= E
[(−
2θ
)2]
=4θ2.
However, if we compute
−∂2
∂2θln fY (y ; θ) = −
2θ2
−E[∂2
∂2θln fY (Y ; θ)
]= E
[−
2θ2
]= −
2θ26=
4θ2
= I(θ). (†)
Step 2. Compute Var(θ):
Var(θ) =9
4nVar(Y ) =
94n
θ2
18=θ2
8n.
Discussion. Even though θ is unbiased, we have two discripencies: (†) and
Var(θ) =θ2
8n≤θ2
4n=
1nI(θ)
This is because this is not a regular estimation!65
![Page 179: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/179.jpg)
Plan
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
66
![Page 180: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/180.jpg)
Chapter 5. Estimation
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
67
![Page 181: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/181.jpg)
§ 5.6 Sufficient Estimators
Rationale: Let θ be an estimator to the unknown parameter θ. Whetherdoes θ contain all information about θ?
Equivalently, how can one reduce the random sample of size n, denoted by(X1, · · · ,Xn), to a function without lossing any information about θ?
E.g., let’s choose the function h(X1, · · · ,Xn) := 1n
∑ni=1 Xi , the sample mean.
In many cases, h(X1, · · · ,Xn) contains all relevant information about the truemean E(X ). In that case, h(X1, · · · ,Xn), as an estimator, is sufficent.
68
![Page 182: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/182.jpg)
§ 5.6 Sufficient Estimators
Rationale: Let θ be an estimator to the unknown parameter θ. Whetherdoes θ contain all information about θ?
Equivalently, how can one reduce the random sample of size n, denoted by(X1, · · · ,Xn), to a function without lossing any information about θ?
E.g., let’s choose the function h(X1, · · · ,Xn) := 1n
∑ni=1 Xi , the sample mean.
In many cases, h(X1, · · · ,Xn) contains all relevant information about the truemean E(X ). In that case, h(X1, · · · ,Xn), as an estimator, is sufficent.
68
![Page 183: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/183.jpg)
§ 5.6 Sufficient Estimators
Rationale: Let θ be an estimator to the unknown parameter θ. Whetherdoes θ contain all information about θ?
Equivalently, how can one reduce the random sample of size n, denoted by(X1, · · · ,Xn), to a function without lossing any information about θ?
E.g., let’s choose the function h(X1, · · · ,Xn) := 1n
∑ni=1 Xi , the sample mean.
In many cases, h(X1, · · · ,Xn) contains all relevant information about the truemean E(X ). In that case, h(X1, · · · ,Xn), as an estimator, is sufficent.
68
![Page 184: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/184.jpg)
Definition. Let (X1, · · · ,Xn) be a random sample of size n from a discretepopulation with a unknown parameter θ, of whhich θ (resp. θe) be anestimator (resp. estimate). We call θ and θe sufficient if
P(
X1 = k1, · · · ,Xn = kn
∣∣∣∣ θ = θe
)= b(k1, · · · , kn) (Sufficency-1)
is a function that does not depend on θ.
In case for random sample (Y1, · · · ,Yn) from the continuous population,(Sufficency-1) should be
fY1,··· ,Yn|θ=θe
(y1, · · · , yn
∣∣∣∣ θ = θe
)= b(y1, · · · , yn)
Note: θ = h(X1, · · · ,Xn) and θe = h(k1, · · · , kn).or θ = h(Y1, · · · ,Yn) and θe = h(y1, · · · , yn).
69
![Page 185: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/185.jpg)
Equivalently,
Definition. ... θ (or θe) is sufficient if the likelihood function can befactorized as:
L(θ) =
n∏i=1
pX (ki ; θ) = g(θe, θ) b(k1, · · · , kn) Discrete
n∏i=1
fY (yi ; θ) = g(θe, θ) b(y1, · · · , yn) Continous(Sufficency-2)
where g is a function of two arguments only and b is a function that does notdepend on θ.
70
![Page 186: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/186.jpg)
E.g. 1. A random sample of size n from Bernoulli(p). p =∑n
i=1 Xi . Checksufficiency of p for p by (Sufficency-1):
Case I: If k1, · · · , kn ∈ {0, 1} such that∑n
i=1 ki 6= c, then
P(X1 = k1, · · · ,Xn = kn
∣∣ p = c)
= 0.
Case II: If k1, · · · , kn ∈ {0, 1} such that∑n
i=1 ki = c, then
71
![Page 187: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/187.jpg)
E.g. 1. A random sample of size n from Bernoulli(p). p =∑n
i=1 Xi . Checksufficiency of p for p by (Sufficency-1):
Case I: If k1, · · · , kn ∈ {0, 1} such that∑n
i=1 ki 6= c, then
P(X1 = k1, · · · ,Xn = kn
∣∣ p = c)
= 0.
Case II: If k1, · · · , kn ∈ {0, 1} such that∑n
i=1 ki = c, then
71
![Page 188: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/188.jpg)
E.g. 1. A random sample of size n from Bernoulli(p). p =∑n
i=1 Xi . Checksufficiency of p for p by (Sufficency-1):
Case I: If k1, · · · , kn ∈ {0, 1} such that∑n
i=1 ki 6= c, then
P(X1 = k1, · · · ,Xn = kn
∣∣ p = c)
= 0.
Case II: If k1, · · · , kn ∈ {0, 1} such that∑n
i=1 ki = c, then
71
![Page 189: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/189.jpg)
E.g. 1. A random sample of size n from Bernoulli(p). p =∑n
i=1 Xi . Checksufficiency of p for p by (Sufficency-1):
Case I: If k1, · · · , kn ∈ {0, 1} such that∑n
i=1 ki 6= c, then
P(X1 = k1, · · · ,Xn = kn
∣∣ p = c)
= 0.
Case II: If k1, · · · , kn ∈ {0, 1} such that∑n
i=1 ki = c, then
71
![Page 190: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/190.jpg)
P(X1 = k1, · · · ,Xn = kn
∣∣ p = c)
=P(X1 = k1, · · · ,Xn = kn, p = c
)P(p = c)
=P(
X1 = k1, · · · ,Xn = kn,Xn +∑n−1
i=1 Xi = c)
P(∑n
i=1 Xi = c)
=P(
X1 = k1, · · · ,Xn−1 = kn−1,Xn = c −∑n−1
i=1 ki
)P(∑n
i=1 Xi = c)
=
(∏n−1i=1 pki (1− p)1−ki
)× pc−
∑n−1i=1 ki (1− p)1−c+
∑n−1i=1 ki(n
c
)pc(1− p)n−c
=1(nc
) .
72
![Page 191: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/191.jpg)
P(X1 = k1, · · · ,Xn = kn
∣∣ p = c)
=P(X1 = k1, · · · ,Xn = kn, p = c
)P(p = c)
=P(
X1 = k1, · · · ,Xn = kn,Xn +∑n−1
i=1 Xi = c)
P(∑n
i=1 Xi = c)
=P(
X1 = k1, · · · ,Xn−1 = kn−1,Xn = c −∑n−1
i=1 ki
)P(∑n
i=1 Xi = c)
=
(∏n−1i=1 pki (1− p)1−ki
)× pc−
∑n−1i=1 ki (1− p)1−c+
∑n−1i=1 ki(n
c
)pc(1− p)n−c
=1(nc
) .
72
![Page 192: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/192.jpg)
P(X1 = k1, · · · ,Xn = kn
∣∣ p = c)
=P(X1 = k1, · · · ,Xn = kn, p = c
)P(p = c)
=P(
X1 = k1, · · · ,Xn = kn,Xn +∑n−1
i=1 Xi = c)
P(∑n
i=1 Xi = c)
=P(
X1 = k1, · · · ,Xn−1 = kn−1,Xn = c −∑n−1
i=1 ki
)P(∑n
i=1 Xi = c)
=
(∏n−1i=1 pki (1− p)1−ki
)× pc−
∑n−1i=1 ki (1− p)1−c+
∑n−1i=1 ki(n
c
)pc(1− p)n−c
=1(nc
) .
72
![Page 193: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/193.jpg)
P(X1 = k1, · · · ,Xn = kn
∣∣ p = c)
=P(X1 = k1, · · · ,Xn = kn, p = c
)P(p = c)
=P(
X1 = k1, · · · ,Xn = kn,Xn +∑n−1
i=1 Xi = c)
P(∑n
i=1 Xi = c)
=P(
X1 = k1, · · · ,Xn−1 = kn−1,Xn = c −∑n−1
i=1 ki
)P(∑n
i=1 Xi = c)
=
(∏n−1i=1 pki (1− p)1−ki
)× pc−
∑n−1i=1 ki (1− p)1−c+
∑n−1i=1 ki(n
c
)pc(1− p)n−c
=1(nc
) .
72
![Page 194: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/194.jpg)
P(X1 = k1, · · · ,Xn = kn
∣∣ p = c)
=P(X1 = k1, · · · ,Xn = kn, p = c
)P(p = c)
=P(
X1 = k1, · · · ,Xn = kn,Xn +∑n−1
i=1 Xi = c)
P(∑n
i=1 Xi = c)
=P(
X1 = k1, · · · ,Xn−1 = kn−1,Xn = c −∑n−1
i=1 ki
)P(∑n
i=1 Xi = c)
=
(∏n−1i=1 pki (1− p)1−ki
)× pc−
∑n−1i=1 ki (1− p)1−c+
∑n−1i=1 ki(n
c
)pc(1− p)n−c
=1(nc
) .
72
![Page 195: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/195.jpg)
P(X1 = k1, · · · ,Xn = kn
∣∣ p = c)
=P(X1 = k1, · · · ,Xn = kn, p = c
)P(p = c)
=P(
X1 = k1, · · · ,Xn = kn,Xn +∑n−1
i=1 Xi = c)
P(∑n
i=1 Xi = c)
=P(
X1 = k1, · · · ,Xn−1 = kn−1,Xn = c −∑n−1
i=1 ki
)P(∑n
i=1 Xi = c)
=
(∏n−1i=1 pki (1− p)1−ki
)× pc−
∑n−1i=1 ki (1− p)1−c+
∑n−1i=1 ki(n
c
)pc(1− p)n−c
=1(nc
) .
72
![Page 196: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/196.jpg)
In summary,
P(X1 = k1, · · · ,Xn = kn
∣∣ p = c)
=
{1
(nc)
if ki ∈ {0, 1} s.t.∑n
i=1 ki = c,
0 otherwise.
Hence, by (Sufficency-1), p =∑n
i=1 Xi is a sufficent estimator for p.
73
![Page 197: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/197.jpg)
In summary,
P(X1 = k1, · · · ,Xn = kn
∣∣ p = c)
=
{1
(nc)
if ki ∈ {0, 1} s.t.∑n
i=1 ki = c,
0 otherwise.
Hence, by (Sufficency-1), p =∑n
i=1 Xi is a sufficent estimator for p.
73
![Page 198: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/198.jpg)
E.g. 1’. As in E.g. 1, check sufficiency of p for p by (Sufficency-2):
Notice that pe =∑n
i=1 ki . Then
L(p) =n∏
i=1
pX (ki ; p) =n∏
i=1
pki (1− p)1−ki
= p∑n
i=1 ki (1− p)n−∑n
i=1 ki
= ppe (1− p)n−pe
Therefore, pe (or p) is sufficent since (Sufficency-2) is satisfied with
g(pe, p) = ppe (1− p)n−pe and b(k1, · · · , kn) = 1.
Comment 1. The estimator p is sufficient but not unbiased since E(p) = np 6= p.
2. Any one-to-one function of a sufficient estimator is again a sufficentestimator. E.g., p2 := 1
n p, which is a unbiased, sufficent, and MVE.
3. p3 := X1 is not sufficient!
74
![Page 199: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/199.jpg)
E.g. 1’. As in E.g. 1, check sufficiency of p for p by (Sufficency-2):
Notice that pe =∑n
i=1 ki . Then
L(p) =n∏
i=1
pX (ki ; p) =n∏
i=1
pki (1− p)1−ki
= p∑n
i=1 ki (1− p)n−∑n
i=1 ki
= ppe (1− p)n−pe
Therefore, pe (or p) is sufficent since (Sufficency-2) is satisfied with
g(pe, p) = ppe (1− p)n−pe and b(k1, · · · , kn) = 1.
Comment 1. The estimator p is sufficient but not unbiased since E(p) = np 6= p.
2. Any one-to-one function of a sufficient estimator is again a sufficentestimator. E.g., p2 := 1
n p, which is a unbiased, sufficent, and MVE.
3. p3 := X1 is not sufficient!
74
![Page 200: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/200.jpg)
E.g. 1’. As in E.g. 1, check sufficiency of p for p by (Sufficency-2):
Notice that pe =∑n
i=1 ki . Then
L(p) =n∏
i=1
pX (ki ; p) =n∏
i=1
pki (1− p)1−ki
= p∑n
i=1 ki (1− p)n−∑n
i=1 ki
= ppe (1− p)n−pe
Therefore, pe (or p) is sufficent since (Sufficency-2) is satisfied with
g(pe, p) = ppe (1− p)n−pe and b(k1, · · · , kn) = 1.
Comment 1. The estimator p is sufficient but not unbiased since E(p) = np 6= p.
2. Any one-to-one function of a sufficient estimator is again a sufficentestimator. E.g., p2 := 1
n p, which is a unbiased, sufficent, and MVE.
3. p3 := X1 is not sufficient!
74
![Page 201: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/201.jpg)
E.g. 1’. As in E.g. 1, check sufficiency of p for p by (Sufficency-2):
Notice that pe =∑n
i=1 ki . Then
L(p) =n∏
i=1
pX (ki ; p) =n∏
i=1
pki (1− p)1−ki
= p∑n
i=1 ki (1− p)n−∑n
i=1 ki
= ppe (1− p)n−pe
Therefore, pe (or p) is sufficent since (Sufficency-2) is satisfied with
g(pe, p) = ppe (1− p)n−pe and b(k1, · · · , kn) = 1.
Comment 1. The estimator p is sufficient but not unbiased since E(p) = np 6= p.
2. Any one-to-one function of a sufficient estimator is again a sufficentestimator. E.g., p2 := 1
n p, which is a unbiased, sufficent, and MVE.
3. p3 := X1 is not sufficient!
74
![Page 202: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/202.jpg)
E.g. 1’. As in E.g. 1, check sufficiency of p for p by (Sufficency-2):
Notice that pe =∑n
i=1 ki . Then
L(p) =n∏
i=1
pX (ki ; p) =n∏
i=1
pki (1− p)1−ki
= p∑n
i=1 ki (1− p)n−∑n
i=1 ki
= ppe (1− p)n−pe
Therefore, pe (or p) is sufficent since (Sufficency-2) is satisfied with
g(pe, p) = ppe (1− p)n−pe and b(k1, · · · , kn) = 1.
Comment 1. The estimator p is sufficient but not unbiased since E(p) = np 6= p.
2. Any one-to-one function of a sufficient estimator is again a sufficentestimator. E.g., p2 := 1
n p, which is a unbiased, sufficent, and MVE.
3. p3 := X1 is not sufficient!
74
![Page 203: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/203.jpg)
E.g. 1’. As in E.g. 1, check sufficiency of p for p by (Sufficency-2):
Notice that pe =∑n
i=1 ki . Then
L(p) =n∏
i=1
pX (ki ; p) =n∏
i=1
pki (1− p)1−ki
= p∑n
i=1 ki (1− p)n−∑n
i=1 ki
= ppe (1− p)n−pe
Therefore, pe (or p) is sufficent since (Sufficency-2) is satisfied with
g(pe, p) = ppe (1− p)n−pe and b(k1, · · · , kn) = 1.
Comment 1. The estimator p is sufficient but not unbiased since E(p) = np 6= p.
2. Any one-to-one function of a sufficient estimator is again a sufficentestimator. E.g., p2 := 1
n p, which is a unbiased, sufficent, and MVE.
3. p3 := X1 is not sufficient!
74
![Page 204: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/204.jpg)
E.g. 1’. As in E.g. 1, check sufficiency of p for p by (Sufficency-2):
Notice that pe =∑n
i=1 ki . Then
L(p) =n∏
i=1
pX (ki ; p) =n∏
i=1
pki (1− p)1−ki
= p∑n
i=1 ki (1− p)n−∑n
i=1 ki
= ppe (1− p)n−pe
Therefore, pe (or p) is sufficent since (Sufficency-2) is satisfied with
g(pe, p) = ppe (1− p)n−pe and b(k1, · · · , kn) = 1.
Comment 1. The estimator p is sufficient but not unbiased since E(p) = np 6= p.
2. Any one-to-one function of a sufficient estimator is again a sufficentestimator. E.g., p2 := 1
n p, which is a unbiased, sufficent, and MVE.
3. p3 := X1 is not sufficient!
74
![Page 205: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/205.jpg)
E.g. 1’. As in E.g. 1, check sufficiency of p for p by (Sufficency-2):
Notice that pe =∑n
i=1 ki . Then
L(p) =n∏
i=1
pX (ki ; p) =n∏
i=1
pki (1− p)1−ki
= p∑n
i=1 ki (1− p)n−∑n
i=1 ki
= ppe (1− p)n−pe
Therefore, pe (or p) is sufficent since (Sufficency-2) is satisfied with
g(pe, p) = ppe (1− p)n−pe and b(k1, · · · , kn) = 1.
Comment 1. The estimator p is sufficient but not unbiased since E(p) = np 6= p.
2. Any one-to-one function of a sufficient estimator is again a sufficentestimator. E.g., p2 := 1
n p, which is a unbiased, sufficent, and MVE.
3. p3 := X1 is not sufficient!
74
![Page 206: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/206.jpg)
E.g. 1’. As in E.g. 1, check sufficiency of p for p by (Sufficency-2):
Notice that pe =∑n
i=1 ki . Then
L(p) =n∏
i=1
pX (ki ; p) =n∏
i=1
pki (1− p)1−ki
= p∑n
i=1 ki (1− p)n−∑n
i=1 ki
= ppe (1− p)n−pe
Therefore, pe (or p) is sufficent since (Sufficency-2) is satisfied with
g(pe, p) = ppe (1− p)n−pe and b(k1, · · · , kn) = 1.
Comment 1. The estimator p is sufficient but not unbiased since E(p) = np 6= p.
2. Any one-to-one function of a sufficient estimator is again a sufficentestimator. E.g., p2 := 1
n p, which is a unbiased, sufficent, and MVE.
3. p3 := X1 is not sufficient!
74
![Page 207: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/207.jpg)
E.g. 1’. As in E.g. 1, check sufficiency of p for p by (Sufficency-2):
Notice that pe =∑n
i=1 ki . Then
L(p) =n∏
i=1
pX (ki ; p) =n∏
i=1
pki (1− p)1−ki
= p∑n
i=1 ki (1− p)n−∑n
i=1 ki
= ppe (1− p)n−pe
Therefore, pe (or p) is sufficent since (Sufficency-2) is satisfied with
g(pe, p) = ppe (1− p)n−pe and b(k1, · · · , kn) = 1.
Comment 1. The estimator p is sufficient but not unbiased since E(p) = np 6= p.
2. Any one-to-one function of a sufficient estimator is again a sufficentestimator. E.g., p2 := 1
n p, which is a unbiased, sufficent, and MVE.
3. p3 := X1 is not sufficient!
74
![Page 208: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/208.jpg)
E.g. 1’. As in E.g. 1, check sufficiency of p for p by (Sufficency-2):
Notice that pe =∑n
i=1 ki . Then
L(p) =n∏
i=1
pX (ki ; p) =n∏
i=1
pki (1− p)1−ki
= p∑n
i=1 ki (1− p)n−∑n
i=1 ki
= ppe (1− p)n−pe
Therefore, pe (or p) is sufficent since (Sufficency-2) is satisfied with
g(pe, p) = ppe (1− p)n−pe and b(k1, · · · , kn) = 1.
Comment 1. The estimator p is sufficient but not unbiased since E(p) = np 6= p.
2. Any one-to-one function of a sufficient estimator is again a sufficentestimator. E.g., p2 := 1
n p, which is a unbiased, sufficent, and MVE.
3. p3 := X1 is not sufficient!
74
![Page 209: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/209.jpg)
E.g. 2. Poisson(λ), pX (k ;λ) = e−λλk/k !, k = 0, 1, · · · . Show thatλ = (
∑ni=1 Xi )
2 is sufficient for λ for a sample of size n.
Sol: The Corresponding estimate is λe = (∑n
i=1 ki )2.
L(λ) =n∏
i=1
e−λλki
ki !
= e−nλλ∑n
i=1 ki
(n∏
i=1
ki !
)−1
= e−nλλ√λe
︸ ︷︷ ︸g(λe,λ)
×
(n∏
i=1
ki !
)−1
︸ ︷︷ ︸b(k1,··· ,kn)
.
Hence, λ is sufficent estimator for λ. �
75
![Page 210: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/210.jpg)
E.g. 2. Poisson(λ), pX (k ;λ) = e−λλk/k !, k = 0, 1, · · · . Show thatλ = (
∑ni=1 Xi )
2 is sufficient for λ for a sample of size n.
Sol: The Corresponding estimate is λe = (∑n
i=1 ki )2.
L(λ) =n∏
i=1
e−λλki
ki !
= e−nλλ∑n
i=1 ki
(n∏
i=1
ki !
)−1
= e−nλλ√λe
︸ ︷︷ ︸g(λe,λ)
×
(n∏
i=1
ki !
)−1
︸ ︷︷ ︸b(k1,··· ,kn)
.
Hence, λ is sufficent estimator for λ. �
75
![Page 211: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/211.jpg)
E.g. 2. Poisson(λ), pX (k ;λ) = e−λλk/k !, k = 0, 1, · · · . Show thatλ = (
∑ni=1 Xi )
2 is sufficient for λ for a sample of size n.
Sol: The Corresponding estimate is λe = (∑n
i=1 ki )2.
L(λ) =n∏
i=1
e−λλki
ki !
= e−nλλ∑n
i=1 ki
(n∏
i=1
ki !
)−1
= e−nλλ√λe
︸ ︷︷ ︸g(λe,λ)
×
(n∏
i=1
ki !
)−1
︸ ︷︷ ︸b(k1,··· ,kn)
.
Hence, λ is sufficent estimator for λ. �
75
![Page 212: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/212.jpg)
E.g. 2. Poisson(λ), pX (k ;λ) = e−λλk/k !, k = 0, 1, · · · . Show thatλ = (
∑ni=1 Xi )
2 is sufficient for λ for a sample of size n.
Sol: The Corresponding estimate is λe = (∑n
i=1 ki )2.
L(λ) =n∏
i=1
e−λλki
ki !
= e−nλλ∑n
i=1 ki
(n∏
i=1
ki !
)−1
= e−nλλ√λe
︸ ︷︷ ︸g(λe,λ)
×
(n∏
i=1
ki !
)−1
︸ ︷︷ ︸b(k1,··· ,kn)
.
Hence, λ is sufficent estimator for λ. �
75
![Page 213: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/213.jpg)
E.g. 2. Poisson(λ), pX (k ;λ) = e−λλk/k !, k = 0, 1, · · · . Show thatλ = (
∑ni=1 Xi )
2 is sufficient for λ for a sample of size n.
Sol: The Corresponding estimate is λe = (∑n
i=1 ki )2.
L(λ) =n∏
i=1
e−λλki
ki !
= e−nλλ∑n
i=1 ki
(n∏
i=1
ki !
)−1
= e−nλλ√λe
︸ ︷︷ ︸g(λe,λ)
×
(n∏
i=1
ki !
)−1
︸ ︷︷ ︸b(k1,··· ,kn)
.
Hence, λ is sufficent estimator for λ. �
75
![Page 214: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/214.jpg)
E.g. 2. Poisson(λ), pX (k ;λ) = e−λλk/k !, k = 0, 1, · · · . Show thatλ = (
∑ni=1 Xi )
2 is sufficient for λ for a sample of size n.
Sol: The Corresponding estimate is λe = (∑n
i=1 ki )2.
L(λ) =n∏
i=1
e−λλki
ki !
= e−nλλ∑n
i=1 ki
(n∏
i=1
ki !
)−1
= e−nλλ√λe
︸ ︷︷ ︸g(λe,λ)
×
(n∏
i=1
ki !
)−1
︸ ︷︷ ︸b(k1,··· ,kn)
.
Hence, λ is sufficent estimator for λ. �
75
![Page 215: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/215.jpg)
E.g. 2. Poisson(λ), pX (k ;λ) = e−λλk/k !, k = 0, 1, · · · . Show thatλ = (
∑ni=1 Xi )
2 is sufficient for λ for a sample of size n.
Sol: The Corresponding estimate is λe = (∑n
i=1 ki )2.
L(λ) =n∏
i=1
e−λλki
ki !
= e−nλλ∑n
i=1 ki
(n∏
i=1
ki !
)−1
= e−nλλ√λe
︸ ︷︷ ︸g(λe,λ)
×
(n∏
i=1
ki !
)−1
︸ ︷︷ ︸b(k1,··· ,kn)
.
Hence, λ is sufficent estimator for λ. �
75
![Page 216: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/216.jpg)
E.g. 3. Let Y1, · · · ,Yn be a random sample from fY (y ; θ) = 2yθ2 for y ∈ [0, θ].
Whether is the MLE θ = Ymax sufficent for θ?
Sol: The corresponding estimate is θe = ymax .
L(θ) =n∏
i=1
2yθ2 I[0,θ](yi ) = 2nθ−2n
(n∏
i=1
yi
)×
n∏i=1
I[0,θ](yi )
= 2nθ−2n
(n∏
i=1
yi
)× I[0,θ](ymax )
= 2nθ−2nI[0,θ](θe)
︸ ︷︷ ︸=g(θe,θ)
×n∏
i=1
yi︸ ︷︷ ︸=b(y1,··· ,yk )
.
Hence, θ is a sufficent estimator for θ. �
Note: MME θ = 32 Y is NOT sufficent for θ!
76
![Page 217: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/217.jpg)
E.g. 3. Let Y1, · · · ,Yn be a random sample from fY (y ; θ) = 2yθ2 for y ∈ [0, θ].
Whether is the MLE θ = Ymax sufficent for θ?
Sol: The corresponding estimate is θe = ymax .
L(θ) =n∏
i=1
2yθ2 I[0,θ](yi ) = 2nθ−2n
(n∏
i=1
yi
)×
n∏i=1
I[0,θ](yi )
= 2nθ−2n
(n∏
i=1
yi
)× I[0,θ](ymax )
= 2nθ−2nI[0,θ](θe)
︸ ︷︷ ︸=g(θe,θ)
×n∏
i=1
yi︸ ︷︷ ︸=b(y1,··· ,yk )
.
Hence, θ is a sufficent estimator for θ. �
Note: MME θ = 32 Y is NOT sufficent for θ!
76
![Page 218: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/218.jpg)
E.g. 3. Let Y1, · · · ,Yn be a random sample from fY (y ; θ) = 2yθ2 for y ∈ [0, θ].
Whether is the MLE θ = Ymax sufficent for θ?
Sol: The corresponding estimate is θe = ymax .
L(θ) =n∏
i=1
2yθ2 I[0,θ](yi ) = 2nθ−2n
(n∏
i=1
yi
)×
n∏i=1
I[0,θ](yi )
= 2nθ−2n
(n∏
i=1
yi
)× I[0,θ](ymax )
= 2nθ−2nI[0,θ](θe)
︸ ︷︷ ︸=g(θe,θ)
×n∏
i=1
yi︸ ︷︷ ︸=b(y1,··· ,yk )
.
Hence, θ is a sufficent estimator for θ. �
Note: MME θ = 32 Y is NOT sufficent for θ!
76
![Page 219: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/219.jpg)
E.g. 3. Let Y1, · · · ,Yn be a random sample from fY (y ; θ) = 2yθ2 for y ∈ [0, θ].
Whether is the MLE θ = Ymax sufficent for θ?
Sol: The corresponding estimate is θe = ymax .
L(θ) =n∏
i=1
2yθ2 I[0,θ](yi ) = 2nθ−2n
(n∏
i=1
yi
)×
n∏i=1
I[0,θ](yi )
= 2nθ−2n
(n∏
i=1
yi
)× I[0,θ](ymax )
= 2nθ−2nI[0,θ](θe)
︸ ︷︷ ︸=g(θe,θ)
×n∏
i=1
yi︸ ︷︷ ︸=b(y1,··· ,yk )
.
Hence, θ is a sufficent estimator for θ. �
Note: MME θ = 32 Y is NOT sufficent for θ!
76
![Page 220: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/220.jpg)
E.g. 3. Let Y1, · · · ,Yn be a random sample from fY (y ; θ) = 2yθ2 for y ∈ [0, θ].
Whether is the MLE θ = Ymax sufficent for θ?
Sol: The corresponding estimate is θe = ymax .
L(θ) =n∏
i=1
2yθ2 I[0,θ](yi ) = 2nθ−2n
(n∏
i=1
yi
)×
n∏i=1
I[0,θ](yi )
= 2nθ−2n
(n∏
i=1
yi
)× I[0,θ](ymax )
= 2nθ−2nI[0,θ](θe)
︸ ︷︷ ︸=g(θe,θ)
×n∏
i=1
yi︸ ︷︷ ︸=b(y1,··· ,yk )
.
Hence, θ is a sufficent estimator for θ. �
Note: MME θ = 32 Y is NOT sufficent for θ!
76
![Page 221: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/221.jpg)
E.g. 3. Let Y1, · · · ,Yn be a random sample from fY (y ; θ) = 2yθ2 for y ∈ [0, θ].
Whether is the MLE θ = Ymax sufficent for θ?
Sol: The corresponding estimate is θe = ymax .
L(θ) =n∏
i=1
2yθ2 I[0,θ](yi ) = 2nθ−2n
(n∏
i=1
yi
)×
n∏i=1
I[0,θ](yi )
= 2nθ−2n
(n∏
i=1
yi
)× I[0,θ](ymax )
= 2nθ−2nI[0,θ](θe)
︸ ︷︷ ︸=g(θe,θ)
×n∏
i=1
yi︸ ︷︷ ︸=b(y1,··· ,yk )
.
Hence, θ is a sufficent estimator for θ. �
Note: MME θ = 32 Y is NOT sufficent for θ!
76
![Page 222: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/222.jpg)
E.g. 3. Let Y1, · · · ,Yn be a random sample from fY (y ; θ) = 2yθ2 for y ∈ [0, θ].
Whether is the MLE θ = Ymax sufficent for θ?
Sol: The corresponding estimate is θe = ymax .
L(θ) =n∏
i=1
2yθ2 I[0,θ](yi ) = 2nθ−2n
(n∏
i=1
yi
)×
n∏i=1
I[0,θ](yi )
= 2nθ−2n
(n∏
i=1
yi
)× I[0,θ](ymax )
= 2nθ−2nI[0,θ](θe)
︸ ︷︷ ︸=g(θe,θ)
×n∏
i=1
yi︸ ︷︷ ︸=b(y1,··· ,yk )
.
Hence, θ is a sufficent estimator for θ. �
Note: MME θ = 32 Y is NOT sufficent for θ!
76
![Page 223: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/223.jpg)
E.g. 3. Let Y1, · · · ,Yn be a random sample from fY (y ; θ) = 2yθ2 for y ∈ [0, θ].
Whether is the MLE θ = Ymax sufficent for θ?
Sol: The corresponding estimate is θe = ymax .
L(θ) =n∏
i=1
2yθ2 I[0,θ](yi ) = 2nθ−2n
(n∏
i=1
yi
)×
n∏i=1
I[0,θ](yi )
= 2nθ−2n
(n∏
i=1
yi
)× I[0,θ](ymax )
= 2nθ−2nI[0,θ](θe)
︸ ︷︷ ︸=g(θe,θ)
×n∏
i=1
yi︸ ︷︷ ︸=b(y1,··· ,yk )
.
Hence, θ is a sufficent estimator for θ. �
Note: MME θ = 32 Y is NOT sufficent for θ!
76
![Page 224: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/224.jpg)
Plan
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
77
![Page 225: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/225.jpg)
Chapter 5. Estimation
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
78
![Page 226: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/226.jpg)
§ 5.7 Consistency
Definition. An estimator θn = h(W1, · · · ,Wn) is said to be consistent if itconverges to θ in probability, i.e., for any ε > 0,
limn→∞
P(|θn − θ| < ε
)= 1.
Comment: In the ε-δ language, the above convergence in probability says
∀ε > 0, ∀δ > 0, ∃n(ε, δ) > 0, s.t . ∀n ≥ n(ε, δ),
P(|θn − θ| < ε
)> 1− δ.
79
![Page 227: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/227.jpg)
§ 5.7 Consistency
Definition. An estimator θn = h(W1, · · · ,Wn) is said to be consistent if itconverges to θ in probability, i.e., for any ε > 0,
limn→∞
P(|θn − θ| < ε
)= 1.
Comment: In the ε-δ language, the above convergence in probability says
∀ε > 0, ∀δ > 0, ∃n(ε, δ) > 0, s.t . ∀n ≥ n(ε, δ),
P(|θn − θ| < ε
)> 1− δ.
79
![Page 228: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/228.jpg)
A useful tool to check convergence in probability is
Theorem. (Chebyshev’s inequality) Let W be any r.v. with finite mean µ andvariance σ2. Then for any ε > 0
P (|W − µ| < ε) ≥ 1− σ2
ε2 ,
or, equivalently,
P (|W − µ| ≥ ε) ≤ σ2
ε2 .
Proof. ... �
80
![Page 229: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/229.jpg)
As a consequence of Chebyshev’s inequality, we have
Proposition. The sample mean µn = 1n
∑ni=1 Wi is consistent for E(W ) = µ,
provided that the population W has finite mean µ and variance σ2.
Proof.
E(µn) = µ and Var(µn) =σ2
n.
∀ε > 0, P (|µn − µ| ≤ ε) ≥ 1− σ2
nε2 → 1.
�
81
![Page 230: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/230.jpg)
E.g. 1. Let Y1, · · · ,Yn be a random sample of size n from the uniform pdffY (y ; θ) = 1/θ, y ∈ [0, θ]. Let θn = Ymax . We know that Ymax is biased. Isit consistent?
Sol. The c.d.f. of Y is equal to FY (y) = y/θ for y ∈ [0, θ]. Hence,
fYmax (y) = nFY (y)n−1fY (y) =nyn−1
θn , y ∈ [0, θ].
Therefore,
P(|θn − θ| < ε) = P(θ − ε < θn < θ + ε)
=
∫ θ
θ−ε
nyn−1
θn dy +
∫ θ+ε
θ
0dy
= 1−(θ − εθ
)n
→ 0 as n→∞.
�
82
![Page 231: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/231.jpg)
E.g. 1. Let Y1, · · · ,Yn be a random sample of size n from the uniform pdffY (y ; θ) = 1/θ, y ∈ [0, θ]. Let θn = Ymax . We know that Ymax is biased. Isit consistent?
Sol. The c.d.f. of Y is equal to FY (y) = y/θ for y ∈ [0, θ]. Hence,
fYmax (y) = nFY (y)n−1fY (y) =nyn−1
θn , y ∈ [0, θ].
Therefore,
P(|θn − θ| < ε) = P(θ − ε < θn < θ + ε)
=
∫ θ
θ−ε
nyn−1
θn dy +
∫ θ+ε
θ
0dy
= 1−(θ − εθ
)n
→ 0 as n→∞.
�
82
![Page 232: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/232.jpg)
E.g. 1. Let Y1, · · · ,Yn be a random sample of size n from the uniform pdffY (y ; θ) = 1/θ, y ∈ [0, θ]. Let θn = Ymax . We know that Ymax is biased. Isit consistent?
Sol. The c.d.f. of Y is equal to FY (y) = y/θ for y ∈ [0, θ]. Hence,
fYmax (y) = nFY (y)n−1fY (y) =nyn−1
θn , y ∈ [0, θ].
Therefore,
P(|θn − θ| < ε) = P(θ − ε < θn < θ + ε)
=
∫ θ
θ−ε
nyn−1
θn dy +
∫ θ+ε
θ
0dy
= 1−(θ − εθ
)n
→ 0 as n→∞.
�
82
![Page 233: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/233.jpg)
E.g. 1. Let Y1, · · · ,Yn be a random sample of size n from the uniform pdffY (y ; θ) = 1/θ, y ∈ [0, θ]. Let θn = Ymax . We know that Ymax is biased. Isit consistent?
Sol. The c.d.f. of Y is equal to FY (y) = y/θ for y ∈ [0, θ]. Hence,
fYmax (y) = nFY (y)n−1fY (y) =nyn−1
θn , y ∈ [0, θ].
Therefore,
P(|θn − θ| < ε) = P(θ − ε < θn < θ + ε)
=
∫ θ
θ−ε
nyn−1
θn dy +
∫ θ+ε
θ
0dy
= 1−(θ − εθ
)n
→ 0 as n→∞.
�
82
![Page 234: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/234.jpg)
E.g. 1. Let Y1, · · · ,Yn be a random sample of size n from the uniform pdffY (y ; θ) = 1/θ, y ∈ [0, θ]. Let θn = Ymax . We know that Ymax is biased. Isit consistent?
Sol. The c.d.f. of Y is equal to FY (y) = y/θ for y ∈ [0, θ]. Hence,
fYmax (y) = nFY (y)n−1fY (y) =nyn−1
θn , y ∈ [0, θ].
Therefore,
P(|θn − θ| < ε) = P(θ − ε < θn < θ + ε)
=
∫ θ
θ−ε
nyn−1
θn dy +
∫ θ+ε
θ
0dy
= 1−(θ − εθ
)n
→ 0 as n→∞.
�
82
![Page 235: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/235.jpg)
E.g. 1. Let Y1, · · · ,Yn be a random sample of size n from the uniform pdffY (y ; θ) = 1/θ, y ∈ [0, θ]. Let θn = Ymax . We know that Ymax is biased. Isit consistent?
Sol. The c.d.f. of Y is equal to FY (y) = y/θ for y ∈ [0, θ]. Hence,
fYmax (y) = nFY (y)n−1fY (y) =nyn−1
θn , y ∈ [0, θ].
Therefore,
P(|θn − θ| < ε) = P(θ − ε < θn < θ + ε)
=
∫ θ
θ−ε
nyn−1
θn dy +
∫ θ+ε
θ
0dy
= 1−(θ − εθ
)n
→ 0 as n→∞.
�
82
![Page 236: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/236.jpg)
E.g. 1. Let Y1, · · · ,Yn be a random sample of size n from the uniform pdffY (y ; θ) = 1/θ, y ∈ [0, θ]. Let θn = Ymax . We know that Ymax is biased. Isit consistent?
Sol. The c.d.f. of Y is equal to FY (y) = y/θ for y ∈ [0, θ]. Hence,
fYmax (y) = nFY (y)n−1fY (y) =nyn−1
θn , y ∈ [0, θ].
Therefore,
P(|θn − θ| < ε) = P(θ − ε < θn < θ + ε)
=
∫ θ
θ−ε
nyn−1
θn dy +
∫ θ+ε
θ
0dy
= 1−(θ − εθ
)n
→ 0 as n→∞.
�
82
![Page 237: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/237.jpg)
E.g. 1. Let Y1, · · · ,Yn be a random sample of size n from the uniform pdffY (y ; θ) = 1/θ, y ∈ [0, θ]. Let θn = Ymax . We know that Ymax is biased. Isit consistent?
Sol. The c.d.f. of Y is equal to FY (y) = y/θ for y ∈ [0, θ]. Hence,
fYmax (y) = nFY (y)n−1fY (y) =nyn−1
θn , y ∈ [0, θ].
Therefore,
P(|θn − θ| < ε) = P(θ − ε < θn < θ + ε)
=
∫ θ
θ−ε
nyn−1
θn dy +
∫ θ+ε
θ
0dy
= 1−(θ − εθ
)n
→ 0 as n→∞.
�
82
![Page 238: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/238.jpg)
E.g. 2. Suppose Y1,Y2, · · · ,Yn is a random sample from the exponential pdf,fY (y ;λ) = λe−λy , y > 0. Show that λn = Y1 is not consistent for λ.
Sol. To prove λn is not consistent for λ, we need only to find out oneε > 0 such that the following limit does not hold:
limn→∞
P(|λn − λ| < ε
)= 1. (3)
We can choose ε = λ/m for any m ≥ 1. Then
|λn − λ| ≤λ
m⇐⇒
(1− 1
m
)λ ≤ λn ≤
(1 +
1m
)λ
=⇒ λn ≥(
1− 1m
)λ.
83
![Page 239: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/239.jpg)
E.g. 2. Suppose Y1,Y2, · · · ,Yn is a random sample from the exponential pdf,fY (y ;λ) = λe−λy , y > 0. Show that λn = Y1 is not consistent for λ.
Sol. To prove λn is not consistent for λ, we need only to find out oneε > 0 such that the following limit does not hold:
limn→∞
P(|λn − λ| < ε
)= 1. (3)
We can choose ε = λ/m for any m ≥ 1. Then
|λn − λ| ≤λ
m⇐⇒
(1− 1
m
)λ ≤ λn ≤
(1 +
1m
)λ
=⇒ λn ≥(
1− 1m
)λ.
83
![Page 240: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/240.jpg)
E.g. 2. Suppose Y1,Y2, · · · ,Yn is a random sample from the exponential pdf,fY (y ;λ) = λe−λy , y > 0. Show that λn = Y1 is not consistent for λ.
Sol. To prove λn is not consistent for λ, we need only to find out oneε > 0 such that the following limit does not hold:
limn→∞
P(|λn − λ| < ε
)= 1. (3)
We can choose ε = λ/m for any m ≥ 1. Then
|λn − λ| ≤λ
m⇐⇒
(1− 1
m
)λ ≤ λn ≤
(1 +
1m
)λ
=⇒ λn ≥(
1− 1m
)λ.
83
![Page 241: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/241.jpg)
E.g. 2. Suppose Y1,Y2, · · · ,Yn is a random sample from the exponential pdf,fY (y ;λ) = λe−λy , y > 0. Show that λn = Y1 is not consistent for λ.
Sol. To prove λn is not consistent for λ, we need only to find out oneε > 0 such that the following limit does not hold:
limn→∞
P(|λn − λ| < ε
)= 1. (3)
We can choose ε = λ/m for any m ≥ 1. Then
|λn − λ| ≤λ
m⇐⇒
(1− 1
m
)λ ≤ λn ≤
(1 +
1m
)λ
=⇒ λn ≥(
1− 1m
)λ.
83
![Page 242: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/242.jpg)
Hence,
P(|λn − λ| <
λ
m
)≤ P
(λn ≥
(1− 1
m
)λ
)= P
(Y1 ≥
(1− 1
m
)λ
)=
∫ ∞(1− 1
m )λλe−λy dy
= e−(1− 1m )λ2
< 1.
Therefore, the limit in (3) cannot hold. �
84
![Page 243: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/243.jpg)
Hence,
P(|λn − λ| <
λ
m
)≤ P
(λn ≥
(1− 1
m
)λ
)= P
(Y1 ≥
(1− 1
m
)λ
)=
∫ ∞(1− 1
m )λλe−λy dy
= e−(1− 1m )λ2
< 1.
Therefore, the limit in (3) cannot hold. �
84
![Page 244: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/244.jpg)
Hence,
P(|λn − λ| <
λ
m
)≤ P
(λn ≥
(1− 1
m
)λ
)= P
(Y1 ≥
(1− 1
m
)λ
)=
∫ ∞(1− 1
m )λλe−λy dy
= e−(1− 1m )λ2
< 1.
Therefore, the limit in (3) cannot hold. �
84
![Page 245: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/245.jpg)
Hence,
P(|λn − λ| <
λ
m
)≤ P
(λn ≥
(1− 1
m
)λ
)= P
(Y1 ≥
(1− 1
m
)λ
)=
∫ ∞(1− 1
m )λλe−λy dy
= e−(1− 1m )λ2
< 1.
Therefore, the limit in (3) cannot hold. �
84
![Page 246: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/246.jpg)
Hence,
P(|λn − λ| <
λ
m
)≤ P
(λn ≥
(1− 1
m
)λ
)= P
(Y1 ≥
(1− 1
m
)λ
)=
∫ ∞(1− 1
m )λλe−λy dy
= e−(1− 1m )λ2
< 1.
Therefore, the limit in (3) cannot hold. �
84
![Page 247: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/247.jpg)
Plan
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
85
![Page 248: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/248.jpg)
Chapter 5. Estimation
§ 5.1 Introduction
§ 5.2 Estimating parameters: MLE and MME
§ 5.3 Interval Estimation
§ 5.4 Properties of Estimators
§ 5.5 Minimum-Variance Estimators: The Cramér-Rao Lower Bound
§ 5.6 Sufficient Estimators
§ 5.7 Consistency
§ 5.8 Bayesian Estimation
86
![Page 249: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/249.jpg)
§ 5.8 Bayesian Estimation
Rationale: Let W be an estimator dependent on a parameter θ.
1. Frequentists view θ as a parameter whose exact value is to be estimated.
2. Bayesians view θ is the value of a random variable Θ.
One can incorporate our knownledge on Θ — the prior distributionpΘ(θ) if Θ is discrete and fΘ(θ) if Θ is continuous — and use Bayes’formula to update our knownledge on Θ upon new observation W = w :
gΘ(θ|W = w) =
pW (w |Θ = θ)fΘ(θ)
P(W = w)if W is discrete
fW (w |Θ = θ)fΘ(θ)
fW (w)if W is continuous
(4)
where gΘ(theta|W = w) is called posterior distribution of Θ.
87
![Page 250: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/250.jpg)
§ 5.8 Bayesian Estimation
Rationale: Let W be an estimator dependent on a parameter θ.
1. Frequentists view θ as a parameter whose exact value is to be estimated.
2. Bayesians view θ is the value of a random variable Θ.
One can incorporate our knownledge on Θ — the prior distributionpΘ(θ) if Θ is discrete and fΘ(θ) if Θ is continuous — and use Bayes’formula to update our knownledge on Θ upon new observation W = w :
gΘ(θ|W = w) =
pW (w |Θ = θ)fΘ(θ)
P(W = w)if W is discrete
fW (w |Θ = θ)fΘ(θ)
fW (w)if W is continuous
(4)
where gΘ(theta|W = w) is called posterior distribution of Θ.
87
![Page 251: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/251.jpg)
§ 5.8 Bayesian Estimation
Rationale: Let W be an estimator dependent on a parameter θ.
1. Frequentists view θ as a parameter whose exact value is to be estimated.
2. Bayesians view θ is the value of a random variable Θ.
One can incorporate our knownledge on Θ — the prior distributionpΘ(θ) if Θ is discrete and fΘ(θ) if Θ is continuous — and use Bayes’formula to update our knownledge on Θ upon new observation W = w :
gΘ(θ|W = w) =
pW (w |Θ = θ)fΘ(θ)
P(W = w)if W is discrete
fW (w |Θ = θ)fΘ(θ)
fW (w)if W is continuous
(4)
where gΘ(theta|W = w) is called posterior distribution of Θ.
87
![Page 252: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/252.jpg)
§ 5.8 Bayesian Estimation
Rationale: Let W be an estimator dependent on a parameter θ.
1. Frequentists view θ as a parameter whose exact value is to be estimated.
2. Bayesians view θ is the value of a random variable Θ.
One can incorporate our knownledge on Θ — the prior distributionpΘ(θ) if Θ is discrete and fΘ(θ) if Θ is continuous — and use Bayes’formula to update our knownledge on Θ upon new observation W = w :
gΘ(θ|W = w) =
pW (w |Θ = θ)fΘ(θ)
P(W = w)if W is discrete
fW (w |Θ = θ)fΘ(θ)
fW (w)if W is continuous
(4)
where gΘ(theta|W = w) is called posterior distribution of Θ.
87
![Page 253: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/253.jpg)
Four cases for computing posterior distribution
gΘ(θ|W = w) W discrete W continuous
Θ discretepW (w |Θ = θ)pΘ(θ)∑i pW (w |Θ = θi )pΘ(θi )
fW (w |Θ = θ)pΘ(θ)∑i fW (w |Θ = θi )pΘ(θi )
Θ continuouspW (w |Θ = θ)fΘ(θ)∫
R pW (w |Θ = θ′)fΘ(θ′)dθ′fW (w |Θ = θ)fΘ(θ)∫
R fW (w |Θ = θ′)fΘ(θ′)dθ′
88
![Page 254: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/254.jpg)
Gamma distributions
Γ(r) :=
∫ ∞0
y r−1e−y dy , r > 0.
Two parametrizations for Gamma distributions:
1. With a shape parameter r and a scale parameter θ:
fY (y ; r , θ) =y r−1e−y/θ
θr Γ(r), y > 0, r , θ > 0.
2. With a shape parameter r and a rate parameter λ = 1/θ,
fY (y ; r , λ) =λr y r−1e−λy
Γ(r), y > 0, r , λ > 0.
E[Y ] =rλ
= rθ and Var(Y ) =rλ2 = rθ2
89
![Page 255: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/255.jpg)
Gamma distributions
Γ(r) :=
∫ ∞0
y r−1e−y dy , r > 0.
Two parametrizations for Gamma distributions:
1. With a shape parameter r and a scale parameter θ:
fY (y ; r , θ) =y r−1e−y/θ
θr Γ(r), y > 0, r , θ > 0.
2. With a shape parameter r and a rate parameter λ = 1/θ,
fY (y ; r , λ) =λr y r−1e−λy
Γ(r), y > 0, r , λ > 0.
E[Y ] =rλ
= rθ and Var(Y ) =rλ2 = rθ2
89
![Page 256: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/256.jpg)
Gamma distributions
Γ(r) :=
∫ ∞0
y r−1e−y dy , r > 0.
Two parametrizations for Gamma distributions:
1. With a shape parameter r and a scale parameter θ:
fY (y ; r , θ) =y r−1e−y/θ
θr Γ(r), y > 0, r , θ > 0.
2. With a shape parameter r and a rate parameter λ = 1/θ,
fY (y ; r , λ) =λr y r−1e−λy
Γ(r), y > 0, r , λ > 0.
E[Y ] =rλ
= rθ and Var(Y ) =rλ2 = rθ2
89
![Page 257: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/257.jpg)
Gamma distributions
Γ(r) :=
∫ ∞0
y r−1e−y dy , r > 0.
Two parametrizations for Gamma distributions:
1. With a shape parameter r and a scale parameter θ:
fY (y ; r , θ) =y r−1e−y/θ
θr Γ(r), y > 0, r , θ > 0.
2. With a shape parameter r and a rate parameter λ = 1/θ,
fY (y ; r , λ) =λr y r−1e−λy
Γ(r), y > 0, r , λ > 0.
E[Y ] =rλ
= rθ and Var(Y ) =rλ2 = rθ2
89
![Page 258: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/258.jpg)
1 # Plot gamma distributions2 x = seq(0,20,0.01)3 k= 3 # Shape parameter4 theta = 0.5 # Scale parameter5 plot (x,dgamma(x, k, scale = theta),6 type="l" ,7 col="red")
90
![Page 259: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/259.jpg)
Beta distributions
B(α, β) :=
∫ 1
0yα−1(1− y)β−1dy , α, β > 0.
......
=Γ(α)Γ(β)
Γ(α + β). (see Appendix)
Beta distribution
fY (y ;α, β) =yα−1(1− y)β−1
B(α, β), y ∈ [0, 1], α, β > 0.
E[Y ] =α
α + βand Var(Y ) =
αβ
(α + β)2(α + β + 1)
91
![Page 260: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/260.jpg)
Beta distributions
B(α, β) :=
∫ 1
0yα−1(1− y)β−1dy , α, β > 0.
......
=Γ(α)Γ(β)
Γ(α + β). (see Appendix)
Beta distribution
fY (y ;α, β) =yα−1(1− y)β−1
B(α, β), y ∈ [0, 1], α, β > 0.
E[Y ] =α
α + βand Var(Y ) =
αβ
(α + β)2(α + β + 1)
91
![Page 261: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/261.jpg)
Beta distributions
B(α, β) :=
∫ 1
0yα−1(1− y)β−1dy , α, β > 0.
......
=Γ(α)Γ(β)
Γ(α + β). (see Appendix)
Beta distribution
fY (y ;α, β) =yα−1(1− y)β−1
B(α, β), y ∈ [0, 1], α, β > 0.
E[Y ] =α
α + βand Var(Y ) =
αβ
(α + β)2(α + β + 1)
91
![Page 262: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/262.jpg)
Beta distributions
B(α, β) :=
∫ 1
0yα−1(1− y)β−1dy , α, β > 0.
......
=Γ(α)Γ(β)
Γ(α + β). (see Appendix)
Beta distribution
fY (y ;α, β) =yα−1(1− y)β−1
B(α, β), y ∈ [0, 1], α, β > 0.
E[Y ] =α
α + βand Var(Y ) =
αβ
(α + β)2(α + β + 1)
91
![Page 263: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/263.jpg)
Beta distributions
B(α, β) :=
∫ 1
0yα−1(1− y)β−1dy , α, β > 0.
......
=Γ(α)Γ(β)
Γ(α + β). (see Appendix)
Beta distribution
fY (y ;α, β) =yα−1(1− y)β−1
B(α, β), y ∈ [0, 1], α, β > 0.
E[Y ] =α
α + βand Var(Y ) =
αβ
(α + β)2(α + β + 1)
91
![Page 264: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/264.jpg)
1 # Plot Beta distributions2 x = seq(0,1,0.01)3 a = 134 b = 25 plot (x,dbeta(x,a,b),6 type="l" ,7 col="red")
92
![Page 265: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/265.jpg)
E.g. 1. Let X1, · · · ,Xn be a random sample from Bernoulli(θ):pXi (k ; θ) = θk (1− θ)1−k for k = 0, 1.
Let X =∑n
i=1 Xi . Then X follows binomial(n, θ).
Prior distribution: Θ ∼beta(r , s), i.e., fΘ(θ) = Γ(r+s)Γ(r)Γ(s)
θr−1(1− θ)s−1 forθ ∈ [0, 1].
X1, · · · ,Xn∣∣θ ∼ Bernoulli(θ)
Θ ∼ Beta(r , s)
r & s are known
X =n∑
i=1
Xi
∣∣∣∣θ ∼ Binomial(n, θ)
Θ ∼ Beta(r , s)
r & s are known
93
![Page 266: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/266.jpg)
E.g. 1. Let X1, · · · ,Xn be a random sample from Bernoulli(θ):pXi (k ; θ) = θk (1− θ)1−k for k = 0, 1.
Let X =∑n
i=1 Xi . Then X follows binomial(n, θ).
Prior distribution: Θ ∼beta(r , s), i.e., fΘ(θ) = Γ(r+s)Γ(r)Γ(s)
θr−1(1− θ)s−1 forθ ∈ [0, 1].
X1, · · · ,Xn∣∣θ ∼ Bernoulli(θ)
Θ ∼ Beta(r , s)
r & s are known
X =n∑
i=1
Xi
∣∣∣∣θ ∼ Binomial(n, θ)
Θ ∼ Beta(r , s)
r & s are known
93
![Page 267: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/267.jpg)
E.g. 1. Let X1, · · · ,Xn be a random sample from Bernoulli(θ):pXi (k ; θ) = θk (1− θ)1−k for k = 0, 1.
Let X =∑n
i=1 Xi . Then X follows binomial(n, θ).
Prior distribution: Θ ∼beta(r , s), i.e., fΘ(θ) = Γ(r+s)Γ(r)Γ(s)
θr−1(1− θ)s−1 forθ ∈ [0, 1].
X1, · · · ,Xn∣∣θ ∼ Bernoulli(θ)
Θ ∼ Beta(r , s)
r & s are known
X =n∑
i=1
Xi
∣∣∣∣θ ∼ Binomial(n, θ)
Θ ∼ Beta(r , s)
r & s are known
93
![Page 268: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/268.jpg)
E.g. 1. Let X1, · · · ,Xn be a random sample from Bernoulli(θ):pXi (k ; θ) = θk (1− θ)1−k for k = 0, 1.
Let X =∑n
i=1 Xi . Then X follows binomial(n, θ).
Prior distribution: Θ ∼beta(r , s), i.e., fΘ(θ) = Γ(r+s)Γ(r)Γ(s)
θr−1(1− θ)s−1 forθ ∈ [0, 1].
X1, · · · ,Xn∣∣θ ∼ Bernoulli(θ)
Θ ∼ Beta(r , s)
r & s are known
X =n∑
i=1
Xi
∣∣∣∣θ ∼ Binomial(n, θ)
Θ ∼ Beta(r , s)
r & s are known
93
![Page 269: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/269.jpg)
E.g. 1. Let X1, · · · ,Xn be a random sample from Bernoulli(θ):pXi (k ; θ) = θk (1− θ)1−k for k = 0, 1.
Let X =∑n
i=1 Xi . Then X follows binomial(n, θ).
Prior distribution: Θ ∼beta(r , s), i.e., fΘ(θ) = Γ(r+s)Γ(r)Γ(s)
θr−1(1− θ)s−1 forθ ∈ [0, 1].
X1, · · · ,Xn∣∣θ ∼ Bernoulli(θ)
Θ ∼ Beta(r , s)
r & s are known
X =n∑
i=1
Xi
∣∣∣∣θ ∼ Binomial(n, θ)
Θ ∼ Beta(r , s)
r & s are known
93
![Page 270: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/270.jpg)
E.g. 1. Let X1, · · · ,Xn be a random sample from Bernoulli(θ):pXi (k ; θ) = θk (1− θ)1−k for k = 0, 1.
Let X =∑n
i=1 Xi . Then X follows binomial(n, θ).
Prior distribution: Θ ∼beta(r , s), i.e., fΘ(θ) = Γ(r+s)Γ(r)Γ(s)
θr−1(1− θ)s−1 forθ ∈ [0, 1].
X1, · · · ,Xn∣∣θ ∼ Bernoulli(θ)
Θ ∼ Beta(r , s)
r & s are known
X =n∑
i=1
Xi
∣∣∣∣θ ∼ Binomial(n, θ)
Θ ∼ Beta(r , s)
r & s are known
93
![Page 271: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/271.jpg)
94
![Page 272: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/272.jpg)
95
![Page 273: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/273.jpg)
X is discrete and Θ is continuous.
gΘ(θ|X = k) =pX (k |Θ = θ)fΘ(θ)∫
R pX (k |Θ = θ′)fΘ(θ′)dθ′
pX (k |Θ = θ)fΘ(θ) =
(nk
)θk (1− θ)n−k × Γ(r + s)
Γ(r)Γ(s)θr−1(1− θ)s−1
=
(nk
)Γ(r + s)
Γ(r)Γ(s)θk+r−1(1− θ)n−k+s−1, θ ∈ [0, 1].
pX (k) =
∫R
pX (k |Θ = θ′)fΘ(θ′)dθ′
=
(nk
)Γ(r + s)
Γ(r)Γ(s)
∫ 1
0θ′k+r−1(1− θ′)n−k+s−1dθ′
=
(nk
)Γ(r + s)
Γ(r)Γ(s)× Γ(k + r)Γ(n − k + s)
Γ((k + r) + (n − k + s))
96
![Page 274: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/274.jpg)
X is discrete and Θ is continuous.
gΘ(θ|X = k) =pX (k |Θ = θ)fΘ(θ)∫
R pX (k |Θ = θ′)fΘ(θ′)dθ′
pX (k |Θ = θ)fΘ(θ) =
(nk
)θk (1− θ)n−k × Γ(r + s)
Γ(r)Γ(s)θr−1(1− θ)s−1
=
(nk
)Γ(r + s)
Γ(r)Γ(s)θk+r−1(1− θ)n−k+s−1, θ ∈ [0, 1].
pX (k) =
∫R
pX (k |Θ = θ′)fΘ(θ′)dθ′
=
(nk
)Γ(r + s)
Γ(r)Γ(s)
∫ 1
0θ′k+r−1(1− θ′)n−k+s−1dθ′
=
(nk
)Γ(r + s)
Γ(r)Γ(s)× Γ(k + r)Γ(n − k + s)
Γ((k + r) + (n − k + s))
96
![Page 275: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/275.jpg)
X is discrete and Θ is continuous.
gΘ(θ|X = k) =pX (k |Θ = θ)fΘ(θ)∫
R pX (k |Θ = θ′)fΘ(θ′)dθ′
pX (k |Θ = θ)fΘ(θ) =
(nk
)θk (1− θ)n−k × Γ(r + s)
Γ(r)Γ(s)θr−1(1− θ)s−1
=
(nk
)Γ(r + s)
Γ(r)Γ(s)θk+r−1(1− θ)n−k+s−1, θ ∈ [0, 1].
pX (k) =
∫R
pX (k |Θ = θ′)fΘ(θ′)dθ′
=
(nk
)Γ(r + s)
Γ(r)Γ(s)
∫ 1
0θ′k+r−1(1− θ′)n−k+s−1dθ′
=
(nk
)Γ(r + s)
Γ(r)Γ(s)× Γ(k + r)Γ(n − k + s)
Γ((k + r) + (n − k + s))
96
![Page 276: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/276.jpg)
X is discrete and Θ is continuous.
gΘ(θ|X = k) =pX (k |Θ = θ)fΘ(θ)∫
R pX (k |Θ = θ′)fΘ(θ′)dθ′
pX (k |Θ = θ)fΘ(θ) =
(nk
)θk (1− θ)n−k × Γ(r + s)
Γ(r)Γ(s)θr−1(1− θ)s−1
=
(nk
)Γ(r + s)
Γ(r)Γ(s)θk+r−1(1− θ)n−k+s−1, θ ∈ [0, 1].
pX (k) =
∫R
pX (k |Θ = θ′)fΘ(θ′)dθ′
=
(nk
)Γ(r + s)
Γ(r)Γ(s)
∫ 1
0θ′k+r−1(1− θ′)n−k+s−1dθ′
=
(nk
)Γ(r + s)
Γ(r)Γ(s)× Γ(k + r)Γ(n − k + s)
Γ((k + r) + (n − k + s))
96
![Page 277: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/277.jpg)
X is discrete and Θ is continuous.
gΘ(θ|X = k) =pX (k |Θ = θ)fΘ(θ)∫
R pX (k |Θ = θ′)fΘ(θ′)dθ′
pX (k |Θ = θ)fΘ(θ) =
(nk
)θk (1− θ)n−k × Γ(r + s)
Γ(r)Γ(s)θr−1(1− θ)s−1
=
(nk
)Γ(r + s)
Γ(r)Γ(s)θk+r−1(1− θ)n−k+s−1, θ ∈ [0, 1].
pX (k) =
∫R
pX (k |Θ = θ′)fΘ(θ′)dθ′
=
(nk
)Γ(r + s)
Γ(r)Γ(s)
∫ 1
0θ′k+r−1(1− θ′)n−k+s−1dθ′
=
(nk
)Γ(r + s)
Γ(r)Γ(s)× Γ(k + r)Γ(n − k + s)
Γ((k + r) + (n − k + s))
96
![Page 278: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/278.jpg)
X is discrete and Θ is continuous.
gΘ(θ|X = k) =pX (k |Θ = θ)fΘ(θ)∫
R pX (k |Θ = θ′)fΘ(θ′)dθ′
pX (k |Θ = θ)fΘ(θ) =
(nk
)θk (1− θ)n−k × Γ(r + s)
Γ(r)Γ(s)θr−1(1− θ)s−1
=
(nk
)Γ(r + s)
Γ(r)Γ(s)θk+r−1(1− θ)n−k+s−1, θ ∈ [0, 1].
pX (k) =
∫R
pX (k |Θ = θ′)fΘ(θ′)dθ′
=
(nk
)Γ(r + s)
Γ(r)Γ(s)
∫ 1
0θ′k+r−1(1− θ′)n−k+s−1dθ′
=
(nk
)Γ(r + s)
Γ(r)Γ(s)× Γ(k + r)Γ(n − k + s)
Γ((k + r) + (n − k + s))
96
![Page 279: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/279.jpg)
gΘ(θ|X = k) =
(nk
)Γ(r + s)
Γ(r)Γ(s)× θk+r−1(1− θ)n−k+s−1
(nk
)Γ(r + s)
Γ(r)Γ(s)× Γ(k + r)Γ(n − k + s)
Γ((k + r) + (n − k + s))
=Γ(n + r + s)
Γ(k + r)Γ(n − k + s)θk+r−1(1− θ)n−k+s−1, θ ∈ [0, 1]
Conclusion: the posterior ∼ beta distribution(k + r , n − k + s).
Recall that the prior ∼ beta distribution(r , s).
97
![Page 280: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/280.jpg)
gΘ(θ|X = k) =
(nk
)Γ(r + s)
Γ(r)Γ(s)× θk+r−1(1− θ)n−k+s−1
(nk
)Γ(r + s)
Γ(r)Γ(s)× Γ(k + r)Γ(n − k + s)
Γ((k + r) + (n − k + s))
=Γ(n + r + s)
Γ(k + r)Γ(n − k + s)θk+r−1(1− θ)n−k+s−1, θ ∈ [0, 1]
Conclusion: the posterior ∼ beta distribution(k + r , n − k + s).
Recall that the prior ∼ beta distribution(r , s).
97
![Page 281: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/281.jpg)
gΘ(θ|X = k) =
(nk
)Γ(r + s)
Γ(r)Γ(s)× θk+r−1(1− θ)n−k+s−1
(nk
)Γ(r + s)
Γ(r)Γ(s)× Γ(k + r)Γ(n − k + s)
Γ((k + r) + (n − k + s))
=Γ(n + r + s)
Γ(k + r)Γ(n − k + s)θk+r−1(1− θ)n−k+s−1, θ ∈ [0, 1]
Conclusion: the posterior ∼ beta distribution(k + r , n − k + s).
Recall that the prior ∼ beta distribution(r , s).
97
![Page 282: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/282.jpg)
It remains to determine the values of r and s to incorporate the priorknowledge:
PK 1. Mean is about 0.035.
E(Θ) = 0.035 =⇒ rr + s
= 0.035 ⇐⇒ rs
=7
193
PK 2. The pdf mostly concentrated over [0, 0.07]. ... trial ...
98
![Page 283: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/283.jpg)
It remains to determine the values of r and s to incorporate the priorknowledge:
PK 1. Mean is about 0.035.
E(Θ) = 0.035 =⇒ rr + s
= 0.035 ⇐⇒ rs
=7
193
PK 2. The pdf mostly concentrated over [0, 0.07]. ... trial ...
98
![Page 284: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/284.jpg)
1 x <− seq(0, 1, length = 1025)2 plot (x,dbeta(x,4,102),3 type="l" )4 plot (x,dbeta(x,7,193),5 type="l" )6 dev.off ()7
8 pdf=cbind(dbeta(x,4,102),dbeta(x,7,193))9 matplot(x,pdf,
10 type="l" ,11 lty = 1:2,12 xlab = "theta" , ylab = "PDF",13 lwd = 2 # Line width14 )15 legend(0.2, 25, # Position of legend16 c("Beta(4,102)", "Beta(7,193)"),17 col = 1:2, lty = 1:2,18 ncol = 1, # Number of columns19 cex = 1.5, # Fontsize20 lwd=2 # Line width21 )22 abline(v=0.07, col="blue", lty =1,lwd=1.5)23 text (0.07, −0.5, "0.07")24 abline(v=0.035, col="gray60", lty =3,lwd=2)25 text (0.035, 1, "0.035")
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
2025
30
theta
PD
F
Beta(4,102)Beta(7,193)
0.070.035
99
![Page 285: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/285.jpg)
If we choose r = 7 and s = 193:
gΘ(θ|X = k) =Γ(n + 200)
Γ(k + 7)Γ(n − k + 193)θk+6(1− θ)n−k+192, θ ∈ [0, 1]
Moreover, if n = 10 and k = 2,
gΘ(θ|X = k) =Γ(210)
Γ(9)Γ(201)θ8(1− θ)200, θ ∈ [0, 1]
100
![Page 286: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/286.jpg)
1 x <− seq(0, 0.1, length = 1025)2 pdf=cbind(dbeta(x,7,193),dbeta(x,9,201))3 matplot(x,pdf,4 type="l" ,5 lty = 1:2,6 xlab = "theta" , ylab = "PDF",7 lwd = 2 # Line width8 )9 legend(0.05, 25, # Position of legend
10 c("Posterior Beta(9,201)", "PriorBeta(7,193)"),
11 col = 1:2, lty = 1:2,12 ncol = 1, # Number of columns13 cex = 1.5, # Fontsize14 lwd=2 # Line width15 )16 abline(v=0.07,col="blue", lty =1,lwd=1.5)17 text (0.07, −0.5, "0.07")18 abline(v=0.035,col="black", lty =3,lwd=2)19 text (0.035, 1, "0.035")
0.00 0.02 0.04 0.06 0.08 0.10
05
1015
2025
30
theta
PD
F
Posterior Beta(9,201)Prior Beta(7,193)
0.070.035
101
![Page 287: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/287.jpg)
Definition. If the posterior distributions p(Θ|X ) are in the same probabilitydistribution family as the prior probability distribution p(Θ), the prior andposterior are then called conjugate distributions, and the prior is called aconjugate prior for the likelihood function.
1. Beta distributions are conjugate priors for Bernoulli, binomial, nega.binomial, geometric likelihood.
2. Gamma distributions are conjugate priors for Poisson and exponentiallikelihood.
102
![Page 288: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/288.jpg)
Definition. If the posterior distributions p(Θ|X ) are in the same probabilitydistribution family as the prior probability distribution p(Θ), the prior andposterior are then called conjugate distributions, and the prior is called aconjugate prior for the likelihood function.
1. Beta distributions are conjugate priors for Bernoulli, binomial, nega.binomial, geometric likelihood.
2. Gamma distributions are conjugate priors for Poisson and exponentiallikelihood.
102
![Page 289: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/289.jpg)
E.g. 2. Let X1, · · · ,Xn be a random sample from Poisson(θ): pX (k ; θ) = e−θθk
k!
for k = 0, 1, · · · .
Let W =∑n
i=1 Xi . Then W follows Poisson(nθ).
Prior distribution: Θ ∼ Gamma(s, µ), i.e., fΘ(θ) = µs
Γ(s)θs−1e−µθ for θ > 0.
X1, · · · ,Xn∣∣θ ∼ Poisson(θ)
Θ ∼ Gamma(s, µ)
s & µ are known
W =n∑
i=1
Xi
∣∣∣∣θ ∼ Poisson(nθ)
Θ ∼ Gamma(s, µ)
s & µ are known
103
![Page 290: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/290.jpg)
E.g. 2. Let X1, · · · ,Xn be a random sample from Poisson(θ): pX (k ; θ) = e−θθk
k!
for k = 0, 1, · · · .
Let W =∑n
i=1 Xi . Then W follows Poisson(nθ).
Prior distribution: Θ ∼ Gamma(s, µ), i.e., fΘ(θ) = µs
Γ(s)θs−1e−µθ for θ > 0.
X1, · · · ,Xn∣∣θ ∼ Poisson(θ)
Θ ∼ Gamma(s, µ)
s & µ are known
W =n∑
i=1
Xi
∣∣∣∣θ ∼ Poisson(nθ)
Θ ∼ Gamma(s, µ)
s & µ are known
103
![Page 291: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/291.jpg)
E.g. 2. Let X1, · · · ,Xn be a random sample from Poisson(θ): pX (k ; θ) = e−θθk
k!
for k = 0, 1, · · · .
Let W =∑n
i=1 Xi . Then W follows Poisson(nθ).
Prior distribution: Θ ∼ Gamma(s, µ), i.e., fΘ(θ) = µs
Γ(s)θs−1e−µθ for θ > 0.
X1, · · · ,Xn∣∣θ ∼ Poisson(θ)
Θ ∼ Gamma(s, µ)
s & µ are known
W =n∑
i=1
Xi
∣∣∣∣θ ∼ Poisson(nθ)
Θ ∼ Gamma(s, µ)
s & µ are known
103
![Page 292: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/292.jpg)
E.g. 2. Let X1, · · · ,Xn be a random sample from Poisson(θ): pX (k ; θ) = e−θθk
k!
for k = 0, 1, · · · .
Let W =∑n
i=1 Xi . Then W follows Poisson(nθ).
Prior distribution: Θ ∼ Gamma(s, µ), i.e., fΘ(θ) = µs
Γ(s)θs−1e−µθ for θ > 0.
X1, · · · ,Xn∣∣θ ∼ Poisson(θ)
Θ ∼ Gamma(s, µ)
s & µ are known
W =n∑
i=1
Xi
∣∣∣∣θ ∼ Poisson(nθ)
Θ ∼ Gamma(s, µ)
s & µ are known
103
![Page 293: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/293.jpg)
E.g. 2. Let X1, · · · ,Xn be a random sample from Poisson(θ): pX (k ; θ) = e−θθk
k!
for k = 0, 1, · · · .
Let W =∑n
i=1 Xi . Then W follows Poisson(nθ).
Prior distribution: Θ ∼ Gamma(s, µ), i.e., fΘ(θ) = µs
Γ(s)θs−1e−µθ for θ > 0.
X1, · · · ,Xn∣∣θ ∼ Poisson(θ)
Θ ∼ Gamma(s, µ)
s & µ are known
W =n∑
i=1
Xi
∣∣∣∣θ ∼ Poisson(nθ)
Θ ∼ Gamma(s, µ)
s & µ are known
103
![Page 294: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/294.jpg)
E.g. 2. Let X1, · · · ,Xn be a random sample from Poisson(θ): pX (k ; θ) = e−θθk
k!
for k = 0, 1, · · · .
Let W =∑n
i=1 Xi . Then W follows Poisson(nθ).
Prior distribution: Θ ∼ Gamma(s, µ), i.e., fΘ(θ) = µs
Γ(s)θs−1e−µθ for θ > 0.
X1, · · · ,Xn∣∣θ ∼ Poisson(θ)
Θ ∼ Gamma(s, µ)
s & µ are known
W =n∑
i=1
Xi
∣∣∣∣θ ∼ Poisson(nθ)
Θ ∼ Gamma(s, µ)
s & µ are known
103
![Page 295: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/295.jpg)
gΘ(θ|W = w) =pW (w |Θ = θ)fΘ(θ)∫
R pW (w |Θ = θ′)fΘ(θ′)dθ′
pW (w |Θ = θ)fΘ(θ) =e−nθ(nθ)w
w !× µs
Γ(s)θs−1e−µθ
=nw
w !
µs
Γ(s)× θw+s−1e−(µ+n)θ, θ > 0.
pW (w) =
∫R
pW (w |Θ = θ′)fΘ(θ′)dθ′
=nw
w !
µs
Γ(s)
∫ ∞0
θ′w+s−1e−(µ+n)θ′dθ′
=nw
w !
µs
Γ(s)× Γ(w + s)
(µ+ n)w+s
104
![Page 296: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/296.jpg)
gΘ(θ|W = w) =pW (w |Θ = θ)fΘ(θ)∫
R pW (w |Θ = θ′)fΘ(θ′)dθ′
pW (w |Θ = θ)fΘ(θ) =e−nθ(nθ)w
w !× µs
Γ(s)θs−1e−µθ
=nw
w !
µs
Γ(s)× θw+s−1e−(µ+n)θ, θ > 0.
pW (w) =
∫R
pW (w |Θ = θ′)fΘ(θ′)dθ′
=nw
w !
µs
Γ(s)
∫ ∞0
θ′w+s−1e−(µ+n)θ′dθ′
=nw
w !
µs
Γ(s)× Γ(w + s)
(µ+ n)w+s
104
![Page 297: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/297.jpg)
gΘ(θ|W = w) =pW (w |Θ = θ)fΘ(θ)∫
R pW (w |Θ = θ′)fΘ(θ′)dθ′
pW (w |Θ = θ)fΘ(θ) =e−nθ(nθ)w
w !× µs
Γ(s)θs−1e−µθ
=nw
w !
µs
Γ(s)× θw+s−1e−(µ+n)θ, θ > 0.
pW (w) =
∫R
pW (w |Θ = θ′)fΘ(θ′)dθ′
=nw
w !
µs
Γ(s)
∫ ∞0
θ′w+s−1e−(µ+n)θ′dθ′
=nw
w !
µs
Γ(s)× Γ(w + s)
(µ+ n)w+s
104
![Page 298: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/298.jpg)
gΘ(θ|W = w) =pW (w |Θ = θ)fΘ(θ)∫
R pW (w |Θ = θ′)fΘ(θ′)dθ′
pW (w |Θ = θ)fΘ(θ) =e−nθ(nθ)w
w !× µs
Γ(s)θs−1e−µθ
=nw
w !
µs
Γ(s)× θw+s−1e−(µ+n)θ, θ > 0.
pW (w) =
∫R
pW (w |Θ = θ′)fΘ(θ′)dθ′
=nw
w !
µs
Γ(s)
∫ ∞0
θ′w+s−1e−(µ+n)θ′dθ′
=nw
w !
µs
Γ(s)× Γ(w + s)
(µ+ n)w+s
104
![Page 299: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/299.jpg)
gΘ(θ|W = w) =pW (w |Θ = θ)fΘ(θ)∫
R pW (w |Θ = θ′)fΘ(θ′)dθ′
pW (w |Θ = θ)fΘ(θ) =e−nθ(nθ)w
w !× µs
Γ(s)θs−1e−µθ
=nw
w !
µs
Γ(s)× θw+s−1e−(µ+n)θ, θ > 0.
pW (w) =
∫R
pW (w |Θ = θ′)fΘ(θ′)dθ′
=nw
w !
µs
Γ(s)
∫ ∞0
θ′w+s−1e−(µ+n)θ′dθ′
=nw
w !
µs
Γ(s)× Γ(w + s)
(µ+ n)w+s
104
![Page 300: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/300.jpg)
gΘ(θ|W = w) =pW (w |Θ = θ)fΘ(θ)∫
R pW (w |Θ = θ′)fΘ(θ′)dθ′
pW (w |Θ = θ)fΘ(θ) =e−nθ(nθ)w
w !× µs
Γ(s)θs−1e−µθ
=nw
w !
µs
Γ(s)× θw+s−1e−(µ+n)θ, θ > 0.
pW (w) =
∫R
pW (w |Θ = θ′)fΘ(θ′)dθ′
=nw
w !
µs
Γ(s)
∫ ∞0
θ′w+s−1e−(µ+n)θ′dθ′
=nw
w !
µs
Γ(s)× Γ(w + s)
(µ+ n)w+s
104
![Page 301: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/301.jpg)
gΘ(θ|X = k) =
nw
w !
µs
Γ(s)× θw+s−1e−(µ+n)θ
nw
w !
µs
Γ(s)× Γ(w + s)
(µ+ n)w+s
=(µ+ n)w+s
Γ(w + s)θw+s−1e−(µ+n)θ, θ > 0.
Conclusion: the posterior of Θ ∼ gamma distribution(w + s, n + µ).
Recall that the prior of Θ ∼ gamma distribution(s, µ).
105
![Page 302: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/302.jpg)
Case Study 5.8.1
1 x <− seq(0, 4, length = 1025)2 pdf=cbind(dgamma(x, shape=88, rate=50),3 dgamma(x, shape=88+92, 100),4 dgamma(x, 88+92+72, 150))5 matplot(x,pdf,6 type="l" ,7 lty = 1:3,8 xlab = "theta" , ylab = "PDF",9 lwd = 2 # Line width
10 )11 legend(2, 3.5, # Position of legend12 c("Prior Gamma(88,50)",13 "Posterior1 Beta(180,100)",14 "Posterior2 Beta(252,150)"),15 col = 1:3, lty = 1:3,16 ncol = 1, # Number of columns17 cex = 1.5, # Fontsize18 lwd=2 # Line width19 )
0 1 2 3 4
01
23
theta
PD
F
Prior Gamma(88,50)Posterior1 Beta(180,100)Posterior2 Beta(252,150)
106
![Page 303: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/303.jpg)
Bayesian Point Estimation
Question. Can one calculate an appropriate point estimate θe given theposterior gΘ(θ|W = w)?
Definitions. Let θe be an estimate for θ based on a statistic W . The lossfunction associated with θe is denoted L(θe, θ), where L(θe, θ) ≥ 0 andL(θ, θ) = 0.
Let gΘ(θ|W = w) be the posterior distribution of the random variable Θ. Thenthe risk associated with θ is the expected value of the loss function withrespect to the posterior distribution of Θ:
risk =
∫R
L(θ, θ)gΘ(θ|W = w)dθ if Θ is continous∑i
L(θ, θi )gΘ(θi |W = w) if Θ is discrete
107
![Page 304: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/304.jpg)
Bayesian Point Estimation
Question. Can one calculate an appropriate point estimate θe given theposterior gΘ(θ|W = w)?
Definitions. Let θe be an estimate for θ based on a statistic W . The lossfunction associated with θe is denoted L(θe, θ), where L(θe, θ) ≥ 0 andL(θ, θ) = 0.
Let gΘ(θ|W = w) be the posterior distribution of the random variable Θ. Thenthe risk associated with θ is the expected value of the loss function withrespect to the posterior distribution of Θ:
risk =
∫R
L(θ, θ)gΘ(θ|W = w)dθ if Θ is continous∑i
L(θ, θi )gΘ(θi |W = w) if Θ is discrete
107
![Page 305: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/305.jpg)
Bayesian Point Estimation
Question. Can one calculate an appropriate point estimate θe given theposterior gΘ(θ|W = w)?
Definitions. Let θe be an estimate for θ based on a statistic W . The lossfunction associated with θe is denoted L(θe, θ), where L(θe, θ) ≥ 0 andL(θ, θ) = 0.
Let gΘ(θ|W = w) be the posterior distribution of the random variable Θ. Thenthe risk associated with θ is the expected value of the loss function withrespect to the posterior distribution of Θ:
risk =
∫R
L(θ, θ)gΘ(θ|W = w)dθ if Θ is continous∑i
L(θ, θi )gΘ(θi |W = w) if Θ is discrete
107
![Page 306: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/306.jpg)
Theorem. Let gΘ(θ|W = w) be the posterior distribution of the randomvariable Θ.
a. If L(θe, θ) = |θe − θ|, then the Bayes point estimate for θ is the median ofgΘ(θ|W = w).
b. If L(θe, θ) = (θe − θ)2, then the Bayes point estimate for θ is the mean ofgΘ(θ|W = w).
Proof. ...... �
RemarksI Mean usually has a closed formula.I Median usually does not have a closed form formula.
108
![Page 307: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/307.jpg)
Theorem. Let gΘ(θ|W = w) be the posterior distribution of the randomvariable Θ.
a. If L(θe, θ) = |θe − θ|, then the Bayes point estimate for θ is the median ofgΘ(θ|W = w).
b. If L(θe, θ) = (θe − θ)2, then the Bayes point estimate for θ is the mean ofgΘ(θ|W = w).
Proof. ...... �
RemarksI Mean usually has a closed formula.I Median usually does not have a closed form formula.
108
![Page 308: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/308.jpg)
Theorem. Let gΘ(θ|W = w) be the posterior distribution of the randomvariable Θ.
a. If L(θe, θ) = |θe − θ|, then the Bayes point estimate for θ is the median ofgΘ(θ|W = w).
b. If L(θe, θ) = (θe − θ)2, then the Bayes point estimate for θ is the mean ofgΘ(θ|W = w).
Proof. ...... �
RemarksI Mean usually has a closed formula.I Median usually does not have a closed form formula.
108
![Page 309: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/309.jpg)
Theorem. Let gΘ(θ|W = w) be the posterior distribution of the randomvariable Θ.
a. If L(θe, θ) = |θe − θ|, then the Bayes point estimate for θ is the median ofgΘ(θ|W = w).
b. If L(θe, θ) = (θe − θ)2, then the Bayes point estimate for θ is the mean ofgΘ(θ|W = w).
Proof. ...... �
RemarksI Mean usually has a closed formula.I Median usually does not have a closed form formula.
108
![Page 310: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/310.jpg)
Theorem. Let gΘ(θ|W = w) be the posterior distribution of the randomvariable Θ.
a. If L(θe, θ) = |θe − θ|, then the Bayes point estimate for θ is the median ofgΘ(θ|W = w).
b. If L(θe, θ) = (θe − θ)2, then the Bayes point estimate for θ is the mean ofgΘ(θ|W = w).
Proof. ...... �
RemarksI Mean usually has a closed formula.I Median usually does not have a closed form formula.
108
![Page 311: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/311.jpg)
Theorem. Let gΘ(θ|W = w) be the posterior distribution of the randomvariable Θ.
a. If L(θe, θ) = |θe − θ|, then the Bayes point estimate for θ is the median ofgΘ(θ|W = w).
b. If L(θe, θ) = (θe − θ)2, then the Bayes point estimate for θ is the mean ofgΘ(θ|W = w).
Proof. ...... �
RemarksI Mean usually has a closed formula.I Median usually does not have a closed form formula.
108
![Page 313: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/313.jpg)
E.g. 1’.X1, · · · ,Xn
∣∣θ ∼ Bernoulli(θ)
Θ ∼ Beta(r , s)
r & s are known
X =n∑
i=1
Xi
∣∣∣∣θ ∼ Binomial(n, θ)
Θ ∼ Beta(r , s)
r & s are known
Prior Beta(r , s)→ posterior Beta(k + r , n − k + s)upon observing X = k for a random sample of size n.
Consider the L2 loss function.
θe = mean of Beta(k + r , n − k + s)
=k + r
n + r + s
=n
n + r + s×(
kn
)︸ ︷︷ ︸
MLE
+r + s
n + r + s×(
rr + s
)︸ ︷︷ ︸Mean of Prior
110
![Page 314: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/314.jpg)
E.g. 1’.X1, · · · ,Xn
∣∣θ ∼ Bernoulli(θ)
Θ ∼ Beta(r , s)
r & s are known
X =n∑
i=1
Xi
∣∣∣∣θ ∼ Binomial(n, θ)
Θ ∼ Beta(r , s)
r & s are known
Prior Beta(r , s)→ posterior Beta(k + r , n − k + s)upon observing X = k for a random sample of size n.
Consider the L2 loss function.
θe = mean of Beta(k + r , n − k + s)
=k + r
n + r + s
=n
n + r + s×(
kn
)︸ ︷︷ ︸
MLE
+r + s
n + r + s×(
rr + s
)︸ ︷︷ ︸Mean of Prior
110
![Page 315: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/315.jpg)
E.g. 1’.X1, · · · ,Xn
∣∣θ ∼ Bernoulli(θ)
Θ ∼ Beta(r , s)
r & s are known
X =n∑
i=1
Xi
∣∣∣∣θ ∼ Binomial(n, θ)
Θ ∼ Beta(r , s)
r & s are known
Prior Beta(r , s)→ posterior Beta(k + r , n − k + s)upon observing X = k for a random sample of size n.
Consider the L2 loss function.
θe = mean of Beta(k + r , n − k + s)
=k + r
n + r + s
=n
n + r + s×(
kn
)︸ ︷︷ ︸
MLE
+r + s
n + r + s×(
rr + s
)︸ ︷︷ ︸Mean of Prior
110
![Page 316: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/316.jpg)
E.g. 1’.X1, · · · ,Xn
∣∣θ ∼ Bernoulli(θ)
Θ ∼ Beta(r , s)
r & s are known
X =n∑
i=1
Xi
∣∣∣∣θ ∼ Binomial(n, θ)
Θ ∼ Beta(r , s)
r & s are known
Prior Beta(r , s)→ posterior Beta(k + r , n − k + s)upon observing X = k for a random sample of size n.
Consider the L2 loss function.
θe = mean of Beta(k + r , n − k + s)
=k + r
n + r + s
=n
n + r + s×(
kn
)︸ ︷︷ ︸
MLE
+r + s
n + r + s×(
rr + s
)︸ ︷︷ ︸Mean of Prior
110
![Page 317: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/317.jpg)
E.g. 1’.X1, · · · ,Xn
∣∣θ ∼ Bernoulli(θ)
Θ ∼ Beta(r , s)
r & s are known
X =n∑
i=1
Xi
∣∣∣∣θ ∼ Binomial(n, θ)
Θ ∼ Beta(r , s)
r & s are known
Prior Beta(r , s)→ posterior Beta(k + r , n − k + s)upon observing X = k for a random sample of size n.
Consider the L2 loss function.
θe = mean of Beta(k + r , n − k + s)
=k + r
n + r + s
=n
n + r + s×(
kn
)︸ ︷︷ ︸
MLE
+r + s
n + r + s×(
rr + s
)︸ ︷︷ ︸Mean of Prior
110
![Page 318: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/318.jpg)
E.g. 1’.X1, · · · ,Xn
∣∣θ ∼ Bernoulli(θ)
Θ ∼ Beta(r , s)
r & s are known
X =n∑
i=1
Xi
∣∣∣∣θ ∼ Binomial(n, θ)
Θ ∼ Beta(r , s)
r & s are known
Prior Beta(r , s)→ posterior Beta(k + r , n − k + s)upon observing X = k for a random sample of size n.
Consider the L2 loss function.
θe = mean of Beta(k + r , n − k + s)
=k + r
n + r + s
=n
n + r + s×(
kn
)︸ ︷︷ ︸
MLE
+r + s
n + r + s×(
rr + s
)︸ ︷︷ ︸Mean of Prior
110
![Page 319: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/319.jpg)
E.g. 1’.X1, · · · ,Xn
∣∣θ ∼ Bernoulli(θ)
Θ ∼ Beta(r , s)
r & s are known
X =n∑
i=1
Xi
∣∣∣∣θ ∼ Binomial(n, θ)
Θ ∼ Beta(r , s)
r & s are known
Prior Beta(r , s)→ posterior Beta(k + r , n − k + s)upon observing X = k for a random sample of size n.
Consider the L2 loss function.
θe = mean of Beta(k + r , n − k + s)
=k + r
n + r + s
=n
n + r + s×(
kn
)︸ ︷︷ ︸
MLE
+r + s
n + r + s×(
rr + s
)︸ ︷︷ ︸Mean of Prior
110
![Page 320: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/320.jpg)
E.g. 1’.X1, · · · ,Xn
∣∣θ ∼ Bernoulli(θ)
Θ ∼ Beta(r , s)
r & s are known
X =n∑
i=1
Xi
∣∣∣∣θ ∼ Binomial(n, θ)
Θ ∼ Beta(r , s)
r & s are known
Prior Beta(r , s)→ posterior Beta(k + r , n − k + s)upon observing X = k for a random sample of size n.
Consider the L2 loss function.
θe = mean of Beta(k + r , n − k + s)
=k + r
n + r + s
=n
n + r + s×(
kn
)︸ ︷︷ ︸
MLE
+r + s
n + r + s×(
rr + s
)︸ ︷︷ ︸Mean of Prior
110
![Page 321: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/321.jpg)
E.g. 2’.X1, · · · ,Xn
∣∣θ ∼ Poisson(θ)
Θ ∼ Gamma(s, µ)
s & µ are known
W =n∑
i=1
Xi
∣∣∣∣θ ∼ Poisson(nθ)
Θ ∼ Gamma(s, µ)
s & µ are known
Prior Gamma(s, µ)→ Posterior Gamma(w + s, µ+ n)upon observing W = w for a random sample of size n.
Consider the L2 loss function.
θe = mean of Gamma(w + s, µ+ n)
=w + sµ+ n
=n
µ+ n×(w
n
)︸ ︷︷ ︸
MLE
+µ
µ+ n×
(sµ
)︸ ︷︷ ︸
Mean of Prior
111
![Page 322: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/322.jpg)
E.g. 2’.X1, · · · ,Xn
∣∣θ ∼ Poisson(θ)
Θ ∼ Gamma(s, µ)
s & µ are known
W =n∑
i=1
Xi
∣∣∣∣θ ∼ Poisson(nθ)
Θ ∼ Gamma(s, µ)
s & µ are known
Prior Gamma(s, µ)→ Posterior Gamma(w + s, µ+ n)upon observing W = w for a random sample of size n.
Consider the L2 loss function.
θe = mean of Gamma(w + s, µ+ n)
=w + sµ+ n
=n
µ+ n×(w
n
)︸ ︷︷ ︸
MLE
+µ
µ+ n×
(sµ
)︸ ︷︷ ︸
Mean of Prior
111
![Page 323: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/323.jpg)
E.g. 2’.X1, · · · ,Xn
∣∣θ ∼ Poisson(θ)
Θ ∼ Gamma(s, µ)
s & µ are known
W =n∑
i=1
Xi
∣∣∣∣θ ∼ Poisson(nθ)
Θ ∼ Gamma(s, µ)
s & µ are known
Prior Gamma(s, µ)→ Posterior Gamma(w + s, µ+ n)upon observing W = w for a random sample of size n.
Consider the L2 loss function.
θe = mean of Gamma(w + s, µ+ n)
=w + sµ+ n
=n
µ+ n×(w
n
)︸ ︷︷ ︸
MLE
+µ
µ+ n×
(sµ
)︸ ︷︷ ︸
Mean of Prior
111
![Page 324: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/324.jpg)
E.g. 2’.X1, · · · ,Xn
∣∣θ ∼ Poisson(θ)
Θ ∼ Gamma(s, µ)
s & µ are known
W =n∑
i=1
Xi
∣∣∣∣θ ∼ Poisson(nθ)
Θ ∼ Gamma(s, µ)
s & µ are known
Prior Gamma(s, µ)→ Posterior Gamma(w + s, µ+ n)upon observing W = w for a random sample of size n.
Consider the L2 loss function.
θe = mean of Gamma(w + s, µ+ n)
=w + sµ+ n
=n
µ+ n×(w
n
)︸ ︷︷ ︸
MLE
+µ
µ+ n×
(sµ
)︸ ︷︷ ︸
Mean of Prior
111
![Page 325: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/325.jpg)
E.g. 2’.X1, · · · ,Xn
∣∣θ ∼ Poisson(θ)
Θ ∼ Gamma(s, µ)
s & µ are known
W =n∑
i=1
Xi
∣∣∣∣θ ∼ Poisson(nθ)
Θ ∼ Gamma(s, µ)
s & µ are known
Prior Gamma(s, µ)→ Posterior Gamma(w + s, µ+ n)upon observing W = w for a random sample of size n.
Consider the L2 loss function.
θe = mean of Gamma(w + s, µ+ n)
=w + sµ+ n
=n
µ+ n×(w
n
)︸ ︷︷ ︸
MLE
+µ
µ+ n×
(sµ
)︸ ︷︷ ︸
Mean of Prior
111
![Page 326: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/326.jpg)
E.g. 2’.X1, · · · ,Xn
∣∣θ ∼ Poisson(θ)
Θ ∼ Gamma(s, µ)
s & µ are known
W =n∑
i=1
Xi
∣∣∣∣θ ∼ Poisson(nθ)
Θ ∼ Gamma(s, µ)
s & µ are known
Prior Gamma(s, µ)→ Posterior Gamma(w + s, µ+ n)upon observing W = w for a random sample of size n.
Consider the L2 loss function.
θe = mean of Gamma(w + s, µ+ n)
=w + sµ+ n
=n
µ+ n×(w
n
)︸ ︷︷ ︸
MLE
+µ
µ+ n×
(sµ
)︸ ︷︷ ︸
Mean of Prior
111
![Page 327: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/327.jpg)
E.g. 2’.X1, · · · ,Xn
∣∣θ ∼ Poisson(θ)
Θ ∼ Gamma(s, µ)
s & µ are known
W =n∑
i=1
Xi
∣∣∣∣θ ∼ Poisson(nθ)
Θ ∼ Gamma(s, µ)
s & µ are known
Prior Gamma(s, µ)→ Posterior Gamma(w + s, µ+ n)upon observing W = w for a random sample of size n.
Consider the L2 loss function.
θe = mean of Gamma(w + s, µ+ n)
=w + sµ+ n
=n
µ+ n×(w
n
)︸ ︷︷ ︸
MLE
+µ
µ+ n×
(sµ
)︸ ︷︷ ︸
Mean of Prior
111
![Page 328: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/328.jpg)
Appendix: Beta integral
Lemma. B(α, β) :=
∫ 1
0xα−1(1− x)β−1dx =
Γ(α)Γ(β)
Γ(α + β)
Proof. Notice that
Γ(α) =
∫ ∞0
xα−1e−x dx and Γ(β) =
∫ ∞0
yβ−1e−y dy .
Hence,
Γ(α)Γ(β) =
∫ ∞0
∫ ∞0
xα−1yβ−1e−(x+y)dxdy .
112
![Page 329: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/329.jpg)
The key in the proof is the following change of variables:
{x = r 2 cos2(θ)
y = r 2 sin2(θ)
=⇒ ∂(x , y)
∂(r , θ)=
(2r cos2(θ) 2r sin2(θ)
−2r 2 cos(θ) sin(θ) 2r 2 cos(θ) sin(θ)
)
=⇒∣∣∣∣det
(∂(x , y)
∂(r , θ)
)∣∣∣∣ = 4r 3 sin(θ) cos(θ).
113
![Page 330: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/330.jpg)
Therefore,
Γ(α)Γ(β) =
∫ π2
0dθ∫ ∞
0dr r 2(α+β)−4e−r2
cos2α−2(θ) sin2β−2(θ)× 4r 3 sin(θ) cos(θ)︸ ︷︷ ︸Jacobian
=4
(∫ π2
0cos2α−1(θ) sin2β−1(θ)dθ
)(∫ ∞0
r 2(α+β)−1e−r2dr).
Now let us compute the following two integrals separately:
I1 :=
∫ π2
0cos2α−1(θ) sin2β−1(θ)dθ
I2 :=
∫ ∞0
r 2(α+β)−1e−r2dr
114
![Page 331: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/331.jpg)
For I2, by change of variable r 2 = u (so that 2rdr = du),
I2 =
∫ ∞0
r 2(α+β)−1e−r2dr
=12
∫ ∞0
r 2(α+β)−2e−r22rdr︸︷︷︸=du
=12
∫ ∞0
uα+β−1e−udu
=12
Γ(α + β).
115
![Page 332: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/332.jpg)
For I1, by the change of variables√
x = cos(θ) (so that − sin(θ)dθ = 12√
x dx),
I1 =
∫ π2
0cos2α−1(θ) sin2β−1(θ)dθ
=
∫ π2
0cos2α−1(θ) sin2β−2(θ)× sin(θ)dθ︸ ︷︷ ︸
=− 12√
xdx
=
∫ 0
1xα−
12 (1− x)β−1 −1
2√
xdx
=12
∫ 1
0xα−1(1− x)β−1 dx
=12
B(α, β)
116
![Page 333: Math 362: Mathmatical Statistics IImath.emory.edu/~lchen41/teaching/2020_Spring/Chapter-5_full.pdf · Math 362: Mathmatical Statistics II Le Chen le.chen@emory.edu Emory University](https://reader033.fdocuments.us/reader033/viewer/2022042420/5f380061258acb1e401a5f9a/html5/thumbnails/333.jpg)
Therefore,
Γ(α)Γ(β) = 4I1 × I2
= 4× 12
Γ(α + β)× 12
B(α, β)
i.e.,
B(α, β) =Γ(α)Γ(β)
Γ(α + β).
�
117