Maximum Likelihood and Method of Moments Estimation
description
Transcript of Maximum Likelihood and Method of Moments Estimation
![Page 1: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/1.jpg)
Maximum Likelihood & Method of Moments Estimation
Patrick Zheng 01/30/14
1
![Page 2: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/2.jpg)
Introduction Goal: Find a good POINT estimation of population parameter
Data: We begin with a random sample of size n taken from the totality of a population.
We shall estimate the parameter based on the sample
Distribution: Initial step is to identify the probability distribution of the sample, which is characterized by the parameter.
The distribution is always easy to identify The parameter is unknown.
2
![Page 3: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/3.jpg)
Notations Sample: 𝑋 , 𝑋 ,…, 𝑋 Distribution: 𝑋 iid f(x, 𝜃) Parameter: 𝜃 Example
e.g., the distribution is normal (f=Normal) with unknown parameter 𝜇 and 𝜎 (𝜃=(𝜇, 𝜎 )). e.g., the distribution is binomial (f=binomial) with unknown parameter p (𝜃= p).
3
![Page 4: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/4.jpg)
It’s important to have a good estimate!
The importance of point estimates lies in the fact that many statistical formulas are based on them, such as confidence interval and formulas for hypothesis testing, etc.. A good estimate should 1. Be unbiased 2. Have small variance 3. Be efficient 4. Be consistent 4
![Page 5: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/5.jpg)
Unbiasedness An estimator is unbiased if its mean equals the parameter. It does not systematically overestimate or underestimate the target parameter. Sample mean( )/proportion( ) is an unbiased estimator of population mean/proportion.
x p̂
5
![Page 6: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/6.jpg)
Small variance We also prefer the sampling distribution of the estimator has a small spread or variability, i.e. small standard deviation.
6
![Page 7: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/7.jpg)
Efficiency An estimator 𝜃 is said to be efficient if its Mean Square Error (MSE) is minimum among all competitors.
Relative Efficiency(𝜃 , 𝜃 ) = ( )( )
If >1, 𝜃 is more efficient than 𝜃 . If <1, 𝜃 is more efficient than 𝜃 .
2 2ˆ ˆ ˆ ˆMSE( ) E( ) Bias ( ) var( ),ˆ ˆwhere Bias( ) E( ) .
T T T T T
T T T
� �
�
7
![Page 8: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/8.jpg)
Example: efficiency Suppose If then
If then Since ,
𝜇 is more efficient than 𝜇 .
21 2 nX ,X ,...X iid~ N( , ).P V
2 21 1 1ˆ ˆ ˆMSE( ) Bias ( ) var( ) 0 .P P P V � �
2 22 2 2ˆ ˆ ˆMSE( ) Bias ( ) var( ) 0 / n.P P P V � �
1 1ˆ X ,P
1 2 n2
X X ... Xˆ X ,n
P� � �
22
1 2 21
ˆMSE( ) / n 1ˆ ˆR.E.( , ) = = 1ˆMSE( ) nP V
P PP V
�
8
![Page 9: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/9.jpg)
Consistency An estimator 𝜃 is said to be consistent if sample size 𝑛 goes to +∞, 𝜃 will converge in probability to 𝜃.
Chebychev’s rule
If one can prove MSE of 𝜃 tends to 0 when 𝑛 goes to +∞, then 𝜃 is consistent.
ˆ0, Pr(| | ) 0 as nH T T H� ! � ! o o�f
2
2 2
ˆ ˆE( ) MSE( )ˆ0, Pr(| | ) T T TH T T H
H H�
� ! � t d
9
![Page 10: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/10.jpg)
Example: Consistency Suppose Estimator is consistent, since
21 2 nX ,X ,...X iid~ N( , ).P V
1 2 nX X ... Xˆ Xn
P� � �
2
2 2
2
2
ˆ ˆE( ) MSE( )ˆ0, Pr(| | )
/ n 0 as n
P P PH P P H
H HVH
�� ! � t d
o o�f
10
![Page 11: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/11.jpg)
Point Estimation Methods There are many methods available for estimating the parameter(s) of interest. Three of the most popular methods of estimation are:
The method of moments (MM) The method of maximum likelihood (ML) Bayesian method
11
![Page 12: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/12.jpg)
1, The Method of Moments
12
![Page 13: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/13.jpg)
The Method of Moments One of the oldest methods; very simple procedure What is Moment? Based on the assumption that sample moments should provide GOOD ESTIMATES of the corresponding population moments.
13
![Page 14: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/14.jpg)
How it works?
14
n' ' 21 2 ii 1
m X; m (1/ n) X
¦
![Page 15: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/15.jpg)
Example: normal distribution
' ' 2 2 21 2
n' ' 21 2 ii 1
' ' ' '1 1 2 2
n2 2 2ii 1
n2 2 2ii 1
step1, E(X) ; E(X ) .
step 2, m X; m (1/ n) X .
step 3, Set m , m , therefore,
X,
(1/ n) X
ˆ ˆSolving the two equations, we get X, (1/ n) X X
P W P W V
P P
W
W V
W V
�
�
�
¦
¦¦
21 2 nX ,X ,...X iid~ N( , ).W V
15
![Page 16: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/16.jpg)
Example: Bernoulli Distribution
X follows a Bernoulli distribution, if p if x 1
P(X x)1 p if x 0
® � ¯
16
![Page 17: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/17.jpg)
Example: Poisson distribution
17
![Page 18: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/18.jpg)
Note MME may not be unique. In general, minimum number of moment conditions we need equals the number of parameters.
Question: Can these two estimators be combined in some optimal way?
Answer: Generalized method of moments.
18
![Page 19: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/19.jpg)
Pros of Method of Moments Easy to compute and always work:
The method often provides estimators when other methods fail to do so or when estimators are hard to obtain (as in the case of gamma distribution).
MME is consistent.
19
![Page 20: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/20.jpg)
Cons of Method of Moments They are usually not the “best estimators” available. By best, we mean most efficient, i.e., achieving minimum MSE. Sometimes it may be meaningless.
(see next page for example)
20
![Page 21: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/21.jpg)
Sometimes, MME is meaningless Suppose we observe 3,5,6,18 from a U(0,𝜃) Since E(X)= 𝜃 /2,
MME of 𝜃 is 2 =2*3+5+6+184 =16, which is
not acceptable, because we have already observed a value of 18.
X
21
![Page 22: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/22.jpg)
2, The Method of Maximum Likelihood
22
![Page 23: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/23.jpg)
The Method of Maximum Likelihood
Proposed by geneticist/statistician: Sir Ronald A. Fisher in 1922
Idea: We attempt to find the values of the parameters which would have most likely produced the data that we in fact observed.
23
![Page 24: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/24.jpg)
What is likelihood?
¾ E.g., Likelihood of 𝜃=1 is the chance of observing when 𝜃=1.
¾
1 2 nX ,X ,...,X
24
![Page 25: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/25.jpg)
How to compute Likelihood?
¾
¾
25
![Page 26: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/26.jpg)
Example of computing likelihood (discrete case)
26
![Page 27: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/27.jpg)
Example of computing likelihood (continuous case)
27
![Page 28: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/28.jpg)
Definition of MLE
¾ In general, the method of ML results in the problem of maximizing a function of single or several parameters. One way to do the maximization is to take derivative.
¾
28
![Page 29: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/29.jpg)
Procedure to find MLE
29
![Page 30: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/30.jpg)
Example: Poisson Distribution
30
![Page 31: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/31.jpg)
Example cont’d
31
![Page 32: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/32.jpg)
Example: Uniform Distribution
32
![Page 33: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/33.jpg)
Example cont’d
33
![Page 34: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/34.jpg)
More than one parameter
34
![Page 35: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/35.jpg)
Pros of Method of ML When sample size n is large (n>30), MLE is unbiased, consistent, normally distributed, and efficient (“regularity conditions”) “Efficient” means it produces the minimum MSE than other methods including Method of Moments
More useful in statistical inference.
35
![Page 36: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/36.jpg)
Cons of Method of ML MLE can be highly biased for small samples. Sometimes, MLE has no closed-form solution. MLE can be sensitive to starting values, which might not give a global optimum.
Common when 𝜃 is of high dimension
36
![Page 37: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/37.jpg)
How to maximize Likelihood
1. Take derivative and solve analytically (as aforementioned)
2. Apply maximization techniques including Newton’s method, quasi-Newton method (Broyden 1970), direct search method (Nelder and Mead 1965), etc. These methods can be implemented by R function optimize(), optim()
37
![Page 38: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/38.jpg)
Newton’s Method a method for finding successively better approximations to the roots (or zeroes) of a real-valued function.
Pick an 𝑥 close to the root of a continuous function 𝑓(𝑥) Take the derivative of 𝑓(𝑥) to get 𝑓′(𝑥) Plug into 𝑥 = 𝑥 − ( )
( ), 𝑓′(𝑥 )≠ 0
Repeat until converges where 𝑥 ≈ 𝑥
38
![Page 39: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/39.jpg)
Example Solve 𝑒 − 1 = 0
Denote 𝑓(𝑥)= 𝑒 − 1; let starting point 𝑥 = 0.1 𝑓′(𝑥)=𝑒
𝑥 = 𝑥 − ( )( ) :
𝑥 = 𝑥 − = 0.1 −.. = 0.0048374
𝑥 = 𝑥 − =…
Repeat until |𝑥 − 𝑥 | < 0.00001, 𝑥 = 7.106 ∗ 10
39
![Page 40: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/40.jpg)
Example: find MLE by Newton’s Method In Poisson Distribution, find 𝜆 is equivalent to
maximizing ln 𝐿(𝜆) finding the root of ( )
= ∑ − 𝑛
Implement Newton’s method here,
define 𝑓(𝜆) = ( ) = ∑ − 𝑛
𝑓′(𝜆) = ∑
𝜆 = 𝜆 − ( )( )
Given 𝑥 , 𝑥 , … , 𝑥 and 𝜆 , we can find 𝜆.
40
![Page 41: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/41.jpg)
Example cont’d Suppose we collected a sample from Poi(𝜆):
18,10,8,13,7,17,11,6,7,7,10,10,12,4,12,4,12,10,7,14,13,7
Implement Newton’s method in R:
41
𝜆 = 𝜆 − 𝑓(𝜆 )𝑓′(𝜆 )
![Page 42: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/42.jpg)
Use R function optim()
42
𝑓(𝜆) = ∑𝑥𝜆 − 𝑛
![Page 43: Maximum Likelihood and Method of Moments Estimation](https://reader034.fdocuments.us/reader034/viewer/2022042705/577c84321a28abe054b7e5cd/html5/thumbnails/43.jpg)
The End! Thank you!
43