20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11...

20: Maximum Likelihood Estimation Lisa Yan May 20, 2020 1

Transcript of 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11...

Page 1: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

20: Maximum Likelihood EstimationLisa YanMay 20, 2020


Page 2: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Quick slide reference


3 Intro to parameter estimation 20a_intro

14 Maximum Likelihood Estimator 20b_mle

21 argmax and log-likelihood 20c_argmax

30 MLE: Bernoulli 20d_mle_bernoulli

42 MLE exercises: Poisson, Uniform, Gaussian LIVE

Page 3: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Intro to parameter estimation



Page 4: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Story so farAt this point:

If you are given a model with all thenecessary probabilities, you canmake predictions.

But what if you want to learn the probabilities in the model?

What if you want to learn the structure of the model, too?

Machine Learning4

(I wish…another day)

π‘Œ~Poi 5

𝑋!, … , 𝑋" i.i.d.𝑋~Ber 0.2 ,𝑋 = βˆ‘ 𝑋#"


Page 5: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

ML: Rooted in probability theory

Artificial Intelligence

Machine Learning

Deep Learning

AI and Machine Learning

Page 6: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Tensor Flow

Alright, so Deep Learning now?

Not so fast…

Page 7: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Page 8: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020 8

Page 9: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Once upon a time…


…there was parameter estimation.

Page 10: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Recall some estimators 𝑋!, 𝑋", … , 𝑋# are 𝑛 i.i.d. random variables,where 𝑋$ drawn from distribution 𝐹 with 𝐸 𝑋$ = πœ‡, Var 𝑋$ = 𝜎".

Sample mean:


𝑋+ =1𝑛

- 𝑋$



unbiased estimate of πœ‡

Sample variance: 𝑆" =1

𝑛 βˆ’ 1- 𝑋$ βˆ’ 𝑋+ "



unbiased estimate of 𝜎"

Page 11: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

What are parameters?def Many random variables we have learned so far are parametric models:

Distribution = model + parameter πœƒex The distribution Ber 0.2

For each of the distributions below, what is the parameter πœƒ?

1. Ber 𝑝2. Poi πœ†3. Uni 𝛼, 𝛽4. 𝒩(πœ‡, 𝜎!)5. π‘Œ = π‘šπ‘‹ + 𝑏


πœƒ = 𝑝

= Bernoulli model, parameter πœƒ = 0.2.


Page 12: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

What are parameters?def Many random variables we have learned so far are parametric models:

Distribution = model + parameter πœƒex The distribution Ber 0.2

For each of the distributions below, what is the parameter πœƒ?

1. Ber 𝑝2. Poi πœ†3. Uni 𝛼, 𝛽4. 𝒩(πœ‡, 𝜎!)5. π‘Œ = π‘šπ‘‹ + 𝑏


= Bernoulli model, parameter πœƒ = 0.2.

πœƒ = π‘πœƒ = πœ†πœƒ = 𝛼, 𝛽

πœƒ = π‘š, π‘πœƒ = πœ‡, 𝜎"

πœƒ is the parameter of a distribution.πœƒ can be a vector of parameters!

Page 13: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Why do we care?In real world, we don’t know the β€œtrue” parameters.β€’ But we do get to observe data:

def estimator πœƒ: : random variable estimating parameter πœƒ from data.

In parameter estimation,We use the point estimate of parameter estimate (best single value):β€’ Better understanding of the process producing dataβ€’ Future predictions based on modelβ€’ Simulation of future processes


(# times coin comes up heads, lifetimes of disk drives produced, # visitors to website per day, etc.)

Page 14: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Maximum Likelihood Estimator



Page 15: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Defining the likelihood of data: BernoulliConsider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.β€’ 𝑋$ was drawn from distribution 𝐹 = Ber πœƒ with unknown parameter πœƒ.β€’ Observed data:

0, 0, 1, 1, 1, 1, 1, 1, 1, 1

How likely was the observed data if πœƒ = 0.4?

𝑃 sample|πœƒ = 0.4 = 0.4 & 0.6 " = 0.000236


(𝑛 = 10)

Likelihood of datagiven parameter πœƒ = 0.4 Is there a better

parameter πœƒ?

Page 16: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Defining the likelihood of dataConsider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.β€’ 𝑋$ was drawn from a distribution with density function 𝑓 𝑋$|πœƒ .β€’ Observed data: 𝑋!, 𝑋", … , 𝑋#

Likelihood question:How likely is the observed data 𝑋!, 𝑋", … , 𝑋# given parameter πœƒ?

Likelihood function, 𝐿 πœƒ :


This is just a product, since 𝑋# are i.i.d.

or mass

= B 𝑓 𝑋$|πœƒ#


𝐿 πœƒ = 𝑓 𝑋!, 𝑋", … , 𝑋#|πœƒ

Page 17: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Defining the likelihood of data


𝐿 πœƒ = B 𝑓 𝑋$|πœƒ#


Page 18: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood EstimatorConsider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#, drawn from a distribution 𝑓 𝑋$|πœƒ .def The Maximum Likelihood Estimator (MLE) of πœƒ is the value of πœƒ that

maximizes 𝐿 πœƒ .


πœƒ!"# = arg max$

𝐿 πœƒ

Page 19: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood EstimatorConsider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#, drawn from a distribution 𝑓 𝑋$|πœƒ .def The Maximum Likelihood Estimator (MLE) of πœƒ is the value of πœƒ that

maximizes 𝐿 πœƒ .


πœƒ!"# = arg max$

𝐿 πœƒ

𝐿 πœƒ = B 𝑓 𝑋$|πœƒ#


Likelihood of your sample

For continuous 𝑋$, 𝑓 𝑋$|πœƒ is PDF; for discrete 𝑋$, 𝑓 𝑋$|πœƒ is PMF

Page 20: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood EstimatorConsider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#, drawn from a distribution 𝑓 𝑋$|πœƒ .def The Maximum Likelihood Estimator (MLE) of πœƒ is the value of πœƒ that

maximizes 𝐿 πœƒ .


πœƒ!"# = arg max$

𝐿 πœƒ

The argument πœƒthat maximizes 𝐿 πœƒ

Stay tuned!

Page 21: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?




Page 22: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

New function: arg max

1. max'

𝑓 π‘₯ ?

2. arg max'

𝑓 π‘₯ ?


arg max%

𝑓 π‘₯ The argument π‘₯ thatmaximizes the function 𝑓 π‘₯ .

Let 𝑓 π‘₯ = βˆ’π‘₯" + 4, where βˆ’2 < π‘₯ < 2.






-2 -1 0 1 2

𝑓 π‘₯

π‘₯ πŸ€”

Page 23: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

New function: arg max

1. max'

𝑓 π‘₯ ?

2. arg max'

𝑓 π‘₯ ?


arg max%

𝑓 π‘₯ The argument π‘₯ thatmaximizes the function 𝑓 π‘₯ .

Let 𝑓 π‘₯ = βˆ’π‘₯" + 4, where βˆ’2 < π‘₯ < 2.






-2 -1 0 1 2

𝑓 π‘₯


= 4

= 0

Page 24: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Argmax and log


arg max%

𝑓 π‘₯

Let 𝑓 π‘₯ = βˆ’π‘₯" + 4, where βˆ’2 < π‘₯ < 2.






-2 -1 0 1 2

𝑓 π‘₯

π‘₯arg max

' 𝑓 π‘₯ = 0





-2 -1 0 1 2

log 𝑓 π‘₯


= arg max%

log 𝑓 π‘₯

The argument π‘₯ thatmaximizes the function 𝑓 π‘₯ .

Page 25: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Logs all around






-1 0 1 2 3 4 5 6

β€’ Log is monotonic:π‘₯ ≀ 𝑦 ⟺ log π‘₯ ≀ log 𝑦

β€’ Log of product = sum of logs:

β€’ Natural logslog π‘₯

log π‘Žπ‘ = log π‘Ž + log 𝑏

log(π‘₯ = ln π‘₯

Page 26: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Argmax properties


arg max%

𝑓 π‘₯

= arg max%

log 𝑓 π‘₯ (log is an increasing function: π‘₯ ≀ 𝑦 ⟺ log π‘₯ ≀ log 𝑦)

= arg max%

𝑐 log 𝑓 π‘₯

for any positive constant 𝑐

(π‘₯ ≀ 𝑦 ⟺ 𝑐 log π‘₯ ≀ 𝑐 log 𝑦)

The argument π‘₯ thatmaximizes the function 𝑓 π‘₯ .

Page 27: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Argmax properties


arg max%

𝑓 π‘₯

= arg max%

log 𝑓 π‘₯ (log is monotonic: π‘₯ ≀ 𝑦 ⟺ log π‘₯ ≀ log 𝑦)

= arg max%

𝑐 log 𝑓 π‘₯

for any positive constant 𝑐

(π‘₯ ≀ 𝑦 ⟺ 𝑐 log π‘₯ ≀ 𝑐 log 𝑦)

The argument π‘₯ thatmaximizes the function 𝑓 π‘₯ .

arg max

How do we compute argmax?

Page 28: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Finding the argmax with calculus


π‘₯/ = arg max%

𝑓 π‘₯ Let 𝑓 π‘₯ = βˆ’π‘₯" + 4, where βˆ’2 < π‘₯ < 2.






-2 -1 0 1 2

𝑓 π‘₯



𝑓 π‘₯ =𝑑

𝑑π‘₯π‘₯" + 4 = 2π‘₯Differentiate w.r.t.

argmax’s argument

Set to 0 and solve 2π‘₯ = 0 β‡’ π‘₯U = 0

Make sure π‘₯;is a maximum

β€’ Check 𝑓 π‘₯; Β± πœ– < 𝑓 π‘₯;β€’ Often ignored in expository derivationsβ€’ We’ll ignore it here too

(and won’t require it in class)

Page 29: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

MLE: Bernoulli



Page 30: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Computing the MLE

General approach for finding πœƒ)*+ , the MLE of πœƒ:


πœƒ!"# = arg max$

𝐿𝐿 πœƒ

1. Determine formula for 𝐿𝐿 πœƒ

2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ

𝐿𝐿 πœƒ = ? log 𝑓 𝑋#|πœƒ"


πœ•πΏπΏ πœƒπœ•πœƒ

3. Solve resulting(simultaneous) equations

To maximize:πœ•πΏπΏ πœƒ

πœ•πœƒ = 0

4. Make sure derived πœƒB%&' is a maximum β€’ Check 𝐿𝐿 πœƒ%&' Β± πœ– < 𝐿𝐿 πœƒ%&'β€’ Often ignored in expository derivationsβ€’ We’ll ignore it here too (and won’t require it in class)

(algebra orcomputer)

𝐿𝐿 πœƒ is often easier to differentiate than 𝐿 πœƒ .

Page 31: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with BernoulliConsider a sample of 𝑛 i.i.d. RVs 𝑋!, 𝑋", … , 𝑋#.What is πœƒ)*+ = 𝑝)*+?


1. Determine formula for 𝐿𝐿 πœƒ

β€’ Let 𝑋$~Ber 𝑝 .

2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0

3. Solve resultingequations

𝑓 𝑋$|𝑝 = W𝑝 if 𝑋$ = 11 βˆ’ 𝑝 if 𝑋$ = 0

𝐿𝐿 πœƒ = ? log 𝑓 𝑋#|𝑝"


Page 32: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with BernoulliConsider a sample of 𝑛 i.i.d. RVs 𝑋!, 𝑋", … , 𝑋#.What is πœƒ)*+ = 𝑝)*+?


1. Determine formula for 𝐿𝐿 πœƒ

β€’ Let 𝑋$~Ber 𝑝 .β€’ 𝑓 𝑋#|𝑝 = 𝑝(! 1 βˆ’ 𝑝 !)(!

2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0

3. Solve resultingequations

𝐿𝐿 πœƒ = ? log 𝑓 𝑋#|𝑝"


𝑓 𝑋$|𝑝 = 𝑝7! 1 βˆ’ 𝑝 !87! where 𝑋$ ∈ {0,1}

β€’ Is differentiable with respect to 𝑝‒ Valid PMF over discrete domainβœ…

𝑓 𝑋$|𝑝 = W𝑝 if 𝑋$ = 11 βˆ’ 𝑝 if 𝑋$ = 0

Page 33: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

𝐿𝐿 πœƒ = ? log 𝑓 𝑋#|𝑝"


Maximum Likelihood with BernoulliConsider a sample of 𝑛 i.i.d. RVs 𝑋!, 𝑋", … , 𝑋#.What is πœƒ)*+ = 𝑝)*+?


1. Determine formula for 𝐿𝐿 πœƒ

β€’ Let 𝑋$~Ber 𝑝 .β€’ 𝑓 𝑋#|𝑝 = 𝑝(! 1 βˆ’ 𝑝 !)(!

2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0

3. Solve resultingequations

= ? log 𝑝(! 1 βˆ’ 𝑝 !)(!"


π‘Œ = ? 𝑋#"


= ? 𝑋# log 𝑝 + 1 βˆ’ 𝑋# log 1 βˆ’ 𝑝"


= π‘Œ log 𝑝 + 𝑛 βˆ’ π‘Œ log 1 βˆ’ 𝑝 , where

Page 34: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with BernoulliConsider a sample of 𝑛 i.i.d. RVs 𝑋!, 𝑋", … , 𝑋#.What is πœƒ)*+ = 𝑝)*+?


1. Determine formula for 𝐿𝐿 πœƒ

β€’ Let 𝑋$~Ber 𝑝 .β€’ 𝑓 𝑋#|𝑝 = 𝑝(! 1 βˆ’ 𝑝 !)(!

2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0

3. Solve resultingequations

= π‘Œ log 𝑝 + 𝑛 βˆ’ π‘Œ log 1 βˆ’ 𝑝 , where π‘Œ = ? 𝑋#



𝐿𝐿 πœƒ = ? 𝑋# log 𝑝 + 1 βˆ’ 𝑋# log 1 βˆ’ 𝑝"


πœ•πΏπΏ πœƒπœ•π‘ = π‘Œ

1𝑝 + 𝑛 βˆ’ π‘Œ

βˆ’11 βˆ’ 𝑝 = 0

Page 35: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with BernoulliConsider a sample of 𝑛 i.i.d. RVs 𝑋!, 𝑋", … , 𝑋#.What is πœƒ)*+ = 𝑝)*+?


1. Determine formula for 𝐿𝐿 πœƒ

β€’ Let 𝑋$~Ber 𝑝 .β€’ 𝑓 𝑋#|𝑝 = 𝑝(! 1 βˆ’ 𝑝 !)(!

2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0

3. Solve resultingequations

= π‘Œ log 𝑝 + 𝑛 βˆ’ π‘Œ log 1 βˆ’ 𝑝 , where π‘Œ = ? 𝑋#



𝐿𝐿 πœƒ = ? 𝑋# log 𝑝 + 1 βˆ’ 𝑋# log 1 βˆ’ 𝑝"


πœ•πΏπΏ πœƒπœ•π‘ = π‘Œ

1𝑝 + 𝑛 βˆ’ π‘Œ

βˆ’11 βˆ’ 𝑝 = 0

Page 36: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with BernoulliConsider a sample of 𝑛 i.i.d. RVs 𝑋!, 𝑋", … , 𝑋#.What is πœƒ)*+ = 𝑝)*+?


1. Determine formula for 𝐿𝐿 πœƒ

β€’ Let 𝑋$~Ber 𝑝 .β€’ 𝑓 𝑋#|𝑝 = 𝑝(! 1 βˆ’ 𝑝 !)(!

2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0

3. Solve resultingequations

= π‘Œ log 𝑝 + 𝑛 βˆ’ π‘Œ log 1 βˆ’ 𝑝 , where π‘Œ = ? 𝑋#



𝐿𝐿 πœƒ = ? 𝑋# log 𝑝 + 1 βˆ’ 𝑋# log 1 βˆ’ 𝑝"


πœ•πΏπΏ πœƒπœ•π‘ = π‘Œ

1𝑝 + 𝑛 βˆ’ π‘Œ

βˆ’11 βˆ’ 𝑝 = 0

𝑝%&' =1𝑛 π‘Œ =

1𝑛 ? 𝑋#



MLE of the Bernoulli parameter, 𝑝%&', is the unbiased estimate of the mean, 𝑋F (sample mean)

Page 37: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

MLE of Bernoulli is the sample mean


𝑋F =1𝑛 ? 𝑋#



Bernoulli𝑓 𝑋#|𝑝 = 𝑝(! 1 βˆ’ 𝑝 !)(! ,

where 𝑋# ∈ {0,1}

𝐿𝐿 πœƒ = ? log 𝑓 𝑋#|𝑝"


Page 38: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Quick checkβ€’ You draw 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋# from the distribution 𝐹,

yielding the following sample: 0, 0, 1, 1, 1, 1, 1, 1, 1, 1

β€’ Suppose distribution 𝐹 = Ber 𝑝 with unknown parameter 𝑝.


(𝑛 = 10)

A. 1.0B. 0.5C. 0.8D. 0.2E. None/other

1. What is 𝑝)*+ , the MLE of the parameter 𝑝?

𝑝%&' = 𝑋F =1𝑛 ? 𝑋#




Page 39: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Quick checkβ€’ You draw 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋# from the distribution 𝐹,

yielding the following sample: 0, 0, 1, 1, 1, 1, 1, 1, 1, 1

β€’ Suppose distribution 𝐹 = Ber 𝑝 with unknown parameter 𝑝.


A. 1.0B. 0.5C. 0.8D. 0.2E. None/other

1. What is 𝑝)*+ , the MLE of the parameter 𝑝?

𝑝%&' = 𝑋F =1𝑛 ? 𝑋#



(𝑛 = 10)

Page 40: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Quick checkβ€’ You draw 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋# from the distribution 𝐹,

yielding the following sample: 0, 0, 1, 1, 1, 1, 1, 1, 1, 1

β€’ Suppose distribution 𝐹 = Ber 𝑝 with unknown parameter 𝑝.


C. 0.8

𝐿 πœƒ = J 𝑓 𝑋#|𝑝"


1. What is 𝑝)*+ , the MLE of the parameter 𝑝?2. What is the likelihood 𝐿 πœƒ of this particular sample?

𝑓 𝑋#|𝑝 = 𝑝(! 1 βˆ’ 𝑝 !)(! where 𝑋# ∈ {0,1}

= 𝑝* 1 βˆ’ 𝑝 +

where πœƒ = 𝑝

(𝑛 = 10)

Page 41: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

(live)20: Maximum Likelihood EstimationLisa YanMay 20, 2020


Page 42: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Computing the MLE

General approach for finding πœƒ)*+ , the MLE of πœƒ:


1. Determine formula for 𝐿𝐿 πœƒ

2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ

𝐿𝐿 πœƒ = ? log 𝑓 𝑋#|πœƒ"


πœ•πΏπΏ πœƒπœ•πœƒ

3. Solve resulting(simultaneous) equations

To maximize:πœ•πΏπΏ πœƒ

πœ•πœƒ = 0

4. Make sure derived πœƒB%&' is a maximum β€’ Check 𝐿𝐿 πœƒ%&' Β± πœ– < 𝐿𝐿 πœƒ%&'β€’ Often ignored in expository derivationsβ€’ We’ll ignore it here too (and won’t require it in class)

(algebra orcomputer)

𝐿𝐿 πœƒ is often easier to differentiate than 𝐿 πœƒ .


Page 43: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with PoissonConsider a sample of 𝑛 i.i.d. RVs 𝑋!, 𝑋", … , 𝑋#.What is πœƒ)*+ = πœ†)*+?


1. Determine formula for 𝐿𝐿 πœƒ

𝐿𝐿 πœƒ =$log𝑒!"πœ†#!𝑋$!



= βˆ’π‘›πœ† + log πœ† $𝑋$



βˆ’$log 𝑋$!%


=$ βˆ’πœ† log 𝑒 + 𝑋$ log πœ† βˆ’ log𝑋$!%

$&'(using natural log, ln 𝑒 = 1)

𝑓 𝑋#|πœ† =𝑒),πœ†(!


β€’ Let 𝑋$~Poi πœ† .β€’ PMF:

Page 44: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with PoissonConsider a sample of 𝑛 i.i.d. RVs 𝑋!, 𝑋", … , 𝑋#.What is πœƒ)*+ = πœ†)*+?


1. Determine formula for 𝐿𝐿 πœƒ

𝐿𝐿 πœƒ =$log𝑒!"πœ†#!𝑋$!



= βˆ’π‘›πœ† + log πœ† $𝑋$



βˆ’$log 𝑋$!%


=$ βˆ’πœ† log 𝑒 + 𝑋$ log πœ† βˆ’ log𝑋$!%

$&'(using natural log, ln 𝑒 = 1)

𝑓 𝑋#|πœ† =𝑒),πœ†(!


β€’ Let 𝑋$~Poi πœ† .β€’ PMF:

2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0

βˆ’π‘› +1πœ† A 𝑋%


%'(+ 𝑛 log πœ† βˆ’ A




%'(βˆ’π‘› +

1πœ† ? 𝑋#



A. B. C. None/other/don’t know πŸ€”

πœ•πΏπΏ πœƒπœ•πœ† = ?

Page 45: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with PoissonConsider a sample of 𝑛 i.i.d. RVs 𝑋!, 𝑋", … , 𝑋#.What is πœƒ)*+ = πœ†)*+?


1. Determine formula for 𝐿𝐿 πœƒ

𝐿𝐿 πœƒ =$log𝑒!"πœ†#!𝑋$!



= βˆ’π‘›πœ† + log πœ† $𝑋$



βˆ’$log 𝑋$!%


=$ βˆ’πœ† log 𝑒 + 𝑋$ log πœ† βˆ’ log𝑋$!%

$&'(using natural log, ln 𝑒 = 1)

𝑓 𝑋#|πœ† =𝑒),πœ†(!


β€’ Let 𝑋$~Poi πœ† .β€’ PMF:

2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0

βˆ’π‘› +1πœ† A 𝑋%


%'(+ 𝑛 log πœ† βˆ’ A




%'(βˆ’π‘› +

1πœ† ? 𝑋#



A. B. C. None/other/don’t know

πœ•πΏπΏ πœƒπœ•πœ† = ?

Page 46: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with PoissonConsider a sample of 𝑛 i.i.d. RVs 𝑋!, 𝑋", … , 𝑋#.What is πœƒ)*+ = πœ†)*+?


1. Determine formula for 𝐿𝐿 πœƒ

𝐿𝐿 πœƒ =$log𝑒!"πœ†#!𝑋$!



= βˆ’π‘›πœ† + log πœ† $𝑋$



βˆ’$log 𝑋$!%


=$ βˆ’πœ† log 𝑒 + 𝑋$ log πœ† βˆ’ log𝑋$!%

$&'(using natural log, ln 𝑒 = 1)

𝑓 𝑋#|πœ† =𝑒),πœ†(!


β€’ Let 𝑋$~Poi πœ† .β€’ PMF:

2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0

πœ•πΏπΏ πœƒπœ•πœ† = βˆ’π‘› +

1πœ† ? 𝑋#


#$!= 0

3. Solve resultingequations

πœ†%&' =1𝑛 ? 𝑋#



Page 47: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with PoissonConsider a sample of 𝑛 i.i.d. RVs 𝑋!, 𝑋", … , 𝑋#.What is πœƒ)*+ = πœ†)*+?


1. Determine formula for 𝐿𝐿 πœƒ

𝐿𝐿 πœƒ =$log𝑒!"πœ†#!𝑋$!



= βˆ’π‘›πœ† + log πœ† $𝑋$



βˆ’$log 𝑋$!%


=$ βˆ’πœ† log 𝑒 + 𝑋$ log πœ† βˆ’ log𝑋$!%

$&'(using natural log, ln 𝑒 = 1)

𝑓 𝑋#|πœ† =𝑒),πœ†(!


β€’ Let 𝑋$~Poi πœ† .β€’ PMF:

2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0

πœ•πΏπΏ πœƒπœ•πœ† = βˆ’π‘› +

1πœ† ? 𝑋#


#$!= 0

3. Solve resultingequations

πœ†%&' =1𝑛 ? 𝑋#



MLE of the Poisson parameter, πœ†%&', is the unbiased estimate of the mean, 𝑋F (sample mean)

Page 48: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Quick check1. A particular experiment can be modeled as a

Poisson RV with parameter πœ†, in terms of events/minute.Collect data: observe 53 events over the next 10 minutes. What is πœ†)*+?

2. Is the Bernoulli MLE an unbiased estimator of the Bernoulli parameter 𝑝?

3. Is the Poisson MLE an unbiased estimator of the Poisson variance?

4. What does unbiased mean?



πœ†%&' =1𝑛 ? 𝑋#



Page 49: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Quick check1. A particular experiment can be modeled as a

Poisson RV with parameter πœ†, in terms of events/minute.Collect data: observe 53 events over the next 10 minutes. What is πœ†)*+?

2. Is the Bernoulli MLE an unbiased estimator of the Bernoulli parameter 𝑝?

3. Is the Poisson MLE an unbiased estimator of the Poisson variance?

4. What does unbiased mean?


πœ†%&' =1𝑛 ? 𝑋#





𝐸 estimator = true_thingUnbiased: If you could repeat your experiment, on average you would get what you are looking for.

Page 50: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Interlude for jokes/announcements


Page 51: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020



Problem Set 5

Only do problems on the official Pset handout.

Problem Set 6Released today! Due Wed. August 12 (no late days or on-time bonus).

Regrade RequestsPset 1-5 and midterm regrade requests are due by August 11 via Gradescope. Please submit Pset 6 regrades only in extreme cases (e.g. we didn’t see your answers because of mislabeled pages) via email.

Completely Optional ProjectYou may be able to replace an early Pset grade that you’re unhappy with by completing a CS109-related project. Details here: https://us.edstem.org/courses/667/discussion/98951

Page 52: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Are these trials independent?Are probabilities consistent across jobs?

Interesting probability news




Page 53: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with UniformConsider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.

Let 𝑋$~Uni 𝛼, 𝛽 .


𝑓 𝑋#|𝛼, 𝛽 = Q1

𝛽 βˆ’ 𝛼 if 𝛼 ≀ π‘₯# ≀ 𝛽

0 otherwise

1. Determine formula for 𝐿 πœƒ

2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0

𝐿 πœƒ = R1

𝛽 βˆ’ 𝛼

" if 𝛼 ≀ π‘₯!, π‘₯+, … , π‘₯" ≀ 𝛽

0 otherwise

A. Great, let’s do itB. Differentiation is hardC. Constraint 𝛼 ≀ π‘₯!, π‘₯+, … , π‘₯" ≀ 𝛽

makes differentiation hard πŸ€”

Page 54: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Example sample from a UniformConsider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.

Let 𝑋$~Uni 𝛼, 𝛽 .


𝐿 πœƒ = R1

𝛽 βˆ’ 𝛼

" if 𝛼 ≀ π‘₯!, π‘₯+, … , π‘₯" ≀ 𝛽

0 otherwise

A. Uni 𝛼 = 0 , 𝛽 = 1

B. Uni 𝛼 = 0.15, 𝛽 = 0.75

C. Uni 𝛼 = 0.15, 𝛽 = 0.70

Suppose 𝑋$~Uni 0,1 .You observe data:

Which parameterswould give youmaximum 𝐿 πœƒ ?

0.15, 0.20, 0.30, 0.40, 0.65, 0.70, 0.75


Page 55: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Example sample from a UniformConsider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.

Let 𝑋$~Uni 𝛼, 𝛽 .


𝐿 πœƒ = R1

𝛽 βˆ’ 𝛼

" if 𝛼 ≀ π‘₯!, π‘₯+, … , π‘₯" ≀ 𝛽

0 otherwise

A. Uni 𝛼 = 0 , 𝛽 = 1

B. Uni 𝛼 = 0.15, 𝛽 = 0.75

C. Uni 𝛼 = 0.15, 𝛽 = 0.70

Suppose 𝑋$~Uni 0,1 .You observe data:

Which parameterswould give youmaximum 𝐿 πœƒ ?

0.15, 0.20, 0.30, 0.40, 0.65, 0.70, 0.75


0β‹… 0 = 0

1 1 = 1!-.0

1= 59.5

⚠ Original parameters may not yield maximum likelihood.

Page 56: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with UniformConsider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.

Let 𝑋$~Uni 𝛼, 𝛽 .


𝐿 πœƒ = R1

𝛽 βˆ’ 𝛼

" if 𝛼 ≀ π‘₯!, π‘₯+, … , π‘₯" ≀ 𝛽

0 otherwise

πœƒ)*+ : 𝛼)*+ = min π‘₯!, π‘₯", … , π‘₯# 𝛽)*+ = max π‘₯!, π‘₯", … , π‘₯#

Intuition:β€’ Want interval size 𝛽 βˆ’ 𝛼 to be as small

as possible to maximize likelihood function per datapoint

β€’ Need to make sure all observed data is in interval (if not, then 𝐿 πœƒ = 0)

Page 57: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Small samples = problems with MLEMaximum Likelihood Estimator πœƒ)*+ :β€’ Best explains data we have seen β€’ Does not attempt to generalize to unseen data.

In many cases,

β€’ Unbiased (𝐸 πœ‡%&' = πœ‡ regardless of size of sample, 𝑛)

For some cases, like Uniform:

β€’ Biased. Problematic for small sample sizeβ€’ Example: If 𝑛 = 1 then 𝛼 = 𝛽, yielding an invalid distribution


πœ‡%&' =1𝑛 ? 𝑋#


#$!Sample mean (MLE for Bernoulli 𝑝,

Poisson πœ†, Normal πœ‡)

𝛼)*+ β‰₯ 𝛼, 𝛽)*+ ≀ 𝛽



πœƒ)*+ = arg maxF

𝐿 πœƒ

Page 58: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Properties of MLEMaximum Likelihood Estimator:β€’ Best explains data we have seen β€’ Does not attempt to generalize to unseen data.

β€’ Often used when sample size 𝑛 is large relative to parameter space

β€’ Potentially biased (though asymptotically less so, as 𝑛 β†’ ∞)

β€’ Consistent:

As 𝑛 β†’ ∞ (i.e., more data), probability that πœƒB significantly differs from πœƒ is zero


πœƒ)*+ = arg maxF

𝐿 πœƒ


𝑃 πœƒ: βˆ’ πœƒ < πœ€ = 1 where πœ€ > 0

Page 59: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with Normal


𝐿𝐿 πœƒ = ? log1

2πœ‹πœŽπ‘’) (!)2 "/ +4"


#$!= ? βˆ’ log 2πœ‹πœŽ βˆ’ 𝑋# βˆ’ πœ‡ +/ 2𝜎+


#$! (using natural log)

= βˆ’ ? log 2πœ‹πœŽ"


βˆ’ ? 𝑋# βˆ’ πœ‡ +/ 2𝜎+"


Consider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.β€’ Let 𝑋#~ 𝒩 πœ‡, 𝜎+ .

What is πœƒ)*+ = πœ‡)*+ , 𝜎)*+" ?

1. Determine formula for 𝐿𝐿 πœƒ

3. Solve resultingequations

2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0

𝑓 𝑋#|πœ‡, 𝜎+ =1

2πœ‹πœŽπ‘’) (!)2 "/ +4"

Page 60: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with NormalConsider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.β€’ Let 𝑋#~ 𝒩 πœ‡, 𝜎+ .

What is πœƒ)*+ = πœ‡)*+ , 𝜎)*+" ?


1. Determine formula for 𝐿𝐿 πœƒ

3. Solve resultingequations

𝐿𝐿 πœƒ = βˆ’ ? log 2πœ‹πœŽ"


βˆ’ ? 𝑋# βˆ’ πœ‡ +/ 2𝜎+"


2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0

𝑓 𝑋#|πœ‡, 𝜎+ =1

2πœ‹πœŽπ‘’) (!)2 "/ +4"

πœ•πΏπΏ πœƒπœ•πœ‡ = ? 2 𝑋# βˆ’ πœ‡ / 2𝜎+




𝜎+ ? 𝑋# βˆ’ πœ‡"


= 0

with respect to πœ‡

Page 61: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with NormalConsider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.β€’ Let 𝑋#~ 𝒩 πœ‡, 𝜎+ .

What is πœƒ)*+ = πœ‡)*+ , 𝜎)*+" ?


1. Determine formula for 𝐿𝐿 πœƒ

3. Solve resultingequations

𝐿𝐿 πœƒ = βˆ’ ? log 2πœ‹πœŽ"


βˆ’ ? 𝑋# βˆ’ πœ‡ +/ 2𝜎+"


2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0

𝑓 𝑋#|πœ‡, 𝜎+ =1

2πœ‹πœŽπ‘’) (!)2 "/ +4"

πœ•πΏπΏ πœƒπœ•πœ‡ = ? 2 𝑋# βˆ’ πœ‡ / 2𝜎+




𝜎+ ? 𝑋# βˆ’ πœ‡"


= 0

with respect to πœ‡ with respect to 𝜎

πœ•πΏπΏ πœƒπœ•πœŽ = βˆ’ ?




+ ? 2 𝑋# βˆ’ πœ‡ +/ 2𝜎5"


= βˆ’π‘›πœŽ +

1𝜎5 ? 𝑋# βˆ’ πœ‡ +



= 0

Page 62: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with Normal Consider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.β€’ Let 𝑋#~ 𝒩 πœ‡, 𝜎+ .

What is πœƒ)*+ = πœ‡)*+ , 𝜎)*+" ?


3. Solve resultingequations

𝑓 𝑋#|πœ‡, 𝜎+ =1

2πœ‹πœŽπ‘’) (!)2 "/ +4"

1𝜎+ ? 𝑋# βˆ’ πœ‡


#$!= 0Two equations,

two unknowns:

First, solvefor πœ‡%&':

1𝜎+ ? 𝑋#



1𝜎+ ? πœ‡


#$!= 0 β‡’ ? 𝑋#



= π‘›πœ‡ β‡’ πœ‡%&' =1𝑛 ? 𝑋#



βˆ’π‘›πœŽ +

1𝜎5 ? 𝑋# βˆ’ πœ‡ +



= 0

Page 63: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with NormalConsider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.β€’ Let 𝑋#~ 𝒩 πœ‡, 𝜎+ .

What is πœƒ)*+ = πœ‡)*+ , 𝜎)*+" ?


3. Solve resultingequations

𝑓 𝑋#|πœ‡, 𝜎+ =1

2πœ‹πœŽπ‘’) (!)2 "/ +4"

βˆ’π‘›πœŽ +

1𝜎5 ? 𝑋# βˆ’ πœ‡ +



= 0Two equations, two unknowns:

1𝜎+ ? 𝑋#



1𝜎+ ? πœ‡


#$!= 0 β‡’ ? 𝑋#



= π‘›πœ‡ β‡’ πœ‡%&' =1𝑛 ? 𝑋#



Next, solvefor 𝜎%&':

1𝜎5 ? 𝑋# βˆ’ πœ‡ +



=π‘›πœŽ β‡’ ? 𝑋# βˆ’ πœ‡ +



= 𝜎+𝑛 β‡’ 𝜎%&'+ =1𝑛 ? 𝑋# βˆ’ πœ‡%&' +




First, solvefor πœ‡%&':

1𝜎+ ? 𝑋# βˆ’ πœ‡


#$!= 0

Page 64: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Estimating a Bernoulli parameterConsider 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.β€’ Suppose distribution 𝐹 = Ber πœƒ with unknown parameter πœƒ.β€’ Say you have three coins: πœƒ! = 0.5, πœƒ" = 0.8, or πœƒK = 1

Which coin is most likely to give you the following sample (𝑛 = 10)?0, 0, 1, 1, 1, 1, 1, 1, 1, 1



Page 65: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Estimating a Bernoulli parameterConsider 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.β€’ Suppose distribution 𝐹 = Ber πœƒ with unknown parameter πœƒ.β€’ Say you have three coins: πœƒ! = 0.5, πœƒ" = 0.8, or πœƒK = 1

Which estimate is most likely to give you the following sample (𝑛 = 10)?0, 0, 1, 1, 1, 1, 1, 1, 1, 1


How do we write this process mathematically?

Most likely, sochoose this coin

𝑃 sample|πœƒ = 0.5 = 0.5 & 0.5 " = 0.00097𝑃 sample|πœƒ = 0.8 = 0.8 & 0.2 " = 0.00671𝑃 sample|πœƒ = 1.0 = 1.0 & 0 " = 0

Page 66: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Estimating a Bernoulli parameter Consider 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.β€’ Suppose distribution 𝐹 = Ber πœƒ with unknown parameter πœƒ.β€’ Say you have three coins: πœƒ! = 0.5, πœƒ" = 0.8, or πœƒK = 1

Which estimate is most likely to give you the following sample (𝑛 = 10)? 0, 0, 1, 1, 1, 1, 1, 1, 1, 1


πœƒ3 = arg max$∈ '.),'.+,,

πœƒ+ 1 βˆ’ πœƒ - = 0.8

Most likely, sochoose this coin

𝑃 sample|πœƒ = 0.5 = 0.5 & 0.5 " = 0.00097𝑃 sample|πœƒ = 0.8 = 0.8 & 0.2 " = 0.00671𝑃 sample|πœƒ = 1.0 = 1.0 & 0 " = 0

Page 67: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with BernoulliConsider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.β€’ Let 𝑋#~Ber 𝑝 .

What is πœƒ)*+ = 𝑝)*+?


What is the PMF 𝑓 𝑋$|𝑝 ?A. 𝑝B. 1 βˆ’ 𝑝

C. W𝑝 if 𝑋$ = 11 βˆ’ 𝑝 if 𝑋$ = 0

D. 𝑝7! 1 βˆ’ 𝑝 !87! where 𝑋$ ∈ {0,1}

1. Determine formula for 𝐿𝐿 πœƒ

3. Solve resultingequations

𝐿𝐿 πœƒ = ? log 𝑓 𝑋#|𝑝"


2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0


Page 68: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with BernoulliConsider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.β€’ Let 𝑋#~Ber 𝑝 .

What is πœƒ)*+ = 𝑝)*+?


What is the PMF 𝑓 𝑋$|𝑝 ?A. 𝑝B. 1 βˆ’ 𝑝

C. W𝑝 if 𝑋$ = 11 βˆ’ 𝑝 if 𝑋$ = 0

D. 𝑝7! 1 βˆ’ 𝑝 !87! where 𝑋$ ∈ {0,1}

1. Determine formula for 𝐿𝐿 πœƒ

3. Solve resultingequations

𝐿𝐿 πœƒ = ? log 𝑓 𝑋#|𝑝"


2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0

Page 69: 20: Maximum Likelihood Estimationweb.stanford.edu/.../20_mle_blank-announcements.pdf5. / = 12 +4 11 0 = 1 = Bernoulli model, parameter 0 = 0.2.! Lisa Yan, CS109, 2020 What are parameters?

Lisa Yan, CS109, 2020

Maximum Likelihood with BernoulliConsider a sample of 𝑛 i.i.d. random variables 𝑋!, 𝑋", … , 𝑋#.β€’ Let 𝑋#~Ber 𝑝 .

What is πœƒ)*+ = 𝑝)*+?


β€’ Is differentiableβ€’ Valid PMF over

discrete domain

What is the PMF 𝑓 𝑋$|𝑝 ?A. 𝑝B. 1 βˆ’ 𝑝

C. W𝑝 if 𝑋$ = 11 βˆ’ 𝑝 if 𝑋$ = 0

D. 𝑝7! 1 βˆ’ 𝑝 !87! where 𝑋$ ∈ {0,1}

1. Determine formula for 𝐿𝐿 πœƒ

3. Solve resultingequations

𝐿𝐿 πœƒ = ? log 𝑓 𝑋#|𝑝"


2. Differentiate 𝐿𝐿 πœƒw.r.t. (each) πœƒ, set to 0