Maximum likelihood estimates
-
Upload
yardley-carver -
Category
Documents
-
view
39 -
download
3
description
Transcript of Maximum likelihood estimates
![Page 1: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/1.jpg)
Maximum likelihood estimates
What are they and why do we care?
Relationship to AIC and other model selection criteria
![Page 2: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/2.jpg)
Maximum Likelihood Estimates (MLE)
Given a model () MLE is (are) the value(s) that are most likely to estimate the parameter(s) of interest.
That is, they maximize the probability of the model given the data.
The likelihood of a model is the product of the probabilities of the observations.
![Page 3: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/3.jpg)
Maximum Likelihood Estimation
For linear models (e.g., ANOVA and regression) these are usually determined using the linear equations which minimize the sum of the squared residuals – closed form
For nonlinear models and some distributions we determine MLEs setting the first derivative equal to zero and then making sure it is a maxima by setting the second derivative equal to zero – closed form.
Or we can search for values that maximize the probabilities of all of the observations – numerical estimation.
Search stops when certain criteria are met: Precision of the estimate Change in the likelihood Solution seems unlikely (stops after n iterations)
![Page 4: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/4.jpg)
Binomial probability
Some theory and math An example Assumptions Adding a link function Additional assumptions about bs
![Page 5: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/5.jpg)
Binomial Sampling
Characterized by two mutually exclusive events Heads or tails On or off Dead or alive Used or not used, or Occupied or not occupied.
Often referred to as Bernoulli trials
![Page 6: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/6.jpg)
Models
Trials have an associated parameter p p = probability of success. 1-p = probability of failure ( = q) p + q = 1
p also represents a model Single parameter p is equal for every trial
![Page 7: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/7.jpg)
Binomial Sampling
p is a continuous variable between 0 and 1 (0 <p <1)
y is the number of successful outcomes n is the number of trials.
This estimator is unbiased.
nyp ˆ
npyE )( .
n
qp
nqpp
npqp
pnpnpqy
ˆˆ)pe(s
ˆˆ)ˆar(v
)ˆvar(
)1()var(
![Page 8: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/8.jpg)
Binomial Probability Function
The probability of observing y successes given n trials with the underlying probability p is ...
Example: 10 flips of a fair coin (p = 0.5), 7 of which turn up heads is written
yny ppy
npnyf
)1(,|
7107 )5.01(5.07
105.0,10|7
f
![Page 9: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/9.jpg)
Binomial Probability Function (2)
1172.0
5.05.0120
)1()!(!
!5.0,10|7
37
yny ppyny
nf
evaluated numerically:
In Excel:
=BINOMDIST(y, n, p, FALSE)
![Page 10: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/10.jpg)
Binomial Probability Function (3)
yny ppyny
nyf
)1(
)!(!
!5.0,10|
00.10.20.30.40.50.60.70.80.9
1
0 1 2 3 4 5 6 7 8 9 10
Probability
y
n 10
y
p 0.5
y BINPROB0 0.00101 0.00982 0.04393 0.11724 0.20515 0.24616 0.20517 0.11728 0.04399 0.0098
10 0.0010
![Page 11: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/11.jpg)
Reality: have data (n and y) don’t know the model (p) leads us to the likelihood function:
read the likelihood of p given n and y is ... not a probability function. is a positive function (0 < p < 1)
Likelihood Function of Binomial Probability
yny ppy
nynp
)1(,|L
![Page 12: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/12.jpg)
Likelihood Function of Binomial Probability(2)
Alternatively, the likelihood of the data given the model can be thought of as the product of the probabilities of the individual observations.
The probability of the observations is:
Therefore,
n
i
ff ppynp1
1)1(,|L
ffi ppyP 1)1()( f = 1 for success,
f = 0 for failure
![Page 13: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/13.jpg)
Binomial Probability Function and it's likelihood
p Likelihood
0.00 0.00000000 0.05 0.00000000 0.10 0.00000007 0.15 0.00000105 0.20 0.00000655 0.25 0.00002575 0.30 0.00007501 0.35 0.00017669 0.40 0.00035389 0.45 0.00062169 0.50 0.00097656 0.55 0.00138732 0.60 0.00179159 0.65 0.00210183 0.70 0.00222357 0.75 0.00208569 0.80 0.00167772 0.85 0.00108195 0.90 0.00047830 0.95 0.00008729 1.00 0.00000000
0.000E+00
5.000E-04
1.000E-03
1.500E-03
2.000E-03
2.500E-03
Likelihood
p
7107 )1(7,10| pppL
maximum
![Page 14: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/14.jpg)
Log likelihood
Although the Likelihood function is useful, the log-likelihood has some desirable properties in that the terms are additive and the binomial coefficient does not include p.
)()1ln()ln(ln
lnln
ynppyy
n
yn,|pLL
![Page 15: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/15.jpg)
Log likelihood
Using the alternative:
The estimate of p that
maximizes the value of ln(L) is the MLE.
n
i
ff ppynp1
1 ))1(ln(,|ln L
![Page 16: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/16.jpg)
Precision
L(p|10,7) L(p|100,70)As n , precision , variance
![Page 17: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/17.jpg)
Properties of MLEs
Asymptotically normally distributed Asymptotically minimize variance Asymptotically unbiased as n → One-to-one transformations of MLEs are also
MLEs.
For example mean lifespan:
is also an MLE.
)ˆ(1/lnL S
![Page 18: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/18.jpg)
Assumptions:
n trials must be identical – i.e., the population is well defined (e.g.,20 coin flips, 50 Kirtland's warbler nests, 75 radio-marked black bears in the Pisgah Bear Sanctuary).
Each trial results in one of two mutually exclusive outcomes. (e.g., heads or tails, survived or died, successful or failed, etc.)
The probability of success on each trial remains constant. (homogeneous)
Trials are independent events (the outcome of one does not depend on the outcome of another).
y, the number of successes; is the random variable after n trials.
![Page 19: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/19.jpg)
Example – use/non-use survey
Selected 50 sites (n) at random (or systematically) with a study area.
Visit each site once and ‘surveyed’ for species x Species was detected at 10 sites (y)
Meet binomial assumptions: Sites selected without bias Surveys conducted using same methods Sites could only be used or not used (occupied) No knowledge of habitat differences or species
preferences Sites are independent
Additional assumption – perfect detection
![Page 20: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/20.jpg)
Example – calculating the likelihood
105010 )1(10,50| pppL
maximum
p Likelihood Variance SE0.00 0.00E+000.05 1.26E-14 0.00095 0.0308220.10 1.48E-12 0.0018 0.0424260.15 8.66E-12 0.00255 0.0504980.20 1.36E-11 0.0032 0.0565690.25 9.59E-12 0.00375 0.0612370.30 3.76E-12 0.0042 0.0648070.35 9.06E-13 0.00455 0.0674540.40 1.40E-13 0.0048 0.0692820.45 1.40E-14 0.00495 0.0703560.50 8.88E-16 0.005 0.0707110.55 3.41E-17 0.00495 0.0703560.60 7.31E-19 0.0048 0.0692820.65 7.80E-21 0.00455 0.0674540.70 3.43E-23 0.0042 0.0648070.75 4.66E-26 0.00375 0.0612370.80 1.18E-29 0.0032 0.0565690.85 2.18E-34 0.00255 0.0504980.90 3.49E-41 0.0018 0.0424260.95 5.45E-53 0.00095 0.0308221.00 0.00E+00 0 0
![Page 21: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/21.jpg)
Example – results
MLE = 20% + 6% of the area is occupied
![Page 22: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/22.jpg)
Link functions - adding covariates
“Link” the covariates, the data (X),with the response variable (i.e., use or occupancy)
Usually done with logit link:
Nice properties: Constrains result 0<pi<1
0 1 1
0 1 11
β x βi
i β x βi
ep
e
![Page 23: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/23.jpg)
Link functions - adding covariates
“Link” the covariates, the data (X),with the response variable (i.e., use or occupancy)
Usually done with logit link:
Nice properties: Constrains result 0<pi<1
s can be -∞ < < +∞
Additional assumption –s are normally distributed
0 1 1
0 1 1 11
β x β βi
i ββ x βi
e ep
ee
X
X
βX
![Page 24: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/24.jpg)
Link function
Binomial likelihood:
Substitute the link for p
Voila! – logistic regression
yny ppy
nynp
)1(,|L
yny
y
nynp
)exp(1)exp(
1)exp(1
)exp(,|
XX
XX
L
![Page 25: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/25.jpg)
Link function
More than one covariate can be included
Extend the logit (linear equation).
bs are the estimated parameters (effects);
estimated for each period or group
constrained to be equal using the data (xij).
![Page 26: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/26.jpg)
Link function
The use rates or real parameters of interest are calculated from the s as in this equation.
HUGE concept and applicable to EVERY estimator we examine.
Occupancy and detection probabilities are replaced by the link function submodel of the covariate(s).
Conceivably every sites has a different probability of use that is related to the value of the covariates.
0 1 1
0 1 1 11
β x β βi
i ββ x βi
e ep
ee
X
X
![Page 27: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/27.jpg)
Multinomial probability
An example
Adding a link function
![Page 28: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/28.jpg)
Multinomial Distribution and Likelihoods
Extension of the binomial coefficient with more than two possible mutually exclusive outcomes.
Nearly always introduced by way of die tossing.
Another example
Multiple presence/absence surveys at multiple sites
![Page 29: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/29.jpg)
Binomial Coefficient
The binomial coefficient was the number of ways y successes could be obtained from the n trials
Example 7 successes in 10 trials
!
!
n n
y y
! 3628800720
! 5040
n n
y y
![Page 30: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/30.jpg)
Multinomial coefficient
The multinomial coefficient or the number of possible outcomes for die tossing (6 possibilities):
Example rolling each die face once in 6 trials:
6
1
654321654321 !
!
!!!!!!
!
iiy
n
yyyyyy
nyyyyyy
n
61 2 3 4 5 6
1
! 6! 720720
1! 1! 1! 1! 1! 1! 1!i
i
n ny y y y y y
y
![Page 31: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/31.jpg)
Properties of multinomials
Dependency among the counts.
For example, if a die is thrown and it is not a 1, 2, 3, 4, or 5, then it must be a 6.
6
1i
i
y n
Face Number Variable
1 10 y1
2 11 y2
3 13 y3
4 9 y4
5 8 y5
6 9 y6
TOTAL 60 n
![Page 32: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/32.jpg)
Multinomial pdf
Probability an outcome or series of outcomes:
654321)|( ppppppy
npnyf
iii
6
1
1ii
p
![Page 33: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/33.jpg)
Die example 1
The probability of rolling a fair die (pi = 1/6) six
times (n) and turning up each face only once (ni =
1) is:
(1,1,1,1,1,1| 6 1 6,1 6,1 6,1 6,1 6,1 6)
6! 1 1 1 1 1 1
1! 1! 1! 1! 1! 1! 6 6 6 6 6 6
0.01543
f
![Page 34: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/34.jpg)
Die example 1
Dependency6
1
0.167 0.167 0.167
0.167 0.167 0.167
1
ii
p
![Page 35: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/35.jpg)
Example 2
Another example, the probability of rolling 2 – 2s , 3 – 3s, and 1 – 4 is:
6
(0,2,3,1,0,0 | 6 1 6,1 6,1 6,1 6,1 6,1 6)
6! 1 1 1 1 1 1
0! 2! 3! 1! 0! 0! 6 6 6 6 6 6
720 0.167
120.001286
f
![Page 36: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/36.jpg)
Likelihood
As you might have expected, the likelihood of the multinomial is of greater interest to us
We frequently have data (n, yi...m) and are seeking
to determine the model (pi…m). The likelihood for
our example with the die is:
3 5 61 2 41 2 3 4 5 6
1
( | )
i
y y yy y yi i
i
myi
ii
np n y p p p p p p
y
np
y
L
![Page 37: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/37.jpg)
Log-likelihood
This likelihood has all of the same properties we discussed for the binomial case.
Usually solve to maximize the ln(L)
1 1 2 2 3 3
1
ln( | ) ln ln ln ln ln
ln ln
i m m
m
i ii
nn y y p y p y p y p
y
ny p
y
ipL(
3 5 61 2 41 2 3 4 5 6
1
( | )
i
y y yy y yi i
i
myi
ii
np n y p p p p p p
y
np
y
L
![Page 38: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/38.jpg)
Log-likelihood
Ignoring the multinomial coefficient (constant)
1
ln( | ) ln
ln( )
m
i i ii
n y y p
data probabilities
ipL(
))|(ln( ii nypL
![Page 39: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/39.jpg)
Presence-absence surveys & multinomials
Procedure:
Select a sample of sites
Conduct repeated presence-absence surveys at each site
Usually temporal replication
Sometimes spatial replication
Record presence or absence of species during survey
![Page 40: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/40.jpg)
Encounter histories for each site & species
Encounter history matrix
Each row represents a site
Each column represents a sampling occasion.
On each occasion each species
‘1’ if encountered (captured)
‘0’ if not encountered.
Occasion
Site No. 1 2 3
211 0 0 1
212 0 0 1
213 0 1 0
214 1 0 0
215 1 0 1
216 1 1 0
217 1 1 1
218 0 0 1
![Page 41: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/41.jpg)
Encounter history - example
For sites sampled on 3 occasions there are 8 (=2m = 23) possible encounter histories
10 sites were sampled 3 times(not enough for a good estimate)
1 – Detected during survey
0 – Not-detected during survey
Separate encounter history for each species
yi Encounter History
1 1 0 0
2 1 0 1
0 1 1 0
1 1 1 1
1 0 1 0
0 0 1 1
3 0 0 1
2 0 0 0
10
![Page 42: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/42.jpg)
Encounter history - example
Each capture history is a possible outcome,
Analogous to one face of the die (ni).
Data consist of the number of times each capture history appears (yi).
yi Encounter History
1 1 0 0
2 1 0 1
0 1 1 0
1 1 1 1
1 0 1 0
0 0 1 1
3 0 0 1
2 0 0 0
10
![Page 43: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/43.jpg)
Encounter history - example
Each encounter history has an associated probability (pi)
Each pij can be different
yi Encounter History
1 1 0 0
2 1 0 1
0 1 1 0
1 1 1 1
1 0 1 0
0 0 1 1
3 0 0 1
2 0 0 0
10
![Page 44: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/44.jpg)
Log-likelihood example
Log-likelihood Calculate log of the probability of encounter
history (ln(Pi))
Multiply ln(Pi) by the number of times observed
(yi) Sum the products
![Page 45: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/45.jpg)
Link function in binomial
Binomial likelihood:
Substitute the link for p
Voila! – logistic regression
yny ppy
nynp
)1(,|L
yny
y
nynp
)exp(1)exp(
1)exp(1
)exp(,|
XX
XX
L
![Page 46: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/46.jpg)
Multinomial with link function
Substitute the logit link for the pi
m
iiii y
y
nnypL
1 )exp(1
)exp(lnln)|(ln(
X
X
![Page 47: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/47.jpg)
But wait a minute!
Is Pr(Occupancy) = Pr(Encounter)?
![Page 48: Maximum likelihood estimates](https://reader036.fdocuments.us/reader036/viewer/2022062301/56812fad550346895d95337c/html5/thumbnails/48.jpg)
Is Pr(Occupancy) = Pr(Encounter)?
Probability of encounter includes both detection and use (occupancy).
Occupancy analysis estimates each thus providing conditional estimates of use of sites.
yi Encounter History
1 1 0 0
2 1 0 1
0 1 1 0
1 1 1 1
1 0 1 0
0 0 1 1
3 0 0 1
2 0 0 0
10
Sites known to be used
Absentor
Not detected?