Max Likelihood for BNs

10
Daphne Koller Parameter Estimation Max Likelihood for BNs Probabilistic Graphical Models Learning

description

Learning. Probabilistic Graphical Models. Parameter Estimation. Max Likelihood for BNs. MLE for Bayesian Networks. Parameters: Data instances: . X. Y. MLE for Bayesian Networks.  X. Parameters:. X.  Y|X. Y. Data d. MLE for Bayesian Networks. - PowerPoint PPT Presentation

Transcript of Max Likelihood for BNs

Page 1: Max Likelihood for BNs

Daphne Koller

Parameter Estimation

Max Likelihoodfor BNs

ProbabilisticGraphicalModels

Learning

Page 2: Max Likelihood for BNs

Daphne Koller

MLE for Bayesian Networks• Parameters:

• Data instances: <x[m],y[m]> X

Y

Y

X y0 y1

x0 0.95 0.05

x1 0.2 0.8

X

x0 x1

0.7 0.3

Page 3: Max Likelihood for BNs

Daphne Koller

MLE for Bayesian Networks• Parameters:

):][],[():(1

M

m

mymxPDL

):][|][():][(

):][|][():][(

1|

1

11M

mXY

M

mX

M

m

M

m

mxmyPmxP

mxmyPmxP

):][|][():][(1

M

m

mxmyPmxP

X

X

Y

Y|X

Data d

Page 4: Max Likelihood for BNs

Daphne Koller

MLE for Bayesian Networks• Likelihood for Bayesian network

if Xi|Ui are disjoint, then MLE can be computed by

maximizing each local likelihood separately

iii

i miii

m iiii

m

DL

mmxP

mmxP

mxPDL

):(

):][|][(

):][|][(

):][():(

U

U

Page 5: Max Likelihood for BNs

Daphne Koller

MLE for Table CPDs

uu

Uu

u][,][:

|,

):][|][(mxmxm

Xx

mmxP

][

],[

],'[

],[

'

| u

u

u

uu M

xM

xM

xM

x

x

):][|][():][|][(1

|1

M

mX

M

m

mmxPmmxP Uuu

uu

uu ][,][:

|, mxmxm

xx

],[

,|

u

uu

xM

xx

Page 6: Max Likelihood for BNs

Daphne Koller

MLE for Linear Gaussians

Page 7: Max Likelihood for BNs

Daphne Koller

Shared ParametersS(1)S(0) S(2) S(3)

S’|S

Page 8: Max Likelihood for BNs

Daphne Koller

Shared Parameters

S(1)

O(1)

S(0) S(2)

O(2)

S(3)

O(3)

S’|S

O’|S’

Page 9: Max Likelihood for BNs

Daphne Koller

Summary• For BN with disjoint sets of parameters in

CPDs, likelihood decomposes as product of local likelihood functions, one per variable

• For table CPDs, local likelihood further decomposes as product of likelihood for multinomials, one for each parent combination

• For networks with shared CPDs, sufficient statistics accumulate over all uses of CPD

Page 10: Max Likelihood for BNs

Daphne Koller

END END END