A derivation of the sampling formulas for An Entity-Topic Model for Entity Linking [Han+...
-
Upload
tomonari-masada -
Category
Data & Analytics
-
view
203 -
download
0
Transcript of A derivation of the sampling formulas for An Entity-Topic Model for Entity Linking [Han+...
A derivation of the sampling formulas for
An Entity-Topic Model for Entity Linking [Han+ EMNLP-CoNLL12]
and
A Context-Aware Topic Model for Statistical Machine Translation [Su+ ACL15]
Tomonari MASADA @ Nagasaki University
September 17, 2015
The full joint distribution is obtained as follows.
p(m,w, z, e,a,θ,ϕ,ψ, ξ|α,β,γ, ι)
=D∏
d=1
[p(md|ed,ψ)p(ed|zd,ϕ)p(zd|θd)p(wd|ad, ξ)p(ad|ed)
]
·D∏
d=1
p(θd|α) ·K∏
k=1
p(ϕk|β) ·T∏
t=1
p(ψ|γ) ·T∏
t=1
p(ξ|ι)
=D∏
d=1
[{ Md∏i=1
p(mdi|ψedi)p(edi|ϕzdi
)p(zdi|θd)}{ Nd∏
n=1
p(wdn|ξadn)p(adn|ed)
}]
·D∏
d=1
p(θd|α) ·K∏
k=1
p(ϕk|β) ·T∏
t=1
p(ψt|γ) ·T∏
t=1
p(ξt|ι)
=D∏
d=1
[{ Md∏i=1
K∏k=1
T∏t=1
(ψt,mdi
ϕk,tθd,k
)∆(zdi=k∧edi=t)}{ Nd∏n=1
T∏t=1
(ξt,wdn
∑Md
i=1 ∆(edi = t)
Md
)∆(adn=t)}]
·D∏
d=1
p(θd|α) ·K∏
k=1
p(ϕk|β) ·T∏
t=1
p(ψt|γ) ·T∏
t=1
p(ξt|ι)
=D∏
d=1
[{ Md∏i=1
T∏t=1
ψ∆(edi=t)t,mdi
}{ Md∏i=1
K∏k=1
T∏t=1
ϕ∆(zdi=k∧edi=t)k,t
}{ Md∏i=1
K∏k=1
θ∆(zdi=k)d,k
}{ Nd∏n=1
T∏t=1
ξ∆(adn=t)t,wdn
}]
·D∏
d=1
[{ Nd∏n=1
T∏t=1
(∑Md
i=1 ∆(edi = t)
Md
)∆(adn=t)}]·
D∏d=1
p(θd|α) ·K∏
k=1
p(ϕk|β) ·T∏
t=1
p(ψt|γ) ·T∏
t=1
p(ξt|ι)
=U∏
u=1
T∏t=1
ψCt,u
t,u ·K∏
k=1
T∏t=1
ϕCk,t
k,t ·D∏
d=1
K∏k=1
θCd,k
d,k ·T∏
t=1
V∏v=1
ξCt,v
t,v ·D∏
d=1
T∏t=1
(Md,t
Md
)Nd,t
·D∏
d=1
p(θd|α) ·K∏
k=1
p(ϕk|β) ·T∏
t=1
p(ψt|γ) ·T∏
t=1
p(ξt|ι) , (1)
where ∆(·) is 1 if the proposition in the parentheses is true and is 0 otherwise.
Nd,t and Md,t are defined as follows: Nd,t ≡∑Nd
n=1 ∆(adn = t); Md,t ≡∑Md
i=1 ∆(edi = t).
The Cs are defined as follows: Ct,u ≡∑D
d=1
∑Md
i=1 ∆(edi = t ∧mdi = u); Ck,t ≡∑D
d=1
∑Md
i=1 ∆(zdi =
k ∧ edi = t); Cd,k ≡∑Md
i=1 ∆(zdi = k); Ct,v ≡∑D
d=1
∑Nd
n=1 ∆(adn = t ∧ wdn = v).
1
We marginalize the multinomial parameters out.
p(m,w, z, e,a|α,β,γ, ι) =∫p(m,w, z, e,a,θ,ϕ,ψ, ξ|α,β,γ, ι)dθdϕdψdξ
=T∏
t=1
∏u Γ(Ct,u + γu)
Γ(Ct +∑
u γu)
Γ(∑
u γu)∏u Γ(γu)
·K∏
k=1
T∏t=1
∏t Γ(Ck,t + βt)
Γ(Ck +∑
t βt)
Γ(∑
t βt)∏t Γ(βt)
·D∏
d=1
K∏k=1
∏k Γ(Cd,k + αk)
Γ(Md +∑
k αk)
Γ(∑
k αk)∏k Γ(αk)
·T∏
t=1
V∏v=1
∏v Γ(Ct,v + ιv)
Γ(Ct +∑
v ιv)
Γ(∑
v ιv)∏v Γ(ιv)
·D∏
d=1
T∏t=1
(Md,t
Md
)Nd,t
(2)
We remove the ith mention in the dth document.
p(m−di,w, z−di, e−di,a|α,β,γ, ι)
=T∏
t=1
∏u Γ(C
−dit,u + γu)
Γ(C−dit +
∑u γu)
Γ(∑
u γu)∏u Γ(γu)
·K∏
k=1
T∏t=1
∏t Γ(C
−dik,t + βt)
Γ(C−dik +
∑t βt)
Γ(∑
t βt)∏t Γ(βt)
·D∏
d=1
K∏k=1
∏k Γ(C
−did,k + αk)
Γ(Md − 1 +∑
k αk)
Γ(∑
k αk)∏k Γ(αk)
·T∏
t=1
V∏v=1
∏v Γ(Ct,v + ιv)
Γ(Ct +∑
v ιv)
Γ(∑
v ιv)∏v Γ(ιv)
·D∏
d=1
T∏t=1
(M−di
d,t
Md − 1
)Nd,t
(3)
And add the mention of the same type with different latent variable values.
p(mdi, zdi = k, edi = t|m−di,w, z−di, e−di,a,α,β,γ, ι)
=p(mdi, zdi = k, edi = t,m−di,w, z−di, e−di,a|α,β,γ, ι)
p(m−di,w, z−di, e−di,a|α,β,γ, ι)
=Γ(C−di
t,mdi+ 1 + γmdi
)
Γ(C−dit + 1 +
∑u γu)
Γ(C−dit +
∑u γu)
Γ(C−dit,mdi
+ γmdi)·
Γ(C−dik,t + 1 + βt)
Γ(C−dik + 1 +
∑t βt)
Γ(C−dik +
∑t βt)
Γ(C−dik,t + βt)
·Γ(C−di
d,k + 1 + αk)
Γ(Md +∑
k αk)
Γ(Md − 1 +∑
k αk)
Γ(C−did,k + αk)
·(M−di
d,t + 1
Md
Md − 1
M−did,t
)Nd,t
=C−di
t,mdi+ γmdi
C−dit +
∑u γu
·C−di
k,t + βt
C−dik +
∑t βt
·C−di
d,k + αk
Md +∑
k αk·(M−di
d,t + 1
Md
Md − 1
M−did,t
)Nd,t
(4)
Therefore, zdi can be updated based on the following probabilities:
p(zdi = k|m,w,z−di, e,a,α,β,γ, ι)
=p(mdi, zdi = k, edi = t|m−di,w, z−di, e−di,a,α,β,γ, ι)∑Kk=1 p(mdi, zdi = k, edi = t|m−di,w, z−di, e−di,a,α,β,γ, ι)
=
[C−di
t,mdi+γmdi
C−dit +
∑u γu
· C−dik,t +βt
C−dik +
∑t βt
· C−did,k +αk
Md+∑
k αk·(
M−did,t +1
Md
Md−1
M−did,t
)Nd,t]
∑Kk=1
[C−di
t,mdi+γmdi
C−dit +
∑u γu
· C−dik,t +βt
C−dik +
∑t βt
· C−did,k +αk
Md+∑
k αk·(
M−did,t +1
Md
Md−1
M−did,t
)Nd,t]
∝C−di
k,t + βt
C−dik +
∑t βt
·C−di
d,k + αk
Md +∑
k αk(5)
Further, edi can be updated based on the following probabilities:
p(edi = t|m,w, z, e−di,a,α,β,γ, ι)
=p(mdi, zdi = k, edi = t|m−di,w, z−di, e−di,a,α,β,γ, ι)∑Tt=1 p(mdi, zdi = k, edi = t|m−di,w, z−di, e−di,a,α,β,γ, ι)
∝C−di
t,mdi+ γmdi
C−dit +
∑u γu
·C−di
k,t + βt
C−dik +
∑t βt
·(M−di
d,t + 1
M−did,t
)Nd,t
(6)
2
We remove the nth word token in the dth document.
p(m,w−dn, z, e,a−dn|α,β,γ, ι)
=
T∏t=1
∏u Γ(Ct,u + γu)
Γ(Ct +∑
u γu)
Γ(∑
u γu)∏u Γ(γu)
·K∏
k=1
T∏t=1
∏t Γ(Ck,t + βt)
Γ(Ck +∑
t βt)
Γ(∑
t βt)∏t Γ(βt)
·D∏
d=1
K∏k=1
∏k Γ(Cd,k + αk)
Γ(Md +∑
k αk)
Γ(∑
k αk)∏k Γ(αk)
·T∏
t=1
V∏v=1
∏v Γ(C
−dnt,v + ιv)
Γ(C−dnt +
∑v ιv)
Γ(∑
v ιv)∏v Γ(ιv)
·D∏
d=1
T∏t=1
(Md,t
Md
)N−dnd,t
(7)
And add the word token of the same word type with a different latent variable value.
p(wdn, adn = t|m,w−dn, z, e,a−dn,α,β,γ, ι)
=p(wdn, adn = t,m,w−dn, z, e,a−dn|α,β,γ, ι)
p(m,w−dn, z, e,a−dn|α,β,γ, ι)
=Γ(C−dn
t,wdn+ 1 + ιwdn
)
Γ(C−dnt + 1 +
∑v ιv)
Γ(C−dnt +
∑v ιv)
Γ(C−dnt,wdn
+ ιwdn)·(Md,t
Md
)N−dnd,t +1(
Md
Md,t
)N−dnd,t
=C−dn
t,wdn+ ιwdn
C−dnt +
∑v ιv
·(Md,t
Md
)(8)
Therefore, adn can be updated based on the following probabilities:
p(adn = t|m,w,z, e,a−dn,α,β,γ, ι)
p(wdn, adn = t|m,w−dn, z, e,a−dn,α,β,γ, ι)∑Tt=1 p(wdn, adn = t|m,w−dn, z, e,a−dn,α,β,γ, ι)
∝C−dn
t,wdn+ ιwdn
C−dnt +
∑v ιv
·(Md,t
Md
)(9)
3