A Note on PCVB0 for HDP-LDA
Click here to load reader
-
Upload
tomonari-masada -
Category
Engineering
-
view
121 -
download
3
description
Transcript of A Note on PCVB0 for HDP-LDA
A Note on PCVB0 for HDP-LDA
Tomonari MASADA @ Nagasaki University
August 22, 2014
This note gives a derivation of the variational posterior updates presented in the following paper:
Sato, Issei and Kurihara, Kenichi and Nakagawa, Hiroshi,Practical Collapsed Variational Bayes Inference for Hierarchical Dirichlet Process,in Proc. of KDD ’12.
A lower bound of the log of the evidence p(w) is obtained as follows:
ln p(w|α0, β0, γ0, τ ) = ln
∫ ∑z
p(z,w|α0, β0,π, τ )p(π|γ0)dπ
= ln
∫ ∑z
q(z)q(π)p(z,w|α0, β0,π, τ )p(π|γ0)
q(z)q(π)dπ
≥∫ ∑
z
q(z)q(π) lnp(z,w|α0, β0,π, τ )p(π|γ0)
q(z)q(π)dπ (Jensen’s inequality)
=
∫ ∑z
q(z)q(π) ln p(z,w|α0, β0,π, τ )dπ −∑z
q(z) ln q(z) +
∫q(π) ln p(π|γ0)dπ −
∫q(π) ln q(π)dπ .
(1)
Based on the proposed approximation, we can approximate the joint probability of z and w by
p(z,w|α0, β0,π, τ )
=
[ N∏d=1
Γ(α0)
Γ(α0 + nd)
T∏k=1
[Γ(nd,k)α0πk]I(nd,k>0)
][ T∏k=1
Γ(β0)
Γ(β0 + nk,·)
V∏v=1
[Γ(nk,v)β0τv]I(nk,v>0)
](2)
as in Eq. (21) of the paper.
The first term of the lower bound in Eq. (2) can be rewritten as follows:∑z
∫q(z)q(π) ln p(z,w|α0, β0,π, τ )dπ
=∑z
∫ { T∏k=1
q(π̃k)}q(z) ln
[ N∏d=1
Γ(α0)
Γ(α0 + nd)
T∏k=1
[Γ(nd,k)α0πk]I(nd,k>0)
]dπ̃1 · · · dπ̃T
+∑z
q(z) ln
[ T∏k=1
Γ(β0)
Γ(β0 + nk,·)
V∏v=1
[Γ(nk,v)β0τv]I(nk,v>0)
]. (3)
The first term of the right hand side in Eq. (3) can be rewritten as follows:∑z
∫ { T∏k=1
q(π̃k)}q(z) ln
[ N∏d=1
Γ(α0)
Γ(α0 + nd)
T∏k=1
[Γ(nd,k)α0πk]I(nd,k>0)
]dπ̃1 · · · dπ̃T
= N ln Γ(α0)−N∑
d=1
ln Γ(α0 + nd) +∑z
q(z) lnN∏
d=1
T∏k=1
[Γ(nd,k)]I(nd,k>0)
+∑z
∫ { T∏k=1
q(π̃k)}q(z) ln
[ N∏d=1
T∏k=1
[α0πk]I(nd,k>0)
]dπ̃1 · · · dπ̃T . (4)
1
The last term of the right hand side in Eq. (4) can be rewritten as follows:
∑z
∫ { T∏k=1
q(π̃k)}q(z) ln
[ N∏d=1
T∏k=1
[α0πk]I(nd,k>0)
]dπ̃1 · · · dπ̃T
=∑z
∫ { T∏k=1
q(π̃k)}q(z)
[ N∑d=1
T∑k=1
I(nd,k > 0)[lnα0 + lnπk]
]dπ̃1 · · · dπ̃T
= lnα0
∑z
q(z)
[ N∑d=1
T∑k=1
I(nd,k > 0)
]
+∑z
∫ { T∏k=1
q(π̃k)}q(z)
[ T∑k=1
{ N∑d=1
I(nd,k > 0)}lnπk
]dπ̃1 · · · dπ̃T
= lnα0
∑d
∑k
E[I(nd,k > 0)] +
∫ { T∏k=1
q(π̃k)}{∑
k
∑d
E[I(nd,k > 0)] lnπk
}dπ̃1 · · · dπ̃T , (5)
where E[I(nd,k > 0)] = 1−∏nd
i=1
{1−q(zd,i = k)
}. The second term in Eq. (5) can be rewritten as follows:
∫ { T∏k=1
q(π̃k)}{∑
k
∑d
E[I(nd,k > 0)] lnπk
}dπ̃1 · · · dπ̃T
=
∫ { T∏k=1
q(π̃k)}[∑
k
∑d
E[I(nd,k > 0)] ln{π̃k
k−1∏l=1
(1− π̃l)}]dπ̃1 · · · dπ̃T
=T−1∑k=1
∫q(π̃k)
[{∑d
E[I(nd,k > 0)]}ln π̃k +
{ T∑l=k+1
∑d
E[I(nd,l > 0)]}ln(1− π̃k)
]dπ̃k
=
T−1∑k=1
∫q(π̃k)
{∑d
E[I(nd,k > 0)]}ln π̃kdπ̃k +
T−1∑k=1
∫q(π̃k)
{ T∑l=k+1
∑d
E[I(nd,l > 0)]}ln(1− π̃k)dπ̃k ,
(6)
where it should be noted that π̃T ≡ 1.
The second term of the right hand side in Eq. (3) can be rewritten as follows:
∑z
q(z) ln
[ T∏k=1
Γ(β0)
Γ(β0 + nk,·)
V∏v=1
[Γ(nk,v)β0τv]I(nk,v>0)
]
= T ln Γ(β0)−∑z
q(z)
{ T∑k=1
ln Γ(β0 + nk,·)
}+∑z
q(z)
[ T∑k=1
V∑v=1
I(nk,v > 0) ln{Γ(nk,v)β0τv
}]. (7)
The last term of the right hand side in Eq. (7) can be rewritten as follows:
∑z
q(z)
[ T∑k=1
V∑v=1
I(nk,v > 0) ln{Γ(nk,v)β0τv
}]
= lnβ0
T∑k=1
V∑v=1
E[I(nk,v > 0)] +V∑
v=1
ln τv
T∑k=1
E[I(nk,v > 0)] +∑z
q(z)
[ T∑k=1
V∑v=1
I(nk,v > 0) ln Γ(nk,v)
].
(8)
2
The third term of the lower bound in Eq. (2) can be rewritten as follows:∫q(π) ln p(π|γ0)dπ =
T−1∑k=1
∫q(π̃k) ln
{γ0(1− π̃k)γ0−1
}dπ̃k
= (T − 1) ln γ0 + (γ0 − 1)
T−1∑k=1
E[ln(1− π̃k)]
= (T − 1) ln γ0 + (γ0 − 1)
T−1∑k=1
{ψ(bk)− ψ(ak + bk)} , (9)
and the last term of the lower bound in Eq. (2) can be rewritten as follows:
−∫q(π) ln q(π)dπ = −
T=1∑k=1
∫q(π̃k) ln q(π̃k)dπ̃k . (10)
We would like to know the distribution of π̃k. For simplicity, we denote∑
d E[I(nd,k > 0)] and∑Tl=k+1
∑d E[I(nd,l > 0)] by R and S, respectively. We obtain the functional derivative of the sum of
Eq. (6) and Eq. (10) as follows:
δ
δq(π̃′k)
∫q(π̃k)
{R ln π̃k + S ln(1− π̃k)dπ̃k − ln q(π̃k)
}dπ̃k
= limϵ→0
1
ϵ
∫ ({q(π̃k) + ϵδ(π̃k − π̃′
k)}[R ln π̃k + S ln(1− π̃k)dπ̃k − ln
{q(π̃k) + ϵδ(π̃k − π̃′
k)}]
− q(π̃k)[R ln π̃k + S ln(1− π̃k)− ln q(π̃k)
])dπ̃k
= limϵ→0
1
ϵ
∫ [R ln π̃′
k + S ln(1− π̃′k)− ln q(π̃′
k)− q(π̃k){ln{q(π̃k) + ϵδ(π̃k − π̃′
k)}− ln q(π̃k)
}]dπ̃k
= limϵ→0
1
ϵ
∫ {R ln π̃′
k + S ln(1− π̃′k)− ln q(π̃′
k)− q(π̃k) lnq(π̃k) + ϵδ(π̃k − π̃′
k)
q(π̃k)
}dπ̃k
= limϵ→0
1
ϵ
∫ [R ln π̃′
k + S ln(1− π̃′k)− ln q(π̃′
k)− q(π̃k){ϵδ(π̃k − π̃′
k)
q(π̃k)+O(ϵ2)
}]dπ̃k
= R ln π̃′k + S ln(1− π̃′
k)− ln q(π̃′k)− 1 . (11)
Therefore, q(π̃k) ∝ π̃Rk (1 − π̃k)S . This implies that q(π̃k) is a density function of the Beta distribution.
We parametrize it as π̃k ∼ Beta(ak, bk) and obtain the following result:
ak = 1 +N∑
d=1
E[I(nd,k > 0)] , bk = 1 +T∑
l=k+1
N∑d=1
E[I(nd,l > 0)] . (12)
By the way, based on Eq. (1), we obtain the following conditional distribution:
p(zd,i = k|w, z−d,i, α0, β0,π, τ )
∝{I(n−d,i
d,k > 0)n−d,id,k + I(n−d,i
d,k = 0)α0πk
}I(n−d,ik,wd,i
> 0)n−d,ik,wd,i
+ I(n−d,ik,wd,i
= 0)β0τwd,i
n−d,ik + β0
. (13)
However, this does not lead to the variational posterior q(zd,i = k) given in the original paper (cf. Eq. (34)and Eq. (35)). When we adopt the conditional
p(zd,i = k|w,z−d,i, α0, β0,π, τ ) ∝(n−d,id,k + α0πk
)n−d,ik,wd,i
+ β0τwd,i
n−d,ik + β0
(14)
as usual and use E[πk] for approximating the posterior q(zd,i = k), we obtain the result given in the paper.
3
Let the lower bound in Eq. (2) be denoted as L. We differentiate L with respect to α0:
∂L
∂α0= Nψ(α0)−
N∑d=1
ψ(α0 + nd) +
∑d
∑k E[I(nd,k > 0)]
α0(15)
∂L∂α0
= 0 gives the following update:
α0 ←∑
d
∑k E[I(nd,k > 0)]∑
d ψ(α0 + nd)−Nψ(α0). (16)
In a similar manner, we obtain the following update:
β0 ←∑
k
∑v E[I(nk,v > 0)]∑
k ψ(β0 + nk,·)− Tψ(β0), (17)
where E[I(nk,v > 0)] = 1−∏
d
∏i I(wd,i = v)q(zd,i ̸= k).
For τv, we assume it is a multinomial parameter and obtain the following update:
τv =
∑k E[I(nk,v > 0)]∑
v
∑k E[I(nk,v > 0)]
. (18)
By differentiating Eq. (9) with respect to γ0, we obtain the following update:
γ0 =T − 1∑T−1
k=1 {ψ(ak + bk)− ψ(bk)}. (19)
4