LectureNotes for Bayesian methods in Recommendation system
-
Upload
xudong-sun -
Category
Education
-
view
172 -
download
0
Transcript of LectureNotes for Bayesian methods in Recommendation system
Bayesian method in Recommendation system
Bayesian method in Recommendation system
Lecturer: Xudong Sun,[email protected]
DSOR-AISBI
January 20, 2016
Bayesian method in Recommendation system
1 Preliminaries
Exponential family model
Su�cient Statistics
Bayesian formula and Conjugate Prior
statistical sampling
2 Probablistic Matrix Factorization
3 Bayesian Matrix factorization
4 Bayesian Factorization machine
Bayesian method in Recommendation system
Preliminaries
Exponential family model
Exponential family model of order 1
Exponential tilting:f (y ; θ) = es(y)θf0(y)∫ s(x)θ f0(x)dx
f0(x)isanuniformdistributionunderadefinedsupport
a valid θ is called a natural parameter
s(y) is a transform for the random variable, which de�ne the
order of the exponential family
cumulant-generating function κ(θ) = log∫
es(y)θf0(y)dy
Natural parameter set is Convex set N = {θ : κ(θ) <∞}Partition function P(y) = e−κ(θ) and moment generating
function E{ety} =∫
ety f (y)dy
Physics: If you have the partition function, you have
everything!
Bayesian method in Recommendation system
Preliminaries
Exponential family model
Exponential family model of order 1
Suppose s(y) = θ
f (y ; θ) = eyθf0(y)∫ yθ f0(x)dx
κ(θ) = log(eθ − 1)/θ,why?
f (y ; θ) = θeθy/(eθ − 1)
Exponential family model without tilting:
f (y ; θ) = es(y)θf0(y)
eκ(θ) = exp{s(y)θ − κ(θ) + logf0(y)}f (y ;ω) = exp{s(y)θ(w)− b(w) + c(y)}
s(y) is called a natual observation
θ is called a natual parameter
Bayesian method in Recommendation system
Preliminaries
Exponential family model
Case study for exponential family model of order 1
Find the natual observation and natual parameter for the following
distribution
exponential p.d.f. f (y ; w) = w−1exp(−y/w)
bionomial p.d.f. C rmπ
r (1− π)m−r
Poisson density p.d.f. λxe−lambda
x!
Bayesian method in Recommendation system
Preliminaries
Exponential family model
Exponential family model of order p
Exponential tilting:f (y ; θ) = esT (y)θf0(y)∫ sT (x)θ f0(x)dx∑
ci si (y) = 0 implies ci = 0
θ = [1, θ1(w), ...θp(w)]
f (y ; w) = exp{sT (y)θ(w)− b(w)}f0(y)
in case there is 1-1 mapping between ω and
θ,f (y ; w) = exp{sT (y)θ − κ(θ)}f0(y)
E s(Y ) = dκ(θ)dθ
E s(Y ) = dκ(θ)dθ
Bayesian method in Recommendation system
Preliminaries
Exponential family model
Case study for exponential family model of order p
Find the natual observation and natual parameter for the following
distribution
beta p.d.f. f (y ; w) = w−1exp(−y/w)
one dimension Gaussian f (y |µ, σ2) = 1
(2π)0.5σexp− 1
2σ2(y − u)2
Bayesian method in Recommendation system
Preliminaries
Su�cient Statistics
Su�cient Statistics
f (y ; θ) = g{s(y); θ}h(y) <=> fY |S(y |s; θ)
s(y) is called su�cient statistics
Y = [y1, y2, ..yn] is from observation
what su�cient statistics can do?
e.g. f (y ;λ) = λe−λy1λe−λy2 = λ2e(−λ(y1+y2)) × 1
Bayesian method in Recommendation system
Preliminaries
Su�cient Statistics
Su�cient Statistics for Exponential family model
i.i.d S =∑
si (y)
Bayesian method in Recommendation system
Preliminaries
Bayesian formula and Conjugate Prior
conjugate prior in Exponential family models
likelihood in exponential family model∏ni=1
f (yi |w) = exp{ST (y)θ(w)− nb(w)}prior π(w) = exp{εT θ(w)− vb(w) + c(ε, v)}posterior p(w |D) = partition × p(D|w)× π(w) =exp{(ε+ s)T θ(w)− (v + n)b(w)}
Bayesian method in Recommendation system
Preliminaries
Bayesian formula and Conjugate Prior
Posterior = prior × likelihood
If prior is conjugate for the likelihood,then posterior is also the
same form as the prior
Bayesian method in Recommendation system
Preliminaries
Bayesian formula and Conjugate Prior
Multivariant Gaussian
p(x1, x2, ...xn) =1√2π|Σ|
exp{−1/2(x − µ)TΣ−1(x − u)} (1)
variance cov(x , x) = E{(x − E{x})covariance cov(x , y) = E{(x − E{x})(y − E{y})}question: how do you estimate the covariance of Gaussian
given n samples?
Cholksky decomposition
Bayesian method in Recommendation system
Preliminaries
Bayesian formula and Conjugate Prior
Wishart distribution
W (Λ|W0, ν0(freedom)) = 1
Z |Λ|(ν0−D−1)/2exp(−1
2Tr(W−1
0Λ))
Bayesian method in Recommendation system
Preliminaries
statistical sampling
Metropolis-Hastings
algorithm,Ak(z∗, z(τ)) = min(1,˜p(z∗)q(z(τ)|z∗)˜p(z(τ))q(z∗|z(τ))
) where q(z∗|z(τ))is
the proposal distribution of the new sample z∗conditioned on the
current samplez(τ).
Gibbs sampler: sample conditional distribution each time
Convergence:Detailed balancep(z∗)will be reached if we discard the
�rst bunch of samples.
Bayesian method in Recommendation system
Probablistic Matrix Factorization
σU
σV
σR
U
V
Rij
UserItems
Figure: graph model for probablistic matrix factorization
Bayesian method in Recommendation system
Probablistic Matrix Factorization
Point estimate of MAP
erri ,j = Rij − UTi Vj
∂E
∂Ui=
Nitem∑j=1
Ii ,j(erri ,j)(−Vj) + λUUi (2)
∂E
∂Vj=
Nu∑i=1
Ii ,j(errj)(−Ui ) + λV Vj (3)
Bayesian method in Recommendation system
Bayesian Matrix factorization
Wishart distribution
W (Λ|W0, ν0(freedom)) = 1
Z |Λ|(ν0−D−1)/2exp(−1
2Tr(W−1
0Λ))
Bayesian method in Recommendation system
Bayesian Matrix factorization
p(ΘU(µU , Λ)|Θ0) = N(µU |µ0, (β0ΛU)−1)W (ΛU |W0, ν0(freedom))
Bayesian method in Recommendation system
Bayesian Matrix factorization
p(R,U,V ,ΘU ,ΘV ,Θ0) =p(R|U,V , α)p(U|µU , ΛU)p(V |µV , ΛV )p(ΘU |Θ0)p(ΘV |Θ0)
Bayesian method in Recommendation system
Bayesian Matrix factorization
sampling on U and V
p(Ui |U−i ,R,V ,ΘU ,ΘV ,Θ0(α)) =p(R|U,V ,α)p(U|µU ,ΛU)[p(V |µV ,ΛV )p(ΘU |Θ0)p(ΘV |Θ0)]|R,U−i ,V ,ΘU ,ΘV ,Θ0=Const.
p(R,U−i ,V ,ΘU ,α)|R,U,V ,ΘV ,Θ0=Const.=
[∏
N(Rij | < Ui ,Vj >,α−1)]I (i ,j)p(Ui |µU , ΛU) = [
∏i ,jN(Rij | <
Ui ,Vj >,α−1)]I (i ,j)N(U−i |µU , ΛU)
Bayesian method in Recommendation system
Bayesian Matrix factorization
sampling on hyperparameter
p[(ΘU(µU , ΛU)|R,U,V ,ΘV , α] =p(R|U,V ,α)p(U|µU ,ΛU)p(V |µV ,ΛV )p(ΘU |Θ0)p(ΘV |Θ0)|R,U,V ,ΘV ,Θ0=Const.
p(R,U,V ,ΘV ,α)|R,U,V ,ΘV ,Θ0=Const.(denominator−is−const) =
[∏
N(Rij | < Ui ,Vj >,α−1)]I (i ,j)p(Ui |µU , ΛU)p(V |µV , ΛV )p(ΘV |Θ0)|R,U,V ,ΘV ,Θ0=Const. ={[∏i ,jN(Rij | < Ui ,Vj >
,α−1)]I (i ,j)N(U|µU , ΛU)N(V |µV , ΛV ){N(µV |µ0, (β0ΛV )−1)...×W (ΛV |W0, freedom)}|R,U,V ,ΘV ,Θ0=Const.}|constant ....×{N(µU |µ0, (β0ΛU)−1)W (ΛU |W0, freedom)} =N(µU |µ∗0, (β∗ΛU)−1)W (ΛU |W ∗
0, v∗
0)