Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB...
Transcript of Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB...
![Page 1: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/1.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Consistency of Variational Inference
Badr-Eddine Chérief-AbdellatifUnder the supervision of Pierre Alquier
Second-year PhD students team daysEcole Doctorale Mathématiques Hadamard
20 May 2019
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 2: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/2.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Outline of the talk
1 Tempered Variational BayesTempered posteriorsVariational Bayes
2 ELBO maximizationMixture modelsModel selection
3 Consistency of VBTheoretical resultsEfficient algorithms
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 3: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/3.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Tempered posteriorsVariational Bayes
1 Tempered Variational BayesTempered posteriorsVariational Bayes
2 ELBO maximizationMixture modelsModel selection
3 Consistency of VBTheoretical resultsEfficient algorithms
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 4: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/4.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Tempered posteriorsVariational Bayes
Notations
Assume that we observe X1, . . . , Xn i.i.d from P0 in a modelM = {Pθ, θ ∈ Θ} associated with a likelihood Ln. We define aprior π on Θ.
The posterior
πn(dθ) ∝ Ln(θ)π(dθ).
The tempered posterior - 0 < α < 1
πn,α(dθ) ∝ [Ln(θ)]απ(dθ).
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 5: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/5.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Tempered posteriorsVariational Bayes
Notations
Assume that we observe X1, . . . , Xn i.i.d from P0 in a modelM = {Pθ, θ ∈ Θ} associated with a likelihood Ln. We define aprior π on Θ.
The posterior
πn(dθ) ∝ Ln(θ)π(dθ).
The tempered posterior - 0 < α < 1
πn,α(dθ) ∝ [Ln(θ)]απ(dθ).
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 6: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/6.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Tempered posteriorsVariational Bayes
Notations
Assume that we observe X1, . . . , Xn i.i.d from P0 in a modelM = {Pθ, θ ∈ Θ} associated with a likelihood Ln. We define aprior π on Θ.
The posterior
πn(dθ) ∝ Ln(θ)π(dθ).
The tempered posterior - 0 < α < 1
πn,α(dθ) ∝ [Ln(θ)]απ(dθ).
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 7: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/7.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Tempered posteriorsVariational Bayes
Various reasons to use a tempered posterior
Easier to sample from
G. Behrens, N. Friel & M. Hurn. (2012). Tuning tempered transitions. Statistics and Computing.
Robust to model misspecification
P. Grünwald and T. Van Ommen (2017). Inconsistency of Bayesian Inference for MisspecifiedLinear Models, and a Proposal for Repairing It. Bayesian Analysis.
Theoretical analysis easier
A. Bhattacharya, D. Pati & Y. Yang (2016). Bayesian fractional posteriors. Preprintarxiv :1611.01125.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 8: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/8.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Tempered posteriorsVariational Bayes
Various reasons to use a tempered posterior
Easier to sample from
G. Behrens, N. Friel & M. Hurn. (2012). Tuning tempered transitions. Statistics and Computing.
Robust to model misspecification
P. Grünwald and T. Van Ommen (2017). Inconsistency of Bayesian Inference for MisspecifiedLinear Models, and a Proposal for Repairing It. Bayesian Analysis.
Theoretical analysis easier
A. Bhattacharya, D. Pati & Y. Yang (2016). Bayesian fractional posteriors. Preprintarxiv :1611.01125.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 9: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/9.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Tempered posteriorsVariational Bayes
Various reasons to use a tempered posterior
Easier to sample from
G. Behrens, N. Friel & M. Hurn. (2012). Tuning tempered transitions. Statistics and Computing.
Robust to model misspecification
P. Grünwald and T. Van Ommen (2017). Inconsistency of Bayesian Inference for MisspecifiedLinear Models, and a Proposal for Repairing It. Bayesian Analysis.
Theoretical analysis easier
A. Bhattacharya, D. Pati & Y. Yang (2016). Bayesian fractional posteriors. Preprintarxiv :1611.01125.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 10: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/10.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Tempered posteriorsVariational Bayes
1 Tempered Variational BayesTempered posteriorsVariational Bayes
2 ELBO maximizationMixture modelsModel selection
3 Consistency of VBTheoretical resultsEfficient algorithms
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 11: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/11.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Tempered posteriorsVariational Bayes
Variational Bayes
π̃n,α = arg minρ∈FK(ρ, πn,α)
= arg maxρ∈F
{α
∫`n(θ)ρ(dθ)−K(ρ, π)
}.
Examples :parametric approximation
F ={N (µ,Σ) : µ ∈ Rd ,Σ ∈ S+
d
}.
mean-field approximation, Θ = Θ1 ×Θ2 and
F = {ρ : ρ(dθ) = ρ1(dθ1)× ρ2(dθ2)} .
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 12: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/12.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Tempered posteriorsVariational Bayes
Variational Bayes
π̃n,α = arg minρ∈FK(ρ, πn,α)
= arg maxρ∈F
{α
∫`n(θ)ρ(dθ)−K(ρ, π)
}.
Examples :
parametric approximation
F ={N (µ,Σ) : µ ∈ Rd ,Σ ∈ S+
d
}.
mean-field approximation, Θ = Θ1 ×Θ2 and
F = {ρ : ρ(dθ) = ρ1(dθ1)× ρ2(dθ2)} .
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 13: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/13.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Tempered posteriorsVariational Bayes
Variational Bayes
π̃n,α = arg minρ∈FK(ρ, πn,α)
= arg maxρ∈F
{α
∫`n(θ)ρ(dθ)−K(ρ, π)
}.
Examples :parametric approximation
F ={N (µ,Σ) : µ ∈ Rd ,Σ ∈ S+
d
}.
mean-field approximation, Θ = Θ1 ×Θ2 and
F = {ρ : ρ(dθ) = ρ1(dθ1)× ρ2(dθ2)} .
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 14: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/14.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Tempered posteriorsVariational Bayes
Variational Bayes
π̃n,α = arg minρ∈FK(ρ, πn,α)
= arg maxρ∈F
{α
∫`n(θ)ρ(dθ)−K(ρ, π)
}.
Examples :parametric approximation
F ={N (µ,Σ) : µ ∈ Rd ,Σ ∈ S+
d
}.
mean-field approximation, Θ = Θ1 ×Θ2 and
F = {ρ : ρ(dθ) = ρ1(dθ1)× ρ2(dθ2)} .
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 15: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/15.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
1 Tempered Variational BayesTempered posteriorsVariational Bayes
2 ELBO maximizationMixture modelsModel selection
3 Consistency of VBTheoretical resultsEfficient algorithms
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 16: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/16.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
Mixture models
Mixture models
Pθ = Pp,θ1,...,θK =∑K
j=1 pjqθj ,
prior π : p = (p1, . . . , pK ) ∼ πp = D(α1, . . . , αK ) and theθj ’s are independent from πθ.
Tempered posterior :
Ln(θ)απ(θ) ∝
(n∏
i=1
K∑j=1
pjqθj (Xi)
)α
πp(p)K∏j=1
πθ(θj).
Variational approximation :
π̃n,α(p, θ) = ρp(p)K∏j=1
ρj(θj).
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 17: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/17.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
Mixture models
Mixture models
Pθ = Pp,θ1,...,θK =∑K
j=1 pjqθj ,prior π : p = (p1, . . . , pK ) ∼ πp = D(α1, . . . , αK ) and theθj ’s are independent from πθ.
Tempered posterior :
Ln(θ)απ(θ) ∝
(n∏
i=1
K∑j=1
pjqθj (Xi)
)α
πp(p)K∏j=1
πθ(θj).
Variational approximation :
π̃n,α(p, θ) = ρp(p)K∏j=1
ρj(θj).
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 18: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/18.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
Mixture models
Mixture models
Pθ = Pp,θ1,...,θK =∑K
j=1 pjqθj ,prior π : p = (p1, . . . , pK ) ∼ πp = D(α1, . . . , αK ) and theθj ’s are independent from πθ.
Tempered posterior :
Ln(θ)απ(θ) ∝
(n∏
i=1
K∑j=1
pjqθj (Xi)
)α
πp(p)K∏j=1
πθ(θj).
Variational approximation :
π̃n,α(p, θ) = ρp(p)K∏j=1
ρj(θj).
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 19: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/19.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
Mixture models
Mixture models
Pθ = Pp,θ1,...,θK =∑K
j=1 pjqθj ,prior π : p = (p1, . . . , pK ) ∼ πp = D(α1, . . . , αK ) and theθj ’s are independent from πθ.
Tempered posterior :
Ln(θ)απ(θ) ∝
(n∏
i=1
K∑j=1
pjqθj (Xi)
)α
πp(p)K∏j=1
πθ(θj).
Variational approximation :
π̃n,α(p, θ) = ρp(p)K∏j=1
ρj(θj).
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 20: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/20.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
ELBO maximization for mixtures
Optimization program
minρ=(ρp ,ρ1,...,ρK )
{− α
n∑i=1
∫log
( K∑j=1
pjqθj (Xi)
)ρ(dθ)
+K(ρp, πp
)+
K∑j=1
K(ρj , πj
)}
− log
( K∑j=1
pjqθj (Xi)
)= min
ωi∈SK
{−
K∑j=1
ωij log(pjqθj (Xi))
+K∑j=1
ωij log(ωi
j )
}Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 21: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/21.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
Coordinate Descent algorithm
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 22: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/22.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
Numerical example on Gaussian mixtures
B.-E. Chérief-Abdellatif & P. Alquier (2018). Consistency of Variational Bayes Inference forEstimation and Model Selection in Mixtures. Electronic Journal of Statistics.
Gaussian mixture∑3
j=1 pjN (θj , 1) and Gaussian prior on θj .Sample size n = 1000, we report the MAE over 10 replications.
Algo. p θ1 θ2 θ3
VBα=0.5 0.03 (0.02) 0.14 (0.30) 0.38 (1.11) 0.05 (0.05)VBα=1 0.03 (0.02) 0.14 (0.21) 0.36 (0.97) 0.06 (0.04)EM 0.03 (0.02) 0.14 (0.22) 0.36 (0.97) 0.06 (0.05)
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 23: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/23.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
Numerical example on Gaussian mixtures
B.-E. Chérief-Abdellatif & P. Alquier (2018). Consistency of Variational Bayes Inference forEstimation and Model Selection in Mixtures. Electronic Journal of Statistics.
Gaussian mixture∑3
j=1 pjN (θj , 1) and Gaussian prior on θj .Sample size n = 1000, we report the MAE over 10 replications.
Algo. p θ1 θ2 θ3
VBα=0.5 0.03 (0.02) 0.14 (0.30) 0.38 (1.11) 0.05 (0.05)VBα=1 0.03 (0.02) 0.14 (0.21) 0.36 (0.97) 0.06 (0.04)EM 0.03 (0.02) 0.14 (0.22) 0.36 (0.97) 0.06 (0.05)
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 24: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/24.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
1 Tempered Variational BayesTempered posteriorsVariational Bayes
2 ELBO maximizationMixture modelsModel selection
3 Consistency of VBTheoretical resultsEfficient algorithms
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 25: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/25.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
Model selection
D. Blei, A. Kucukelbir & J. McAuliffe. Variational inference : A review for statisticians. JASA,2017.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 26: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/26.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
Model selection
Assume that we have a countable number of models, defineπ̃Kn,α a variational approximation of the tempered posterior in
model K :
ELBO maximization program
π̃Kn,α = arg max
ρK∈FK
{α
∫`n(θK )ρK (dθK )−K
(ρK , πK
)}ELBO
ELBO(K ) = α
∫`n(θK )π̃K
n,α(dθK )−K(π̃Kn,α, πK )
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 27: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/27.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
Model selection
Assume that we have a countable number of models, defineπ̃Kn,α a variational approximation of the tempered posterior in
model K :
ELBO maximization program
π̃Kn,α = arg max
ρK∈FK
{α
∫`n(θK )ρK (dθK )−K
(ρK , πK
)}
ELBO
ELBO(K ) = α
∫`n(θK )π̃K
n,α(dθK )−K(π̃Kn,α, πK )
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 28: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/28.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
Model selection
Assume that we have a countable number of models, defineπ̃Kn,α a variational approximation of the tempered posterior in
model K :
ELBO maximization program
π̃Kn,α = arg max
ρK∈FK
{α
∫`n(θK )ρK (dθK )−K
(ρK , πK
)}ELBO
ELBO(K ) = α
∫`n(θK )π̃K
n,α(dθK )−K(π̃Kn,α, πK )
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 29: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/29.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
ELBO criterion
Model selection criterion
K̂ = arg maxK≥1
{ELBO(K )− log
(1bK
)}
B.-E. Chérief-Abdellatif. Consistency of ELBO maximization for model selection. Proceedings ofAABI 2018.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 30: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/30.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Mixture modelsModel selection
ELBO criterion
Model selection criterion
K̂ = arg maxK≥1
{ELBO(K )− log
(1bK
)}
B.-E. Chérief-Abdellatif. Consistency of ELBO maximization for model selection. Proceedings ofAABI 2018.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 31: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/31.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
1 Tempered Variational BayesTempered posteriorsVariational Bayes
2 ELBO maximizationMixture modelsModel selection
3 Consistency of VBTheoretical resultsEfficient algorithms
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 32: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/32.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Technical condition for posterior concentrationIf the model is well-specified (∃θ0 ∈ Θ, Pθ0 = P0) :
Prior mass condition for concentration of tempered posteriorsThe rate (rn) is such that
π[B(rn)] ≥ e−nrn
where B(r) = {θ ∈ Θ : K(Pθ0 ,Pθ) ≤ r}.
Prior mass condition for concentration of Variational BayesThe rate (rn) is such that there exists ρn ∈ F such that∫
K(Pθ0 ,Pθ)ρn(dθ) ≤ rn, and K(ρn, π) ≤ nrn.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 33: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/33.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Technical condition for posterior concentrationIf the model is well-specified (∃θ0 ∈ Θ, Pθ0 = P0) :
Prior mass condition for concentration of tempered posteriorsThe rate (rn) is such that
π[B(rn)] ≥ e−nrn
where B(r) = {θ ∈ Θ : K(Pθ0 ,Pθ) ≤ r}.
Prior mass condition for concentration of Variational BayesThe rate (rn) is such that there exists ρn ∈ F such that∫
K(Pθ0 ,Pθ)ρn(dθ) ≤ rn, and K(ρn, π) ≤ nrn.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 34: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/34.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Technical condition for posterior concentrationIf the model is well-specified (∃θ0 ∈ Θ, Pθ0 = P0) :
Prior mass condition for concentration of tempered posteriorsThe rate (rn) is such that
π[B(rn)] ≥ e−nrn
where B(r) = {θ ∈ Θ : K(Pθ0 ,Pθ) ≤ r}.
Prior mass condition for concentration of Variational BayesThe rate (rn) is such that there exists ρn ∈ F such that∫
K(Pθ0 ,Pθ)ρn(dθ) ≤ rn, and K(ρn, π) ≤ nrn.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 35: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/35.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Consistency of the variational approximation
P. Alquier & J. Ridgway (2017). Concentration of Tempered Posteriors and of their VariationalApproximations. Preprint arxiv :1706.09293.
Theorem (Alquier, Ridgway)
Under the prior mass condition, for any α ∈ (0, 1),
E[ ∫
Dα(Pθ,P0)π̃n,α(dθ)
]≤ 1 + α
1− αrn.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 36: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/36.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Consistency for mixture models
B.-E. Chérief-Abdellatif, P. Alquier. Consistency of Variational Bayes Inference for Estimation andModel Selection in Mixtures. Electronic Journal of Statistics, 2018.
Theorem (C.-A., Alquier)
Chose 2K≤ αj ≤ 1 and assume that estimation in (qθ)
(without mixture) at rate rn. Then
E[∫
Dα(Pp,θ1,...,θK ,Pp0,θ01 ,...,θ0K
)π̃n,α(dθ)
]≤ 1 + α
1− α2Krn.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 37: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/37.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Consistency of the true approximation
P. Alquier & J. Ridgway (2017). Concentration of Tempered Posteriors and of their VariationalApproximations. Preprint arxiv :1706.09293.
Theorem (Alquier, Ridgway)
If there is a true model (∃K0, ∃θ0 ∈ ΘK0 , Pθ0 = P0), thenunder the prior mass condition, for any α ∈ (0, 1),
E[ ∫
Dα(Pθ,P0)π̃K0
n,α(dθ)
]≤ 1 + α
1− αrn.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 38: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/38.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Consistency of the selected approximation
B.-E. Chérief-Abdellatif. Consistency of ELBO maximization for model selection. Proceedings ofAABI 2018.
Theorem (C.-A.)
If there is a true model (∃K0, ∃θ0 ∈ ΘK0 , Pθ0 = P0), thenunder the prior mass condition, for any α ∈ (0, 1),
E[ ∫
Dα(Pθ,P0)π̃K̂
n,α(dθ)
]≤ 1 + α
1− αrn +
log( 1bK0
)
n(1− α).
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 39: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/39.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Robustness to misspecification (Gaussian mixtures)The true distribution P0 is such that E|X | < +∞.Let L ≥ 1, bK = 2−K , πK = DK (α1, . . . , αK )
⊗N (0,V2)⊗n and
rn,K =
[8K log(nK )
n
∨(8K log(nV)
n+
8KL2
nV2
)]+
K log(2)
n(1− α).
TheoremFor any α ∈ (0, 1),
E[ ∫
Dα
(Pθ,P
0)π̃K̂n,α(dθ)
]≤ inf
K≥0
{α
1− αinf
θ∗∈SK×[−L,L]KK(P0,Pθ∗) +
1 + α
1− αrn,K
}.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 40: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/40.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Applications
If there is a true model (∃K0, ∃θ0 ∈ ΘK0 , Pθ0 = P0) :
Gaussian mixtures : θ = (p, (m1, σ21), ..., (mK , σ
2K ))
E[ ∫
Dα(Pθ,P0)π̃K̂
n,α(dθ)
]= O
(K0 log(nK0)
n
)
Probabilistic PCA : θ ∈ Rd×K
E[ ∫
Dα(Pθ,P0)π̃K̂
n,α(dθ)
]= O
(dK0 log(dn)
n
)
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 41: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/41.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Applications
If there is a true model (∃K0, ∃θ0 ∈ ΘK0 , Pθ0 = P0) :
Gaussian mixtures : θ = (p, (m1, σ21), ..., (mK , σ
2K ))
E[ ∫
Dα(Pθ,P0)π̃K̂
n,α(dθ)
]= O
(K0 log(nK0)
n
)
Probabilistic PCA : θ ∈ Rd×K
E[ ∫
Dα(Pθ,P0)π̃K̂
n,α(dθ)
]= O
(dK0 log(dn)
n
)
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 42: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/42.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Applications
If there is a true model (∃K0, ∃θ0 ∈ ΘK0 , Pθ0 = P0) :
Gaussian mixtures : θ = (p, (m1, σ21), ..., (mK , σ
2K ))
E[ ∫
Dα(Pθ,P0)π̃K̂
n,α(dθ)
]= O
(K0 log(nK0)
n
)
Probabilistic PCA : θ ∈ Rd×K
E[ ∫
Dα(Pθ,P0)π̃K̂
n,α(dθ)
]= O
(dK0 log(dn)
n
)
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 43: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/43.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
1 Tempered Variational BayesTempered posteriorsVariational Bayes
2 ELBO maximizationMixture modelsModel selection
3 Consistency of VBTheoretical resultsEfficient algorithms
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 44: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/44.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Efficient algorithms
QuestionAre there efficient algorithms to (provably) compute π̃n,α ?
B.-E. Chérief-Abdellatif, P. Alquier & M. E. Khan. A Generalization Bound for Online VariationalInference. Preprint arXiv, 2018.
Parametric variational approximation :
F = {qµ, µ ∈ M} .
Objective : propose a way to update µt → µt+1 so that qµtleads to similar performances as the tempered posterior...
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 45: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/45.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Efficient algorithms
QuestionAre there efficient algorithms to (provably) compute π̃n,α ?
B.-E. Chérief-Abdellatif, P. Alquier & M. E. Khan. A Generalization Bound for Online VariationalInference. Preprint arXiv, 2018.
Parametric variational approximation :
F = {qµ, µ ∈ M} .
Objective : propose a way to update µt → µt+1 so that qµtleads to similar performances as the tempered posterior...
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 46: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/46.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Efficient algorithms
QuestionAre there efficient algorithms to (provably) compute π̃n,α ?
B.-E. Chérief-Abdellatif, P. Alquier & M. E. Khan. A Generalization Bound for Online VariationalInference. Preprint arXiv, 2018.
Parametric variational approximation :
F = {qµ, µ ∈ M} .
Objective : propose a way to update µt → µt+1 so that qµtleads to similar performances as the tempered posterior...
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 47: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/47.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Some online strategiesAlgorithm 3 SVA (Sequential Variational Approximation)1: for t = 1, 2, . . . do2: θt = Eθ∼qµt
[θ],3: xt revealed, update
µt+1 = arg minµ∈M
[µT∇µ
t∑i=1
Eθ∼qµ[− log pθ(xi)] +K(qµ, π)
α
].
4: end for
SVB (Streaming Variational Bayes) has update
µt+1 = arg minµ∈M
[µT∇µEθ∼qµ[− log pθ(xt)] +
K(qµ, qµt )
α
].
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 48: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/48.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Some online strategiesAlgorithm 3 SVA (Sequential Variational Approximation)1: for t = 1, 2, . . . do2: θt = Eθ∼qµt
[θ],3: xt revealed, update
µt+1 = arg minµ∈M
[µT∇µ
t∑i=1
Eθ∼qµ[− log pθ(xi)] +K(qµ, π)
α
].
4: end for
SVB (Streaming Variational Bayes) has update
µt+1 = arg minµ∈M
[µT∇µEθ∼qµ[− log pθ(xt)] +
K(qµ, qµt )
α
].
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 49: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/49.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
A regret bound for SVA
Theorem (C.-A., Alquier & Khan)
Assume that µ 7→ Eθ∼qµ[− log pθ(xt)] is L-Lipschitz andconvex.
Assume that µ 7→ K(pµ, π) is γ-strongly convex.Then SVA satisfies :
T∑t=1
[− log pθt (xt)]
≤ infµ∈M
{Eθ∼qµ
[T∑t=1
[− log pθ(xt)]
]+αL2T
γ+K(qµ, π)
α
}.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 50: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/50.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
A regret bound for SVA
Theorem (C.-A., Alquier & Khan)
Assume that µ 7→ Eθ∼qµ[− log pθ(xt)] is L-Lipschitz andconvex. (this is for example the case as soon as thelog-likelihood is concave in θ and L-Lipschitz, and µ is alocation-scale parameter).
Assume that µ 7→ K(pµ, π) isγ-strongly convex. Then SVA satisfies :
T∑t=1
[− log pθt (xt)]
≤ infµ∈M
{Eθ∼qµ
[T∑t=1
[− log pθ(xt)]
]+αL2T
γ+K(qµ, π)
α
}.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 51: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/51.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
A regret bound for SVA
Theorem (C.-A., Alquier & Khan)
Assume that µ 7→ Eθ∼qµ[− log pθ(xt)] is L-Lipschitz andconvex. Assume that µ 7→ K(pµ, π) is γ-strongly convex.Then SVA satisfies :
T∑t=1
[− log pθt (xt)]
≤ infµ∈M
{Eθ∼qµ
[T∑t=1
[− log pθ(xt)]
]+αL2T
γ+K(qµ, π)
α
}.
Badr-Eddine Chérief-Abdellatif Consistency of variational inference
![Page 52: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision](https://reader035.fdocuments.us/reader035/viewer/2022071213/6029578cfe624d60d65bec7b/html5/thumbnails/52.jpg)
Tempered Variational BayesELBO maximizationConsistency of VB
Theoretical resultsEfficient algorithms
Thank you !
Badr-Eddine Chérief-Abdellatif Consistency of variational inference