Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov...
Transcript of Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov...
![Page 1: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/1.jpg)
Deep Learning, Neural Networks and Kernel
Machines
Johan Suykens
KU Leuven, ESAT-STADIUSKasteelpark Arenberg 10, B-3001 Leuven, Belgium
Email: [email protected]://www.esat.kuleuven.be/stadius/
Deeplearn 2019, Warsaw Poland, July 2019
![Page 2: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/2.jpg)
Part II: RBMs, kernel machines and deep learning
• Restricted Boltzmann Machines (RBM)
• Deep Boltzmann Machines (Deep BM)
• Restricted Kernel Machines (RKM)
• Deep RKM (see Part III)
• Generative RKM
1
![Page 3: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/3.jpg)
Generative models: RBM, GAN and deep learning
1
![Page 4: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/4.jpg)
Restricted Boltzmann Machines (RBM)
• Markov random field, bipartite graph, stochastic binary unitsLayer of visible units v and layer of hidden units hNo hidden-to-hidden connections
• Energy:
E(v, h; θ) = −vTWh− bTv − aTh with θ = W, b, a
Joint distribution:
P (v, h; θ) =1
Z(θ)exp(−E(v, h; θ))
with partition function Z(θ) =∑
v
∑
h exp(−E(v, h; θ))
[Hinton, Osindero, Teh, Neural Computation 2006]
2
![Page 5: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/5.jpg)
RBM and deep learning
RBM
p(v, h) p(v, h1, h2, h3, ...)
[Hinton et al., 2006; Salakhutdinov, 2015]
3
![Page 6: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/6.jpg)
Convolutional Deep Belief Networks
Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief
Networks [Lee et al. 2011]
4
![Page 7: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/7.jpg)
Energy function
• RBM:E = −vTWh
• Deep Boltzmann machine (two layers):
E = −vTW 1h1 − h1TW 2h2
• Deep Boltzmann machine (three layers):
E = −vTW 1h1 − h1TW 2h2 − h2
TW 3h3
5
![Page 8: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/8.jpg)
RBM: example on MNIST
MNIST training data:
Generating new images:
source: https://www.kaggle.com/nicw102168/restricted-boltzmann-machine-rbm-on-mnist
6
![Page 9: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/9.jpg)
RBM training (1)Thanks to the special bipartite structure, explicit marginalization ispossible:
P (v; θ) =1
Z(θ)
∑
h
exp(−E(v, h; θ)) =1
Z(θ)exp(bTv)
∏
j
(1+exp(aj+∑
i
Wijvj))
with vi ∈ 0, 1, hi ∈ 0, 1.
Conditional distributions:
P (h|v; θ) =∏
j
p(hj|v) with p(hj = 1|v) = σ(∑
i
Wijvi + aj)
and
P (v|h; θ) =∏
i
p(vi|h) with p(vi = 1|h) = σ(∑
j
Wijhj + bi)
with σ the sigmoid activation.
7
![Page 10: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/10.jpg)
RBM training (2)
Given observations vnNn=1, the derivative of the log-likelihood is
1N
∑
n∂ logP (vn;θ)
∂Wij= EPdata
[vihj]− EPmodel[vihj]
1N
∑
n∂ logP (vn;θ)
∂aj= EPdata
[hj]− EPmodel[hj]
1N
∑
n∂ logP (vn;θ)
∂bi= EPdata
[vi]− EPmodel[vi]
with
• Data-dependent expectation EPdata[·] (form of Hebbian learning):
an expectation with respect to the data distribution Pdata(h, v; θ) =P (h|v; θ)Pdata(v) with Pdata(v) = 1
N
∑
n δ(v − vn) the empiricaldistribution.
• Model’s expectation EPmodel[·] (unlearning):
an expectation with respect to the distribution defined by the modelP (v, h; θ) = 1
Z(θ) exp(−E(v, h; θ)).
8
![Page 11: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/11.jpg)
RBM training (3)
Exact maximum likelihood learning is intractable (due to computation ofEPmodel
[·]). In practice, Contrastive Divergence (CD) algorithm [Hinton2002]:
∆W = α(EPdata[vhT ]− EPT
[vhT ])
with α learning rate and PT a distribution defined by running a Gibbs chaininitialized at the data for T full steps (T = 1, i.e. CD1 often in practice).
CD1 scheme:
1. Start Gibbs sampler v(1) := vn and generate h(1) ∼ P (h|v(1))
2. After obtaining h(1), generate v(2) ∼ P (v|h(1)) (called fantasy data)
3. After obtaining v(2), generate h(2) ∼ P (h|v(2))
with∆W ∝ (vnh
(1)T − v(2)h(2)T)
9
![Page 12: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/12.jpg)
Deep Boltzmann machine training (1)
Consider 3-layer Deep BM with energy function [Salakhutdinov 2015]:
E(v, h1, h2, h3; θ) = −vTW 1h1 − h1TW 2h2 − h2
TW 3h3
with unknown model parameters θ = W 1,W 2,W 3.
The model assigns the following probability to a visible vector v:
P (v; θ) =1
Z(θ)
∑
h1,h2,h3
exp(−E(v, h1, h2, h3; θ))
10
![Page 13: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/13.jpg)
Deep Boltzmann machine training (2)
For training:
∂ logP (v;θ)∂W 1 = EPdata
[vh1T]− EPmodel
[vh1T]
∂ logP (v;θ)∂W 2 = EPdata
[h1h2T]− EPmodel
[h1h2T]
∂ logP (v;θ)∂W 3 = EPdata
[h2h3T]− EPmodel
[h2h3T]
Problem: the conditional distribution over the states of the hidden variablesconditioned on the data is no longer factorial. For simplicity and speed onecan assume and impose a fully factorized distribution, correspondingto a naive mean-field approximation [Salakhutdinov 2015].
11
![Page 14: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/14.jpg)
Multimodal Deep Boltzmann Machine
From [Srivastava & Salakhutdinov 2014]
12
![Page 15: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/15.jpg)
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN) [Goodfellow et al., 2014]Training of two competing models in a zero-sum game:
(Generator) generate fake output examples from random noise(Discriminator) discriminate between fake examples and real examples.
source: https://deeplearning4j.org/generative-adversarial-network
13
![Page 16: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/16.jpg)
GAN: example on MNIST
MNIST training data:
GAN generated examples:
source: https://www.kdnuggets.com/2016/07/mnist-generative-adversarial-model-keras.html
14
![Page 17: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/17.jpg)
Kernel methods and deep learning
14
![Page 18: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/18.jpg)
Kernel machines & deep learning
previous approaches:
• kernels for deep learning [Cho & Saul, 2009]
• mathematics of the neural response [Smale et al., 2010]
• deep gaussian processes [Damianou & Lawrence, 2013]
• convolutional kernel networks [Mairal et al., 2014]
• multi-layer support vector machines [Wiering & Schomaker, 2014]
• other
15
![Page 19: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/19.jpg)
Kernel machines & deep learning: New Challenges
• new synergies and new foundations between support vector machines &kernel methods and deep learning architectures?
• possible to extend primal and dual model representations (as occuring inSVM and LS-SVM models) from shallow to deep architectures?
• possible to handle deep feedforward neural networks and deep kernel machineswithin a common setting?
16
![Page 20: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/20.jpg)
Kernel machines & deep learning: New Challenges
• new synergies and new foundations between support vector machines &kernel methods and deep learning architectures?
• possible to extend primal and dual model representations (as occuring inSVM and LS-SVM models) from shallow to deep architectures?
• possible to handle deep feedforward neural networks and deep kernel machineswithin a common setting?
→ new framework:
”Deep Restricted Kernel Machines” [Suykens, Neural Computation, 2017]https://www.mitpressjournals.org/doi/pdf/10.1162/neco a 00984
16
![Page 21: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/21.jpg)
Restricted Kernel Machines
16
![Page 22: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/22.jpg)
Restricted Kernel Machines (RKM)
Main characteristics:
• Kernel machine interpretations in terms of visible and hidden units(similar to Restricted Boltzmann Machines (RBM))
• Restricted Kernel Machine (RKM) representations for
– LS-SVM regression/classification– Kernel PCA– Matrix SVD– Parzen-type models– other
• Based on principle of conjugate feature duality(with hidden features corresponding to dual variables)
17
![Page 23: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/23.jpg)
LS-SVM regression model: classical approach
LS-SVM regression model, given input & output data xi ∈ Rd, yi ∈ R
minw,b,ei
12w
Tw + γ2
N∑
i=1
e2i
subject to yi = wTϕ(xi) + b+ ei, i = 1, ..., N.
Solution in Lagrange multipliers αi:
[
K + I/γ 1N1TN 0
] [
αb
]
=
[
y1:N0
]
with K(xi, xj) = ϕ(xi)Tϕ(xj), y1:N = [y1; ...; yN ]
and y =∑
i αiK(x, xi) + b.
→ How to achieve a representation with visible and hidden units?
18
![Page 24: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/24.jpg)
LS-SVM regression model: classical approach
LS-SVM regression model, given input & output data xi ∈ Rd, yi ∈ R
minw,b,ei
12w
Tw + γ2
N∑
i=1
e2i
subject to yi = wTϕ(xi) + b+ ei, i = 1, ..., N.
Solution in Lagrange multipliers αi:
[
K + I/γ 1N1TN 0
] [
αb
]
=
[
y1:N0
]
with K(xi, xj) = ϕ(xi)Tϕ(xj), y1:N = [y1; ...; yN ]
and y =∑
i αiK(x, xi) + b.
→ How to achieve a representation with visible and hidden units?
18
![Page 25: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/25.jpg)
Conjugate feature duality
Property. For λ > 0, the following quadratic form property holds:
1
2λeTe ≥ eTh−
λ
2hTh, ∀e, h ∈ R
p
Proof: This is verified by writing the quadratic form as
1
2
[
eT hT]
[
1λI II λI
] [
eh
]
≥ 0, ∀e, h ∈ Rp.
It is known that
Q =
[
A BBT C
]
≥ 0
if and only if A > 0 and the Schur complement C −BTA−1B ≥ 0.This results into the condition 1
2(λI − I(λI)I) ≥ 0, which holds.
Note. One has1
2λeTe = max
h(eTh−
λ
2hTh)
19
![Page 26: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/26.jpg)
Conjugate feature duality
Property. For λ > 0, the following quadratic form property holds:
1
2λeTe ≥ eTh−
λ
2hTh, ∀e, h ∈ R
p
Proof: This is verified by writing the quadratic form as
1
2
[
eT hT]
[
1λI II λI
] [
eh
]
≥ 0, ∀e, h ∈ Rp.
It is known that
Q =
[
A BBT C
]
≥ 0
if and only if A > 0 and the Schur complement C −BTA−1B ≥ 0.This results into the condition 1
2(λI − I(λI)I) ≥ 0, which holds.
Note. One has1
2λeTe = max
h(eTh−
λ
2hTh)
19
![Page 27: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/27.jpg)
Conjugate feature duality
Property. For λ > 0, the following quadratic form property holds:
1
2λeTe ≥ eTh−
λ
2hTh, ∀e, h ∈ R
p
Proof: This is verified by writing the quadratic form as
1
2
[
eT hT]
[
1λI II λI
] [
eh
]
≥ 0, ∀e, h ∈ Rp.
It is known that
Q =
[
A BBT C
]
≥ 0
if and only if A > 0 and the Schur complement C −BTA−1B ≥ 0.This results into the condition 1
2(λI − I(λI)I) ≥ 0, which holds.
Note. One has1
2λeTe = max
h(eTh−
λ
2hTh)
19
![Page 28: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/28.jpg)
Model: living in two worlds ...
Original model:
y =W Tx+ b, e = y − yobjective J= regularization term Tr(W TW )+ (1
λ) error term
∑
i eTi ei
↓ 12λe
Te ≥ eTh− λ2h
Th
New representation:
y =∑
j hjxTj x+ b
obtain J ≥ J(hi,W, b)solution from stationary points of J :∂J
∂hi= 0, ∂J
∂W= 0, ∂J
∂b= 0
20
![Page 29: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/29.jpg)
Model: living in two worlds ...
Original model:
y =W Tx+ b, e = y − yobjective J= regularization term Tr(W TW )+ (1
λ) error term
∑
i eTi ei
↓ 12λe
Te ≥ eTh− λ2h
Th
New representation:
y =∑
j hjxTj x+ b
obtain J ≥ J(hi,W, b)solution from stationary points of J :∂J
∂hi= 0, ∂J
∂W= 0, ∂J
∂b= 0
20
![Page 30: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/30.jpg)
Model: living in two worlds ...
Original model:
y =W Tx+ b, e = y − yobjective J= regularization term Tr(W TW )+ (1
λ) error term
∑
i eTi ei
↓ 12λe
Te ≥ eTh− λ2h
Th
New representation:
y =∑
j hjxTj x+ b
obtain J ≥ J(hi,W, b)solution from stationary points of J :∂J
∂hi= 0, ∂J
∂W= 0, ∂J
∂b= 0
20
![Page 31: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/31.jpg)
Model: living in two worlds ...
Original model:
y =W Tϕ(x) + b, e = y − yobjective J= regularization term Tr(W TW )+ (1
λ) error term
∑
i eTi ei
↓ 12λe
Te ≥ eTh− λ2h
Th
New representation:
y =∑
j hjK(xj, x) + b
obtain J ≥ J(hi,W, b)solution from stationary points of J :∂J
∂hi= 0, ∂J
∂W= 0, ∂J
∂b= 0
20
![Page 32: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/32.jpg)
Simplest example: line fitting
Given data: (xi, yi)Ni=1, xi, yi ∈ R
y
x
oo
o
o
oo
o
o
o
o
o
o
Linear model:y = wx+ b, e = y − y
RKM representation:
y =∑
i
hixix+ b
3 visible units: v = [x; 1;−y]1 hidden unit: h ∈ R
v h
RKM
21
![Page 33: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/33.jpg)
From LS-SVM to the RKM representation
Multi-output model y =W Tx+ b, e = y − y
Objective in LS-SVM regression (linear case)
J =η
2Tr(W TW ) +
1
2λ
N∑
i=1
eTi ei s.t. ei = yi −W Txi − b,∀i
≥N∑
i=1
eTi hi −λ
2
N∑
i=1
hTi hi +η
2Tr(W TW ) s.t. ei = yi −W Txi − b,∀i
=N∑
i=1
(yTi − xTi W − bT )hi −λ
2
N∑
i=1
hTi hi +η
2Tr(W TW ) , J(hi,W, b)
= RtrainRKM −
λ
2
N∑
i=1
hTi hi +η
2Tr(W TW )
22
![Page 34: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/34.jpg)
From LS-SVM to the RKM representation
Multi-output model y =W Tx+ b, e = y − y
Objective in LS-SVM regression (linear case)
J =η
2Tr(W TW ) +
1
2λ
N∑
i=1
eTi ei s.t. ei = yi −W Txi − b,∀i
≥N∑
i=1
eTi hi −λ
2
N∑
i=1
hTi hi +η
2Tr(W TW ) s.t. ei = yi −W Txi − b,∀i
=N∑
i=1
(yTi − xTi W − bT )hi −λ
2
N∑
i=1
hTi hi +η
2Tr(W TW ) , J(hi,W, b)
= RtrainRKM −
λ
2
N∑
i=1
hTi hi +η
2Tr(W TW )
22
![Page 35: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/35.jpg)
From LS-SVM to the RKM representation
Multi-output model y =W Tx+ b, e = y − y
Objective in LS-SVM regression (linear case)
J =η
2Tr(W TW ) +
1
2λ
N∑
i=1
eTi ei s.t. ei = yi −W Txi − b,∀i
≥N∑
i=1
eTi hi −λ
2
N∑
i=1
hTi hi +η
2Tr(W TW ) s.t. ei = yi −W Txi − b,∀i
=N∑
i=1
(yTi − xTi W − bT )hi −λ
2
N∑
i=1
hTi hi +η
2Tr(W TW ) , J(hi,W, b)
= RtrainRKM −
λ
2
N∑
i=1
hTi hi +η
2Tr(W TW )
22
![Page 36: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/36.jpg)
Connection between RKM and RBM
• RKM & RBM: interpretation in terms of visible and hidden units
• RKM: energy form as in RBM:
RtrainRKM =
N∑
i=1
RRKM(vi, hi)
= −N∑
i=1
(xTi Whi + bThi − yTi hi) =N∑
i=1
eTi hi
with RRKM(v, h) = −vTWh = −(xTWh+ bTh− yTh) = eTh.
• Conjugate feature duality: hidden features hi are conjugated to the eiand serve as dual variables.
23
![Page 37: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/37.jpg)
From LS-SVM to RKM representation (2)
• Stationary points of J(hi,W, b) (nonlinear case, feature map ϕ(·))
∂J
∂hi
= 0 ⇒ yi = WTϕ(xi) + b + λhi, ∀i
∂J
∂W= 0 ⇒ W =
1
η
∑
i
ϕ(xi)hTi
∂J
∂b= 0 ⇒
∑
i
hi = 0.
• Solution in hi and b with positive definite kernelK(xi, xj) = ϕ(xi)Tϕ(xj)
[ 1ηK + λIN 1N
1TN 0
] [
HT
bT
]
=
[
Y T
0
]
with K = [K(xi, xj)], H = [h1...hN ], Y = [y1...yN ].
24
![Page 38: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/38.jpg)
From LS-SVM to RKM representation (3)
v
hx ϕ(x)
yy
e
Note: ϕ(x) can be multi-layered, visible units: [ϕ(x); 1;−y]Conjugate feature duality: primal and dual model representations:
(P )RKM : y = W Tϕ(x) + b
ր
M
ց
(D)RKM : y =1
η
∑
j
hjK(xj, x) + b.
(large N , small d) versus (large d, small N)
25
![Page 39: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/39.jpg)
Kernel principal component analysis (KPCA)
−1.5 −1 −0.5 0 0.5 1−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
linear PCA
−1.5 −1 −0.5 0 0.5 1−2
−1.5
−1
−0.5
0
0.5
1
1.5
kernel PCA (RBF kernel)
Kernel PCA [Scholkopf et al., 1998]:take eigenvalue decomposition of the kernel matrix
K(x1, x1) ... K(x1, xN)... ...
K(xN , x1) ... K(xN , xN)
(applications in dimensionality reduction and denoising)
26
![Page 40: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/40.jpg)
Kernel PCA: classical LS-SVM approach
• Primal problem: [Suykens et al., 2002]
minw,b,e
1
2wTw −
1
2γ
N∑
i=1
e2i s.t. ei = wTϕ(xi) + b, i = 1, ..., N.
• Dual problem corresponds to kernel PCA
Ω(c)α = λα with λ = 1/γ
with Ω(c)ij = (ϕ(xi)− µϕ)
T (ϕ(xj)− µϕ) the centered kernel matrix
and µϕ = (1/N)∑N
i=1ϕ(xi).
• Interpretation:1. pool of candidate components (objective function equals zero)2. select relevant components
27
![Page 41: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/41.jpg)
From KPCA to RKM representation
Model:
e =W Tϕ(x)objective J= regularization term Tr(W TW )- (1
λ) variance term
∑
i eTi ei
↓ - 12λe
Te ≤ −eTh+ λ2h
Th
RKM representation:
e =∑
j hjK(xj, x)
obtain J ≤ J(hi,W )solution from stationary points of J :∂J∂hi
= 0, ∂J∂W
= 0
28
![Page 42: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/42.jpg)
From KPCA to RKM representation (2)
• Objective
J =η
2Tr(W TW )−
1
2λ
N∑
i=1
eTi ei s.t. ei =W Tϕ(xi), ∀i
≤ −N∑
i=1
eTi hi +λ
2
N∑
i=1
hTi hi +η
2Tr(W TW ) s.t. ei =W Tϕ(xi), ∀i
= −N∑
i=1
ϕ(xi)TWhi +
λ
2
N∑
i=1
hTi hi +η
2Tr(W TW ) , J
• Stationary points of J(hi,W ):
∂J
∂hi= 0 ⇒ W Tϕ(xi) = λhi, ∀i
∂J
∂W= 0 ⇒ W =
1
η
∑
i
ϕ(xi)hTi
29
![Page 43: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/43.jpg)
From KPCA to RKM representation (3)
• Elimination of W gives the eigenvalue decomposition:
1
ηKHT = HTΛ
where H = [h1...hN ] ∈ Rs×N and Λ = diagλ1, ..., λs with s ≤ N
• Primal and dual model representations
(P )RKM : e =W Tϕ(x)ր
Mց
(D)RKM : e =1
η
∑
j
hjK(xj, x).
30
![Page 44: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/44.jpg)
Singular value decomposition
• Objective: given xi, zj row and column data of (non-square) matrix
J = −η
2Tr(V TW ) +
1
2λ
N∑
i=1
eTi ei +1
2λ
M∑
j=1
rTj rj s.t. ei =W Tϕ(xi),∀i
rj = V Tψ(zj), ∀j
• primal and dual representations (relates to non-symmetric kernels)
(P )RKM : e =W Tϕ(x)ր r = V Tψ(z)
Mց
(D)RKM : e =1
η
∑
j
hrjψ(zj)Tϕ(x)
r =1
η
∑
i
heiϕ(xi)Tψ(z)
31
![Page 45: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/45.jpg)
Kernel probability mass function estimation
• Objective:
J =N∑
i=1
(pi − ϕ(xi)Tw)hi −
N∑
i=1
pi +η
2wTw
• primal and dual representations
(P )RKM : pi = wTϕ(xi)ր
Mց
(D)RKM : pi =1
η
∑
j
K(xj, xi)
32
![Page 46: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/46.jpg)
Deep Restricted Kernel Machines
32
![Page 47: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/47.jpg)
Deep RKM: example
v
h(1)x ϕ1(x)
yy
e(1)ϕ2(h
(1))e(2)h(2)
ϕ3(h(2))
e(3)h(3)
Deep RKM: KPCA + KPCA + LSSVM
Coupling of RKMs by taking sum of the objectives
Jdeep = J1 + J2 + J3
33
![Page 48: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/48.jpg)
Generative kernel PCA
33
![Page 49: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/49.jpg)
RKM objective for training and generating (1)
• RBM energy function
E(v, h; θ) = −vTWh− cTv − aTh
with model parameters θ = W, c, a
• RKM objective function
J(v, h,W ) = −vTWh+ λ2h
Th+ 12v
Tv + η2Tr(W
TW )
Training: clamp v → Jtrain(h,W )Generating: clamp h,W → Jgen(v)
[Schreurs & Suykens, ESANN 2018]
34
![Page 50: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/50.jpg)
RKM objective for training and generating (2)
• Training: (clamp v)
Jtrain(hi,W ) = −N∑
i=1
vTi Whi +λ
2
N∑
i=1
hTi hi +η
2Tr(WTW )
Stationary points:
∂Jtrain∂hi
= 0 ⇒WTvi = λhi, ∀i∂Jtrain∂W
= 0 ⇒W = 1η
∑Ni=1 vih
Ti
Elimination of W :1
ηKHT = HT∆,
where H = [h1, . . . , hN ] ∈ Rs×N , ∆ = diagλ1, . . . , λs with s ≤ N
the number of selected components and Kij = vTi vj the kernel matrixelements.
35
![Page 51: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/51.jpg)
RKM objective for training and generating (3)
• Generating: (clamp h,W )
Estimate distribution p(h) from hi, i = 1, ..., N (or assumed normal).Obtain a new value h⋆.Generate in this way v⋆ from
Jgen(v⋆) = −v⋆
TWh⋆ +
1
2v⋆
Tv⋆
Stationary points:∂Jgen∂v⋆
= 0
This givesv⋆ =Wh⋆
36
![Page 52: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/52.jpg)
Dimensionality reduction and denoising: linear case
• Given training data vi = xi with X ∈ Rd×N , obtain hidden features
H ∈ Rs×N :
X =WH = (1
η
N∑
i=1
xihTi )H =
1
ηXHTH
• Reconstruction error: ‖X − X‖2
xi G(·) hi F (·) xi
37
![Page 53: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/53.jpg)
Dimensionality reduction and denoising: nonlinear case (1)
• New datapoint x⋆ generated from h⋆ by
ϕ(x⋆) =Wh⋆ = (1
η
N∑
i=1
ϕ(xi)hTi )h
⋆
• Multiplying both sides by ϕ(xj) gives:
K(xj, x⋆) =
1
η(
N∑
i=1
K(xj, xi)hTi )h
⋆
On training data:
Ω =1
ηΩHTH
with H ∈ Rs×N ,Ωij = K(xi, xj) = ϕ(xi)
Tϕ(xj).
38
![Page 54: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/54.jpg)
Dimensionality reduction and denoising: nonlinear case (2)
• Estimated value x for x⋆ by kernel smoother:
x =
∑Sj=1 K(xj, x
⋆)xj∑S
j=1 K(xj, x⋆)
with K(xj, x⋆) (e.g. RBF kernel) the scaled similarity between 0 and
1, a design parameter S ≤ N (S closest points based on the similarityK(xj, x
⋆)).
[Schreurs & Suykens, ESANN 2018]
39
![Page 55: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/55.jpg)
Example: denoising
Synthetic data sets:
-3 -2 -1 0 1 2 3
X1
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
X2
-2 -1.5 -1 -0.5 0 0.5 1 1.5-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
X ∈ R2×500 (d = 2, N = 500)
Kernel PCA using RBF kernel with σ2 = 1 (left: s = 2; right: s = 8)Kernel smoother: S = 100 closed points, σ2 = 0.2
40
![Page 56: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/56.jpg)
Example: generating new data
From MNIST data:
Training data: 50 images per digit; Kernel PCA (left: s = 20; right: s = 50)Normal distribution fitted on hi, used to generate h⋆
Kernel smoother: (left) S = 10 (digits 0); (right) S = 100 (digits 0,8)
[Schreurs & Suykens, ESANN 2018]
41
![Page 57: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/57.jpg)
Towards explainable AI
Understanding the role of the hidden units:
-0.15 -0.1 -0.05 0 0.05 0.1
H1
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
H2
016
h(1,1) = -0.11 h(1,1) = -0.06 h(1,1) = -0.01 h(1,1) = 0.04 h(1,1) = 0.09
h(1,2) = -0.12 h(1,2) = -0.06 h(1,2) = 0 h(1,2) = 0.06 h(1,2) = 0.12
h(1,3) = -0.11 h(1,3) = -0.05 h(1,3) = 0.01 h(1,3) = 0.06 h(1,3) = 0.12
[figures by Joachim Schreurs]
42
![Page 58: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/58.jpg)
Tensor-based RKM for Multi-view KPCA
min 〈W,W〉−N∑
i=1
⟨
Φ(i),W⟩
hi+λN∑
i=1
h2i with Φ(i) = ϕ[1](x[1]i )⊗...⊗ϕ[V ](x
[V ]i )
[Houthuys & Suykens, ICANN 2018]
43
![Page 59: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/59.jpg)
Generative RKM (1)
The objective
Jtrain(hi, V, U) =N∑
i=1
(−ϕ1(xi)TV hi−ϕ2(yi)
TUhi+λi2hTi hi)+
η12Tr(V TV )+
η22Tr(UTU)
results for training into the eigenvalue problem
(1
η1K1 +
1
η2K2)H
T = HTΛ
with H = [h1...hN ] and kernel matrices K1,K2 related to ϕ1, ϕ2.
[Pandey, Schreurs & Suykens, 2019, arXiv:1906.08144]
44
![Page 60: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/60.jpg)
Generative RKM (2)
Generating data is based on a newly generated h⋆ and the objective
Jgenerate(ϕ1(x⋆), ϕ2(y
⋆)) = −ϕ1(x⋆)TV h⋆−ϕ2(y
⋆)TUh⋆+1
2ϕ1(x
⋆)Tϕ1(x⋆)+
1
2ϕ2(y
⋆)Tϕ2(y⋆)
giving
ϕ1(x⋆) =
1
η1
N∑
i=1
ϕ1(xi)hTi h
⋆, ϕ2(y⋆) =
1
η2
N∑
i=1
ϕ2(yi)hTi h
⋆.
For generating x, y one can either work with the kernel smoother or workwith an explicit feature map using a feedforward neural network.
[Pandey, Schreurs & Suykens, 2019, arXiv:1906.08144]
45
![Page 61: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/61.jpg)
Generative RKM (3)
Train:
Generate:
46
![Page 62: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/62.jpg)
Generative RKM (4)
[Pandey, Schreurs & Suykens, 2019, arXiv:1906.08144]
47
![Page 63: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/63.jpg)
Generative RKM (5)
Figure: Image generation using neural networks as feature map:
(left) MNIST; (right) Small-NORB
[Pandey, Schreurs & Suykens, 2019, arXiv:1906.08144]
48
![Page 64: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/64.jpg)
Generative RKM (6)
Figure: Targeted image generation through corresponding latent variable.
[Pandey, Schreurs & Suykens, 2019, arXiv:1906.08144]
49
![Page 65: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/65.jpg)
Conclusions
• From RBM to deep BM
• From RKM to deep RKM
• RKM and RBM representation: visible and hidden units
• RKM representation for LS-SVM, KPCA, SVD and others
• RKM representation obtained by conjugate feature duality
• Generative RKM
50
![Page 66: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/66.jpg)
Acknowledgements (1)
• Current and former co-workers at ESAT-STADIUS:
C. Alzate, Y. Chen, J. De Brabanter, K. De Brabanter, L. De Lathauwer,H. De Meulemeester, B. De Moor, H. De Plaen, Ph. Dreesen, M.Espinoza, T. Falck, M. Fanuel, Y. Feng, B. Gauthier, X. Huang, L.Houthuys, V. Jumutc, Z. Karevan, R. Langone, R. Mall, S. Mehrkanoon,G. Nisol, M. Orchel, A. Pandey, K. Pelckmans, S. RoyChowdhury, S.Salzo, J. Schreurs, M. Signoretto, Q. Tao, J. Vandewalle, T. Van Gestel,S. Van Huffel, C. Varon, Y. Yang, and others
• Many other people for joint work, discussions, invitations, organizations
• Support from ERC AdG E-DUALITY, ERC AdG A-DATADRIVE-B, KULeuven, OPTEC, IUAP DYSCO, FWO projects, IWT, iMinds, BIL, COST
51
![Page 67: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/67.jpg)
Acknowledgements (2)
52
![Page 68: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/68.jpg)
Acknowledgements (3)
NEW: ERC Advanced Grant E-DUALITYExploring duality for future data-driven modelling
53
![Page 69: Deep Learning, Neural Networks and Kernel MachinesRestricted Boltzmann Machines (RBM) • Markov random field, bipartite graph, stochastic binary units Layer of visible units vand](https://reader030.fdocuments.us/reader030/viewer/2022040411/5ed27eec773cd410be4fdffc/html5/thumbnails/69.jpg)
Thank you
54