Convolutional Neural Networks on Graphs with Fast...

Convolutional Neural Networks on Graphs with Fast LocalizedSpectral Filtering

M. Defferrard1 X. Bresson2 P. Vandergheynst1

1EPFL, Lausanne, Switzerland

2Nanyang Technological University, Singapore

Itay Boneh, Asher KabakovitchTel-Aviv University, Deep Learning Seminar

Spectral FilteringNon-parametric

From convolution theorem:

x ∗G g = U(UTg � UT x

(g � UT x

)Or algebrically:

x ∗G g = U

g1 0. . .

I Not localized

I O(n) parameters to train

I O(n2) multiplications (no FFT)

Spectral FilteringNon-parametric

From convolution theorem:

x ∗G g = U(UTg � UT x

(g � UT x

)Or algebrically:

x ∗G g = U

g1 0. . .

I Not localized

I O(n) parameters to train

I O(n2) multiplications (no FFT)

Spectral FilteringPolynomial Parametrization

g is a continuous 1-D function, parametrized by θ:

x ∗G g = U(gθ(λ)� UT x

gθ(λ1) 0. . .

0 gθ(λn)

= Ugθ(Λ)UT x = gθ(L)x

Tφ = T~λ⇒ f (T ) = φf (Λ)φ−1

g is a continuous 1-D function, parametrized by θ:

x ∗G g = U(gθ(λ)� UT x

gθ(λ1) 0. . .

0 gθ(λn)

= Ugθ(Λ)UT x = gθ(L)x

Polynomial parametrization:

gθ(Λ) =K−1∑k=0

θkΛk

I K-localized: 1-hop for every L application

I K parameters to train - independent on the graph’s size

I Still O(n2) multiplications (multiplications with the basis U)

Impulse response on a 2D Euclidean domain

Impulse response on a graph

Spectral FilteringRecursive Polynomial Parametrization

Chebyshev polynomials: Tk(λ) = 2λTk−1(λ)− Tk−2(λ)

T0(λ) = 1

T1(λ) = λ

Parametrization:

gθ(L) =K−1∑k=0

θkTk(L)

L = 2Lλ−1n − I (orthonormal basis in [−1, 1])

Filtering:

y = gθ(L)x =K−1∑k=0

θkTk(L)x =K−1∑k=0

θk xk

Recurrence: xk = Tk(L)x = 2Lxk−1 − xk−2

x0 = x

x1 = Lx

I O(Kn) multiplications (actually O(K |ε|))

I No EVD of L

Filtering:

y = gθ(L)x =K−1∑k=0

θkTk(L)x =K−1∑k=0

θk xk

Recurrence: xk = Tk(L)x = 2Lxk−1 − xk−2

x0 = x

x1 = Lx

I O(Kn) multiplications (actually O(K |ε|))

I No EVD of L

Graph CNN

Graph CNNLearning Filters

ys,j =

Fin∑i=1

gθi,j (L)xs,i

∂L∂θi ,j

[xs,i ,0, . . . , xs,i ,K−1]T∂L∂ys,j

∂L∂xs,i

=Fout∑j=1

gθi,j (L)∂L∂ys,j

s = 1, . . . ,S - sample indexi = 1, . . . ,Fin - input feature map indexj = 1, . . . ,Fout - output feature map indexθi ,j - Fin × Fout Chebyshev coefficients vectors of order KL - Loss over a mini-batch of S samples

Graph CNNLearning Filters

ys,j =

Fin∑i=1

gθi,j (L)xs,i

∂L∂θi ,j

[xs,i ,0, . . . , xs,i ,K−1]T∂L∂ys,j

∂L∂xs,i

=Fout∑j=1

gθi,j (L)∂L∂ys,j

O (K |ε|FinFoutS)

Easily paralleled

Graph PoolingCoarsening

I Multilevel clustering algorithm

I Reduce the size of the graph by a specified factor (2)

I Do all this efficiently

Graclus multilevel clustering algorithm

I Maximizing local normalized cut

I Greedily pick an unmarked vertex i and match it with an unmatched vertex jwhich maximizes the local normalized cut Wi ,j(1/di + 1/dj).

I Extremely fast.

I Dividing the number of nodes by approximately 2.

I Might generate singletons (non matched) nodes. Solved by using fake nodes.

Graph PoolingCoarsening

I Multilevel clustering algorithm

I Reduce the size of the graph by a specified factor (2)

I Do all this efficiently

Graclus multilevel clustering algorithm

I Maximizing local normalized cut

I Greedily pick an unmarked vertex i and match it with an unmatched vertex jwhich maximizes the local normalized cut Wi ,j(1/di + 1/dj).

I Extremely fast.

I Dividing the number of nodes by approximately 2.

I Might generate singletons (non matched) nodes. Solved by using fake nodes.

Graph PoolingFast Pooling

Example: Pooling by 4

I Graclus generates singletons: n0 = 8→ n1 = 5→ n2 = 3

I By adding fake nodes we get: n2 = 3→ n1 = 6→ n0 = 12

I z = [max(x0, x1),max(x4, x5, x6),max(x8, x9, x10)]

I Balanced binary trees ⇒ efficient on GPUs.

ExperimentsMNIST

Wi ,j = e−‖xi−xj‖22/σ2

I 28×28 pixels + 192 fake nodes ⇒ n = |V| = 976

I 8-NN graph ⇒ |ε| = 3198

I Based on LeNet-5 ⇒ K = 5

ExperimentsMNIST

I 8-NN graph ⇒ |ε| = 3198

ExperimentsMNIST

I 8-NN graph ⇒ |ε| = 3198

I Isotropic filters (no orientation)

I Uninvestigated optimizations and initializations

Experiments20NEWS

I 18,846 text documentsassociated with 20 classes

I 10K most common wordsfrom the 94K unique words⇒ n = |V| = 10K

I 16-NN,Wi ,j = e−‖zi−zj‖

(zi - word2vec) ⇒|ε| = 132, 834

I x - bag-of-words model

Experiments20NEWS

Comparison with other methods (K = 5)

I Slightly worse than multinomial naiveBayes classifier.

I Defeats fully-connected networks withmuch less parameters.

Total training time divided by # ofgradient steps

I Scales as O(n) as opposed toO(n2)

Experiments20NEWS

Comparison with other methods (K = 5)

I Slightly worse than multinomial naiveBayes classifier.

I Defeats fully-connected networks withmuch less parameters.

Total training time divided by # ofgradient steps

I Scales as O(n) as opposed toO(n2)

ExperimentsGraph Quality

⇒ The data structure is important

⇒ A well constructed graph is important

⇒ Proper approximations (LSHForest) can be used for larger DBs.

Conclusions and Future Work

I Introduced a model with linear complexity.

I The quality of the input graph is of paramount importance.

I Local stationarity and compositionality are verified for text documents as long asthe graph is well constructed.

Convolutional Neural Networks on Graphs with Fast...

Documents

Transcript of Convolutional Neural Networks on Graphs with Fast...

Doubly Convolutional Neural Networks - papers.nips.ccpapers.nips.cc/paper/6340-doubly-convolutional-neural-networks.pdf · convolutional neural network (DCNN). In Figure 3 we have

Convolutional Neural Networks for Deep Learning: An Introduction · 2017-10-05 · Introduction: Convolutional Neural Networks (CNN) Convolutional Neural Networks: A deep learning

Visualizing and Understanding Convolutional Neural ... - arXiv and Understanding Convolutional Neural ... - arXiv

Convolutional Neural Network · Convolutional Neural Networks (CNN) •A.k.a. CNN or ConvNet Adit Deshpande, A Beginner's Guide To Understanding Convolutional Neural Networks. Digital

Neural Network Part 3: Convolutional Neural Networkspages.cs.wisc.edu/.../lecture12-neural-networks-3.pdf · Convolutional neural networks •Strong empirical application performance

Shepard Convolutional Neural Networks - Neural Information …papers.nips.cc/paper/5774-shepard-convolutional-neural... · 2016-02-16 · Shepard Convolutional Neural Networks (ShCNN)

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering

Qualifying Oral Exam: Representation Learning on Graphs · Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering [De errard et al., 2016] Convolutional neural

Convolutional Neural Networks for No-Reference Image Quality Assessmentopenaccess.thecvf.com/...Convolutional_Neural_Networks_2014_CVPR_paper.pdf · Convolutional Neural Networks

Convolutional Neural Networks Analyzed via Convolutional ...yromano/posters/ml_csc_poster.pdf · Convolutional Neural Networks ... Recently, convolutional sparse coding (CSC) ...

Hyperbolic Graph Convolutional Neural Networkspapers.nips.cc › paper › 8733-hyperbolic-graph-convolutional-neural-networks.pdfGraph Convolutional Neural Networks (GCNs) are state-of-the-art

A Convolutional Neural Network Cascade for Face … · A Convolutional Neural Network Cascade for Face Detection ... using a GPU, and achieves state ... Convolutional Neural Network

Convolutional Neural Networks (CNN)

Convolutional Neural Networks Analyzed via Convolutional ... · PDF fileConvolutional Neural Networks Analyzed via Convolutional Sparse Coding ... Deep Learning, Convolutional Neural

ImageNet Classiﬁcation with Deep Convolutional Neural …kriz/imagenet_classification_with_deep_convolutional.pdfImageNet Classiﬁcation with Deep Convolutional Neural Networks

Generalizing Convolutional Neural Networks to Graph ...dzeng/BIOS740/Walker_Bios740.pdf · "Convolutional neural networks on graphs with fast localized spectral ﬁltering." Advances

Convolutional neural networks deepa

Deep convolutional neural networks - Computer Science- …web.cs.ucdavis.edu/.../lee_lecture19_deeplearning_notes.pdf · Deep convolutional neural networks ... “Deep” architecture

Convolutional Networks with Adaptive Inference Graphs€¦ · Keywords Convolutional neural networks · Gumbel-Softmax · Residual networks 1 Introduction Often, convolutional networks

CONVOLUTIONAL NEURAL NETWORKS: MOTIVATION, …