Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture...

Graph Neural Networks

Graph Deep Learning 2021 - Lecture 2

Daniele Grattarola

March 1, 2021

Roadmap

Things we are going to cover:

• Practical introduction to GNNs

• Message passing

• Advanced GNNs (attention, edge attributes)

• Demo

After the break:

• Spectral graph theory and GNNs

Recall: what are we doing?

From CNNs to GNNs

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.01.0

• The receptive field of aCNN reflects theunderlying grid structure.

• The CNN has an inductivebias on how to process theindividualpixels/timesteps/nodes.

From CNNs to GNNs

• Drop assumptions about underlying structure: it is now aninput of the problem.

• The only thing we know: the representation of a nodedepends on its neighbors.

Discrete Convolution

1 2 3 4 5 6 1 0 1 4 6 8 10* =

Discrete convolution:

(f ? g)[n] =M∑

m=−M

f [n −m]g [m]

Problems:

• Variable degree of nodes

• Orientation

Notation recap

• Graph: nodes connected by edges;

• X = [x1, . . . , xN ], xi ∈ RF , node attributes or “graph signal”;

• eij ∈ RS , edge attribute for edge i → j ;

• A, N × N adjacency matrix;

• D = diag([d1, . . . , dN ]), diagonal degree matrix;

• L = D− A, Laplacian;

• An = D−1/2AD−1/2, normalized adjacency matrix;

• Reference operator R: rij 6= 0 if ∃i → j

NOTE: all matrices are symmetric.

0 2 4 6 8

e12 e13

A quick recipe for a local learnable filter

Applying R to graph signal X is a local action:

(RX)i =N∑j=1

rij · xj =∑

j∈N (i)

rij · xj

Instead of having a different weight for each neighbor, we shareweights among nodes in the same neighborhood:

X′ = RXΘ

where Θ ∈ RF×F ′.

r12 r13

Powers of R

Let’s consider the effect of applying R2 to X:

(RRX)i =∑

j∈N (i)

rij(RX)j =∑

j∈N (i)

∑k∈N (j)

rij · rjk · xk

Key idea: by applying RK we read from the K -thorder neighborhood of a node.

0 2 4 6 80

K = 10 2 4 6 8

K = 20 2 4 6 8

K = 30 2 4 6 8

Polynomials of R

To cover all neighbors of order 0 to K , we can just takea polynomial with weights Θ(k):

X′ = σ( K∑

RkXΘ(k))

Chebyshev Polynomials [1]

A recursive definition using Chebyshev polynomials:

T(0) = I

T(1) = L

T(k) = 2 · L · T(k−1) − T(k−2)

Where L =2Lnλmax

− I and Ln = I− D−1/2AD−1/2

Layer: X′ = σ( K∑

k=0T(k)XΘ(k)

)1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00

T(n) (x

[1] M. Defferrard et al., “Convolutional neural networks on graphs with fast localized spectral filtering,” 2016.

Graph Convolutional Networks [2]

Polynomial of order K → K layers of order 1;

Three simplifications:

1. λmax = 2 → L =2Lnλmax

− I = −D−1/2AD−1/2 = −An

2. K = 1 → X′ = XΘ(0) − AnXΘ(1)

3. Θ = Θ(0) = −Θ(1)

Layer: X′ = σ(

(I + An︸︷︷︸A

= σ(AXΘ

)For stability: A = D−1/2(I + A)D−1/2

a12 a13

[2] T. N. Kipf et al., “Semi-supervised classification with graph convolutional networks,” 2016.

A General Paradigm

Message Passing Neural Networks [3]

e21 e31

Graph.

m21 m31

e21 e31

Messages.

m21 m31

e21 e31

Propagation.

[3] J. Gilmer et al., “Neural message passing for quantum chemistry,” 2017.

Message Passing Neural Networks [3]

A general scheme for message-passing networks:

x′i = γ(xi ,�j∈N (i) φ (xi , xj , eji )

• φ: message function, depends on xi , xj and possibly theedge attribute eji (we call messages mji );

• �j∈N (i): aggregation function (sum, average, max, orsomething else...);

• γ: update function, final transformation to obtain newattributes after aggregating messages.

m21 m31

e21 e31

[3] J. Gilmer et al., “Neural message passing for quantum chemistry,” 2017.

Models

Graph Attention Networks [4]

1. Update the node features: hi = Θf xi withΘf ∈ RF ′×F .

2. Compute attention logits: eij = σ(θ>a [hi ‖ hj ]

with θa ∈ R2F ′.1

3. Normalize with Softmax:

aij = softmaxj(eij) =exp (eij)∑

k∈N (i)

exp (eik)

4. Propagate using the attention coefficients:

x′i =∑

j∈N (i)

a21 a31

1‖ indicates concatenation[4] P. Velickovic et al., “Graph attention networks,” 2017.

Edge-Conditioned Convolution [5]

Key idea: incorporate edge attributes into the messages.

Consider a MLP φ : RS → RFF ′called a filter

generating network:

Θ(ji) = reshape(φ(eji ))

Use the edge-dependent weights to compute messages:

x′i = Θ(i)xi +∑

j∈N (i)

Θ(ji)xj + b

m21 m31

e21 e31

[5] M. Simonovsky et al., “Dynamic edge-conditioned filters in convolutional neural networks on graphs,” 2017.

So which GNN do I use?

GCNConvKipf & Welling

ChebConvDefferrard et al.

GraphSageConvHamilton et al.

ARMAConvBianchi et al.

ECCConvSimonovsky & Komodakis

GATConvVelickovic et al.

GCSConvBianchi et al.

APPNPConvKlicpera et al.

GINConvXu et al.

DiffusionConvLi et al.

GatedGraphConvLi et al.

AGNNConvThekumparampil et al.

TAGConvDu et al.

CrystalConvXie & Grossman

EdgeConvWang et al.

MessagePassingGilmer et al.

A Good Recipe [6]

Message passing scheme:

• Message: mji = PReLU (BatchNorm (Θxj + b))

• Aggregate: magg =∑

j∈N (i)

• Update: x′ = x || magg;

Architecture:

• Pre- and post-process node features using 2-layer MLPs;

• 4-6 message passing steps;

2-layer MLP

Message Passing

2-layer MLP

[6] J. You et al., “Design space for graph neural networks,” 2020.

How do we use this?

Node-level learning.(e.g., social networks)

Graph-level learning.(e.g., molecules)

GNN libraries

Graph Convolution

Discrete Convolution

1 2 3 4 5 6 1 0 1 4 6 8 10* =Recall: CNNs compute a discreteconvolution

(f ? g)[n] =M∑

m=−M

f [n −m]g [m] (1)

Convolution Theorem

Given two functions f and g , their convolution f ? g can be expressed as:

f ? g = F−1 {F {f } · F {g}} (2)

Where F is the Fourier transform and F−1 its inverse.

Can we use this major property?

What is the Fourier transform?

Key intuition – we are representing a function in a different basis.

F{f }[k] = f [k] =N−1∑n=0

f [n]e−i2πN kn

F−1{f }[n] = f [n] =1N

N−1∑k=0

f [k]e i2πN kn

From FT to GFT

The eigenvectors of the Laplacian for a path graph canbe obtained analytically:

uk [n] =

1, for k = 0

e iπ(k+1)n/N , for odd k , k < N − 1

e−iπkn/N , for even k , k > 0

cos(πn), for odd k , k = N − 1

Looks familiar?

5 10 15 20

From FT to GFT

Convolution

FourierTransform

Path Graph

Laplacian

FourierEigenbasis

• Drop the “grid” assumption

• Replace e−i2πN kn with generic uk [n]:

FG {f } [k] =N−1∑n=0

f [n]uk [n]

• GFT: FG{f } = f = U>f ;

• IGFT: F−1G {f } = f = Uf

Graph Convolution

Recall:

• Convolution theorem: f ? g = F−1 {F {f } · F {g}}

• Spectral theorem: L = UΛU> =N−1∑i=0

λiuiu>i

Graph signals:f , g

GFT:U>f ,U>g

Multiply: 2

U>f � U>gIGFT:

U(U>f � U>g

Graph filter: U(U>f � U>g

)= U · diag(U>g)︸︷︷︸

·U>f = U · g(Λ) · U>︸︷︷︸g(L)

f = g(L)f

2� indicates element-wise multiplication

Spectral GCNs

A first idea [7]: transformation of each individualeigenvalue is learned with a free parameter θi .

Problems:

• O(N) parameters;

• not localized in node space (the only thingthat we want);

• U · g(Λ) · U> costs O(N2);

gθ(Λ) =

θ1. . .

θN−2

θN−1

[7] J. Bruna et al., “Spectral networks and locally connected networks on graphs,” 2013.

Spectral GCNs

Better idea [7]:

• Localized in node domain ↔ smooth inspectral domain;

• Learn only a few parameters θi ;

• Interpolate the other eigenvalues using asmooth cubic spline;

Localized and O(1) parameters, butmultiplying by U twice is still expensive.

0 2 4 6 8 10 12 14 16n

interpolatedlearned

[7] J. Bruna et al., “Spectral networks and locally connected networks on graphs,” 2013.

Chebyshev Polynomials [1]

The same recursion is used to filter eigenvalues:

T(0) = I

T(1) = Λ

T(k) = 2 · Λ · T(k−1) − T(k−2)

1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00x

T(n) (x

[1] M. Defferrard et al., “Convolutional neural networks on graphs with fast localized spectral filtering,” 2016.

References i

[1] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural networks ongraphs with fast localized spectral filtering,” in Advances in Neural Information ProcessingSystems, 2016, pp. 3844–3852.

[2] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutionalnetworks,” in International Conference on Learning Representations (ICLR), 2016.

[3] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural messagepassing for quantum chemistry,” arXiv preprint arXiv:1704.01212, 2017.

[4] P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graphattention networks,” arXiv preprint arXiv:1710.10903, 2017.

[5] M. Simonovsky and N. Komodakis, “Dynamic edge-conditioned filters in convolutionalneural networks on graphs,” in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition, 2017.

References ii

[6] J. You, R. Ying, and J. Leskovec, “Design space for graph neural networks,” arXiv preprintarXiv:2011.08843, 2020.

[7] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral networks and locally connectednetworks on graphs,” arXiv preprint arXiv:1312.6203, 2013.

Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture...

Documents

Transcript of Graph Neural Networks · 2021. 3. 25. · Graph Neural Networks Graph Deep Learning 2021 - Lecture...

Explain Graph Neural Networks to Understand Weighted Graph ...

Chapter 16 Heterogeneous Graph Neural Networks

Graph Neural Network-Based Anomaly Detection in Multivariate … · 2021. 1. 29. · Graph Neural Network-Based Anomaly Detection in Multivariate Time Series Ailin Deng, Bryan Hooi

Learning Discrete Structures for Graph Neural …Learning Discrete Structures for Graph Neural Networks 2. Background We ﬁrst provide some background on graph theory, graph neural

Inductive biases, graph neural networks, attention … › present_file › Inductive biases...Graph neural network 27 Graph neural networks Battaglia, Peter W., et al. "Relational

Robust Graph Representation Learning via Neural Sparsification · 2021. 7. 5. · Robust Graph Representation Learning via Neural Sparsiﬁcation Cheng Zheng1 Bo Zong 2Wei Cheng Dongjin

Graph Neural Networks - GitHub Pages · “Gated Graph Sequence Neural Networks” Li et al. 2017 “The Graph Neural Network Model” Scarselli et al. 2009 “Relational inductive

Auto-GNN: Neural Architecture Search of Graph Neural NetworksAuto-GNN: Neural Architecture Search of Graph Neural Networks Kaixiong Zhou, Qingquan Song, Xiao Huang, Xia Hu Department

Graph neural networks - chrome.ws.dei.polimi.it

Graph Structure of Neural Networks

Semi-Supervised Hierarchical Recurrent Graph Neural ...

GCC: Graph Contrastive Coding for Graph Neural Network Pre ...GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training Jiezhong Qiu qiujz16@mails.tsinghua.edu.cn Tsinghua

Lecture 6: Recurrent & Graph Neural NetworksLecture 6: Recurrent & Graph Neural Networks Efstratios Gavves. UVA DEEP LEARNING COURSE –EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS

Hyperbolic Graph Neural Networks - Facebook Research

Graph Neural Networks (GNNs)

Quelle IA pour les données structurées ? Graph Neural Networksromain.raveaux.free.fr/document/graph neural... · Quelle IA pour les données structurées ? Graph Neural Networks

A gentle introduction to graph neural networks gentle... · Principles of graph neural network Updates in a graph neural network • Edge update : relationship or interactions, sometimes

Graph Neural Networks with Heterophily

GMNN: Graph Markov Neural Networks

Chapter 15 Dynamic Graph Neural Networks