Graph Neural NetworksGraph Neural Networks Alejandro Ribeiro Dept. of Electrical and Systems...
Transcript of Graph Neural NetworksGraph Neural Networks Alejandro Ribeiro Dept. of Electrical and Systems...
Graph Neural Networks
Alejandro Ribeiro
Dept. of Electrical and Systems Engineering
University of Pennsylvania
Email: [email protected]
Web: alelab.seas.upenn.edu
August 31, 2020
A. Ribeiro Graph Neural Networks 1
Course Objectives
I This professor is very excited today. Happy to have a captive audience to talk about his research
I Graph Neural Networks (GNNs) are exciting tools with broad applicability and interesting properties
These are, therefore, the two objectives of this course
Develop the ability to use Graph NeuralNetworks in practical applications
Understand the fundamental propertiesof Graph Neural Networks
I Identify situations where GNNs have potential. Formulate problems with GNNs. Develop solutions.
A. Ribeiro Graph Neural Networks 2
Are These Course Goals Relevant?
I Definitely! ⇒ GNNs are the tool of choicefor performing machine learning on graphs
⇒ Authorship attribution ⇒ Identify author of anonymous text. See [Segarra et al ’14]
⇒ Recommendation systems ⇒ Predict product ratings of different customers. See [Ruiz et al ’18]
⇒ Resource allocation in wireless communication networks See [Eisen-Ribeiro ’19]
⇒ Learning Decentralized Controllers in Collaborative Autonomy. See [Tolstaya et al ’19]
A. Ribeiro Graph Neural Networks 3
Go Call Your Mother
I My mom would always ask me about my first day of
school. All good mothers are equal.
I I would hate it if you don’t have anything interesting to
tell your moms when you call them later tonight
I Thus, allow me to use the next half hour to tell you some
interesting things for this conversation
A. Ribeiro Graph Neural Networks 4
Machine Learning on Graphs
(I) The why ⇒ Graphs appear in scores of settings. ⇒ They are pervasive models of structure
(II) The how ⇒ We should use a neural network ⇒ Fully connected neural networks do not scale
⇒ Convolutions (in time or graphs) are the key to scalable machine learning
(III) Convolutional filters in Euclidean space and convolutional filters on graphs
(IV) Convolutional neural networks and Graph (convolutional) neural networks
A. Ribeiro Graph Neural Networks 5
Machine Learning on Graphs: The Why
A. Ribeiro Graph Neural Networks 6
Why Are Graphs so Common in Information Processing?
I Graphs are generic models of signal structure that can help to learn in several practical problems
Authorship Attribution
aaboutafteragainstallalong
amongamongstanand
anotheranyasasid
eatawayba
rbeca
use
befo
re
beyo
nd
bothbu
tbycanclos
e
cons
eque
ntly
dare
desp
ite
dow
n
due
each
eith
eren
ough
ever
yex
cept
few
for
from
heap
she
nce
how
ever
ifinspite
insi
dein
toit
its
like
little
lots
many
may
might
more
most
much
must
near
neither
next
nononenornothingofononceoneopposite or
other ourout
outside part past plenty regardinground saving
shall
should
since
so some
such
than
that
the them
themselves
thenthence
therefore
thesetheythisthosethroughthroughouttill
totow
ardtow
ardsunder
underneathunless
untilunto
upupon
usused
what
when
where
whether
which
while
who
whom
whose
willwithwithinwouldyet
aaboutafteragainstallalong
amongamongstanand
anotheranyasasid
eatawayba
rbeca
use
befo
re
beyo
nd
bothbu
tbycanclos
e
cons
eque
ntly
dare
desp
ite
dow
n
due
each
eith
eren
ough
ever
yex
cept
few
for
from
heap
she
nce
how
ever
ifinspite
insi
dein
toit
its
like
little
lots
many
may
might
more
most
much
must
near
neither
next
nononenornothingofononceoneopposite or
other ourout
outside part past plenty regardinground saving
shall
should
since
so some
such
than
that
the them
themselves
thenthence
therefore
thesetheythisthosethroughthroughouttill
totow
ardtow
ardsunder
underneathunless
untilunto
upupon
usused
what
when
where
whether
which
while
who
whom
whose
willwithwithinwouldyet
Identify the author of a text of unknown provenance
Segarra et al ’16,, arxiv.org/abs/1805.00165
Recommendation Systems
Predict the rating a customer would give to a product
Ruiz et al ’18,, arxiv.org/abs/1903.12575
I In both cases there exists a graph that contains meaningful information about the problem to solve
A. Ribeiro Graph Neural Networks 7
Authorship Attribution with Word Adjacency Networks (WANs)
I Nodes represent different function words and edges how often words appear close to each other
⇒ A proxy for the different ways in which different authors use the English language grammar
William Shakespeare
aaboutafteragainstallalong
amongamongstanand
anotheranyasasid
eatawayba
rbeca
use
befo
re
beyo
nd
bothbu
tbycanclos
e
cons
eque
ntly
dare
desp
ite
dow
n
due
each
eith
eren
ough
ever
yex
cept
few
for
from
heap
she
nce
how
ever
ifinspite
insi
dein
toit
itslik
e
little
lots
many
may
might
more
most
much
must
near
neither
next
nononenornothingofononceoneopposite or
other ourout
outside part past plenty regardinground saving
shall
should
since
so some
such
than
that
the them
themselves
thenthence
therefore
thesetheythisthosethroughthroughouttill
totow
ardtow
ardsunder
underneathunless
untilunto
upupon
usused
what
when
where
whether
which
while
who
whom
whose
willwithwithinwouldyet
Christopher Marlowe
aaboutafteragainstallalong
amongamongstanand
anotheranyasasid
eatawayba
rbeca
use
befo
re
beyo
nd
bothbu
tbycanclos
e
cons
eque
ntly
dare
desp
ite
dow
n
due
each
eith
eren
ough
ever
yex
cept
few
for
from
heap
she
nce
how
ever
ifinspite
insi
dein
toit
its
like
little
lots
many
may
might
more
most
much
must
near
neither
next
nononenornothingofononceoneopposite or
other ourout
outside part past plenty regardinground saving
shall
should
since
so some
such
than
that
the them
themselves
thenthence
therefore
thesetheythisthosethroughthroughouttill
totow
ardtow
ardsunder
underneathunless
untilunto
upupon
usused
what
when
where
whether
which
while
who
whom
whose
willwithwithinwouldyet
I WAN differences differentiate the writing styles of Marlowe and Shakespeare in, e.g., Henry VI
Segarra-Eisen-Egan-Ribeiro, Attributing the Authorship of the Henry VI Plays by Word Adjacency, Shakespeare Quarterly 2016, doi.org/10.1353/shq.2016.0024
A. Ribeiro Graph Neural Networks 8
Recommendation System with Collaborative Filtering
I Nodes represent different customers and edges their average similarity in product ratings
⇒ The graph informs the completion of ratings when some are unknown and are to be predicted
Variation Diagram for Original (sampled) ratings Variation Diagram for Reconstructed (predicted) ratings
I Variation energy of reconstructed signal is (much) smaller than variation energy of sampled signal
Ruiz-Gama-Marques-Ribeiro, Invariance-Preserving Localized Activation Functions for Graph Neural Networks, arxiv.org/abs/1903.12575
A. Ribeiro Graph Neural Networks 9
Graphs in Multiagent Physical Systems
I Graphs are more than data structures ⇒ They are models of physical systems with multiple agents
Decentralized Control of Autonomous Systems
Coordinate a team of agents without central coordination
Tolstaya et al ’19,, arxiv.org/abs/1903.10527
Wireless Communications Networks
Manage interference when allocating bandwidth and power
Eisen-Ribeiro ’19,, arxiv.org/abs/1909.01865
I The graph is the source of the problem ⇒ Challenge is that goals are global but information is local
A. Ribeiro Graph Neural Networks 10
Machine Learning on Graphs: The How
A. Ribeiro Graph Neural Networks 11
Neural Networks and Convolutional Neural Networks
I There is overwhelming empirical and theoretical justification to choose a neural network (NN)
Challenge is we want to run a NN over this But we are good at running NNs over this
I Generic NNs do not scale to large dimensions ⇒ Convolutional Neural Networks (CNNs) do scale
A. Ribeiro Graph Neural Networks 12
Convolutional Neural Networks and Graph Neural Networks
I CNNs are made up of layers composing convolutional filter banks with pointwise nonlinearities
Process graphs with graph convolutional NNs Process images with convolutional NNs
I Generalize convolutions to graphs ⇒ Compose graph filter banks with pointwise nonlinearities
I Stack in layers to create a graph (convolutional) Neural Network (GNN)
A. Ribeiro Graph Neural Networks 13
Convolutions in Time, in Space, and on Graphs
I How do we generalize convolutions in time and space to operate on graphs?
⇒ Even though we do not often think of them as such, convolutions are operations on graphs
A. Ribeiro Graph Neural Networks 14
Time and Space are Representable by Graphs
I We can describe discrete time and space using graphs that support time or space signals
Description of time with a directed line graph Description of images (space) with a grid graph
0
x0
1
x1
2
x2
3
x3
4
x4
5
x5
6
x6
00
x00
01
x01
02
x02
03
x03
04
x04
05
x05
06
x06
07
x07
10
x10
11
x11
12
x12
13
x13
14
x14
15
x15
16
x16
17
x17
20
x20
21
x21
22
x22
23
x23
24
x24
25
x25
26
x26
27
x27
30
x30
31
x31
32
x32
33
x33
34
x34
35
x35
36
x36
37
x37
I Line graph represents adjacency of points in time. Grid graph represents adjacency of points in space
A. Ribeiro Graph Neural Networks 15
Convolutions in Time and Space
I Use line and grid graphs to write convolutions as polynomials on respective adjacency matrices S
Description of time with a directed line graph Description of images (space) with a grid graph
0
x0
1
x1
2
x2
3
x3
4
x4
5
x5
4321
00
x00
01
x01
02
x02
03
x03
04
x04
05
x05
06
x06
07
x07
10
x10
11
x11
12
x12
13
x13
14
x14
15
x15
16
x16
17
x17
20
x20
21
x21
22
x22
23
x23
24
x24
25
x25
26
x26
27
x27
30
x30
31
x31
32
x32
33
x33
34
x34
35
x35
36
x36
37
x37
24
14
34
23 25
04
13 15
22 26
33 35
03 05
12 16
32 36
21 27
I Filter with coefficients hk ⇒ Output z = h0 S0x + h1 S1x + h2 S2x + h3 S3x + . . . =∞∑k=0
hk Skx
A. Ribeiro Graph Neural Networks 16
Arbitrary Graphs and Arbitrary Graph Signals
I Time and Space are pervasive and important, but still a (very) limited class of signals
I Use graphs as generic descriptors of signal structure with signal values associated to nodes andedges expressing expected similarity between signal components
A signal supported on a graph Another signal supported on another graph
1
2
3
4
5
6
7
8
w12
w24
w25
w13
w23
w34
w46
w47
w35
w56w67
w68
w57
w78
x1
x2
x3
x4
x5
x6
x7
x81
23
4
5 6
7
89
10
11 12
x1
x2x3
x4
x5 x6
x7
x8x9
x10
x11 x12
I Nodes are customers. Signal values are product ratings. Edges are cosine similarities of past scores
A. Ribeiro Graph Neural Networks 17
Arbitrary Graphs and Arbitrary Graph Signals
I Time and Space are pervasive and important, but still a (very) limited class of signals
I Use graphs as generic descriptors of signal structure with signal values associated to nodes andedges expressing expected similarity between signal components
A signal supported on a graph Another signal supported on another graph
1
2
3
4
5
6
7
8
w12
w24
w25
w13
w23
w34
w46
w47
w35
w56w67
w68
w57
w78
x1
x2
x3
x4
x5
x6
x7
x81
23
4
5 6
7
89
10
11 12
x1
x2x3
x4
x5 x6
x7
x8x9
x10
x11 x12
I Nodes are drones. Signal values are velocities. Edges are sensing and communication ranges
A. Ribeiro Graph Neural Networks 17
Arbitrary Graphs and Arbitrary Graph Signals
I Time and Space are pervasive and important, but still a (very) limited class of signals
I Use graphs as generic descriptors of signal structure with signal values associated to nodes andedges expressing expected similarity between signal components
A signal supported on a graph Another signal supported on another graph
1
2
3
4
5
6
7
8
w12
w24
w25
w13
w23
w34
w46
w47
w35
w56w67
w68
w57
w78
x1
x2
x3
x4
x5
x6
x7
x81
23
4
5 6
7
89
10
11 12
x1
x2x3
x4
x5 x6
x7
x8x9
x10
x11 x12
I Nodes are transceivers. Signal values are QoS requirements. Edges are wireless channels strength
A. Ribeiro Graph Neural Networks 17
Convolutions on Graphs
I We’ve already seen that convolutions in time and space are polynomials on adjacency matrices
Description of time with a directed line graph Description of images (space) with a grid graph
0
x0
1
x1
2
x2
3
x3
4
x4
5
x5
4321
00
x00
01
x01
02
x02
03
x03
04
x04
05
x05
06
x06
07
x07
10
x10
11
x11
12
x12
13
x13
14
x14
15
x15
16
x16
17
x17
20
x20
21
x21
22
x22
23
x23
24
x24
25
x25
26
x26
27
x27
30
x30
31
x31
32
x32
33
x33
34
x34
35
x35
36
x36
37
x37
24
14
34
23 25
04
13 15
22 26
33 35
03 05
12 16
32 36
21 27
I Filter with coefficients hk ⇒ Output z = h0 S0x + h1 S1x + h2 S2x + h3 S3x + . . . =∞∑k=0
hk Skx
A. Ribeiro Graph Neural Networks 18
Convolutions on Graphs
I For graph signals we define graph convolutions as polynomials on matrix representations of graphs
A signal supported on a graph Another signal supported on another graph
1
2
3
4
5
6
7
8
w12
w24
w25
w13
w23
w34
w46
w47
w35
w56w67
w68
w57
w78
x1
x2
x3
x4
x5
x6
x7
x81
2
3
4
5
6
7
1
23
4
5 6
7
89
10
11 12
x1
x2x3
x4
x5 x6
x7
x8x9
x10
x11 x12
2
1
3
4
65
10
210
10 10
I Filter with coefficients hk ⇒ Output z = h0 S0x + h1 S1x + h2 S2x + h3 S3x + . . . =∞∑k=0
hk Skx
I Graph convolutions share the locality of conventional convolutions. Recovered as particular case
A. Ribeiro Graph Neural Networks 19
Convolutional Neural Networks and Graph Neural Networks
I CNNs and GNNe are minor variations of linear convolutional filters
⇒ Compose filters with pointwise nonlinearities and compose these compositions into several layers
A. Ribeiro Graph Neural Networks 20
Neural Networks (NNs)
I A neural network composes a cascade of layers
I Each of which are themselves compositions of
linear maps with pointwise nonlinearities
I Does not scale to large dimensional signals x
Layer 1
Layer 2
Layer 3
x
z1 = H1 x x1 = σ[
z1
]z1
z2 = H2 x1 x2 = σ[
z2
]z2
z3 = H3, x2 x3 = σ[
z3
]z3
x1
x1
x2
x2
x3 = Φ(x;H)
A. Ribeiro Graph Neural Networks 21
Convolutional Neural Networks (CNNs)
I A convolutional NN composes a cascade of layers
I Each of which are themselves compositions of
convolutions with pointwise nonlinearities
I Scales well. The Deep Learning workhorse
I A CNNs are minor variation of convolutional filters
⇒ Just add nonlinearity and compose
⇒ They scale because convolutions scale
Layer 1
Layer 2
Layer 3
x
z1 = h1 ? x x1 = σ[
z1
]z1
z2 = h2 ? x1 x2 = σ[
z2
]z2
z3 = h3 ? x2 x3 = σ[
z3
]z3
x1
x1
x2
x2
x3 = Φ(x;H)
A. Ribeiro Graph Neural Networks 22
When we Think of Time Signals as Supported on a Line Graph
I Those convolutions are polynomials on the
adjacency matrix of a line graph
0
x0
1
x1
2
x2
3
x3
4
x4
5
x5
I Just another way of writing convolutions and
Just another way of writing CNNs
I But one that lends itself to generalization
Layer 1
Layer 2
Layer 3
x
z1 =
K−1∑k=0
h1k Sk x x1 = σ[
z1
]z1
z2 =
K−1∑k=0
h2k Sk x1 x2 = σ[
z2
]z2
z3 =
K−1∑k=0
h3k Sk x2 x3 = σ[
z3
]z3
x1
x1
x2
x2
x3 = Φ(x;H)
A. Ribeiro Graph Neural Networks 23
Graph Neural Networks (GNNs)
I The graph can be any arbitrary graph
I The polynomial on the matrix representation S
becomes a graph convolutional filter
1
2
3
4
5
6
7
8
w12
w24
w25
w13
w23
w34
w46
w47
w35
w56w67
w68
w57
w78
Layer 1
Layer 2
Layer 3
x
z1 =
K−1∑k=0
h1k Sk x x1 = σ[
z1
]z1
z2 =
K−1∑k=0
h2k Sk x1 x2 = σ[
z2
]z2
z3 =
K−1∑k=0
h3k Sk x2 x3 = σ[
z3
]z3
x1
x1
x2
x2
x3 = Φ(x; S,H)
Gama-Marques-Leus-Ribeiro, Convolutional Neural Network Architectures for Signals Supported on Graphs, TSP 2019, arxiv.org/abs/1805.00165
A. Ribeiro Graph Neural Networks 24
Graph Neural Networks (GNNs)
I A graph NN composes a cascade of layers
I Each of which are themselves compositions of
graph convolutions with pointwise nonlinearities
I A NN with linear maps restricted to convolutions
I Recovers a CNN if S describes a line graph
Layer 1
Layer 2
Layer 3
x
z1 =
K−1∑k=0
h1k Sk x x1 = σ[
z1
]z1
z2 =
K−1∑k=0
h2k Sk x1 x2 = σ[
z2
]z2
z3 =
K−1∑k=0
h3k Sk x2 x3 = σ[
z3
]z3
x1
x1
x2
x2
x3 = Φ(x; S,H)
Gama-Marques-Leus-Ribeiro, Convolutional Neural Network Architectures for Signals Supported on Graphs, TSP 2019, arxiv.org/abs/1805.00165
A. Ribeiro Graph Neural Networks 25
Graph Neural Networks (GNNs)
I There is growing evidence of scalability.
I A GNN is a minor variation of a graph filter
⇒ Just add nonlinearity and compose
I Both are scalable because they leverage the
signal structure codified by the graph
Layer 1
Layer 2
Layer 3
x
z1 =
K−1∑k=0
h1k Sk x x1 = σ[
z1
]z1
z2 =
K−1∑k=0
h2k Sk x1 x2 = σ[
z2
]z2
z3 =
K−1∑k=0
h3k Sk x2 x3 = σ[
z3
]z3
x1
x1
x2
x2
x3 = Φ(x; S,H)
Gama-Marques-Leus-Ribeiro, Convolutional Neural Network Architectures for Signals Supported on Graphs, TSP 2019, arxiv.org/abs/1805.00165
A. Ribeiro Graph Neural Networks 26
The Road Ahead
A. Ribeiro Graph Neural Networks 27
Course Objectives Revisited
We said there were two objectives in this course but there is obviously a third one
Develop the ability to use Graph NeuralNetworks in practical applications
Understand the fundamental propertiesof Graph Neural Networks
Define Graph Neural NetworkArchitectures
A. Ribeiro Graph Neural Networks 28
Lectures
I I told you a lot about architectures today in the form of convolutions. Just to give you a taste.
⇒ Don’t worry if you didn’t understand. Will revisit graph filters and graph neural networks
⇒ We will also study graph recurrent neural networks
I Can’t use blindly ⇒ GNNs have properties that explain why they work. And why they don’t
⇒ Permutation Equivariance. Stability to deformations. Transferability
Ruiz-Gama-Ribeiro, Graph Neural Networks: Architectures, Stability and Transferability, PIEEE 2020, arxiv.org/pdf/2008.01767
A. Ribeiro Graph Neural Networks 29
Labs
I Labs will focus on building practical skills
Lab 1: Statistical and Empirical Risk Minimization. Just a warmup
Lab 2: Recommendation SystemsHuang-Marques-Ribeiro, Rating Prediction via Graph Signal Processing, TSP 2018, DOI: 10.1109/TSP.2018.2864654
Lab 3: Learning Controllers in Distributed Collaborative Intelligent SystemsTolstaya-Gama-Paulos-Pappas-Kumar-Ribeiro, Learning Decentralized Controllers for Robot Swarms with Graph Neural Networks,
arxiv.org/pdf/1903.10527
Lab 4: Learning Resource Allocations in Wireless Communication NetworksEisen-Ribeiro, Optimal Wireless Resource Allocation with Random Edge Graph Neural Networks, arxiv.org/pdf/1909.01865
Lab 5: Multirobot Path PlanningLi-Gama-Ribeiro-Prorok, Graph Neural Networks for Decentralized Multi-Robot Path Planning, arxiv.org/pdf/1912.06095
A. Ribeiro Graph Neural Networks 30
So Long and Thanks for All the Fish
Looking forward toWorking with You!
A. Ribeiro Graph Neural Networks 31