SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL...

SEMI-SUPERVISED CLASSIFICATION

WITH GRAPH CONVOLUTIONAL

NETWORKS

Thomas N. Kipf, Max WellingICLR 2017

Presented by Devansh Shah

Semi-Supervised Learning

Goal: Learn a better prediction rule than based on labeled data alone2

Why bother?

• Unlabeled data is cheap

• Labeled data can be hard to get

• human annotation is boring

• labels may require experts

Can Unlabeled data help?

• Assuming each class is a coherent group (e.g. Gaussian)

• With and without unlabeled data: decision boundary shift

Can Unlabeled data help?

“Similar” data points have “similar” labels5

Semi-supervised vs transductive learning

• labeled data (Xl ,Yl) = {(x1:l , y1:l)}• unlabeled data Xu = {xl+1:n}, available during training

• test data Xtest = {xn+1:}, not available during training

Inductive learning is ultimately applied to the test data.

Transductive learning is only concerned with the unlabeled data.

Graph Convolutional Networks

Applications

• Social Networks

• Protein-Protein Interaction

• 3D Meshes

• Clustering

• Scene Graphs

Graph Learning Problem

Inputs:

• graph G = (V ,E )

• A feature description xi for every node i; summarized in a

N × D feature matrix X (N: number of nodes, D: number of

input features)

• Adjacency matrix A

Outputs:

• node-level output Z (an N×F feature matrix, where F is the

number of output features per node)

Understanding Graph Neural Networks

Every neural network layer can be written as a non-linear function

H l+1 = f (H l ,A) with

• H0 = X

• HL = Z where L is number of layers

f (H l ,A) = σ(AH lW l) where

• W l is weight matrix for the l-th layer

• σ(.) is a non-linear activation function like the ReLU

Limitation I:

• Multiplication with A means that, for every node, we sum up

all the feature vectors of all neighboring nodes but not the

node itself

• Enforce self-loop in the graph by adding identity matrix to A

Limitation II:

• A is typically not normalized and therefore the multiplication

with A will completely change the scale of the feature vectors

• Normalize A such that all rows sum to one, i.e. D−1A, where

D is the diagonal node degree matrix. Multiplying with D−1A

now corresponds to taking the average of neighboring node

features

Propagation Rule: f (H l ,A) = σ(D−0.5AD−0.5H lW l)

• A = A + I , where I is the identity matrix

• D is the diagonal node degree matrix of A

Semi-Supervised Node Classification

Cross-Entropy error over all labeled examples

Z = softmax(HL)

Loss = −∑l∈YL

F∑f=1

Ylf lnZlf

• HL is the output of the last layer

• YL is the set of node indices that have labels

• F is the number of distinct output classes

Experiments

Datasets

Experiments

Baselines

• Label Propagation (LP)

• Semi-Supervised embedding (SemiEmb)

• Manifold regularization (ManiReg)

• skip-gram based graph embeddings (DeepWalk)

• Iterative classification algorithm (ICA)

Experiments

Results

Robust Graph Convolutional Networks Against

Adversarial Attacks

Dingyuan Zhu, Ziwei Zhang, Peng Cui, Wenwu ZhuACM SIGKDD 2019

Presented by Devansh Shah

Adversarial Attacks on Graphs

RELATED WORK

• Adversarial Attack on Graph Structured Data

• Adversarial Attacks on Neural Networks for Graph Data

Graph adversarial attack

Transductive Node Classification Setting

• A single graph G0 = (V0,E0) is considered in the entire

dataset

• A target node ci ∈ Vi of graph Gi is associated with a

corresponding node label yi ∈ Y

• Test nodes (but not their labels) are also observed during

training

• D(tra) = {(G0, ci , yi )}Ni=1

Problem DefinitionGiven:

• A learned classifier f

• An instance from the dataset (G , c , y) ∈ D

The graph adversarial attacker g(·, ·) : G × D → G modifies the

graph G = (V ,E ) into G = (V , E ) such that,

1(f (G , c) 6= y)

s.t. G = g(f , (G , c , y))

Eq(G , G , c) = 1

Here Eq(·, ·, ·) : G × G × V → {0, 1} is an equivalency indicator

that tells whether two graphs G and G are semantically equivalent 22

Robust Graph Convolutional Network (RGCN)

Crux of the paper

• Instead of representing nodes as vectors, they are represented

as Gaussian distributions in each convolutional layer

• When the graph is attacked, the model can automatically

absorb the effects of adversarial changes in the variances of

the Gaussian distributions

• To remedy the propagation of adversarial attacks in GCNs,

variance-based attention mechanism is used when performing

convolutions

Gaussian-based Graph Convolution Layer

Latent representation of node vi in layer l

hli = N (µli , diag(σli ))

µli ∈ Rfl is the mean vector

diag(σli )) ∈ Rfl×fl is the diagonal variance matrix

Notation:

M l = [µl1, ..., µN1 ] ∈ RN×fl is the mean matrix

Covl = [σl1, ..., σN1 ] ∈ RN×fl is the variance matrix

TheoremIf xi ∼ N (µi , diag(σi )) i = 1, ...n and they are independent, then

for any fixed weights wi , we have:

n∑i=1

wixi ∼ N (n∑

wiµi , diag(n∑

w2i σi ))

RGCN Node Aggregation

To prevent the propagation of adversarial attacks in GCNs, we

propose an attention mechanism to assign different weights to

neighbors based on their variances since larger variances indicate

more uncertainties in the latent representations and larger

probability of having been attacked

αlj = exp(−γσlj )

Here αlj are the attention weights of node vj in the layer l and γ is

a hyper-parameter

RGCN Node Aggregation

µl+1i = ReLU(

∑j∈ne(i)

1√Di ,i Dj ,j

(µlj � αlj)W

σl+1i = ReLU(

∑j∈ne(i)

Di ,i Dj ,j

(σlj � αlj � αl

j)Wlσ)

Loss Functions

Considering that the hidden representations of our method are

Gaussian distributions, we first adopt a sampling process in the last

hidden layer

zi ∼ N (µLi , diag(σLi ))

Next zi is passed to a softmax function to get the predicted labels:

Y = softmax(Z ),Z = [z1, ..., zn]

Lcls is the cross-entropy loss between the actual labels and the

predicted probabilities for the labelled nodes

Loss Functions

To ensure that the learned representations are indeed Gaussian

distributions, we use an explicit regularization to constrain the

latent representations in the first layer as follows

Lreg1 =n∑

KL(N (µi , diag(σi ))||N (0, I ))

where KL(·||·) is the KL-divergence between two distributions

We also impose L2 regularization on parameters of the first layer as

follows:

Lreg2 =∥∥∥W (0)

∥∥∥22

+∥∥∥W (0)

∥∥∥22

Loss Functions

L = Lcls + β1Lreg1 + β2Lreg2

where β1 and β2 are hyper-parameters that control the impact of

different regularizations

Results

Node Classification on Clean Datasets

RGCN slightly outperforms the baseline methods on Pubmed,

while having comparable performance on Cora and Citeseer

Results

Against Non-targeted Adversarial Attacks

Results

Against Targeted Adversarial Attacks

Thank You!

SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL...

Documents

Transcript of SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL...

Semi-supervised graph clustering: a kernel approach · 2009-07-27 · Mach Learn DOI 10.1007/s10994-008-5084-4 Semi-supervised graph clustering: a kernel approach Brian Kulis ·Sugato

Graph Based Semi-supervised Learning - GitHub PagesGraph Based Semi-supervised Learning Aydın Gerek Marmara University May 26th, 2018 Aydın Gerek Graph Based Semi-supervised Learning

Research on semi-supervised multi-graph classification algorithm … · 2020. 6. 22. · RESEARCH Open Access Research on semi-supervised multi-graph classification algorithm based

Scalable Methods for Graph-Based Unsupervised and Semi-Supervised Learning

(Graph-based) Semi-Supervised Learning - Semantic Scholar...(Graph-based) Semi-Supervised Learning Partha Pratim Talukdar Indian Institute of Science ppt@serc.iisc.in April 7, 2015

Semi-supervised graph clustering: a kernel approach · gorithm can be viewed as optimizing an underlying semi-supervised clustering objective; speciﬁcally, it optimizes a relaxation

Consistent Semi-Supervised Graph Regularization for High ...

Interpretable Graph-Based Semi-Supervised Learning via Flows · connectivity is provided by graph-based semi-supervised learning approaches, whereby the labels, or values, known on

Parallel Graph-Based Semi-Supervised Learning1 Parallel Graph-Based Semi-Supervised Learning Jeff Bilmes and Amarnag Subramanya aDepartment of Electrical Engineering, University of

Graph-based Semi-Supervised Learning with Multi-Modality › ~winston › papers › lee12graphbased.pdf · Graph-based Semi-Supervised Learning with Multi-Modality Propagation for

GRAPH-BASED POSTERIOR REGULARIZATION FOR SEMI-SUPERVISED ...luheng.github.io/files/graph_pr_slides.pdf · GRAPH-BASED POSTERIOR REGULARIZATION FOR SEMI-SUPERVISED STRUCTURED PREDICTION

Graph-Based Semi-Supervised Learninglisa/seminaires/17-06-2005.pdf2005/06/17 · Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse Graph-Based

Experiments in Graph-based Semi-Supervised Learning for ...Partha Pratim Talukdar * (Search Labs, MSR) Fernando Pereira (Google) Experiments in Graph-based Semi-Supervised Learning

Factorized Graph Representations for semi-supervised ...

A Semi-Supervised Clustering Method Based on Graph Contraction

Graph Convolution for Semi-Supervised Classification ...

A Semi-supervised Graph Attentive Network for Financial ...

Label Efficient Semi-Supervised Learning via Graph Filteringopenaccess.thecvf.com/content_CVPR_2019/papers/Li... · Label Efﬁcient Semi-Supervised Learning via Graph Filtering Qimai

Semi-Supervised Hierarchical Recurrent Graph Neural ...

Graph-Based Semi-Supervised Learning for Indoor ......Graph-Based Semi-Supervised Learning for Indoor Localization Using Crowdsourced Data Liye Zhang 1, Shahrokh Valaee 2, Yubin Xu