New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously...

17
Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 1 The Weisfeiler-Lehman Kernel Karsten Borgwardt and Nino Shervashidze Machine Learning and Computational Biology Research Group, Max Planck Institute for Biological Cybernetics and Max Planck Institute for Developmental Biology, Tübingen

Transcript of New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously...

Page 1: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 1

The Weisfeiler-Lehman KernelKarsten Borgwardt and Nino Shervashidze

Machine Learning andComputational Biology Research Group,

Max Planck Institute for Biological Cybernetics andMax Planck Institute for Developmental Biology, Tübingen

Page 2: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

Graph Kernels Are All About...

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 2

How similar are two graphs?How similar is their structure?How similar are their node labels and edge labels?

G G’

ApplicationsFunction prediction for molecules and proteinsComparison of parts of images in computer visionComparison of semantic structures in NLP

Page 3: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

Today’s Talk

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 3

Introduction: What are graph kernels?

Motivation: Why is there a need for scalable graph ker-nels?

Past: Fast computation of random walk graph kernels

Present: Scalable hash-based graph kernels

Future: Scalable interpretable graph kernels

Page 4: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

Graph Mining

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 4

Graphs are everywhereBioinformaticsComputer VisionNatural Language Processing

Hot topics in databases/data miningFrequent subgraph miningDense subgraph miningGraph indexing and search

Recent trendsData: Growing size of graphs is achallenge for classic approachesMethods: Machine Learningapproaches to graph mining

Page 5: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

Kernels in a Nutshell I

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 5

KernelsKey concept: Move learning task to feature space H.Naive explicit approach:

Map objects x and x′ via mapping φ to H.Measure their similarity in H as 〈φ(x), φ(x′)〉.

Kernel Trick: Compute inner product in H as kernel ininput space k(x, x′) = 〈φ(x), φ(x′)〉.

R2 ⇒ H

Page 6: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

Kernels in a Nutshell II

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 6

Graph kernelsKernels on pairs of graphs(not pairs of nodes)Instance of R-Convolution kernels (Haussler, 1999):

Decompose objects x and x′ into substructures.Pairwise comparison of substructures via kernels tocompare x and x′.

A graph kernel makes the whole family of kernel me-thods applicable to graphs.

G G’

Page 7: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

Learning with Graph Kernels

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 7

Define a graph kernelLots of attention (decades of research in chemoinforma-tics, early work on graph kernels)

Compute this graph kernelLittle attention on efficiency on large graphs (ICDM 2005, NIPS 2006c,

AISTATS 2009a, NIPS 2009, JMLR 2010)

Plug kernel values into a kernel methodLots of interest in developing new kernel methods

Apply kernel method to an application problemBioinformatics

Page 8: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

Complete Graph Kernels

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 8

Complete graph kernels (Gärtner et al., 2003)If the mapping φ is injective, then computing the corre-sponding kernel k is as hard as deciding graph isomor-phism.Wanted: Polynomial-time kernel (non-injective) whichcombines expressivity with efficiency

Walk-based graph kernels (Kashima et al., 2003)Count common walks in two graphs G and G′.The more common walks, the more similar G and G′.Common walk means series of nodes and edges withidentical labels.

Page 9: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

Product Graph Kernels

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 9

Elegant computation (Gärtner et al., 2003)Form a direct product graph of G and G′.Count walks in this product graph.Each walk in product graph corresponds to one walk ineach of the two input graphs:

k×(G,G′) =

|V×|∑i,j=1

[

∞∑k=0

λkAk×]ij = e> (1− λA×)−1︸ ︷︷ ︸

n2×n2

e

SetbacksLack of efficiency: scales as O(n6)

Tottering: walks allow to go back and forth between thesame two nodesHalting: short walks are overweighted

Page 10: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

Speeding Up Graph Kernels

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 10

Fast product graph kernelCompute product graph kernel via Sylvester Equationsand Kronecker Products in O(n3) (NIPS 2006c).

Shortest path kernelAvoid tottering and halting by a graph kernel based onshortest path distances (ICDM 2005); scales as O(n4).

Graphlet kernelEnumerate or sample unlabeled subgraphs of limitedsize g from each graph (AISTATS 2009); scales asO(ndg−1).

Unresolved problemHow do scale up graph kernels to large, labeled graphs?

Page 11: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

Subtree Kernels

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 11

Classic ‘subtree kernel’ (Ramon et al., 2003)Given two graphs G and G′ with n nodes each.Compare all pairs of nodes v and v′ from G and G′

Compare all subsets of their neighbours recursivelyRuntime O(n2 4d h)

4:3:

2: 1:

3: 2: 1:

G'

G

Page 12: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

Weisfeiler-Lehman Procedure

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 12

12345

6235

1

2 3

5 4

3

6

2

5

Sort RelabelCompress

G

G'

A new subtree kernelCount common labels after each iterationProcess N graphs simultaneouslyUse a global hash function for compression

Page 13: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

Fast Subtree Kernels

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 13

Weisfeiler-Lehman Kernel (NIPS 2009)Algorithm: Perform the following three steps h times:1. Sorting: Represent each node v as a sorted list Lv of

its neighbours (O(m))2. Compression: Compress this list into a hash va-

lue h(Lv) (O(m))3. Relabeling: Relabel v with h(Lv) as its new node label

(O(n))

Analysisper pair of graphs: Runtime O(m h) (versus O(n2 4d h)for the classic kernel)for N graphs: Runtime O(N m h + N 2 n h) (naiveO(N 2 m h))

Page 14: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

WL Kernel: Pairwise vs. Global

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 14

Page 15: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

WL Kernel: Runtime & Accuracy

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 15

MUTAG NCI1 NCI109 D&D

10 sec1 minute

1 hour

1 day

10 days

100 days

1000 days

50 %

55 %

60 %

65 %

70 %

75 %

80 %

85 %

WLRG3 GraphletRWSP

graph size

Page 16: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

Outlook: Feature Selection

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 16

InterpretabilityWhy are two graphs similar? → feature selectionamong graphs, that is selecting discriminative subgra-phsTheir number may grow exponentially with graph size.In general, selecting an optimal group of k features re-quires a runtime which is exponential in k.

RoadmapOur subtree kernels operate in a feature space of sizeN n h — not exponential in n and hence more feasibleto search.Alternatively, explore theoretically-founded strategiesfor finding discriminative subgraphs of arbitrary shape(Thoma et al., SDM 2009; Thoma et al., SADM 2010).

Page 17: New The Weisfeiler-Lehman Kernel - ETH Zürich · 2014. 10. 29. · Process Ngraphs simultaneously Use a global hash function for compression. Fast Subtree Kernels ... Computational

Summary

Karsten Borgwardt and Oliver Stegle: Computational Approaches for Analysing Complex Biological Systems, Page 17

Today’s talkMotivation

Graph kernels allow to apply kernel methods to graphdata.Efficient graph kernels are the key to exploit this inpractice.

Past: Speed-up of random walk kernels to O(n3)

Present: Hash-based subtree kernels in O(m h)

Future: Feature selection on graphs using kernels